public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Tiled memory
@ 1997-03-14 16:38 root
  1997-03-14 21:00 ` Jim Balter
  1997-03-16 22:23 ` Chin Chee-Kai
  0 siblings, 2 replies; 8+ messages in thread
From: root @ 1997-03-14 16:38 UTC (permalink / raw)
  To: gnu-win32

Discussions in this group are really boring, and limit themselves 
to some obscure bugs in bash or so. Let's talk about something else.
Something new for a change.

I am adding MMX support to lcc-win32.
As you may know, the MMX introduces a SIMD parallelism to the x86 
architecture. Besides the obvious benefits of 8 bytes memory moves,
and other goodies, this parallelism feature of the new instruction 
set will be a challenge for compiler writers.

I will try to introduce the concept of a 'tiled' vector, using a
special datatype. This vectors will be handled in parallel by the
compiler, i.e. if you declare

	_tiled int vector1[1024],vector2[1024],vector3[1024];

you will be able to write something like:

	vector3 = vector1+vector2;

and the compiler will add those vectors 2 adds in parallel. The
dimensions must be right of course, and be known at compile time.

If you declare:

	_tiled short vector1[2048],vector2[2048];

You will add the 16 bits numbers 4 adds in parallel. With byte 
operations the number goes to 8 operations in parallel. You will
be able to obtain a vector of bits, comparing two strings 8 bytes
at a time (using a _tiled char).

Another new concept is the saturation operations. Using the
_saturated keyword, adds/substracts, etc will be done using saturation
arithmetic instead of normal wraparound. For instance

	_saturated char a = 150,b = 150,c;
	c = a + b;

'c' contains now 255 instead of 300-255=45 as it is now.
This operators can be combined of course.

Special variables will allow you to use directly the mmx registers.
_mm0 to _mm7 denote the mmx registers and are 64 bits wide. This
registers, aliased to the FPU registers, are NOT organized as a stack
and can be addressed individually. The datatype can be described in C as:
typedef union {
	struct {
		int high_32_63;
		int low_0_31;
	} int32;
	struct {
		short high_48_63;
		short high_32_47
		short low_16-31;
		short low_0-15;
	} int16;
	struct {
		char	high_56_63;
		char	high_48_55;
		char	high_40_47;
		char	high_32_39;
		char	low_24_31;
		char	low_16_23:
		char	low_8_15;
		char	low_0_7;
	};
} _mmxData;

Individual bytes/shorts/ints must be individually addressed to be
able to control the pack/unpack operations.

To come back to parallelism, I will borrow many concepts from the
then famous but now forgotten programming language APL. I will
introduce the vector operations as an extension of the normal operations,
and many of the APL goodies like the inner product, the outer product,
the reduce (+/ operator) etc. For instance:
	int sum = +/ vector;
This will add the vector in parallel 2/4/8 elements at a time. The
algorithm should be something like:

	_tiled vector[16];

	_mmx0 = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx0 += vector[1] + vector[9];
	.....
	_mmx0 += vector[7] + vector[15];

To maximize the pipeline effect, we can use:

	_mmx0 = _mmx1 = _mmx2 ... = 0;
	_mmx0 += vector[0] + vector[8];
	_mmx1 += vector[1] + vector[9];
	...
	etc. 
The 8 mmx registers are then added together in _mmx0 at the
end of the operation. This will allow a theoretical 8 stage
pipeline.

Similar to the reduce operator we have the +\ (expand)
operator.

Suppose we have

	_tiled vector1[] = { 1 2 3 4 5 };
	vector1 = +\vector2;
	gives:
	1       3        6          10          15
	(0+1) (0+1+2) (0+1+2+3) (0+1+2+3+4) (0+1+2+3+4+5)
---------------------------------------------------------------

Well, I will stop here, I am wasting bandwidth, that would be
better used discussing /groff/termcap/vi/bash/ls/less/old.

P.S. I still see mail about 'less'. It still exists somehow, even
termcap, even if there are no terminals around for ages...

What is 'less'?
Its goal is to display a text file isn't it? 

Imagine this:

Several years ago, Xerox (who else) researchers published the
results of playing with a graphical control to display text that
presented the text to the user as a ROLL. You rolled text slowly
into view. The eye has been trained by an evolution of millions
of years to see the objects in 3 dimensions, so this text that
rolled from the back left of the screen to the center and again 
to the right gave the eye cues that eased the recognition of text.

A control that does that would be easy to write using the graphic
3D libraries that are everywhere...

Yes but how about the termcap file for that??? :-)

Have fun guys, and stop bashing bash!

-- 
Jacob Navia	Logiciels/Informatique
41 rue Maurice Ravel			Tel 01 48.23.51.44
93430 Villetaneuse 			Fax 01 48.23.95.39
France
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-14 16:38 Tiled memory root
@ 1997-03-14 21:00 ` Jim Balter
  1997-03-16 22:23 ` Chin Chee-Kai
  1 sibling, 0 replies; 8+ messages in thread
From: Jim Balter @ 1997-03-14 21:00 UTC (permalink / raw)
  To: root; +Cc: gnu-win32

root wrote:
> 
> Discussions in this group are really boring, and limit themselves
> to some obscure bugs in bash or so. Let's talk about something else.
> Something new for a change.

Yes, but those discussions, unlike yours, interesting as it is,
are specific to cygwin32.  Unless I've missed something, what you
are talking about is not specific to Windows.  I would think that
a more appropriate forum would be one of the x86 newsgroups,
or a gcc newsgroup, or comp.compilers.  Not that I have any objection
to seeing this here; I'm not in the business of telling people what to
post, other than spam.  But for your sake, I think you would be more
likely to get an informed response in one of those groups, esp.
comp.compilers, where a lot of extremely savvy compiler people hang out.

--
<J Q B>
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-14 16:38 Tiled memory root
  1997-03-14 21:00 ` Jim Balter
@ 1997-03-16 22:23 ` Chin Chee-Kai
  1997-03-17  4:31   ` Fergus Henderson
  1997-03-17 12:14   ` Shankar Unni
  1 sibling, 2 replies; 8+ messages in thread
From: Chin Chee-Kai @ 1997-03-16 22:23 UTC (permalink / raw)
  To: root; +Cc: gnu-win32

Interesting detour here.  Just my 2 cents :

> i.e. if you declare
> 	_tiled int vector1[1024],vector2[1024],vector3[1024];

This sort of special treatment should be included into some
sort of optimizing flags, or parallelization flags to tell
compiler to automatically process, or group instructions to
process, them in parallel.  Using your example and IMHO,
I would have liked 
	int vector1[1024],vector2[1024],vector3[1024];

	vector3 = vector1+vector2;

more than the explicit "_tiled" declarative.  In the case in
which I want parallelization, I turn on the compiler flag,
or else for algorithmic debugging (instead of mixing up with
parallelization bugs), I would turn off the compiler parallelization
flag.  This way would be easier for the user.


On the other hand, the "_saturated" declarative is perhaps
necessary as a type modifier (like "unsigned", "long long" etc).
I'm not familiar with MMX instructions, but if MMX has a fast
(perhaps single) instruction for "cropping" values like that,
then it will be extremely useful for applications in signal
processing, images, etc.  You might have to worry about
other complications like
	_saturated long long sllvar;
	_saturated double    sdvar;

BTW, why is it underscored?  (_saturated)




> Well, I will stop here, I am wasting bandwidth, that would be
> better used discussing /groff/termcap/vi/bash/ls/less/old.

Personally, I think these are just practical issues whose
answer users would like to know to fully utilise the resources
Cygnus has created.  I'll rather read answers to these same
questions twice (perhaps in different fonts :)  than to read
a list of endless political debates and accusations.



Anyway, thanks for lcc, and keep up the good work!



Chin Chee-Kai (Last, First)
Internet Email-ID:	cheekai@gen.co.jp

-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-16 22:23 ` Chin Chee-Kai
@ 1997-03-17  4:31   ` Fergus Henderson
  1997-03-17 12:14   ` Shankar Unni
  1 sibling, 0 replies; 8+ messages in thread
From: Fergus Henderson @ 1997-03-17  4:31 UTC (permalink / raw)
  To: Chin Chee-Kai; +Cc: gnu-win32

[Apologies for contnuing this off-topic thread.]

Chin Chee-Kai wrote:
> 
> BTW, why is it underscored?  (_saturated)

To avoid breaking existing programs that use `saturated' to mean something
else.

BTW, if you want ANSI/ISO C conformance, it should be `__saturated'.
or `_Saturated' or something like that.  Names starting with an underscore
and a lower-case letter are not reserved for the implementation in all
contexts.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>   |  "I have always known that the pursuit
WWW: < http://www.cs.mu.oz.au/~fjh >   |  of excellence is a lethal habit"
PGP: finger fjh@128.250.37.3         |     -- the last words of T. S. Garp.
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-16 22:23 ` Chin Chee-Kai
  1997-03-17  4:31   ` Fergus Henderson
@ 1997-03-17 12:14   ` Shankar Unni
  1997-03-18  9:38     ` Hans Zuidam
  1 sibling, 1 reply; 8+ messages in thread
From: Shankar Unni @ 1997-03-17 12:14 UTC (permalink / raw)
  To: Chin Chee-Kai; +Cc: root, gnu-win32

root <root@jacob.remcomp.fr> wrote:

> > i.e. if you declare
> >       _tiled int vector1[1024],vector2[1024],vector3[1024];

and Chin Chee-Kai <cheekai@gen.co.jp> replied:

> This sort of special treatment should be included into some
> sort of optimizing flags, or parallelization flags to tell
> compiler to automatically process, or group instructions to
> process, them in parallel.

I sort of agree with you here. But it's always possible to treat this on
two levels. 

One is to say that we consider this "multimedia data" (for lack of a
better term: "vectors"?) as a special data kind, made up of streams of
packed 8, 16 or 32-bit quantities, and have a special kind for them,
with restrictions (i.e. cannot declare single _tiled int's, and so on). 

This makes it easier on the compiler writer to get off the ground, and
it is not terribly hard on the programmer, because it is usually clear
which variables are your media (or otherwise packed streams or vectors),
and where you are passing them around. 

While you (as a programmer) won't get the most general and powerful
optimizations for all of your character data, you'll get a substantial
benefit for a small expenditure of effort.

> On the other hand, the "_saturated" declarative is perhaps
> necessary as a type modifier (like "unsigned", "long long" etc).

Here, on the other hand, I strongly disagree.  

Saturatedness is not a feature of the data per se, but a feature of the
operation you perform on them. I.e. there will be occasions on which you
want to perform saturating adds on vectors, and other times when you
want to perform non-saturating operations.

What you need for this case is some sort of operator or builtin function
to perform your saturating operations for you.  The most obvious way of
doing this is to add builtin functions for this:

  vector3 = __saturated_add(vector1, vector2)

or, if you want to play games, you can invent either inline operators
(+|, -|, etc.?) or some sort of functional notation like

  vector3 = __saturated(vector1 + vector2)

where __saturated(expression) is treated by the compiler as putting a
"__saturated" attribute on all the arithmetic operators contained
within, if applicable (and maybe error out if not applicable).

-- 
Shankar Unni                                  shankar@chromatic.com
Chromatic Research                            (408) 752-9488
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-17 12:14   ` Shankar Unni
@ 1997-03-18  9:38     ` Hans Zuidam
  1997-03-19 10:28       ` Shankar Unni
  1997-03-19 10:28       ` Ron G. Minnich
  0 siblings, 2 replies; 8+ messages in thread
From: Hans Zuidam @ 1997-03-18  9:38 UTC (permalink / raw)
  To: gnu-win32

> Shankar Unni wrote:
> root <root@jacob.remcomp.fr> wrote:
> 
> > > i.e. if you declare
> > >       _tiled int vector1[1024],vector2[1024],vector3[1024];
> 
> and Chin Chee-Kai <cheekai@gen.co.jp> replied:
> 
> > This sort of special treatment should be included into some
> > sort of optimizing flags, or parallelization flags to tell
> > compiler to automatically process, or group instructions to
> > process, them in parallel.
Isn't there some work going on among the super-computer people to add
these kinds of extensions to the C language?  I vaguely remember
reading about addition vector operations to C in an issue of Dr. Dobbs
a long time ago.

Regards,
					Hans

-- 
H. Zuidam                        E-Mail: hans@brandinnovators.com
Brand Innovators B.V.            P-Mail: P.O. Box 1377
de Pinckart 54                   5602 BJ Eindhoven, The Netherlands
5674 CC Nuenen                   Tel. +31 40 2631134, Fax. +31 40 2831138
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-18  9:38     ` Hans Zuidam
  1997-03-19 10:28       ` Shankar Unni
@ 1997-03-19 10:28       ` Ron G. Minnich
  1 sibling, 0 replies; 8+ messages in thread
From: Ron G. Minnich @ 1997-03-19 10:28 UTC (permalink / raw)
  To: Hans Zuidam; +Cc: gnu-win32

yes, additions like this have been put into C from years ago, for the 
same reasons. A literature search makes sense.

ron
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Tiled memory
  1997-03-18  9:38     ` Hans Zuidam
@ 1997-03-19 10:28       ` Shankar Unni
  1997-03-19 10:28       ` Ron G. Minnich
  1 sibling, 0 replies; 8+ messages in thread
From: Shankar Unni @ 1997-03-19 10:28 UTC (permalink / raw)
  To: Hans Zuidam; +Cc: gnu-win32

Hans Zuidam wrote:

> Isn't there some work going on among the super-computer people to add
> these kinds of extensions to the C language?  I vaguely remember
> reading about addition vector operations to C in an issue of Dr. Dobbs
> a long time ago.

There are two different initiatives going on.

One is a set of "Numerical Extensions" to C (a subcommittee of ANSI
X3J9). No idea what the state of this proposal is today (there was some
progress as recently as a year or two ago).

Another proposal is a set of extensions called "HPC" (analogous to the
HPF extensions to Fortran - "HP" stands for High Performance). I know
that the Univ of Illinois (Urbana) was involved in this, as was Cray and
some other participants. 

Both have their good and bad points. I'm not a big fan of the pragma
approach to extending the language, which is what HPC is doing - the
goal there is to write C code in a vanilla fashion, but splatter it with
pragmas like "#pragma doacross blah blah", which tells a smart code
generator and optimizer to tile the loops in a particular fashion, but
lets the code compile on "plain" C compilers by just stripping out or
ignoring the pragmas.

The Numerical Extensions group is actually proposing syntax extensions
like array slices, arrays as first class objects, etc., which actually
sound more elegant, but which will require a fair amount of work to
existing compilers to accommodate.

What Chin (<cheekai@gen.co.jp>) is proposing is sort of a middle ground:
extensions to the language in the form of new predefined types or
qualifiers (only), and some tweaks to the language operators to
recognize these types and generate MM instructions for them. The
attractive thing about this is that it's possible to implement with very
little effort in something like, say, GCC or LCC, and is still useful
enough for programming tight, well-controlled multimedia loops with only
a small expenditure of effort on the part of the programmer.

After all, you *do* have to think of the nature of your data anyway -
it's foolish to let the compiler do all of your thinking for you and
expect it to figure out automatically which variables are streaming
multimedia data, which are an ordinary vectors, and which are simply
arrays of quantities..

-- 
Shankar Unni                                  shankar@chromatic.com
Chromatic Research                            (408) 752-9488
-
For help on using this list, send a message to
"gnu-win32-request@cygnus.com" with one line of text: "help".

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1997-03-19 10:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-03-14 16:38 Tiled memory root
1997-03-14 21:00 ` Jim Balter
1997-03-16 22:23 ` Chin Chee-Kai
1997-03-17  4:31   ` Fergus Henderson
1997-03-17 12:14   ` Shankar Unni
1997-03-18  9:38     ` Hans Zuidam
1997-03-19 10:28       ` Shankar Unni
1997-03-19 10:28       ` Ron G. Minnich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).