public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Status of SSE builtins
@ 2002-10-29 12:16 Jan Hubicka
  2002-10-29 13:51 ` Gerald Pfeifer
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-29 12:16 UTC (permalink / raw)
  To: gcc, rth, bernds, aj

Hi,
after three weeks long bugfixing I think the SSE builtins are in pretty
working shape - all of them results in valid instructions, some in
suboptimal code (like *_set patterns).

There are critical bugs left tought:
  1) Outgoing argument alignment
     I blieve my last patch is mostly valid, but has not been reviewed
  2) Stack aligmnment
     The patches for dynamic stack alignment has been sent to the list
     but never applied.  Are there some plans about that?

     When we don't have dynamic stack alignment we can:
       a) Assume that ABI contains preffered-stack-alignment so the
          entry point is always aligned
	  This gives us quite cheaply working SSE support
       b) Assume that ABI does not mandate preffered-stack-alignment
          This needs to teach all moves to output unaligned moves
	  depending on alignment setting in the MEM expression.
	  We also need to add target hook for x86-64 that should use a)

	  Most importantly we need to change predicates of all SSE
	  patterns to refuse missaligned memory operands.  This will
	  break reload that will happily insert it on 'm' constraint so
	  we will have to duplicate all the patterns, one for 32bit and
	  one for 64bit version
  3) ABI compatibility with ICC
     ICC obviously uses register passing conventions in version 6.0,
     however I din't found any official documentation with it.
  4) Missaligned load/store buitins
     The use of missaligned loads/stores results in GCC eventually
     keeping the values in register and producing "internal" move for it
     resulting in trap.  I am not sure how to model this properly.
  5) generic SIMD support is quite broken right now as SSE does not
     allow scalar opration on elements of vectors registers, like Sparc and
     other sane instruction sets most probably do.  I am not quite sure how
     to get arround here and I also think the RTL produced is invalid when
     dealing with vectors containing elements smaller than word size.

     We also should probably rename most of the patterns to match the
     names gneeric SIMD support handles

     I am also quite confused about dealing with vec_select that is
     quite redundant with SUBREG.  SSE builtins appears to use
     vec_select while SIMD appears to use subreg.

Any ideas how to deal with these?  In case these won't be fixed in 3.3,
should we document it in "known bugs"?

Honza

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 12:16 Status of SSE builtins Jan Hubicka
@ 2002-10-29 13:51 ` Gerald Pfeifer
  2002-10-29 15:11   ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Pfeifer @ 2002-10-29 13:51 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

On Tue, 29 Oct 2002, Jan Hubicka wrote:
> Any ideas how to deal with these?  In case these won't be fixed in 3.3,
> should we document it in "known bugs"?

Yes, please.

(Also the release notes at gcc-3.3/changes.html probably need some
updating, right?)

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 13:51 ` Gerald Pfeifer
@ 2002-10-29 15:11   ` Jan Hubicka
  2002-10-29 15:40     ` Gerald Pfeifer
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-29 15:11 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> On Tue, 29 Oct 2002, Jan Hubicka wrote:
> > Any ideas how to deal with these?  In case these won't be fixed in 3.3,
> > should we document it in "known bugs"?
> 
> Yes, please.

OK, I will try to polish my english and send something once this thread
settles down (or it already did?)
> 
> (Also the release notes at gcc-3.3/changes.html probably need some
> updating, right?)

We mention that SSE2 builtins has been added.  I guess we don't need to
add that they has been added and fixed later ;)

Honza
> 
> Gerald
> -- 
> Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 15:11   ` Jan Hubicka
@ 2002-10-29 15:40     ` Gerald Pfeifer
  2002-10-29 15:44       ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Pfeifer @ 2002-10-29 15:40 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

On Tue, 29 Oct 2002, Jan Hubicka wrote:
>> (Also the release notes at gcc-3.3/changes.html probably need some
>> updating, right?)
> We mention that SSE2 builtins has been added.  I guess we don't need to
> add that they has been added and fixed later ;)

Well, consider that users have tried it and it did not work for them.
Adding a note that "Several problems concerning SSE2 builtins have been
fixed" might encourage them to try again.

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 15:40     ` Gerald Pfeifer
@ 2002-10-29 15:44       ` Jan Hubicka
  2002-10-29 16:05         ` Gerald Pfeifer
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-29 15:44 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> On Tue, 29 Oct 2002, Jan Hubicka wrote:
> >> (Also the release notes at gcc-3.3/changes.html probably need some
> >> updating, right?)
> > We mention that SSE2 builtins has been added.  I guess we don't need to
> > add that they has been added and fixed later ;)
> 
> Well, consider that users have tried it and it did not work for them.
> Adding a note that "Several problems concerning SSE2 builtins have been
> fixed" might encourage them to try again.
We didn't released compiler with SSE2 support yet.  I guess if someone
tried them, they didn't work on CVS compiler he will expect them to be
fixed in the release.
I will add note about SSE1 builtins, as I've fixed number of bugs there.
Should this come to 3.2.1 notes, 3.3 notes or both?

Honza
> 
> Gerald
> -- 
> Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 15:44       ` Jan Hubicka
@ 2002-10-29 16:05         ` Gerald Pfeifer
  2002-10-29 18:22           ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Pfeifer @ 2002-10-29 16:05 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

On Tue, 29 Oct 2002, Jan Hubicka wrote:
> We didn't released compiler with SSE2 support yet.  I guess if someone
> tried them, they didn't work on CVS compiler he will expect them to be
> fixed in the release.

You mean, SSE2 intrinsics, not general support for SSE2, which already
was in GCC 3.1, right? Or did I misread gcc-3.1/changes.html?

  The compiler now supports MMX, 3DNow!, SSE, and SSE2
  instructions. Options -mmmx, -m3dnow, -msse, and -msse2 will
  enable the respective instruction sets. Intel C++ compatible
  MMX/3DNow!/SSE intrinsics are implemented. SSE2 intrinsics
  will be added in next major release.

> I will add note about SSE1 builtins, as I've fixed number of bugs there.
> Should this come to 3.2.1 notes, 3.3 notes or both?

As far as I remember your patches, many which were applied both to the 3.2
branch and mainline (IIRC), I'd say yes.

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 16:05         ` Gerald Pfeifer
@ 2002-10-29 18:22           ` Jan Hubicka
  2002-10-30  9:18             ` Gerald Pfeifer
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-29 18:22 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> On Tue, 29 Oct 2002, Jan Hubicka wrote:
> > We didn't released compiler with SSE2 support yet.  I guess if someone
> > tried them, they didn't work on CVS compiler he will expect them to be
> > fixed in the release.
> 
> You mean, SSE2 intrinsics, not general support for SSE2, which already
> was in GCC 3.1, right? Or did I misread gcc-3.1/changes.html?

Yes.
I just noticed that the SSE2 intrinsics are not mentioned, I've added
them as well as few other changes I can come with.  I will try to
mention more later.
> 
>   The compiler now supports MMX, 3DNow!, SSE, and SSE2
>   instructions. Options -mmmx, -m3dnow, -msse, and -msse2 will
>   enable the respective instruction sets. Intel C++ compatible
>   MMX/3DNow!/SSE intrinsics are implemented. SSE2 intrinsics
>   will be added in next major release.
> 
> > I will add note about SSE1 builtins, as I've fixed number of bugs there.
> > Should this come to 3.2.1 notes, 3.3 notes or both?
> 
> As far as I remember your patches, many which were applied both to the 3.2
> branch and mainline (IIRC), I'd say yes.
I am attaching the 3.2 changes.  In case you feel it as good idea, I
will apply same changes to 3.3 too.
I am not sure whether people see 3.3 as successor 3.2.1 or whether they
see them independent (as they really are)

Index: gcc-3.2/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.2/changes.html,v
retrieving revision 1.33
diff -c -3 -p -r1.33 changes.html
*** gcc-3.2/changes.html	16 Oct 2002 19:26:07 -0000	1.33
--- gcc-3.2/changes.html	29 Oct 2002 20:35:07 -0000
*************** a list of bugs fixed in this release.</p
*** 86,96 ****
--- 86,107 ----
  
  <h2>New Targets and Target Specific Improvements</h2>
  
+ <h3>IA-32</h3>
+   <ul>
+    <li>Fixed number of bugs in SSE and MMX intrinsics.</li>
+    <li>Fixed common compiler crashes with SSE instruction set enabled
+        (implied by <code>-march=pentium3</code>, <code>pentium4</code>,
+         <code>athlon-xp</code>)</li>
+    <li>__m128 and __m128i is not 128bit aligned when used in structures.
+   </ul>
  <h3>x86-64</h3>
   
    <ul>
     <li>A bug whereby the compiler could generate bad code for
         <code>bzero</code> has been fixed.</li>
+    <li>ABI fixes (implying ABI incompatibilities with previous version in some
+    side cases)</li>
+    <li>Fixed prefetch code generation</li>
    </ul>
  
  </body>
Index: gcc-3.3/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.3/changes.html,v
retrieving revision 1.11
diff -c -3 -p -r1.11 changes.html
*** gcc-3.3/changes.html	14 Oct 2002 17:12:09 -0000	1.11
--- gcc-3.3/changes.html	29 Oct 2002 20:35:07 -0000
***************
*** 185,191 ****
      <li>The HP-PA port now defaults to scheduling for the PA8000 series
          of processors.  Scheduling support for the PA7300 processor has
  	been added.</li>
!     <li>The SPARC, HP-PA, SH4, and x86 ports have been converted to
          use the DFA processor pipeline description.</li>
      <li>The following NetBSD configurations for the SuperH processor family
  	have been added:
--- 185,191 ----
      <li>The HP-PA port now defaults to scheduling for the PA8000 series
          of processors.  Scheduling support for the PA7300 processor has
  	been added.</li>
!     <li>The SPARC, HP-PA, SH4, and partly x86 ports have been converted to
          use the DFA processor pipeline description.</li>
      <li>The following NetBSD configurations for the SuperH processor family
  	have been added:
***************
*** 201,206 ****
--- 201,213 ----
  	  <li>SH5, SHmedia, little-endian, 64-bit default,
  	      <code>sh64le-*-netbsd*</code></li>
  	</ul></li>
+     <li>The following changes have been made to the IA-32/x86-64 port:
+     	<ul>
+ 	  <li>SSE2 and 3dNOW! intrinsics are now supported</li>
+ 	  <li>Support for thread local storeage has been added to both IA-32
+ 	      and x86-64 ports.
+ 	  <li>The x86-64 port has been significantly improved</li>
+ 	</ul>
      <li>The following changes have been made to the MIPS port:
  	<ul>
  	  <li>All configurations now accept the <code>-mabi</code>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-29 18:22           ` Jan Hubicka
@ 2002-10-30  9:18             ` Gerald Pfeifer
  2002-10-30 10:30               ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Gerald Pfeifer @ 2002-10-30  9:18 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

On Tue, 29 Oct 2002, Jan Hubicka wrote:
> I am attaching the 3.2 changes.  In case you feel it as good idea, I
> will apply same changes to 3.3 too.

Please.

> I am not sure whether people see 3.3 as successor 3.2.1 or whether they
> see them independent (as they really are)

Usually, GCC 3.x.1 should not differ too much from 3.x, so if we made
a substantial change in both 3.x.1 and mainline, I believe we should
announce that in both places, because someone might have tried/looked
at 3.x and concluded that it was not useful for him.

> Index: gcc-3.2/changes.html
> ===================================================================
> +    <li>Fixed number of bugs in SSE and MMX intrinsics.</li>

...a number...

> +    <li>ABI fixes (implying ABI incompatibilities with previous version in some
> +    side cases)</li>

I believe this should be "corner case".

> Index: gcc-3.3/changes.html
> ===================================================================
> !     <li>The SPARC, HP-PA, SH4, and partly x86 ports have been converted to
>           use the DFA processor pipeline description.</li>

What does "partly" mean here? Can we list the exact list of ports
affected?

> + 	  <li>SSE2 and 3dNOW! intrinsics are now supported</li>

"supported." (with a full stop; I believe there is also a similar case
in gcc-3.2/changes.html).

> + 	  <li>Support for thread local storeage has been added to both IA-32

storage

> + 	      and x86-64 ports.

to the IA-32 and x86-64 ports

> + 	  <li>The x86-64 port has been significantly improved</li>

Full stop. (How has it been improved? In terms of performance of the
generated code?)

Both patches are fine with those minor changes; thanks!

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30  9:18             ` Gerald Pfeifer
@ 2002-10-30 10:30               ` Jan Hubicka
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Hubicka @ 2002-10-30 10:30 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> On Tue, 29 Oct 2002, Jan Hubicka wrote:
> > I am attaching the 3.2 changes.  In case you feel it as good idea, I
> > will apply same changes to 3.3 too.
> 
> Please.
> 
> > I am not sure whether people see 3.3 as successor 3.2.1 or whether they
> > see them independent (as they really are)
> 
> Usually, GCC 3.x.1 should not differ too much from 3.x, so if we made
> a substantial change in both 3.x.1 and mainline, I believe we should
> announce that in both places, because someone might have tried/looked
> at 3.x and concluded that it was not useful for him.
> 
> > Index: gcc-3.2/changes.html
> > ===================================================================
> > +    <li>Fixed number of bugs in SSE and MMX intrinsics.</li>
> 
> ...a number...
> 
> > +    <li>ABI fixes (implying ABI incompatibilities with previous version in some
> > +    side cases)</li>
> 
> I believe this should be "corner case".
> 
> > Index: gcc-3.3/changes.html
> > ===================================================================
> > !     <li>The SPARC, HP-PA, SH4, and partly x86 ports have been converted to
> >           use the DFA processor pipeline description.</li>
> 
> What does "partly" mean here? Can we list the exact list of ports
> affected?
We converted Athlon and Pentium, but didn't K6 and Pentium4.
I didn't know how to fit this in the sentence.
> 
> > + 	  <li>SSE2 and 3dNOW! intrinsics are now supported</li>
> 
> "supported." (with a full stop; I believe there is also a similar case
> in gcc-3.2/changes.html).
> 
> > + 	  <li>Support for thread local storeage has been added to both IA-32
> 
> storage
Oops, missed ispell :(
> 
> > + 	      and x86-64 ports.
> 
> to the IA-32 and x86-64 ports
> 
> > + 	  <li>The x86-64 port has been significantly improved</li>
> 
> Full stop. (How has it been improved? In terms of performance of the
> generated code?)
> 
> Both patches are fine with those minor changes; thanks!
> 
> Gerald
> -- 
> Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 10:55       ` Richard Henderson
@ 2002-10-31 12:24         ` Aldy Hernandez
  0 siblings, 0 replies; 24+ messages in thread
From: Aldy Hernandez @ 2002-10-31 12:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jan Hubicka, Joern Rennecke, gcc, bernds, aj

>>>>> "Richard" == Richard Henderson <rth@redhat.com> writes:

Oh yeah, I'll probably tackle this next week.  I'm just a bit busy
with my try/finally and CLASS_CANNOT_CHANGE_MODE patches which are
both 98% finished ;-).

 > On Thu, Oct 31, 2002 at 05:54:39PM +0100, Jan Hubicka wrote:
 >> What names would you propose?
 >> Perhaps extract_vec_fieldM/set_vec_fieldM?

 > No idea.  I was hoping that Aldy would invent something.  ;-)

I was thinking more along the lines of aldyvec_fieldM :).

I think a combination of the vec_select and vec_merge patterns which
we already have (but don't emit), could be used for most
architectures.  I'll look into it.

AldyVec

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 11:00     ` Joern Rennecke
  2002-10-31 11:00       ` Richard Henderson
@ 2002-10-31 11:03       ` Jan Hubicka
  1 sibling, 0 replies; 24+ messages in thread
From: Jan Hubicka @ 2002-10-31 11:03 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Richard Henderson, Jan Hubicka, gcc, bernds, aj

> We could also have a target macro that controls the word size (i.e.
> hard register size) being assumed for an access of a subparty in one
> mode of an entity in another mode.  For SSE2 targets, you would give the
> size of the SSE2 registers.
> This value can than be used instead of UNITS_PER_WORD in the generic simd
> code - and elsewhere if appropriate - to control wheather to use subregs or
> extract_bit_field / store_bit_field.
> 
> This would make the target code a lot simpler than having to write expanders
> for every extraction.  Of course, that is assuming that extract_bit_field
> will work OK.

I don't think this is good idea.  We will end up with something that
will move do a lot of shifting on SSE register and finally subreg it
into integer register, do arithmetic and repeat the shifting that will
be dificult to simplify into the code we should produce.

Honza
> 	
> -- 
> --------------------------
> SuperH (UK) Ltd.
> 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 11:00     ` Joern Rennecke
@ 2002-10-31 11:00       ` Richard Henderson
  2002-10-31 11:03       ` Jan Hubicka
  1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2002-10-31 11:00 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc, bernds, aj

On Thu, Oct 31, 2002 at 05:10:34PM +0000, Joern Rennecke wrote:
> This would make the target code a lot simpler than having to write expanders
> for every extraction.  Of course, that is assuming that extract_bit_field
> will work OK.

I think we'll wind up writing them anyway, since the SSE, Altivec,
and BookE all provide a way to swizzle elements of the vector.


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 10:53   ` Richard Henderson
  2002-10-31 10:54     ` Jan Hubicka
@ 2002-10-31 11:00     ` Joern Rennecke
  2002-10-31 11:00       ` Richard Henderson
  2002-10-31 11:03       ` Jan Hubicka
  1 sibling, 2 replies; 24+ messages in thread
From: Joern Rennecke @ 2002-10-31 11:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jan Hubicka, gcc, bernds, aj

Richard Henderson wrote:
> 
> On Wed, Oct 30, 2002 at 10:41:40PM +0100, Jan Hubicka wrote:
> > We generate instruction dealing with elements of the vector using
> > subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
> > only the second field of the vector.  However the subregs gneerally
> > clobber whole word in GCC and are not allowed in such general forms.
> 
> Yes, I was talking with Aldy about this recently.  He has the
> same problem for Motorola BookE vectors.
> 
> I think the proper solution is to have named patterns in the md file
> that the rtl expander will use to satisfy these insertions and extractions.
> If the named patterns do not exist, or FAIL, then we fall back to a
> combination of subreg and insert/extract bitfield.

We could also have a target macro that controls the word size (i.e.
hard register size) being assumed for an access of a subparty in one
mode of an entity in another mode.  For SSE2 targets, you would give the
size of the SSE2 registers.
This value can than be used instead of UNITS_PER_WORD in the generic simd
code - and elsewhere if appropriate - to control wheather to use subregs or
extract_bit_field / store_bit_field.

This would make the target code a lot simpler than having to write expanders
for every extraction.  Of course, that is assuming that extract_bit_field
will work OK.
	
-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 10:54     ` Jan Hubicka
@ 2002-10-31 10:55       ` Richard Henderson
  2002-10-31 12:24         ` Aldy Hernandez
  0 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2002-10-31 10:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joern Rennecke, gcc, bernds, aj

On Thu, Oct 31, 2002 at 05:54:39PM +0100, Jan Hubicka wrote:
> What names would you propose?
> Perhaps extract_vec_fieldM/set_vec_fieldM?

No idea.  I was hoping that Aldy would invent something.  ;-)


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31 10:53   ` Richard Henderson
@ 2002-10-31 10:54     ` Jan Hubicka
  2002-10-31 10:55       ` Richard Henderson
  2002-10-31 11:00     ` Joern Rennecke
  1 sibling, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-31 10:54 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, Joern Rennecke, gcc, bernds, aj

> On Wed, Oct 30, 2002 at 10:41:40PM +0100, Jan Hubicka wrote:
> > We generate instruction dealing with elements of the vector using
> > subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
> > only the second field of the vector.  However the subregs gneerally
> > clobber whole word in GCC and are not allowed in such general forms.
> 
> Yes, I was talking with Aldy about this recently.  He has the
> same problem for Motorola BookE vectors.
> 
> I think the proper solution is to have named patterns in the md file
> that the rtl expander will use to satisfy these insertions and extractions.
> If the named patterns do not exist, or FAIL, then we fall back to a
> combination of subreg and insert/extract bitfield.

This sounds good to me, even when the SSE code would be lousy
(to set given QImode of V16QI seems to be dificult).  I won't be able to
do this before weekend, but I can take a look.
What names would you propose?
Perhaps extract_vec_fieldM/set_vec_fieldM?

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 15:07 ` Jan Hubicka
  2002-10-30 20:31   ` Joern Rennecke
@ 2002-10-31 10:53   ` Richard Henderson
  2002-10-31 10:54     ` Jan Hubicka
  2002-10-31 11:00     ` Joern Rennecke
  1 sibling, 2 replies; 24+ messages in thread
From: Richard Henderson @ 2002-10-31 10:53 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Joern Rennecke, gcc, bernds, aj

On Wed, Oct 30, 2002 at 10:41:40PM +0100, Jan Hubicka wrote:
> We generate instruction dealing with elements of the vector using
> subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
> only the second field of the vector.  However the subregs gneerally
> clobber whole word in GCC and are not allowed in such general forms.

Yes, I was talking with Aldy about this recently.  He has the
same problem for Motorola BookE vectors.

I think the proper solution is to have named patterns in the md file
that the rtl expander will use to satisfy these insertions and extractions.
If the named patterns do not exist, or FAIL, then we fall back to a
combination of subreg and insert/extract bitfield.


r~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31  8:52           ` Joern Rennecke
@ 2002-10-31 10:18             ` Jan Hubicka
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Hubicka @ 2002-10-31 10:18 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> Jan Hubicka wrote:
> > 
> > > Jan Hubicka wrote:
> > > > Not yet, however I think this is just part of the problem, as reload
> > > > will offload the register to memory, read it back and clobber the upper
> > > > part.  I will check
> > >
> > > You'll have to define reload patterns that copy the SSE register into the
> > > GP register first, than load the value into the desired subreg of the GPR,
> > Moving SSE into GP is kind of overkill.  You need 4 registers on 5
> > GPR register machine ;(. 	
> 
> We are interested in a working compiler first.  And it's a 5 GPR
> register machine only when compiling -fpic -fno-omit-frame-pointer.
> Of course, you can try to get thius stuff through secondary memory, although
> I'm not sure that this will work unless you load memory by using a reload through
> GPR and a secondary reload through memory.
> Or if you reserve a few bytes just in case, or fiddle with machine_dependent_reorg ;-)
> 
> >                          Loading two SSE registers into GPR is already
> > impossible.
> 
> You don't need to.  Just manipulate the SSE register with the word you want
> to change.
Reload definitly reloads whole register. I already run into this problem on
instruction dealing with (subreg:SI (reg:DF)).  In this case reload
decided to reload whole subreg into integer registers (killing 4
registers) and reload the subreg (SImode register) into yet another
register (killing another 2) crashing on PIC.
I've fixed it to do this excercise only for output reload that saves 2
registers and makes us to fit, but for SSE this won't work.
I will try to figure out how to deal with this.  At the moment we even
can't store vector modes in integer registers (even when it is possible)

Honza
> 	
> > Honza
> > > then write back to the SSE register.
> > >
> > > --
> > > --------------------------
> > > SuperH (UK) Ltd.
> > > 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> > > T:+44 1454 465658
> 
> -- 
> --------------------------
> SuperH (UK) Ltd.
> 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-31  1:51         ` Jan Hubicka
@ 2002-10-31  8:52           ` Joern Rennecke
  2002-10-31 10:18             ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Joern Rennecke @ 2002-10-31  8:52 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

Jan Hubicka wrote:
> 
> > Jan Hubicka wrote:
> > > Not yet, however I think this is just part of the problem, as reload
> > > will offload the register to memory, read it back and clobber the upper
> > > part.  I will check
> >
> > You'll have to define reload patterns that copy the SSE register into the
> > GP register first, than load the value into the desired subreg of the GPR,
> Moving SSE into GP is kind of overkill.  You need 4 registers on 5
> GPR register machine ;(. 	

We are interested in a working compiler first.  And it's a 5 GPR
register machine only when compiling -fpic -fno-omit-frame-pointer.
Of course, you can try to get thius stuff through secondary memory, although
I'm not sure that this will work unless you load memory by using a reload through
GPR and a secondary reload through memory.
Or if you reserve a few bytes just in case, or fiddle with machine_dependent_reorg ;-)

>                          Loading two SSE registers into GPR is already
> impossible.

You don't need to.  Just manipulate the SSE register with the word you want
to change.
	
> Honza
> > then write back to the SSE register.
> >
> > --
> > --------------------------
> > SuperH (UK) Ltd.
> > 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> > T:+44 1454 465658

-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 20:39       ` Joern Rennecke
@ 2002-10-31  1:51         ` Jan Hubicka
  2002-10-31  8:52           ` Joern Rennecke
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-31  1:51 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> Jan Hubicka wrote:
> > Not yet, however I think this is just part of the problem, as reload
> > will offload the register to memory, read it back and clobber the upper
> > part.  I will check
> 
> You'll have to define reload patterns that copy the SSE register into the
> GP register first, than load the value into the desired subreg of the GPR,
Moving SSE into GP is kind of overkill.  You need 4 registers on 5
GPR register machine ;(. Loading two SSE registers into GPR is already
impossible.

Honza
> then write back to the SSE register.
> 	
> -- 
> --------------------------
> SuperH (UK) Ltd.
> 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 20:33     ` Jan Hubicka
@ 2002-10-30 20:39       ` Joern Rennecke
  2002-10-31  1:51         ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Joern Rennecke @ 2002-10-30 20:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

Jan Hubicka wrote:
> Not yet, however I think this is just part of the problem, as reload
> will offload the register to memory, read it back and clobber the upper
> part.  I will check

You'll have to define reload patterns that copy the SSE register into the
GP register first, than load the value into the desired subreg of the GPR,
then write back to the SSE register.
	
-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 20:31   ` Joern Rennecke
@ 2002-10-30 20:33     ` Jan Hubicka
  2002-10-30 20:39       ` Joern Rennecke
  0 siblings, 1 reply; 24+ messages in thread
From: Jan Hubicka @ 2002-10-30 20:33 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> > You can compile any of the simd-*.c testcases from testsuite.
> 
> If I happen to have a freshly build x86 compiler sitting around.  Often
> I do, but not right now.
> 
> > I've sent few emals about previously.
> > We generate instruction dealing with elements of the vector using
> > subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
> > only the second field of the vector.  However the subregs gneerally
> > clobber whole word in GCC and are not allowed in such general forms.
> 
> Reading a sub-word subreg is well-defined and doesn't clobber the register,
Reading is not problem, writing is.
For instance (set (subreg:HI (reg:SI) 0) (const_int 0)) is interpreted
as clear of whole register on i386.

> however, reading a subword subreg that is not the lowpart of a word is
> currently not implemented.
> Clobbering the whole register when you are going to write all the other
> parts subsequently is OK.
> 
> I've fixed expand_vector_unop / expand_vector_binop to use extract_bit_field
> for non-constant input operands, and store_bit_field unless we write the first
> part of a word.
> 
> rtl.texi says:
> 
>  Storing in a non-paradoxical @code{subreg} has undefined results for
>  bits belonging to the same word as the @code{subreg}.  This laxity makes
>  it easier to generate efficient code for such instructions.  To
>  represent an instruction that preserves all the bits outside of those in
>  the @code{subreg}, use @code{strict_low_part} around the @code{subreg}.
> 
> Accordingly, the test to see if we can store directly into a subreg of the
> target uses UNITS_PER_WORD:
> 
>           if (GET_CODE (target) == REG
>               && (BYTES_BIG_ENDIAN
>                   ? subsize < UNITS_PER_WORD
>                   : ((i * subsize) % UNITS_PER_WORD) != 0))
>             t = NULL_RTX;
>           else
>             t = simplify_gen_subreg (submode, target, mode, i * subsize);
> 
> 
> > The SIMD support works for PPC/SPARC as such subregs always simplify to
> > specific register, but for SSE they not.
> 
> I suppose the problem is that the SSE registers are larger than UNITS_PER_WORD,
> and you can't address individual words in the register?
Yes.
> Have you tried defining SECONDARY_*RELOAD_CLASS so that you go through general
> purpose registers in this case?
Not yet, however I think this is just part of the problem, as reload
will offload the register to memory, read it back and clobber the upper
part.  I will check

Honza
> 	
> -- 
> --------------------------
> SuperH (UK) Ltd.
> 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 15:07 ` Jan Hubicka
@ 2002-10-30 20:31   ` Joern Rennecke
  2002-10-30 20:33     ` Jan Hubicka
  2002-10-31 10:53   ` Richard Henderson
  1 sibling, 1 reply; 24+ messages in thread
From: Joern Rennecke @ 2002-10-30 20:31 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

> You can compile any of the simd-*.c testcases from testsuite.

If I happen to have a freshly build x86 compiler sitting around.  Often
I do, but not right now.

> I've sent few emals about previously.
> We generate instruction dealing with elements of the vector using
> subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
> only the second field of the vector.  However the subregs gneerally
> clobber whole word in GCC and are not allowed in such general forms.

Reading a sub-word subreg is well-defined and doesn't clobber the register,
however, reading a subword subreg that is not the lowpart of a word is
currently not implemented.
Clobbering the whole register when you are going to write all the other
parts subsequently is OK.

I've fixed expand_vector_unop / expand_vector_binop to use extract_bit_field
for non-constant input operands, and store_bit_field unless we write the first
part of a word.

rtl.texi says:

 Storing in a non-paradoxical @code{subreg} has undefined results for
 bits belonging to the same word as the @code{subreg}.  This laxity makes
 it easier to generate efficient code for such instructions.  To
 represent an instruction that preserves all the bits outside of those in
 the @code{subreg}, use @code{strict_low_part} around the @code{subreg}.

Accordingly, the test to see if we can store directly into a subreg of the
target uses UNITS_PER_WORD:

          if (GET_CODE (target) == REG
              && (BYTES_BIG_ENDIAN
                  ? subsize < UNITS_PER_WORD
                  : ((i * subsize) % UNITS_PER_WORD) != 0))
            t = NULL_RTX;
          else
            t = simplify_gen_subreg (submode, target, mode, i * subsize);


> The SIMD support works for PPC/SPARC as such subregs always simplify to
> specific register, but for SSE they not.

I suppose the problem is that the SSE registers are larger than UNITS_PER_WORD,
and you can't address individual words in the register?
Have you tried defining SECONDARY_*RELOAD_CLASS so that you go through general
purpose registers in this case?
	
-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Status of SSE builtins
  2002-10-30 14:55 Joern Rennecke
@ 2002-10-30 15:07 ` Jan Hubicka
  2002-10-30 20:31   ` Joern Rennecke
  2002-10-31 10:53   ` Richard Henderson
  0 siblings, 2 replies; 24+ messages in thread
From: Jan Hubicka @ 2002-10-30 15:07 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Jan Hubicka, gcc, rth, bernds, aj

> >  4) Missaligned load/store buitins
> >     The use of missaligned loads/stores results in GCC eventually
> >     keeping the values in register and producing "internal" move for it
> >     resulting in trap.  I am not sure how to model this properly.
> 
> Make the misaligned load explicit as a special operation.
> You can use insv / extv in your pattern, or use two memory references
> that are explicitly aligned upwards / downwards and put the pieces
> together with arithmetic.
I guess I need to use 16 byte memory references to be strictly correct,
or perhaps jus tuse BLKmode memory reference would work.
> 
> > 5) generic SIMD support is quite broken right now as SSE does not
> >     allow scalar opration on elements of vectors registers, like Sparc and
> >     other sane instruction sets most probably do.  I am not quite sure how
> >     to get arround here and I also think the RTL produced is invalid when
> >     dealing with vectors containing elements smaller than word size. 
> 
> Can you give an example?

You can compile any of the simd-*.c testcases from testsuite.
I've sent few emals about previously.
We generate instruction dealing with elements of the vector using
subregs, like (subreg:HI (reg:V4HI) 2) is expected to access and modify
only the second field of the vector.  However the subregs gneerally
clobber whole word in GCC and are not allowed in such general forms.

The SIMD support works for PPC/SPARC as such subregs always simplify to
specific register, but for SSE they not.

Honza
> 
> -- 
> --------------------------
> SuperH (UK) Ltd.
> 2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
> T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Status of SSE builtins
@ 2002-10-30 14:55 Joern Rennecke
  2002-10-30 15:07 ` Jan Hubicka
  0 siblings, 1 reply; 24+ messages in thread
From: Joern Rennecke @ 2002-10-30 14:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, bernds, aj

>  4) Missaligned load/store buitins
>     The use of missaligned loads/stores results in GCC eventually
>     keeping the values in register and producing "internal" move for it
>     resulting in trap.  I am not sure how to model this properly.

Make the misaligned load explicit as a special operation.
You can use insv / extv in your pattern, or use two memory references
that are explicitly aligned upwards / downwards and put the pieces
together with arithmetic.

> 5) generic SIMD support is quite broken right now as SSE does not
>     allow scalar opration on elements of vectors registers, like Sparc and
>     other sane instruction sets most probably do.  I am not quite sure how
>     to get arround here and I also think the RTL produced is invalid when
>     dealing with vectors containing elements smaller than word size. 

Can you give an example?

-- 
--------------------------
SuperH (UK) Ltd.
2410 Aztec West / Almondsbury / BRISTOL / BS32 4QX
T:+44 1454 465658

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2002-10-31 18:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-29 12:16 Status of SSE builtins Jan Hubicka
2002-10-29 13:51 ` Gerald Pfeifer
2002-10-29 15:11   ` Jan Hubicka
2002-10-29 15:40     ` Gerald Pfeifer
2002-10-29 15:44       ` Jan Hubicka
2002-10-29 16:05         ` Gerald Pfeifer
2002-10-29 18:22           ` Jan Hubicka
2002-10-30  9:18             ` Gerald Pfeifer
2002-10-30 10:30               ` Jan Hubicka
2002-10-30 14:55 Joern Rennecke
2002-10-30 15:07 ` Jan Hubicka
2002-10-30 20:31   ` Joern Rennecke
2002-10-30 20:33     ` Jan Hubicka
2002-10-30 20:39       ` Joern Rennecke
2002-10-31  1:51         ` Jan Hubicka
2002-10-31  8:52           ` Joern Rennecke
2002-10-31 10:18             ` Jan Hubicka
2002-10-31 10:53   ` Richard Henderson
2002-10-31 10:54     ` Jan Hubicka
2002-10-31 10:55       ` Richard Henderson
2002-10-31 12:24         ` Aldy Hernandez
2002-10-31 11:00     ` Joern Rennecke
2002-10-31 11:00       ` Richard Henderson
2002-10-31 11:03       ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).