RFC: new rtl vec_set_unit/vec_get

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RFC: new rtl vec_set_unit/vec_get_unit
@ 2003-03-27 23:04 Aldy Hernandez
  2003-03-28  0:03 ` Jan Hubicka
  0 siblings, 1 reply; 4+ messages in thread
From: Aldy Hernandez @ 2003-03-27 23:04 UTC (permalink / raw)
  To: GCC Mailinglist; +Cc: Jan Hubicka, Richard Henderson

I can't seem to find the original thread on the GCC archive, but... 
there was a discussion a while back between Jan, Richard, and me about 
subregs of SIMD types creating bogus code.

Particularly, when we have a hard register, both of the following 
snippets end up referencing r0 because we have no way of distinguishing 
the upper and the lower halves:

	(set (subreg:SI (reg:V2SI r0) 0) (reg:SI xx))
	(set (subreg:SI (reg:V2SI r0) 4) (reg:SI xx))

It was suggested that we add new RTL code to deal with this, but the 
exact semantics had not been proposed.  I'm taking this up again, and 
here is the proposed syntax:

	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))

and

	(set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1))

Then, the expanders:

(define_expand "vec_set_unitv2si"
	(set (match_operand:V2SI 0)
	     (vec_set_unit:V2SI (match_operand:V2SI 1)
				(match_operand 2 immediate)
				(match_operand:SI 3)))

and...

(define_expand "vec_get_unitv2si"
   [(set (match_operand:SI 0)
	    (vec_get_unit:SI (match_operand:V2SI 1)
						 (match_operand:SI 2)))]

I think it's all pretty clear.  If no one objects as to the syntax, 
I'll start hacking away.

Aldy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: new rtl vec_set_unit/vec_get_unit
  2003-03-27 23:04 RFC: new rtl vec_set_unit/vec_get_unit Aldy Hernandez
@ 2003-03-28  0:03 ` Jan Hubicka
  2003-03-31 20:03   ` Aldy Hernandez
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Hubicka @ 2003-03-28  0:03 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: GCC Mailinglist, Jan Hubicka, Richard Henderson

> I can't seem to find the original thread on the GCC archive, but... 
> there was a discussion a while back between Jan, Richard, and me about 
> subregs of SIMD types creating bogus code.
> 
> Particularly, when we have a hard register, both of the following 
> snippets end up referencing r0 because we have no way of distinguishing 
> the upper and the lower halves:
> 
> 	(set (subreg:SI (reg:V2SI r0) 0) (reg:SI xx))
> 	(set (subreg:SI (reg:V2SI r0) 4) (reg:SI xx))
> 
> It was suggested that we add new RTL code to deal with this, but the 
> exact semantics had not been proposed.  I'm taking this up again, and 
> here is the proposed syntax:
> 
> 	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))
> 
> and
> 
> 	(set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1))
> 
> Then, the expanders:
> 
> (define_expand "vec_set_unitv2si"
> 	(set (match_operand:V2SI 0)
> 	     (vec_set_unit:V2SI (match_operand:V2SI 1)
> 				(match_operand 2 immediate)
> 				(match_operand:SI 3)))
> 
> and...
> 
> (define_expand "vec_get_unitv2si"
>   [(set (match_operand:SI 0)
> 	    (vec_get_unit:SI (match_operand:V2SI 1)
> 						 (match_operand:SI 2)))]
> 
> I think it's all pretty clear.  If no one objects as to the syntax, 
> I'll start hacking away.

This is still something I would like to look into.  The expanders to
get/set pariticular fields of the vector looks like obvious sollution.
However the problem is that the code generated for SSE would be ugly,
especially when taking into account V16QImode where to access paritcular
mode number of rotations on different temporaries needs to be made.

Most of the time we need to get/set all the fields of vector at once (to
simulate vector operation) so perhaps we should have both.
We probably need both mechanizms as in some cases it is deifnitly
desirable to access particular fields of the vector.

Also vec_set_unit/vec_get_unit can be expanded into
vec_select/vec_duplicate operations so there is probably no need to
invent the RTL construct for that, we only need the named patterns.

Honza
> 
> Aldy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: new rtl vec_set_unit/vec_get_unit
  2003-03-28  0:03 ` Jan Hubicka
@ 2003-03-31 20:03   ` Aldy Hernandez
  2003-04-01 15:46     ` Jan Hubicka
  0 siblings, 1 reply; 4+ messages in thread
From: Aldy Hernandez @ 2003-03-31 20:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Mailinglist, Richard Henderson

Quoting the context...

>> It was suggested that we add new RTL code to deal with this, but the
>> exact semantics had not been proposed.  I'm taking this up again, and
>> here is the proposed syntax:
>>
>> 	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))
>>
>> and
>>
>> 	(set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1))
>>
>> Then, the expanders:
>>
>> (define_expand "vec_set_unitv2si"
>> 	(set (match_operand:V2SI 0)
>> 	     (vec_set_unit:V2SI (match_operand:V2SI 1)
>> 				(match_operand 2 immediate)
>> 				(match_operand:SI 3)))
>>
>> and...
>>
>> (define_expand "vec_get_unitv2si"
>>   [(set (match_operand:SI 0)
>> 	    (vec_get_unit:SI (match_operand:V2SI 1)
>> 						 (match_operand:SI 2)))]
>>
>> I think it's all pretty clear.  If no one objects as to the syntax,
>> I'll start hacking away.
>
> This is still something I would like to look into.  The expanders to
> get/set pariticular fields of the vector looks like obvious sollution.

"Look into", as in you're volunteering?  I'd gladly tackle other 
things.  Let me know.

> However the problem is that the code generated for SSE would be ugly,
> especially when taking into account V16QImode where to access 
> paritcular
> mode number of rotations on different temporaries needs to be made.
>
> Most of the time we need to get/set all the fields of vector at once 
> (to
> simulate vector operation) so perhaps we should have both.
> We probably need both mechanizms as in some cases it is deifnitly
> desirable to access particular fields of the vector.

Example?

> Also vec_set_unit/vec_get_unit can be expanded into
> vec_select/vec_duplicate operations so there is probably no need to
> invent the RTL construct for that, we only need the named patterns.

Ok I see you're using vec_select in the x86 backend to get to a 
particular element.  This is definitely better than my approach, but I 
suggest we document it.

How does this look for extraction?:

(set (match_operand:SI 99)
      (vec_select:SI (match_operand:V4SI 3))

However... how do you suggest we do the set operation, as in setting an 
element of vector to a particular value. ??  You can't use vec_select 
as a left hand value without major surgery, pretty much every place we 
handle zero_extract, sign_extract, subreg, and strict_low_part.

For the set, I was suggesting completely different rtl, ala:

> 	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))
>
> Then, the expanders:
>
> (define_expand "vec_set_unitv2si"
> 	(set (match_operand:V2SI 0)
> 	     (vec_set_unit:V2SI (match_operand:V2SI 1)
> 				(match_operand 2 immediate)
> 				(match_operand:SI 3)))

Aldy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: new rtl vec_set_unit/vec_get_unit
  2003-03-31 20:03   ` Aldy Hernandez
@ 2003-04-01 15:46     ` Jan Hubicka
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Hubicka @ 2003-04-01 15:46 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Jan Hubicka, GCC Mailinglist, Richard Henderson

> 
> Example?

Imagine expanding a*b into sequence of

 vec_extract a[1]
 vec_extract b[1]
 mult
 vec_set b[1]
 vec_extract a[2]
 vec_extract b[2]
 mult
 vec_set b[2]
 .....

To extract field 6 of V16QImode vector you need to first rotate around
the V4SImode, then move lowpart of V4SImode into SImode register and
rotate the SImode register to get the 6th field first.
To save the value you need about the same.

However the fastest way to do this in this particular case on Athlon to
move everything into memory and do it in memory (this is not the case
for Pentium4, where the fastest way is to use the rotations but one has
to optimize since as you need only one rotation per field and one
SSE->integer reg move per 4 fields.

This is also not the case of V4SFmode.  To extrace 3rd field of V4SF you
still need two rotations (I believe) and two temporaries, while the
whole operation can be done without them like:

a[0]=a[0]*b[0]  (there is such instruction)
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]
rotate a by 4
rotate b by 4
a[0]=a[0]*b[0]

It would be nice to have expanders smart enought to be able to do such
tricks (offloading the memory, doing the computations with rotated
sources and destinations or extracting fields smartly reusing rotated
temporaries)
I am not sure about sane API to get this, but it is definitly important
for the perofmrance.

I was even thinking about something like we do for call expansion - an
target hooks that will have three functions
 - initialize (receiving the vector and flag whether we are going to
   read, write or read/write the operand, it would return target
   specific structure)
 - advance - this one will switch to next field in target specific order
 - extract (this one will get the current field)
 - set (this one will set the field

This way we can write everything we need in the i386, however I am not
sure whether is not too overenginered and it does not fit the GCC design
very well.  Since it is allowed to modify the input operands, it must be
initialized exactly once per each register making it unconfortable for
the midleend.

Perhaps plain flag to let middleend to choose one of the three methods -
extracting field by field,  rotating and modifying the first field,
offloading everything to memory and provide expanders for that.
> 
> >Also vec_set_unit/vec_get_unit can be expanded into
> >vec_select/vec_duplicate operations so there is probably no need to
> >invent the RTL construct for that, we only need the named patterns.
> 
> Ok I see you're using vec_select in the x86 backend to get to a 
> particular element.  This is definitely better than my approach, but I 
> suggest we document it.
> 
> How does this look for extraction?:
> 
> (set (match_operand:SI 99)
>      (vec_select:SI (match_operand:V4SI 3))
> 
> However... how do you suggest we do the set operation, as in setting an 
> element of vector to a particular value. ??  You can't use vec_select 

You need something like

(define_insn "sse_loadss_1"
  [(set (match_operand:V4SF 0 "register_operand" "=x")
	(vec_merge:V4SF
	 (vec_duplicate:V4SF (match_operand:SF 1 "memory_operand" "m"))
	 (match_operand:V4SF 2 "const0_operand" "X")
	 (const_int 1)))]
  "TARGET_SSE"
  "movss\t{%1, %0|%0, %1}"
  [(set_attr "type" "ssemov")
   (set_attr "mode" "SF")])
> as a left hand value without major surgery, pretty much every place we 
> handle zero_extract, sign_extract, subreg, and strict_low_part.

I don't think it is sane to have these and I would not like adding
similar construct.  It makes number of problems everywhere (SSA form,
dependency analysis and so on) - having plain expression to merge both
values into result is much more convenient to operate on.

Honza
> 
> For the set, I was suggesting completely different rtl, ala:
> 
> >	(vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5))
> >
> >Then, the expanders:
> >
> >(define_expand "vec_set_unitv2si"
> >	(set (match_operand:V2SI 0)
> >	     (vec_set_unit:V2SI (match_operand:V2SI 1)
> >				(match_operand 2 immediate)
> >				(match_operand:SI 3)))
> 
> Aldy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2003-04-01 14:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-27 23:04 RFC: new rtl vec_set_unit/vec_get_unit Aldy Hernandez
2003-03-28  0:03 ` Jan Hubicka
2003-03-31 20:03   ` Aldy Hernandez
2003-04-01 15:46     ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).