* RFC: new rtl vec_set_unit/vec_get_unit @ 2003-03-27 23:04 Aldy Hernandez 2003-03-28 0:03 ` Jan Hubicka 0 siblings, 1 reply; 4+ messages in thread From: Aldy Hernandez @ 2003-03-27 23:04 UTC (permalink / raw) To: GCC Mailinglist; +Cc: Jan Hubicka, Richard Henderson I can't seem to find the original thread on the GCC archive, but... there was a discussion a while back between Jan, Richard, and me about subregs of SIMD types creating bogus code. Particularly, when we have a hard register, both of the following snippets end up referencing r0 because we have no way of distinguishing the upper and the lower halves: (set (subreg:SI (reg:V2SI r0) 0) (reg:SI xx)) (set (subreg:SI (reg:V2SI r0) 4) (reg:SI xx)) It was suggested that we add new RTL code to deal with this, but the exact semantics had not been proposed. I'm taking this up again, and here is the proposed syntax: (vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5)) and (set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1)) Then, the expanders: (define_expand "vec_set_unitv2si" (set (match_operand:V2SI 0) (vec_set_unit:V2SI (match_operand:V2SI 1) (match_operand 2 immediate) (match_operand:SI 3))) and... (define_expand "vec_get_unitv2si" [(set (match_operand:SI 0) (vec_get_unit:SI (match_operand:V2SI 1) (match_operand:SI 2)))] I think it's all pretty clear. If no one objects as to the syntax, I'll start hacking away. Aldy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: new rtl vec_set_unit/vec_get_unit 2003-03-27 23:04 RFC: new rtl vec_set_unit/vec_get_unit Aldy Hernandez @ 2003-03-28 0:03 ` Jan Hubicka 2003-03-31 20:03 ` Aldy Hernandez 0 siblings, 1 reply; 4+ messages in thread From: Jan Hubicka @ 2003-03-28 0:03 UTC (permalink / raw) To: Aldy Hernandez; +Cc: GCC Mailinglist, Jan Hubicka, Richard Henderson > I can't seem to find the original thread on the GCC archive, but... > there was a discussion a while back between Jan, Richard, and me about > subregs of SIMD types creating bogus code. > > Particularly, when we have a hard register, both of the following > snippets end up referencing r0 because we have no way of distinguishing > the upper and the lower halves: > > (set (subreg:SI (reg:V2SI r0) 0) (reg:SI xx)) > (set (subreg:SI (reg:V2SI r0) 4) (reg:SI xx)) > > It was suggested that we add new RTL code to deal with this, but the > exact semantics had not been proposed. I'm taking this up again, and > here is the proposed syntax: > > (vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5)) > > and > > (set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1)) > > Then, the expanders: > > (define_expand "vec_set_unitv2si" > (set (match_operand:V2SI 0) > (vec_set_unit:V2SI (match_operand:V2SI 1) > (match_operand 2 immediate) > (match_operand:SI 3))) > > and... > > (define_expand "vec_get_unitv2si" > [(set (match_operand:SI 0) > (vec_get_unit:SI (match_operand:V2SI 1) > (match_operand:SI 2)))] > > I think it's all pretty clear. If no one objects as to the syntax, > I'll start hacking away. This is still something I would like to look into. The expanders to get/set pariticular fields of the vector looks like obvious sollution. However the problem is that the code generated for SSE would be ugly, especially when taking into account V16QImode where to access paritcular mode number of rotations on different temporaries needs to be made. Most of the time we need to get/set all the fields of vector at once (to simulate vector operation) so perhaps we should have both. We probably need both mechanizms as in some cases it is deifnitly desirable to access particular fields of the vector. Also vec_set_unit/vec_get_unit can be expanded into vec_select/vec_duplicate operations so there is probably no need to invent the RTL construct for that, we only need the named patterns. Honza > > Aldy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: new rtl vec_set_unit/vec_get_unit 2003-03-28 0:03 ` Jan Hubicka @ 2003-03-31 20:03 ` Aldy Hernandez 2003-04-01 15:46 ` Jan Hubicka 0 siblings, 1 reply; 4+ messages in thread From: Aldy Hernandez @ 2003-03-31 20:03 UTC (permalink / raw) To: Jan Hubicka; +Cc: GCC Mailinglist, Richard Henderson Quoting the context... >> It was suggested that we add new RTL code to deal with this, but the >> exact semantics had not been proposed. I'm taking this up again, and >> here is the proposed syntax: >> >> (vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5)) >> >> and >> >> (set (reg:SI r88) (vec_get_unit:SI (reg:V2SI r9) 1)) >> >> Then, the expanders: >> >> (define_expand "vec_set_unitv2si" >> (set (match_operand:V2SI 0) >> (vec_set_unit:V2SI (match_operand:V2SI 1) >> (match_operand 2 immediate) >> (match_operand:SI 3))) >> >> and... >> >> (define_expand "vec_get_unitv2si" >> [(set (match_operand:SI 0) >> (vec_get_unit:SI (match_operand:V2SI 1) >> (match_operand:SI 2)))] >> >> I think it's all pretty clear. If no one objects as to the syntax, >> I'll start hacking away. > > This is still something I would like to look into. The expanders to > get/set pariticular fields of the vector looks like obvious sollution. "Look into", as in you're volunteering? I'd gladly tackle other things. Let me know. > However the problem is that the code generated for SSE would be ugly, > especially when taking into account V16QImode where to access > paritcular > mode number of rotations on different temporaries needs to be made. > > Most of the time we need to get/set all the fields of vector at once > (to > simulate vector operation) so perhaps we should have both. > We probably need both mechanizms as in some cases it is deifnitly > desirable to access particular fields of the vector. Example? > Also vec_set_unit/vec_get_unit can be expanded into > vec_select/vec_duplicate operations so there is probably no need to > invent the RTL construct for that, we only need the named patterns. Ok I see you're using vec_select in the x86 backend to get to a particular element. This is definitely better than my approach, but I suggest we document it. How does this look for extraction?: (set (match_operand:SI 99) (vec_select:SI (match_operand:V4SI 3)) However... how do you suggest we do the set operation, as in setting an element of vector to a particular value. ?? You can't use vec_select as a left hand value without major surgery, pretty much every place we handle zero_extract, sign_extract, subreg, and strict_low_part. For the set, I was suggesting completely different rtl, ala: > (vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5)) > > Then, the expanders: > > (define_expand "vec_set_unitv2si" > (set (match_operand:V2SI 0) > (vec_set_unit:V2SI (match_operand:V2SI 1) > (match_operand 2 immediate) > (match_operand:SI 3))) Aldy ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: RFC: new rtl vec_set_unit/vec_get_unit 2003-03-31 20:03 ` Aldy Hernandez @ 2003-04-01 15:46 ` Jan Hubicka 0 siblings, 0 replies; 4+ messages in thread From: Jan Hubicka @ 2003-04-01 15:46 UTC (permalink / raw) To: Aldy Hernandez; +Cc: Jan Hubicka, GCC Mailinglist, Richard Henderson > > Example? Imagine expanding a*b into sequence of vec_extract a[1] vec_extract b[1] mult vec_set b[1] vec_extract a[2] vec_extract b[2] mult vec_set b[2] ..... To extract field 6 of V16QImode vector you need to first rotate around the V4SImode, then move lowpart of V4SImode into SImode register and rotate the SImode register to get the 6th field first. To save the value you need about the same. However the fastest way to do this in this particular case on Athlon to move everything into memory and do it in memory (this is not the case for Pentium4, where the fastest way is to use the rotations but one has to optimize since as you need only one rotation per field and one SSE->integer reg move per 4 fields. This is also not the case of V4SFmode. To extrace 3rd field of V4SF you still need two rotations (I believe) and two temporaries, while the whole operation can be done without them like: a[0]=a[0]*b[0] (there is such instruction) rotate a by 4 rotate b by 4 a[0]=a[0]*b[0] rotate a by 4 rotate b by 4 a[0]=a[0]*b[0] rotate a by 4 rotate b by 4 a[0]=a[0]*b[0] It would be nice to have expanders smart enought to be able to do such tricks (offloading the memory, doing the computations with rotated sources and destinations or extracting fields smartly reusing rotated temporaries) I am not sure about sane API to get this, but it is definitly important for the perofmrance. I was even thinking about something like we do for call expansion - an target hooks that will have three functions - initialize (receiving the vector and flag whether we are going to read, write or read/write the operand, it would return target specific structure) - advance - this one will switch to next field in target specific order - extract (this one will get the current field) - set (this one will set the field This way we can write everything we need in the i386, however I am not sure whether is not too overenginered and it does not fit the GCC design very well. Since it is allowed to modify the input operands, it must be initialized exactly once per each register making it unconfortable for the midleend. Perhaps plain flag to let middleend to choose one of the three methods - extracting field by field, rotating and modifying the first field, offloading everything to memory and provide expanders for that. > > >Also vec_set_unit/vec_get_unit can be expanded into > >vec_select/vec_duplicate operations so there is probably no need to > >invent the RTL construct for that, we only need the named patterns. > > Ok I see you're using vec_select in the x86 backend to get to a > particular element. This is definitely better than my approach, but I > suggest we document it. > > How does this look for extraction?: > > (set (match_operand:SI 99) > (vec_select:SI (match_operand:V4SI 3)) > > However... how do you suggest we do the set operation, as in setting an > element of vector to a particular value. ?? You can't use vec_select You need something like (define_insn "sse_loadss_1" [(set (match_operand:V4SF 0 "register_operand" "=x") (vec_merge:V4SF (vec_duplicate:V4SF (match_operand:SF 1 "memory_operand" "m")) (match_operand:V4SF 2 "const0_operand" "X") (const_int 1)))] "TARGET_SSE" "movss\t{%1, %0|%0, %1}" [(set_attr "type" "ssemov") (set_attr "mode" "SF")]) > as a left hand value without major surgery, pretty much every place we > handle zero_extract, sign_extract, subreg, and strict_low_part. I don't think it is sane to have these and I would not like adding similar construct. It makes number of problems everywhere (SSA form, dependency analysis and so on) - having plain expression to merge both values into result is much more convenient to operate on. Honza > > For the set, I was suggesting completely different rtl, ala: > > > (vec_set_unit:SI (reg:V2SI r9) 1 (reg:SI r5)) > > > >Then, the expanders: > > > >(define_expand "vec_set_unitv2si" > > (set (match_operand:V2SI 0) > > (vec_set_unit:V2SI (match_operand:V2SI 1) > > (match_operand 2 immediate) > > (match_operand:SI 3))) > > Aldy ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-04-01 14:36 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-03-27 23:04 RFC: new rtl vec_set_unit/vec_get_unit Aldy Hernandez 2003-03-28 0:03 ` Jan Hubicka 2003-03-31 20:03 ` Aldy Hernandez 2003-04-01 15:46 ` Jan Hubicka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).