public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/58497] New: SLP vectorizes identical operations @ 2013-09-22 7:04 glisse at gcc dot gnu.org 2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: glisse at gcc dot gnu.org @ 2013-09-22 7:04 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 Bug ID: 58497 Summary: SLP vectorizes identical operations Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org typedef float float4 __attribute__((vector_size(16))); float4 g(int x) { float4 W; W[0]=W[1]=W[2]=W[3]=x+1; return W; } is vectorized by SLP to: vect_cst_.4_11 = {x_1(D), x_1(D), x_1(D), x_1(D)}; vect__2.3_13 = vect_cst_.4_11 + { 1, 1, 1, 1 }; vect__3.6_14 = (vector(4) floatD.38) vect__2.3_13; Maybe when a vector is really the same scalar copied into all slots it would be better not to turn the scalar ops into vector ops? (turning the 4 BIT_FIELD_REF writes into a constructor is still good though) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org @ 2013-09-23 8:33 ` rguenth at gcc dot gnu.org 2013-09-23 9:03 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2013-09-23 8:33 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Target| |x86_64-*-* Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2013-09-23 Depends on| |53947 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Heh ;) I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't vectorize this at all. Note that with for example typedef float float4 __attribute__((vector_size(16))); float4 g(int x) { float4 W; W[0]=W[1]=x+1; W[2]=x+2; W[3]=x+3; return W; } vectorizing two same operations may be profitable. But yes, if all scalars are the same there is no point to do it. And the cost model should have disabled it as well (though likely the four "stores" made it profitable in the end). I will have a look at some point. OTOH generated code is g: .LFB0: .cfi_startproc movl %edi, -12(%rsp) movd -12(%rsp), %xmm1 pshufd $0, %xmm1, %xmm0 paddd .LC0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 ret vs. -fno-tree-vectorize: g: .LFB0: .cfi_startproc xorps %xmm1, %xmm1 addl $1, %edi xorps %xmm0, %xmm0 cvtsi2ss %edi, %xmm1 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $36, %xmm2, %xmm0 movaps %xmm0, %xmm2 movss %xmm1, %xmm2 shufps $196, %xmm2, %xmm0 movaps %xmm0, %xmm2 unpcklps %xmm0, %xmm0 movss %xmm1, %xmm0 shufps $225, %xmm2, %xmm0 movss %xmm1, %xmm0 ret so clearly a win, but improvable to sth like addl $1, %edi cvtsi2ss %edi, %xmm1 pshufd $0, %xmm1, %xmm0 the above also shows that vector init by BIT_FIELD_REF is not expanded very well (sth for a generalized vector shuffle recognition in the bswap pass). ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org 2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org @ 2013-09-23 9:03 ` rguenth at gcc dot gnu.org 2015-10-22 13:37 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2013-09-23 9:03 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 30884 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30884&action=edit prototype patch A quick check shows generated code will be g: .LFB0: .cfi_startproc xorps %xmm0, %xmm0 addl $1, %edi cvtsi2ss %edi, %xmm0 shufps $0, %xmm0, %xmm0 ret and the patch shows possible issues with finding an insert location for the init stmt (otherwise "external" is just outside of the current basic-block). ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org 2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org 2013-09-23 9:03 ` rguenth at gcc dot gnu.org @ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org 2015-10-22 13:37 ` rguenth at gcc dot gnu.org 2021-08-14 23:27 ` pinskia at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Author: rguenth Date: Thu Oct 22 13:36:46 2015 New Revision: 229173 URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev Log: 2015-10-22 Richard Biener <rguenther@suse.de> PR tree-optimization/58497 * tree-vect-generic.c (ssa_uniform_vector_p): New helper. (expand_vector_operations_1): Use it. Lower operations on all uniform vectors to scalar operations if the HW supports it. * gcc.dg/tree-ssa/vector-5.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-generic.c --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Now we fix this up in veclower, still the bug should be addressed in SLP directly (also because it affects cost decisions). ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org ` (2 preceding siblings ...) 2015-10-22 13:37 ` rguenth at gcc dot gnu.org @ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org 2021-08-14 23:27 ` pinskia at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Author: rguenth Date: Thu Oct 22 13:36:46 2015 New Revision: 229173 URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev Log: 2015-10-22 Richard Biener <rguenther@suse.de> PR tree-optimization/58497 * tree-vect-generic.c (ssa_uniform_vector_p): New helper. (expand_vector_operations_1): Use it. Lower operations on all uniform vectors to scalar operations if the HW supports it. * gcc.dg/tree-ssa/vector-5.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-generic.c --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Now we fix this up in veclower, still the bug should be addressed in SLP directly (also because it affects cost decisions). ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org ` (3 preceding siblings ...) 2015-10-22 13:37 ` rguenth at gcc dot gnu.org @ 2021-08-14 23:27 ` pinskia at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: pinskia at gcc dot gnu.org @ 2021-08-14 23:27 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Mine for GCC 13, I have patches which turn: W_6 = BIT_INSERT_EXPR <W_5(D), _2, 96 (32 bits)>; W_7 = BIT_INSERT_EXPR <W_6, _2, 64 (32 bits)>; W_8 = BIT_INSERT_EXPR <W_7, _2, 32 (32 bits)>; W_9 = BIT_INSERT_EXPR <W_8, _2, 0 (32 bits)>; Into: W_9 = {_2,_2,_2,_2}; This improvement deals with bitfields but vectors have a similar issue with Bit_inserts so I deal with it there. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-14 23:27 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org 2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org 2013-09-23 9:03 ` rguenth at gcc dot gnu.org 2015-10-22 13:37 ` rguenth at gcc dot gnu.org 2015-10-22 13:37 ` rguenth at gcc dot gnu.org 2021-08-14 23:27 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).