public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/58497] New: SLP vectorizes identical operations
@ 2013-09-22 7:04 glisse at gcc dot gnu.org
2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-09-22 7:04 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
Bug ID: 58497
Summary: SLP vectorizes identical operations
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: glisse at gcc dot gnu.org
typedef float float4 __attribute__((vector_size(16)));
float4 g(int x)
{
float4 W;
W[0]=W[1]=W[2]=W[3]=x+1;
return W;
}
is vectorized by SLP to:
vect_cst_.4_11 = {x_1(D), x_1(D), x_1(D), x_1(D)};
vect__2.3_13 = vect_cst_.4_11 + { 1, 1, 1, 1 };
vect__3.6_14 = (vector(4) floatD.38) vect__2.3_13;
Maybe when a vector is really the same scalar copied into all slots it would be
better not to turn the scalar ops into vector ops? (turning the 4 BIT_FIELD_REF
writes into a constructor is still good though)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
@ 2013-09-23 8:33 ` rguenth at gcc dot gnu.org
2013-09-23 9:03 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-23 8:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2013-09-23
Depends on| |53947
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Heh ;) I suppose this started with BIT_FIELD_REF support in SLP, 4.8 didn't
vectorize this at all.
Note that with for example
typedef float float4 __attribute__((vector_size(16)));
float4 g(int x)
{
float4 W;
W[0]=W[1]=x+1;
W[2]=x+2;
W[3]=x+3;
return W;
}
vectorizing two same operations may be profitable. But yes, if all
scalars are the same there is no point to do it. And the cost model
should have disabled it as well (though likely the four "stores"
made it profitable in the end).
I will have a look at some point.
OTOH generated code is
g:
.LFB0:
.cfi_startproc
movl %edi, -12(%rsp)
movd -12(%rsp), %xmm1
pshufd $0, %xmm1, %xmm0
paddd .LC0(%rip), %xmm0
cvtdq2ps %xmm0, %xmm0
ret
vs. -fno-tree-vectorize:
g:
.LFB0:
.cfi_startproc
xorps %xmm1, %xmm1
addl $1, %edi
xorps %xmm0, %xmm0
cvtsi2ss %edi, %xmm1
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $36, %xmm2, %xmm0
movaps %xmm0, %xmm2
movss %xmm1, %xmm2
shufps $196, %xmm2, %xmm0
movaps %xmm0, %xmm2
unpcklps %xmm0, %xmm0
movss %xmm1, %xmm0
shufps $225, %xmm2, %xmm0
movss %xmm1, %xmm0
ret
so clearly a win, but improvable to sth like
addl $1, %edi
cvtsi2ss %edi, %xmm1
pshufd $0, %xmm1, %xmm0
the above also shows that vector init by BIT_FIELD_REF is not expanded
very well (sth for a generalized vector shuffle recognition in the bswap pass).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
@ 2013-09-23 9:03 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-09-23 9:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 30884
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30884&action=edit
prototype patch
A quick check shows generated code will be
g:
.LFB0:
.cfi_startproc
xorps %xmm0, %xmm0
addl $1, %edi
cvtsi2ss %edi, %xmm0
shufps $0, %xmm0, %xmm0
ret
and the patch shows possible issues with finding an insert location for
the init stmt (otherwise "external" is just outside of the current
basic-block).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
` (2 preceding siblings ...)
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
@ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2021-08-14 23:27 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173
URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev
Log:
2015-10-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/58497
* tree-vect-generic.c (ssa_uniform_vector_p): New helper.
(expand_vector_operations_1): Use it. Lower operations on
all uniform vectors to scalar operations if the HW supports it.
* gcc.dg/tree-ssa/vector-5.c: New testcase.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-generic.c
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
2013-09-23 9:03 ` rguenth at gcc dot gnu.org
@ 2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2021-08-14 23:27 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-10-22 13:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Oct 22 13:36:46 2015
New Revision: 229173
URL: https://gcc.gnu.org/viewcvs?rev=229173&root=gcc&view=rev
Log:
2015-10-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/58497
* tree-vect-generic.c (ssa_uniform_vector_p): New helper.
(expand_vector_operations_1): Use it. Lower operations on
all uniform vectors to scalar operations if the HW supports it.
* gcc.dg/tree-ssa/vector-5.c: New testcase.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/vector-5.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-generic.c
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Now we fix this up in veclower, still the bug should be addressed in SLP
directly
(also because it affects cost decisions).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/58497] SLP vectorizes identical operations
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
` (3 preceding siblings ...)
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
@ 2021-08-14 23:27 ` pinskia at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-14 23:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58497
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Mine for GCC 13, I have patches which turn:
W_6 = BIT_INSERT_EXPR <W_5(D), _2, 96 (32 bits)>;
W_7 = BIT_INSERT_EXPR <W_6, _2, 64 (32 bits)>;
W_8 = BIT_INSERT_EXPR <W_7, _2, 32 (32 bits)>;
W_9 = BIT_INSERT_EXPR <W_8, _2, 0 (32 bits)>;
Into:
W_9 = {_2,_2,_2,_2};
This improvement deals with bitfields but vectors have a similar issue with
Bit_inserts so I deal with it there.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-14 23:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-22 7:04 [Bug tree-optimization/58497] New: SLP vectorizes identical operations glisse at gcc dot gnu.org
2013-09-23 8:33 ` [Bug tree-optimization/58497] " rguenth at gcc dot gnu.org
2013-09-23 9:03 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2015-10-22 13:37 ` rguenth at gcc dot gnu.org
2021-08-14 23:27 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).