public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/65847] SSE2 code for adding two structs is much worse at -O3 than at -O2
[not found] <bug-65847-4@http.gcc.gnu.org/bugzilla/>
@ 2015-04-22 14:03 ` rguenth at gcc dot gnu.org
2021-03-24 12:57 ` rguenth at gcc dot gnu.org
1 sibling, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-04-22 14:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65847
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target| |x86_64-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-04-22
CC| |rguenth at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. The issue is that the vectorizer thinks x and y reside in memory
and thus it vectorizes the code as
<bb 2>:
vect__2.5_11 = MEM[(double *)&x];
vect__3.8_13 = MEM[(double *)&y];
vect__4.9_14 = vect__2.5_11 + vect__3.8_13;
MEM[(double *)&D.1840] = vect__4.9_14;
return D.1840;
which looks good. But now comes the ABI and passes x, y and the return
value in registers ...
But even the best vectorized sequence would have four stmts - two to
pack arguments into vector registers, one add and one upack for the
return value.
Thus it seems the vectorizer should be informed of this ABI detail
or simply as heuristic never consider function arguments "memory"
it can perform vector loads on (which probably means to disable
group analysis on them?).
On i?86 with SSE2 we get
movupd 8(%esp), %xmm1
movl 4(%esp), %eax
movupd 24(%esp), %xmm0
addpd %xmm1, %xmm0
movups %xmm0, (%eax)
vs.
movsd 16(%esp), %xmm0
movl 4(%esp), %eax
movsd 8(%esp), %xmm1
addsd 32(%esp), %xmm0
addsd 24(%esp), %xmm1
movsd %xmm0, 8(%eax)
movsd %xmm1, (%eax)
which eventually looks even profitable (with -mfpmath=sse).
So a simple heuristic might pessimize things too much.
Replicating calls.c code to compute how the arguments are passed sounds
odd though...
Eventually the target can pessimize the loads in the target cost model
though (at least it can perform a more reasonable "heuristic").
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug target/65847] SSE2 code for adding two structs is much worse at -O3 than at -O2
[not found] <bug-65847-4@http.gcc.gnu.org/bugzilla/>
2015-04-22 14:03 ` [Bug target/65847] SSE2 code for adding two structs is much worse at -O3 than at -O2 rguenth at gcc dot gnu.org
@ 2021-03-24 12:57 ` rguenth at gcc dot gnu.org
1 sibling, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-03-24 12:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65847
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Similarly
struct X { int a; int b; int c; int d; };
struct X foo (struct X x, struct X y)
{
struct X res;
res.a = x.a + y.a;
res.b = x.b + y.b;
res.c = x.c + y.c;
res.d = x.d + y.d;
return res;
}
is vectorized as
foo:
.LFB0:
.cfi_startproc
movq %rdi, -40(%rsp)
movq %rsi, -32(%rsp)
movdqa -40(%rsp), %xmm0
movq %rdx, -24(%rsp)
movq %rcx, -16(%rsp)
paddd -24(%rsp), %xmm0
movaps %xmm0, -40(%rsp)
movq -40(%rsp), %rax
movq -32(%rsp), %rdx
ret
which is bad because the on-stack construction of %xmm0 causes a STLF fail.
Unvectorized code isn't necessarily worse, but the vectorized sequence
can be improved
foo:
.LFB0:
.cfi_startproc
movq %rdi, %rax
movq %rdi, %r10
movq %rdx, %rdi
movq %rsi, %r9
sarq $32, %r10
sarq $32, %rdi
addl %edx, %eax
movq %rcx, %r8
addl %r10d, %edi
sarq $32, %r9
movl %eax, %eax
leal (%rsi,%rcx), %edx
movl %edi, %edi
sarq $32, %r8
salq $32, %rdi
orq %rdi, %rax
leal (%r9,%r8), %edi
salq $32, %rdi
orq %rdi, %rdx
ret
in this case the spill is caused by LRA not knowing how to re-load
the TImode reg build by pieces by the RTL expansion code.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-03-24 12:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-65847-4@http.gcc.gnu.org/bugzilla/>
2015-04-22 14:03 ` [Bug target/65847] SSE2 code for adding two structs is much worse at -O3 than at -O2 rguenth at gcc dot gnu.org
2021-03-24 12:57 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).