public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/52572] New: suboptimal assignment to avx element
@ 2012-03-12 22:50 marc.glisse at normalesup dot org
2012-03-13 7:55 ` [Bug target/52572] " jakub at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-03-12 22:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
Bug #: 52572
Summary: suboptimal assignment to avx element
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: marc.glisse@normalesup.org
For the following program:
#include <x86intrin.h>
__m256d f(__m256d x){
x[0]=0;
return x;
}
gcc -O3 generates:
vmovlpd .LC0(%rip), %xmm0, %xmm1
vinsertf128 $0x0, %xmm1, %ymm0, %ymm0
or with -Os:
vxorps %xmm2, %xmm2, %xmm2
vmovsd %xmm2, %xmm0, %xmm1
vinsertf128 $0x0, %xmm1, %ymm0, %ymm0
If I understand correctly, it first constructs {0,x[1],0,0} and then merges it
with the upper part of x. However, using the legacy movlpd instruction would
avoid zeroing the upper 128 bits and thus the vinsertf128 wouldn't be needed.
Is there a policy not to generate the non-VEX instructions anymore, or is this
a missed optimization?
Setting x[1] is similar. For x[2] or x[3], we get extract+mov+insert, but it
might be better to do something with vblendpd.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/52572] suboptimal assignment to avx element
2012-03-12 22:50 [Bug target/52572] New: suboptimal assignment to avx element marc.glisse at normalesup dot org
@ 2012-03-13 7:55 ` jakub at gcc dot gnu.org
2012-03-13 8:17 ` marc.glisse at normalesup dot org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-13 7:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-13 07:54:14 UTC ---
Have you actually tried that? Mixing VEX encoded insns with legacy encoded
SSE* insns is very costly, for good performance there needs to be a vzeroupper
in between (but then you lose the upper bits). See e.g. 2.8 in the AVX
Programming Reference.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/52572] suboptimal assignment to avx element
2012-03-12 22:50 [Bug target/52572] New: suboptimal assignment to avx element marc.glisse at normalesup dot org
2012-03-13 7:55 ` [Bug target/52572] " jakub at gcc dot gnu.org
@ 2012-03-13 8:17 ` marc.glisse at normalesup dot org
2012-03-13 17:58 ` marc.glisse at normalesup dot org
2021-12-25 22:30 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-03-13 8:17 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
--- Comment #2 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-13 08:16:58 UTC ---
(In reply to comment #1)
> Have you actually tried that?
Ah, no, sorry, I only have occasional access to such a machine to benchmark the
code. From a -Os perspective it is still shorter (but indeed that matters less
to me than -O3 performance).
> Mixing VEX encoded insns with legacy encoded
> SSE* insns is very costly, for good performance there needs to be a vzeroupper
> in between (but then you lose the upper bits). See e.g. 2.8 in the AVX
> Programming Reference.
Thanks, I'd missed that.
The vblendpd solution should still apply (from the initial 'v' it sounds safe),
no?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/52572] suboptimal assignment to avx element
2012-03-12 22:50 [Bug target/52572] New: suboptimal assignment to avx element marc.glisse at normalesup dot org
2012-03-13 7:55 ` [Bug target/52572] " jakub at gcc dot gnu.org
2012-03-13 8:17 ` marc.glisse at normalesup dot org
@ 2012-03-13 17:58 ` marc.glisse at normalesup dot org
2021-12-25 22:30 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: marc.glisse at normalesup dot org @ 2012-03-13 17:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
--- Comment #3 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-13 17:57:58 UTC ---
Or for this variant:
__m256d f(__m256d *y){
__m256d x=*y;
x[0]=0; // or x[3]
return x;
}
it looks like vmaskmovpd could replace:
vmovapd (%rdi), %ymm0
vmovapd %xmm0, %xmm1
vmovlpd .LC0(%rip), %xmm1, %xmm1
vinsertf128 $0x0, %xmm1, %ymm0, %ymm0
(I tried a version with __builtin_shuffle but it wouldn't generate vmaskmovpd
either)
(sorry for the naive suggestions, there are too many possibilities to optimize
them all...)
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/52572] suboptimal assignment to avx element
2012-03-12 22:50 [Bug target/52572] New: suboptimal assignment to avx element marc.glisse at normalesup dot org
` (2 preceding siblings ...)
2012-03-13 17:58 ` marc.glisse at normalesup dot org
@ 2021-12-25 22:30 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-25 22:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Last reconfirmed| |2021-12-25
Target| |x86_64-linux-gnu
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
LLVM produces:
vxorps %xmm1, %xmm1, %xmm1
vblendps $3, %ymm1, %ymm0, %ymm0 # ymm0 =
ymm1[0,1],ymm0[2,3,4,5,6,7]
and
vxorps %xmm0, %xmm0, %xmm0
vblendps $252, (%rdi), %ymm0, %ymm0 # ymm0 =
ymm0[0,1],mem[2,3,4,5,6,7]
Which I suspect is better.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-25 22:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-12 22:50 [Bug target/52572] New: suboptimal assignment to avx element marc.glisse at normalesup dot org
2012-03-13 7:55 ` [Bug target/52572] " jakub at gcc dot gnu.org
2012-03-13 8:17 ` marc.glisse at normalesup dot org
2012-03-13 17:58 ` marc.glisse at normalesup dot org
2021-12-25 22:30 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).