public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/60826] New: inefficient code for vector xor on SSE2
@ 2014-04-11 18:09 sunfish at mozilla dot com
2014-04-14 9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: sunfish at mozilla dot com @ 2014-04-11 18:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826
Bug ID: 60826
Summary: inefficient code for vector xor on SSE2
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: sunfish at mozilla dot com
On the following C testcase:
#include <stdint.h>
typedef double v2f64 __attribute__((__vector_size__(16), may_alias));
typedef int64_t v2i64 __attribute__((__vector_size__(16), may_alias));
static inline v2f64 f_and (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l &
(v2i64)r); }
static inline v2f64 f_xor (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l ^
(v2i64)r); }
static inline double vector_to_scalar(v2f64 v) { return v[0]; }
double test(v2f64 w, v2f64 x, v2f64 z)
{
v2f64 y = f_and(w, x);
return vector_to_scalar(f_xor(z, y));
}
GCC emits this code:
andpd %xmm1, %xmm0
movdqa %xmm0, %xmm3
pxor %xmm2, %xmm3
movdqa %xmm3, -24(%rsp)
movsd -24(%rsp), %xmm0
ret
GCC should move the result of the xor to the return register directly instead
of spilling it. Also, it should avoid the first movdqa, which is an unnecessary
copy.
Also, this should ideally use xorpd instead of pxor, to avoid a domain-crossing
penalty on Nehalem and other micro-architectures (or xorps if domain-crossing
doesn't matter, since its smaller).
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/60826] inefficient code for vector xor on SSE2
2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
@ 2014-04-14 9:46 ` rguenth at gcc dot gnu.org
2014-04-14 16:47 ` sunfish at mozilla dot com
2021-07-26 22:03 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-14 9:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization, ra
Target| |x86_64-*-*
Status|UNCONFIRMED |NEW
Last reconfirmed| |2014-04-14
CC| |vmakarov at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/60826] inefficient code for vector xor on SSE2
2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
2014-04-14 9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
@ 2014-04-14 16:47 ` sunfish at mozilla dot com
2021-07-26 22:03 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: sunfish at mozilla dot com @ 2014-04-14 16:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826
--- Comment #2 from Dan Gohman <sunfish at mozilla dot com> ---
A little more detail: I think I have seen GCC use a spill + movsd reload as a
method of zeroing the non-zero-index vector elements of an xmm register,
however that's either not what's happening here, or it may be happening when it
isn't needed.
I think the x86-64 ABI doesn't require the unused parts of an xmm return
register to be zeroed, but even if it does, I can also reproduce the
unnecessary spill and reload when I modify the test function above to this:
void test(v2f64 w, v2f64 x, v2f64 z, double *p)
{
v2f64 y = f_and(w, x);
*p = vector_to_scalar(f_xor(z, y));
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/60826] inefficient code for vector xor on SSE2
2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
2014-04-14 9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
2014-04-14 16:47 ` sunfish at mozilla dot com
@ 2021-07-26 22:03 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26 22:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Target Milestone|--- |6.0
Resolution|--- |FIXED
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed fully in GCC 6+ in there is no extra move or going through memory.
In GCC 5, there is an extra move but no going through memory.
the RTL changed in GCC 7+ to use vec_select which fixes the problem without a
register allocation issue with tieing TI and DF modes.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-26 22:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
2014-04-14 9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
2014-04-14 16:47 ` sunfish at mozilla dot com
2021-07-26 22:03 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).