public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/60826] New: inefficient code for vector xor on SSE2
@ 2014-04-11 18:09 sunfish at mozilla dot com
  2014-04-14  9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: sunfish at mozilla dot com @ 2014-04-11 18:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

            Bug ID: 60826
           Summary: inefficient code for vector xor on SSE2
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: sunfish at mozilla dot com

On the following C testcase:

#include <stdint.h>

typedef double v2f64 __attribute__((__vector_size__(16), may_alias));
typedef int64_t v2i64 __attribute__((__vector_size__(16), may_alias));

static inline v2f64 f_and   (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l &
(v2i64)r); }
static inline v2f64 f_xor   (v2f64 l, v2f64 r) { return (v2f64)((v2i64)l ^
(v2i64)r); }
static inline double vector_to_scalar(v2f64 v) { return v[0]; }

double test(v2f64 w, v2f64 x, v2f64 z)
{
    v2f64 y = f_and(w, x);

    return vector_to_scalar(f_xor(z, y));
}

GCC emits this code:

    andpd    %xmm1, %xmm0
    movdqa    %xmm0, %xmm3
    pxor    %xmm2, %xmm3
    movdqa    %xmm3, -24(%rsp)
    movsd    -24(%rsp), %xmm0
    ret

GCC should move the result of the xor to the return register directly instead
of spilling it. Also, it should avoid the first movdqa, which is an unnecessary
copy.

Also, this should ideally use xorpd instead of pxor, to avoid a domain-crossing
penalty on Nehalem and other micro-architectures (or xorps if domain-crossing
doesn't matter, since its smaller).


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/60826] inefficient code for vector xor on SSE2
  2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
@ 2014-04-14  9:46 ` rguenth at gcc dot gnu.org
  2014-04-14 16:47 ` sunfish at mozilla dot com
  2021-07-26 22:03 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-14  9:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-04-14
                 CC|                            |vmakarov at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/60826] inefficient code for vector xor on SSE2
  2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
  2014-04-14  9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
@ 2014-04-14 16:47 ` sunfish at mozilla dot com
  2021-07-26 22:03 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: sunfish at mozilla dot com @ 2014-04-14 16:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

--- Comment #2 from Dan Gohman <sunfish at mozilla dot com> ---
A little more detail: I think I have seen GCC use a spill + movsd reload as a
method of zeroing the non-zero-index vector elements of an xmm register,
however that's either not what's happening here, or it may be happening when it
isn't needed.

I think the x86-64 ABI doesn't require the unused parts of an xmm return
register to be zeroed, but even if it does, I can also reproduce the
unnecessary spill and reload when I modify the test function above to this:

void test(v2f64 w, v2f64 x, v2f64 z, double *p)
{
    v2f64 y = f_and(w, x);

    *p = vector_to_scalar(f_xor(z, y));
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/60826] inefficient code for vector xor on SSE2
  2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
  2014-04-14  9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
  2014-04-14 16:47 ` sunfish at mozilla dot com
@ 2021-07-26 22:03 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-26 22:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60826

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |6.0
         Resolution|---                         |FIXED

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed fully in GCC 6+ in there is no extra move or going through memory.
In GCC 5, there is an extra move but no going through memory.

the RTL changed in GCC 7+ to use vec_select which fixes the problem without a
register allocation issue with tieing TI and DF modes.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-26 22:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-11 18:09 [Bug target/60826] New: inefficient code for vector xor on SSE2 sunfish at mozilla dot com
2014-04-14  9:46 ` [Bug target/60826] " rguenth at gcc dot gnu.org
2014-04-14 16:47 ` sunfish at mozilla dot com
2021-07-26 22:03 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).