[Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used
@ 2021-09-21 22:37 gabravier at gmail dot com
  2021-09-21 23:32 ` [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast " pinskia at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2021-09-21 22:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

            Bug ID: 102438
           Summary: [x86-64] Failure to optimize out random extra
                    store+load in vector code when memcpy is used
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include <stddef.h>

typedef double simde_float64x1_t __attribute__((__vector_size__(8)));

simde_float64x1_t simde_vabs_f64(simde_float64x1_t a) {
    simde_float64x1_t r;
    r[0] = -a[0];
    return (simde_float64x1_t)r;
}

On AMD64 with -O3, this is outputted:

simde_vabs_f64(double __vector(1)):
        movsd   xmm0, QWORD PTR [rsp+8]
        xorpd   xmm0, XMMWORD PTR .LC0[rip]
        mov     rax, rdi
        movsd   QWORD PTR [rsp-24], xmm0
        mov     rdx, QWORD PTR [rsp-24]
        mov     QWORD PTR [rdi], rdx
        ret

If we instead just return `r` (without the cast) this is instead outputted:

simde_vabs_f64(double __vector(1)):
        movsd   xmm0, QWORD PTR [rsp+8]
        xorpd   xmm0, XMMWORD PTR .LC0[rip]
        mov     rax, rdi
        movsd   QWORD PTR [rdi], xmm0
        ret

It seems as though the presence of a cast (to the same type, no less) confuses
GCC into spilling the result into memory.

The GIMPLE optimized output is different for the two, so idk how much this
target-specific to x86, but I haven't been able to reproduce it anywhere else,
so ¯\_(ツ)_/¯. 

PS: The same bug can also be reproduced with -m32

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast is used
  2021-09-21 22:37 [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used gabravier at gmail dot com
@ 2021-09-21 23:32 ` pinskia at gcc dot gnu.org
  2021-09-21 23:33 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-21 23:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-09-21
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There is an ABI difference between GCC and clang here ....

But I suspect this is the one of the standard return/argument issues with
respect to gcc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast is used
  2021-09-21 22:37 [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used gabravier at gmail dot com
  2021-09-21 23:32 ` [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast " pinskia at gcc dot gnu.org
@ 2021-09-21 23:33 ` pinskia at gcc dot gnu.org
  2021-09-22  3:04 ` crazylht at gmail dot com
  2021-09-22  3:08 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-21 23:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
There might be a dup of this bug too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast is used
  2021-09-21 22:37 [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used gabravier at gmail dot com
  2021-09-21 23:32 ` [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast " pinskia at gcc dot gnu.org
  2021-09-21 23:33 ` pinskia at gcc dot gnu.org
@ 2021-09-22  3:04 ` crazylht at gmail dot com
  2021-09-22  3:08 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2021-09-22  3:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
currently the i386 backend doesn't support V1DFmode, and it's treated as a
DImode (an equal-size integer type) and passed by stack.

Shall we support V1DFmode?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast is used
  2021-09-21 22:37 [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-22  3:04 ` crazylht at gmail dot com
@ 2021-09-22  3:08 ` crazylht at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2021-09-22  3:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102438

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> currently the i386 backend doesn't support V1DFmode, and it's treated as a
> DImode (an equal-size integer type) and passed by stack.
typo, moved through stack.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-22  3:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21 22:37 [Bug target/102438] New: [x86-64] Failure to optimize out random extra store+load in vector code when memcpy is used gabravier at gmail dot com
2021-09-21 23:32 ` [Bug target/102438] [x86-64] Failure to optimize out spill in vector code when a cast " pinskia at gcc dot gnu.org
2021-09-21 23:33 ` pinskia at gcc dot gnu.org
2021-09-22  3:04 ` crazylht at gmail dot com
2021-09-22  3:08 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).