[Bug c++/113978] New: Misoptimize for long vector load operation

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/113978] New: Misoptimize for long vector load operation
@ 2024-02-18  6:16 xjkp2283572185 at gmail dot com
  2024-02-18  6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18  6:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

            Bug ID: 113978
           Summary: Misoptimize for long vector load operation
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xjkp2283572185 at gmail dot com
  Target Milestone: ---

===
Compiler
===
Using built-in specs.
COLLECT_GCC=D:\Tools\gcc\bin\g++.exe
COLLECT_LTO_WRAPPER=D:/Tools/gcc/bin/../libexec/gcc/x86_64-w64-mingw32/14.0.1/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../configure --disable-werror
--prefix=/home/luo/x86_64-w64-mingw32-native-gcc14 --host=x86_64-w64-mingw32
--target=x86_64-w64-mingw32 --enable-multilib --enable-languages=c,c++
--disable-sjlj-exceptions --enable-threads=win32
Thread model: win32
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240130 (experimental) (GCC)

===
Source Code
===
using v [[using gnu: vector_size(128)]] = char;
auto f(v* p) noexcept
{
    return *p;
}

===
Command
===
g++ test.cpp -Ofast -march=znver4

===
Result
===
_Z1fPDv128_c:
.LFB0:
        subq    $248, %rsp
        .seh_stackalloc 248
        .seh_endprologue
        vmovdqa64       (%rdx), %zmm0
        movq    %rcx, %rax
        vmovdqa64       %zmm0, (%rcx)
        vmovdqa64       64(%rdx), %zmm0
        vmovdqa64       %zmm0, 64(%rcx)
        vzeroupper
        addq    $248, %rsp
        ret

GCC generates extra stack operation. But clang just generates two load:
_Z1fPDv128_c:                           # @_Z1fPDv128_c
# %bb.0:
        vmovaps (%rcx), %zmm0
        vmovaps 64(%rcx), %zmm1
        retq

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
@ 2024-02-18  6:28 ` pinskia at gcc dot gnu.org
  2024-02-18  6:31 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18  6:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ABI
          Component|c++                         |target
             Target|                            |x86_64-linux-gnu

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is an ABI issue. I am not sure if GCC is correct or LLVM is correct.

But basically clang/LLVM is returning in %zmm0 and %zmm1 while GCC is returning
via memory.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
  2024-02-18  6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
@ 2024-02-18  6:31 ` pinskia at gcc dot gnu.org
  2024-02-18  6:32 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18  6:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://github.com/llvm/llv
                   |                            |m-project/issues/82151

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Filed https://github.com/llvm/llvm-project/issues/82151 for the corresponding
LLVM issue for ABI compatibility.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
  2024-02-18  6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
  2024-02-18  6:31 ` pinskia at gcc dot gnu.org
@ 2024-02-18  6:32 ` pinskia at gcc dot gnu.org
  2024-02-18  6:36 ` xjkp2283572185 at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18  6:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>GCC generates extra stack operation

That is basically to realign the stack just in case there was a spill, this
happens more on mingw compiling than linux really.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (2 preceding siblings ...)
  2024-02-18  6:32 ` pinskia at gcc dot gnu.org
@ 2024-02-18  6:36 ` xjkp2283572185 at gmail dot com
  2024-02-18  6:40 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

--- Comment #4 from 严 逍宇 <xjkp2283572185 at gmail dot com> ---
I find an example without abi problem:
===
Source Code
===
using v [[using gnu: vector_size(128)]] = char;
void f(v *pa, v *pb) noexcept
{
    v a{*pa}, b{*pb};
    *pa = b;
    *pb = a;
}

===
Command
===
g++ test.cpp -Ofast -march=znver4 -S

===
Result
===
_Z1fPDv128_cS0_:
.LFB0:
        subq    $376, %rsp
        .seh_stackalloc 376
        .seh_endprologue
        vmovdqa64       (%rcx), %zmm1
        vmovdqa64       64(%rcx), %zmm0
        leaq    127(%rsp), %rax
        andq    $-128, %rax
        vmovdqa64       (%rdx), %zmm3
        vmovdqa64       64(%rdx), %zmm2
        vmovdqa64       %zmm1, 128(%rax)
        vmovdqa64       %zmm0, 192(%rax)
        vmovdqa64       %zmm3, (%rcx)
        vmovdqa64       %zmm2, 64(%rcx)
        vmovdqa64       %zmm3, (%rax)
        vmovdqa64       %zmm2, 64(%rax)
        vmovdqa64       %zmm1, (%rdx)
        vmovdqa64       %zmm0, 64(%rdx)
        vzeroupper
        addq    $376, %rsp
        ret
But clang can do this right:
_Z1fPDv128_cS0_:                        # @_Z1fPDv128_cS0_
# %bb.0:
        vmovaps (%rcx), %zmm0
        vmovaps 64(%rcx), %zmm1
        vmovaps (%rdx), %zmm2
        vmovaps 64(%rdx), %zmm3
        vmovaps %zmm2, (%rcx)
        vmovaps %zmm3, 64(%rcx)
        vmovaps %zmm0, (%rdx)
        vmovaps %zmm1, 64(%rdx)
        vzeroupper
        retq

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (3 preceding siblings ...)
  2024-02-18  6:36 ` xjkp2283572185 at gmail dot com
@ 2024-02-18  6:40 ` pinskia at gcc dot gnu.org
  2024-02-18  6:46 ` xjkp2283572185 at gmail dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18  6:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to 严 逍宇 from comment #4)
> I find an example without abi problem:

As I mentioned, that works on linux just fine:
```
        vmovdqa64       (%rdi), %zmm1
        vmovdqa64       64(%rdi), %zmm0
        vmovdqa64       (%rsi), %zmm3
        vmovdqa64       64(%rsi), %zmm2
        vmovdqa64       %zmm3, (%rdi)
        vmovdqa64       %zmm2, 64(%rdi)
        vmovdqa64       %zmm1, (%rsi)
        vmovdqa64       %zmm0, 64(%rsi)
        vzeroupper
        ret
```

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (4 preceding siblings ...)
  2024-02-18  6:40 ` pinskia at gcc dot gnu.org
@ 2024-02-18  6:46 ` xjkp2283572185 at gmail dot com
  2024-02-18  7:51 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18  6:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

--- Comment #6 from 严 逍宇 <xjkp2283572185 at gmail dot com> ---
(In reply to Andrew Pinski from comment #5)

> As I mentioned, that works on linux just fine:

Thank you for your time. And when can I use this feature on mingw? I think the
behavior of swap two long vectors should be platform-independent.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (5 preceding siblings ...)
  2024-02-18  6:46 ` xjkp2283572185 at gmail dot com
@ 2024-02-18  7:51 ` jakub at gcc dot gnu.org
  2024-02-18 11:47 ` jakub at gcc dot gnu.org
  2024-02-19  8:14 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> This is an ABI issue. I am not sure if GCC is correct or LLVM is correct.

The psABI doesn't cover that.  It only talks about __m128, __m256 and __m512
types, and
as both compilers use the GNU vector_size attribute extension under the hood
for those types, that is how __attribute__((vector_size ({16,32,64}))) should
behave.
Smaller vectors (vector_size 2, 4, 8) on x86_64 are in GCC passed like __m128
(I think), larger vectors or even __m256/__m512 if AVX/AVX512 isn't supported
are classified as MEMORY like > 16 byte structures/unions.

> But basically clang/LLVM is returning in %zmm0 and %zmm1 while GCC is
> returning via memory.

There is certainly not anything in the psABI that would return something in
%zmm0/%zmm1 pair I believe.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (6 preceding siblings ...)
  2024-02-18  7:51 ` jakub at gcc dot gnu.org
@ 2024-02-18 11:47 ` jakub at gcc dot gnu.org
  2024-02-19  8:14 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18 11:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #7)
> The psABI doesn't cover that.  It only talks about __m128, __m256 and __m512
> types, and
> as both compilers use the GNU vector_size attribute extension under the hood
> for those types, that is how __attribute__((vector_size ({16,32,64})))
> should behave.
> Smaller vectors (vector_size 2, 4, 8) on x86_64 are in GCC passed like
> __m128 (I think), larger vectors or even __m256/__m512 if AVX/AVX512 isn't
> supported are classified as MEMORY like > 16 byte structures/unions.

And given that vector_size is a GNU extension and GCC behaves that way since
GCC 4.0
(I think since https://gcc.gnu.org/legacy-ml/gcc-patches/2004-07/msg01512.html
,
before that
typedef char V __attribute__((vector_size (128)));
V foo (V* p) { return *p; }
has been rejected, we only supported natively supported vectors in GCC 3.x), so
I think LLVM needs to be fixed to match that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug target/113978] Misoptimize for long vector load operation
  2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
                   ` (7 preceding siblings ...)
  2024-02-18 11:47 ` jakub at gcc dot gnu.org
@ 2024-02-19  8:14 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-19  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|ABI                         |missed-optimization
             Target|x86_64-*-*                  |x86_64-w64-mingw32

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is the mingw specific missed-optimization issue of comment#4 then,
the ABI thing is a llvm bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-19  8:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-18  6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
2024-02-18  6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
2024-02-18  6:31 ` pinskia at gcc dot gnu.org
2024-02-18  6:32 ` pinskia at gcc dot gnu.org
2024-02-18  6:36 ` xjkp2283572185 at gmail dot com
2024-02-18  6:40 ` pinskia at gcc dot gnu.org
2024-02-18  6:46 ` xjkp2283572185 at gmail dot com
2024-02-18  7:51 ` jakub at gcc dot gnu.org
2024-02-18 11:47 ` jakub at gcc dot gnu.org
2024-02-19  8:14 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).