public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/113978] New: Misoptimize for long vector load operation
@ 2024-02-18 6:16 xjkp2283572185 at gmail dot com
2024-02-18 6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18 6:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Bug ID: 113978
Summary: Misoptimize for long vector load operation
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: xjkp2283572185 at gmail dot com
Target Milestone: ---
===
Compiler
===
Using built-in specs.
COLLECT_GCC=D:\Tools\gcc\bin\g++.exe
COLLECT_LTO_WRAPPER=D:/Tools/gcc/bin/../libexec/gcc/x86_64-w64-mingw32/14.0.1/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../configure --disable-werror
--prefix=/home/luo/x86_64-w64-mingw32-native-gcc14 --host=x86_64-w64-mingw32
--target=x86_64-w64-mingw32 --enable-multilib --enable-languages=c,c++
--disable-sjlj-exceptions --enable-threads=win32
Thread model: win32
Supported LTO compression algorithms: zlib
gcc version 14.0.1 20240130 (experimental) (GCC)
===
Source Code
===
using v [[using gnu: vector_size(128)]] = char;
auto f(v* p) noexcept
{
return *p;
}
===
Command
===
g++ test.cpp -Ofast -march=znver4
===
Result
===
_Z1fPDv128_c:
.LFB0:
subq $248, %rsp
.seh_stackalloc 248
.seh_endprologue
vmovdqa64 (%rdx), %zmm0
movq %rcx, %rax
vmovdqa64 %zmm0, (%rcx)
vmovdqa64 64(%rdx), %zmm0
vmovdqa64 %zmm0, 64(%rcx)
vzeroupper
addq $248, %rsp
ret
GCC generates extra stack operation. But clang just generates two load:
_Z1fPDv128_c: # @_Z1fPDv128_c
# %bb.0:
vmovaps (%rcx), %zmm0
vmovaps 64(%rcx), %zmm1
retq
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
@ 2024-02-18 6:28 ` pinskia at gcc dot gnu.org
2024-02-18 6:31 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 6:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ABI
Component|c++ |target
Target| |x86_64-linux-gnu
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is an ABI issue. I am not sure if GCC is correct or LLVM is correct.
But basically clang/LLVM is returning in %zmm0 and %zmm1 while GCC is returning
via memory.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
2024-02-18 6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
@ 2024-02-18 6:31 ` pinskia at gcc dot gnu.org
2024-02-18 6:32 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 6:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://github.com/llvm/llv
| |m-project/issues/82151
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Filed https://github.com/llvm/llvm-project/issues/82151 for the corresponding
LLVM issue for ABI compatibility.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
2024-02-18 6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
2024-02-18 6:31 ` pinskia at gcc dot gnu.org
@ 2024-02-18 6:32 ` pinskia at gcc dot gnu.org
2024-02-18 6:36 ` xjkp2283572185 at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 6:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>GCC generates extra stack operation
That is basically to realign the stack just in case there was a spill, this
happens more on mingw compiling than linux really.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (2 preceding siblings ...)
2024-02-18 6:32 ` pinskia at gcc dot gnu.org
@ 2024-02-18 6:36 ` xjkp2283572185 at gmail dot com
2024-02-18 6:40 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18 6:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
--- Comment #4 from 严 逍宇 <xjkp2283572185 at gmail dot com> ---
I find an example without abi problem:
===
Source Code
===
using v [[using gnu: vector_size(128)]] = char;
void f(v *pa, v *pb) noexcept
{
v a{*pa}, b{*pb};
*pa = b;
*pb = a;
}
===
Command
===
g++ test.cpp -Ofast -march=znver4 -S
===
Result
===
_Z1fPDv128_cS0_:
.LFB0:
subq $376, %rsp
.seh_stackalloc 376
.seh_endprologue
vmovdqa64 (%rcx), %zmm1
vmovdqa64 64(%rcx), %zmm0
leaq 127(%rsp), %rax
andq $-128, %rax
vmovdqa64 (%rdx), %zmm3
vmovdqa64 64(%rdx), %zmm2
vmovdqa64 %zmm1, 128(%rax)
vmovdqa64 %zmm0, 192(%rax)
vmovdqa64 %zmm3, (%rcx)
vmovdqa64 %zmm2, 64(%rcx)
vmovdqa64 %zmm3, (%rax)
vmovdqa64 %zmm2, 64(%rax)
vmovdqa64 %zmm1, (%rdx)
vmovdqa64 %zmm0, 64(%rdx)
vzeroupper
addq $376, %rsp
ret
But clang can do this right:
_Z1fPDv128_cS0_: # @_Z1fPDv128_cS0_
# %bb.0:
vmovaps (%rcx), %zmm0
vmovaps 64(%rcx), %zmm1
vmovaps (%rdx), %zmm2
vmovaps 64(%rdx), %zmm3
vmovaps %zmm2, (%rcx)
vmovaps %zmm3, 64(%rcx)
vmovaps %zmm0, (%rdx)
vmovaps %zmm1, 64(%rdx)
vzeroupper
retq
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (3 preceding siblings ...)
2024-02-18 6:36 ` xjkp2283572185 at gmail dot com
@ 2024-02-18 6:40 ` pinskia at gcc dot gnu.org
2024-02-18 6:46 ` xjkp2283572185 at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 6:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to 严 逍宇 from comment #4)
> I find an example without abi problem:
As I mentioned, that works on linux just fine:
```
vmovdqa64 (%rdi), %zmm1
vmovdqa64 64(%rdi), %zmm0
vmovdqa64 (%rsi), %zmm3
vmovdqa64 64(%rsi), %zmm2
vmovdqa64 %zmm3, (%rdi)
vmovdqa64 %zmm2, 64(%rdi)
vmovdqa64 %zmm1, (%rsi)
vmovdqa64 %zmm0, 64(%rsi)
vzeroupper
ret
```
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (4 preceding siblings ...)
2024-02-18 6:40 ` pinskia at gcc dot gnu.org
@ 2024-02-18 6:46 ` xjkp2283572185 at gmail dot com
2024-02-18 7:51 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: xjkp2283572185 at gmail dot com @ 2024-02-18 6:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
--- Comment #6 from 严 逍宇 <xjkp2283572185 at gmail dot com> ---
(In reply to Andrew Pinski from comment #5)
> As I mentioned, that works on linux just fine:
Thank you for your time. And when can I use this feature on mingw? I think the
behavior of swap two long vectors should be platform-independent.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (5 preceding siblings ...)
2024-02-18 6:46 ` xjkp2283572185 at gmail dot com
@ 2024-02-18 7:51 ` jakub at gcc dot gnu.org
2024-02-18 11:47 ` jakub at gcc dot gnu.org
2024-02-19 8:14 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18 7:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #1)
> This is an ABI issue. I am not sure if GCC is correct or LLVM is correct.
The psABI doesn't cover that. It only talks about __m128, __m256 and __m512
types, and
as both compilers use the GNU vector_size attribute extension under the hood
for those types, that is how __attribute__((vector_size ({16,32,64}))) should
behave.
Smaller vectors (vector_size 2, 4, 8) on x86_64 are in GCC passed like __m128
(I think), larger vectors or even __m256/__m512 if AVX/AVX512 isn't supported
are classified as MEMORY like > 16 byte structures/unions.
> But basically clang/LLVM is returning in %zmm0 and %zmm1 while GCC is
> returning via memory.
There is certainly not anything in the psABI that would return something in
%zmm0/%zmm1 pair I believe.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (6 preceding siblings ...)
2024-02-18 7:51 ` jakub at gcc dot gnu.org
@ 2024-02-18 11:47 ` jakub at gcc dot gnu.org
2024-02-19 8:14 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18 11:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #7)
> The psABI doesn't cover that. It only talks about __m128, __m256 and __m512
> types, and
> as both compilers use the GNU vector_size attribute extension under the hood
> for those types, that is how __attribute__((vector_size ({16,32,64})))
> should behave.
> Smaller vectors (vector_size 2, 4, 8) on x86_64 are in GCC passed like
> __m128 (I think), larger vectors or even __m256/__m512 if AVX/AVX512 isn't
> supported are classified as MEMORY like > 16 byte structures/unions.
And given that vector_size is a GNU extension and GCC behaves that way since
GCC 4.0
(I think since https://gcc.gnu.org/legacy-ml/gcc-patches/2004-07/msg01512.html
,
before that
typedef char V __attribute__((vector_size (128)));
V foo (V* p) { return *p; }
has been rejected, we only supported natively supported vectors in GCC 3.x), so
I think LLVM needs to be fixed to match that.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug target/113978] Misoptimize for long vector load operation
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
` (7 preceding siblings ...)
2024-02-18 11:47 ` jakub at gcc dot gnu.org
@ 2024-02-19 8:14 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-19 8:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113978
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|ABI |missed-optimization
Target|x86_64-*-* |x86_64-w64-mingw32
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is the mingw specific missed-optimization issue of comment#4 then,
the ABI thing is a llvm bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-19 8:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-18 6:16 [Bug c++/113978] New: Misoptimize for long vector load operation xjkp2283572185 at gmail dot com
2024-02-18 6:28 ` [Bug target/113978] " pinskia at gcc dot gnu.org
2024-02-18 6:31 ` pinskia at gcc dot gnu.org
2024-02-18 6:32 ` pinskia at gcc dot gnu.org
2024-02-18 6:36 ` xjkp2283572185 at gmail dot com
2024-02-18 6:40 ` pinskia at gcc dot gnu.org
2024-02-18 6:46 ` xjkp2283572185 at gmail dot com
2024-02-18 7:51 ` jakub at gcc dot gnu.org
2024-02-18 11:47 ` jakub at gcc dot gnu.org
2024-02-19 8:14 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).