public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86
@ 2024-03-12 17:45 pali at kernel dot org
2024-03-12 17:49 ` [Bug middle-end/114319] " pinskia at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: pali at kernel dot org @ 2024-03-12 17:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
Bug ID: 114319
Summary: htobe64-like function is not optimized on 32-bit x86
Product: gcc
Version: 12.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pali at kernel dot org
Target Milestone: ---
Target: x86
Here is very simple and straightforward implementation of htobe64 function
which takes 64-bit number stored in unsigned long long variable and encodes it
into byte buffer unsigned char[].
void test1(unsigned long long val, unsigned char *buf) {
buf[0] = val >> 56;
buf[1] = val >> 48;
buf[2] = val >> 40;
buf[3] = val >> 32;
buf[4] = val >> 24;
buf[5] = val >> 16;
buf[6] = val >> 8;
buf[7] = val;
}
Compiling it for 64-bit x86 via "gcc -m64 -O2" produces optimized code:
0000000000000000 <test1>:
0: 48 0f cf bswap %rdi
3: 48 89 3e mov %rdi,(%rsi)
6: c3 retq
But compiling it for 32-bit x86 via "gcc -m32 -O2" produces not so optimized
code:
00000000 <test1>:
0: 8b 54 24 08 mov 0x8(%esp),%edx
4: 8b 44 24 0c mov 0xc(%esp),%eax
8: 89 d1 mov %edx,%ecx
a: 88 70 02 mov %dh,0x2(%eax)
d: c1 e9 18 shr $0x18,%ecx
10: 88 50 03 mov %dl,0x3(%eax)
13: 88 08 mov %cl,(%eax)
15: 89 d1 mov %edx,%ecx
17: 8b 54 24 04 mov 0x4(%esp),%edx
1b: c1 e9 10 shr $0x10,%ecx
1e: 0f ca bswap %edx
20: 88 48 01 mov %cl,0x1(%eax)
23: 89 50 04 mov %edx,0x4(%eax)
26: c3 ret
I tried to compile it for 32-bit powerpc via "powerpc-linux-gnu-gcc -m32 -O2"
and it produces optimized code:
00000000 <test1>:
0: 90 65 00 00 stw r3,0(r5)
4: 90 85 00 04 stw r4,4(r5)
8: 4e 80 00 20 blr
Same for 64-bit powerpc via "powerpc-linux-gnu-gcc -m64 -O2":
0000000000000000 <.test1>:
0: f8 64 00 00 std r3,0(r4)
4: 4e 80 00 20 blr
As a next experiment I tried to rewrite the simple implementation to use gcc
builtins.
void test2(unsigned long long val, unsigned char *buf) {
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
val = __builtin_bswap64(val);
#endif
__builtin_memcpy(buf, &val, sizeof(val));
}
If I compile it for 32-bit x86 then I get optimized code:
00000030 <test2>:
30: 8b 4c 24 0c mov 0xc(%esp),%ecx
34: 8b 44 24 04 mov 0x4(%esp),%eax
38: 8b 54 24 08 mov 0x8(%esp),%edx
3c: 0f c8 bswap %eax
3e: 89 41 04 mov %eax,0x4(%ecx)
41: 0f ca bswap %edx
43: 89 11 mov %edx,(%ecx)
45: c3 ret
If I compile it for 64-bit x86 then I get exactly same code as for test1:
0000000000000010 <test2>:
10: 48 0f cf bswap %rdi
13: 48 89 3e mov %rdi,(%rsi)
16: c3 retq
I tried to compile it for powerpc too and the result of test1 and test2 was
same.
So it looks like that the issue here is specific for 32-bit x86 and gcc does
not detect that test1 function on x86 is doing bswap64.
All tests I have done on (amd64) Debian gcc and for powerpc target I used
Debian's powerpc-linux-gnu-gcc cross compiler.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
@ 2024-03-12 17:49 ` pinskia at gcc dot gnu.org
2024-03-12 17:56 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12 17:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |middle-end
Severity|normal |enhancement
Keywords| |missed-optimization
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
2024-03-12 17:49 ` [Bug middle-end/114319] " pinskia at gcc dot gnu.org
@ 2024-03-12 17:56 ` pinskia at gcc dot gnu.org
2024-03-12 18:04 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12 17:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to fail| |11.4.0
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>But compiling it for 32-bit x86 via "gcc -m32 -O2" produces not so optimized code:
I get that code generation for GCC 11.4.0 and before.
For GCC 12.1.0 and above I get:
```
movl 8(%esp), %ecx
bswap %ecx
movl %ecx, %eax
movl 4(%esp), %ecx
bswap %ecx
movl %ecx, %edx
movl 12(%esp), %ecx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
ret
```
Which just has a few extra moves.
But adding -mno-sse, GCC 12 produces worse code.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
2024-03-12 17:49 ` [Bug middle-end/114319] " pinskia at gcc dot gnu.org
2024-03-12 17:56 ` pinskia at gcc dot gnu.org
@ 2024-03-12 18:04 ` pinskia at gcc dot gnu.org
2024-03-12 18:07 ` pali at kernel dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12 18:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2024-03-12
Target|x86 |ILP32
Blocks| |94094
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed. I see the trunk even without -mno-sse does not produce the 2 bswaps.
Looks like the store-merging pass is not recognizing bswap<<32 for some reason.
Also I thought there was a dup somewhere ...
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94094
[Bug 94094] [meta-bug] store-merging and/or bswap load/store-merging missed
optimizations
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (2 preceding siblings ...)
2024-03-12 18:04 ` pinskia at gcc dot gnu.org
@ 2024-03-12 18:07 ` pali at kernel dot org
2024-03-12 18:10 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pali at kernel dot org @ 2024-03-12 18:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
--- Comment #3 from Pali Rohár <pali at kernel dot org> ---
For details, here is the compiler which produces the mentioned code:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14'
--with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-12
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)
I guess that with these configure options you should be able to compile gcc
which produces the mentioned code.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (3 preceding siblings ...)
2024-03-12 18:07 ` pali at kernel dot org
@ 2024-03-12 18:10 ` pinskia at gcc dot gnu.org
2024-03-13 9:47 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-03-12 18:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Pali Rohár from comment #3)
> --with-arch-32=i686
This basically causes SSE to be disabled for 32bit by default ...
With the default options to configure GCC, -m32 for x86_64 still enables sse
...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (4 preceding siblings ...)
2024-03-12 18:10 ` pinskia at gcc dot gnu.org
@ 2024-03-13 9:47 ` rguenth at gcc dot gnu.org
2024-03-13 14:35 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-13 9:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Coalescing successful!
Merged into 1 stores
32 bit bswap implementation found at: _37
looks like we are only merging one store. Note we cannot recognize
bswap to memory this is a known issue. So for the bswap64 we need to
merge to a 64bit store which we never do on a 32bit platform. We
could with SSE, but appearantly we don't try with the bswap trick
at least. The bswap trick also doesn't seem to consider the split
64bit bswap. Oddly enough we also fail to merge the other store
(maybe missing a val >> 32 pre-shift "trick").
Possibly could be shown to be a similar issue with a 126bit bswap
on x86_64 which we could emulate with two 64bit bswaps.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (5 preceding siblings ...)
2024-03-13 9:47 ` rguenth at gcc dot gnu.org
@ 2024-03-13 14:35 ` cvs-commit at gcc dot gnu.org
2024-03-13 14:38 ` jakub at gcc dot gnu.org
2024-03-13 21:31 ` pali at kernel dot org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-13 14:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:74bca21db31e3f4ab6543b56c3f26b4dfe586fef
commit r14-9453-g74bca21db31e3f4ab6543b56c3f26b4dfe586fef
Author: Jakub Jelinek <jakub@redhat.com>
Date: Wed Mar 13 15:34:59 2024 +0100
store-merging: Match bswap64 on 32-bit targets with bswapsi2 [PR114319]
gimple-ssa-store-merging.cc tests bswap_optab in 3 different places,
in 2 of them it has special exception for double-word bswap using pair
of word-mode bswap optabs, but in the last one it doesn't.
The following patch changes even the last spot.
We don't handle 128-bit bswaps in the passes at all, because currently we
just use uint64_t to represent the byte reshuffling (we'd need to use
offset_int or something like that instead) and we don't have
__builtin_bswap128 nor type-generic __builtin_bswap, so there is nothing
for 64-bit targets there.
2024-03-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114319
* gimple-ssa-store-merging.cc
(imm_store_chain_info::try_coalesce_bswap): For 32-bit targets
allow matching __builtin_bswap64 if there is bswapsi2 optab.
* gcc.target/i386/pr114319.c: New test.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (6 preceding siblings ...)
2024-03-13 14:35 ` cvs-commit at gcc dot gnu.org
@ 2024-03-13 14:38 ` jakub at gcc dot gnu.org
2024-03-13 21:31 ` pali at kernel dot org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-03-13 14:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
CC| |jakub at gcc dot gnu.org
Status|NEW |RESOLVED
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for GCC 14.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/114319] htobe64-like function is not optimized on 32-bit x86
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
` (7 preceding siblings ...)
2024-03-13 14:38 ` jakub at gcc dot gnu.org
@ 2024-03-13 21:31 ` pali at kernel dot org
8 siblings, 0 replies; 10+ messages in thread
From: pali at kernel dot org @ 2024-03-13 21:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114319
--- Comment #8 from Pali Rohár <pali at kernel dot org> ---
Thanks for quick response and fixup of this issue.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-03-13 21:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 17:45 [Bug target/114319] New: htobe64-like function is not optimized on 32-bit x86 pali at kernel dot org
2024-03-12 17:49 ` [Bug middle-end/114319] " pinskia at gcc dot gnu.org
2024-03-12 17:56 ` pinskia at gcc dot gnu.org
2024-03-12 18:04 ` pinskia at gcc dot gnu.org
2024-03-12 18:07 ` pali at kernel dot org
2024-03-12 18:10 ` pinskia at gcc dot gnu.org
2024-03-13 9:47 ` rguenth at gcc dot gnu.org
2024-03-13 14:35 ` cvs-commit at gcc dot gnu.org
2024-03-13 14:38 ` jakub at gcc dot gnu.org
2024-03-13 21:31 ` pali at kernel dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).