public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types)
@ 2020-12-24 16:03 gabravier at gmail dot com
2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2020-12-24 16:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
Bug ID: 98438
Summary: Rather bad optimization of midpoint implementation for
__int128 (and other types)
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
_Tp midpoint(_Tp __a, _Tp __b) noexcept
{
using _Up = std::make_unsigned_t<_Tp>;
constexpr _Up __bitshift = std::numeric_limits<_Up>::digits - 1;
_Up __diff = _Up(__b) - _Up(__a);
_Up __sign_bit = __b < __a;
_Up __half_diff = (__diff / 2) + (__sign_bit << __bitshift) + (__sign_bit &
__diff);
return __a + __half_diff;
}
For `_Tp` `int`, this results in somewhat bad code generation on x86,
presumably due to the fact that GCC does not seem to know that `sub` generates
flags :
midpoint(int, int):
mov edx, esi
xor ecx, ecx
sub edx, edi
cmp esi, edi
setl cl
mov eax, edx
mov esi, ecx
shr eax
and edx, ecx
sal esi, 31
add edx, edi
add eax, esi
add eax, edx
ret
Whereas LLVM has better results :
midpoint(int, int): # @midpoint(int, int)
xor eax, eax
sub esi, edi
setl al
mov ecx, esi
and esi, eax
shl eax, 31
shr ecx
add eax, edi
add eax, ecx
add eax, esi
ret
This seems to however be even worse with such types as __int128, where this is
the code generation from GCC :
midpoint(__int128, __int128):
mov r10, rdx
mov r11, rcx
mov rax, rcx
push r14
sub r10, rdi
push rbx
mov r8, rdi
mov r9, rsi
sbb r11, rsi
xor ebx, ebx
cmp rdx, rdi
mov ecx, 1
sbb rax, rsi
jl .L2
xor ecx, ecx
.L2:
mov rax, r10
xor esi, esi
mov r14, rcx
mov rdx, r11
shrd rax, r11, 1
sal r14, 63
shr rdx
add rax, rsi
adc rdx, r14
and rcx, r10
mov rsi, rcx
mov rcx, r11
and rcx, rbx
add rsi, r8
pop rbx
pop r14
mov rdi, rcx
adc rdi, r9
add rax, rsi
adc rdx, rdi
ret
And this is the code generation from LLVM :
midpoint(__int128, __int128): # @midpoint(__int128, __int128)
mov rax, rdx
sub rax, rdi
sbb rcx, rsi
mov r8, rax
setl dl
mov r9, rcx
shr r8
shr rcx
shl r9, 63
movzx edx, dl
and eax, edx
shl rdx, 63
or r9, r8
add rdx, rsi
add r9, rdi
adc rdx, rcx
add rax, r9
adc rdx, 0
ret
With the GCC version requiring such a large amount of registers that it even
has to use some callee saved registers.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
@ 2020-12-31 17:34 ` tkoenig at gcc dot gnu.org
2020-12-31 18:38 ` gabravier at gmail dot com
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-12-31 17:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2020-12-31
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Could you make this into a self-contained example that compiles cleanly?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
@ 2020-12-31 18:38 ` gabravier at gmail dot com
2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2020-12-31 18:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
Directly based off the emitted GIMPLE, here's a self-contained example that
emits the same assembly I was observing before :
using _Tp = int;
using _Up = unsigned;
_Tp f(_Tp __a, _Tp __b)
{
_Up __sign_bit_12;
_Up __diff_11;
unsigned int __b_0_1;
unsigned int __a_1_2;
bool _3;
unsigned int _4;
unsigned int _5;
unsigned int _7;
unsigned int _8;
unsigned int _13;
_Tp _14;
unsigned int _17;
__b_0_1 = (unsigned int) __b;
__a_1_2 = (unsigned int) __a;
__diff_11 = __b_0_1 - __a_1_2;
_3 = __b < __a;
__sign_bit_12 = (_Up) _3;
_4 = __diff_11 >> 1;
_5 = __sign_bit_12 << 31;
_13 = _4 + _5;
_7 = __diff_11 & __sign_bit_12;
_17 = __a_1_2 + _7;
_8 = _13 + _17;
_14 = (_Tp) _8;
return _14;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
2020-12-31 18:38 ` gabravier at gmail dot com
@ 2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
2021-01-01 0:47 ` gabravier at gmail dot com
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-12-31 19:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Could you simply post the complete C++ source code that you used
in the original example? This has the advantages of a) making it easier
to modify (for a non-C++-person such as me) and b) of conforming
to the gcc bug reporting guidelines at https://gcc.gnu.org/bugs/#need .
Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
` (2 preceding siblings ...)
2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
@ 2021-01-01 0:47 ` gabravier at gmail dot com
2021-01-01 0:53 ` gabravier at gmail dot com
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-01-01 0:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
--- Comment #4 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 49865
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49865&action=edit
Pre-processed source of a file reproducing the bug
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
` (3 preceding siblings ...)
2021-01-01 0:47 ` gabravier at gmail dot com
@ 2021-01-01 0:53 ` gabravier at gmail dot com
2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-01-01 0:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
--- Comment #5 from Gabriel Ravier <gabravier at gmail dot com> ---
If you're wondering, the `using x = y;` syntax is roughly equivalent to
`typedef y x;`, and the code just expects _Up to be the unsigned counterpart of
_Tp, but here are the details as demanded by https://gcc.gnu.org/bugs/#need
(sorry if the preprocessed file is a bit large, I usually use a template for
examining GCC bugs):
Exact version of GCC: gcc version 11.0.0 20201230 (experimental)
(Compiler-Explorer-Build)
System type: Assuming that's the target, x86_64-linux-gnu
Options given when GCC was configured/built: ../gcc-trunk-20201231/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build
Complete command line that triggers the bug: g++ -g -o
/tmp/compiler-explorer-compiler202101-4546-1y1nk07.gl18/output.s -masm=intel -S
-fdiagnostics-color=always -O3 -std=gnu++2a -DNDEBUG -g0
/tmp/compiler-explorer-compiler202101-4546-1y1nk07.gl18/example.cpp
Compiler output (error messages, warnings, etc.): (nothing)
Preprocessed file (*.i*) that triggers the bug: Would have been sent as an
attachment to this message but was accidentally previously sent as an
attachment to a blank message, so you can find it there.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
` (4 preceding siblings ...)
2021-01-01 0:53 ` gabravier at gmail dot com
@ 2021-01-01 12:50 ` tkoenig at gcc dot gnu.org
2021-01-05 10:00 ` rguenth at gcc dot gnu.org
2021-09-02 8:56 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-01-01 12:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
Severity|normal |enhancement
Keywords| |ra
Target Milestone|--- |11.0
Component|target |rtl-optimization
--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Thanks for the expanded test case.
The assembly with gcc 8 for int is identical to that with trunk,
for _int128 it is a bit longer, so this does not appear to be
a regression.
The 128-bit case looks like a problem with register allocation.
Confirming, changing component, adding relevant keywords.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
` (5 preceding siblings ...)
2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
@ 2021-01-05 10:00 ` rguenth at gcc dot gnu.org
2021-09-02 8:56 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-05 10:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|x86_64 i?86 |x86_64-*-* i?86-*-*
Target Milestone|11.0 |---
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wonder whether (_Tp)((type-with-double-size)__a + (..)__b)/2 will be faster
for some types.
Or whether we should transform such pattern to the fancy one.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
` (6 preceding siblings ...)
2021-01-05 10:00 ` rguenth at gcc dot gnu.org
@ 2021-09-02 8:56 ` pinskia at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-02 8:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2020-12-31 00:00:00 |2021-9-2
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Part of this is register allocation but the __int128_t one seems to be more and
it comes down to how we don't handle double register integer arthematic that
well. On the gimple level we leave it as normal and only split it up during
expansion to RTL.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-02 8:56 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
2020-12-31 18:38 ` gabravier at gmail dot com
2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
2021-01-01 0:47 ` gabravier at gmail dot com
2021-01-01 0:53 ` gabravier at gmail dot com
2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
2021-01-05 10:00 ` rguenth at gcc dot gnu.org
2021-09-02 8:56 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).