public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types)
@ 2020-12-24 16:03 gabravier at gmail dot com
  2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2020-12-24 16:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

            Bug ID: 98438
           Summary: Rather bad optimization of midpoint implementation for
                    __int128 (and other types)
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

_Tp midpoint(_Tp __a, _Tp __b) noexcept
{
    using _Up = std::make_unsigned_t<_Tp>;
    constexpr _Up __bitshift = std::numeric_limits<_Up>::digits - 1;

    _Up __diff = _Up(__b) - _Up(__a);
    _Up __sign_bit = __b < __a;

    _Up __half_diff = (__diff / 2) + (__sign_bit << __bitshift) + (__sign_bit &
__diff);

    return __a + __half_diff;
}

For `_Tp` `int`, this results in somewhat bad code generation on x86,
presumably due to the fact that GCC does not seem to know that `sub` generates
flags :

midpoint(int, int):
  mov edx, esi
  xor ecx, ecx
  sub edx, edi
  cmp esi, edi
  setl cl
  mov eax, edx
  mov esi, ecx
  shr eax
  and edx, ecx
  sal esi, 31
  add edx, edi
  add eax, esi
  add eax, edx
  ret

Whereas LLVM has better results :

midpoint(int, int): # @midpoint(int, int)
  xor eax, eax
  sub esi, edi
  setl al
  mov ecx, esi
  and esi, eax
  shl eax, 31
  shr ecx
  add eax, edi
  add eax, ecx
  add eax, esi
  ret

This seems to however be even worse with such types as __int128, where this is
the code generation from GCC :

midpoint(__int128, __int128):
  mov r10, rdx
  mov r11, rcx
  mov rax, rcx
  push r14
  sub r10, rdi
  push rbx
  mov r8, rdi
  mov r9, rsi
  sbb r11, rsi
  xor ebx, ebx
  cmp rdx, rdi
  mov ecx, 1
  sbb rax, rsi
  jl .L2
  xor ecx, ecx
.L2:
  mov rax, r10
  xor esi, esi
  mov r14, rcx
  mov rdx, r11
  shrd rax, r11, 1
  sal r14, 63
  shr rdx
  add rax, rsi
  adc rdx, r14
  and rcx, r10
  mov rsi, rcx
  mov rcx, r11
  and rcx, rbx
  add rsi, r8
  pop rbx
  pop r14
  mov rdi, rcx
  adc rdi, r9
  add rax, rsi
  adc rdx, rdi
  ret

And this is the code generation from LLVM :

midpoint(__int128, __int128): # @midpoint(__int128, __int128)
  mov rax, rdx
  sub rax, rdi
  sbb rcx, rsi
  mov r8, rax
  setl dl
  mov r9, rcx
  shr r8
  shr rcx
  shl r9, 63
  movzx edx, dl
  and eax, edx
  shl rdx, 63
  or r9, r8
  add rdx, rsi
  add r9, rdi
  adc rdx, rcx
  add rax, r9
  adc rdx, 0
  ret

With the GCC version requiring such a large amount of registers that it even
has to use some callee saved registers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
@ 2020-12-31 17:34 ` tkoenig at gcc dot gnu.org
  2020-12-31 18:38 ` gabravier at gmail dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-12-31 17:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2020-12-31

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Could you make this into a self-contained example that compiles cleanly?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
  2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
@ 2020-12-31 18:38 ` gabravier at gmail dot com
  2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2020-12-31 18:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

--- Comment #2 from Gabriel Ravier <gabravier at gmail dot com> ---
Directly based off the emitted GIMPLE, here's a self-contained example that
emits the same assembly I was observing before :

using _Tp = int;
using _Up = unsigned;
_Tp f(_Tp __a, _Tp __b)
{
  _Up __sign_bit_12;
  _Up __diff_11;
  unsigned int __b_0_1;
  unsigned int __a_1_2;
  bool _3;
  unsigned int _4;
  unsigned int _5;
  unsigned int _7;
  unsigned int _8;
  unsigned int _13;
  _Tp _14;
  unsigned int _17;

  __b_0_1 = (unsigned int) __b;
  __a_1_2 = (unsigned int) __a;
  __diff_11 = __b_0_1 - __a_1_2;
  _3 = __b < __a;
  __sign_bit_12 = (_Up) _3;
  _4 = __diff_11 >> 1;
  _5 = __sign_bit_12 << 31;
  _13 = _4 + _5;
  _7 = __diff_11 & __sign_bit_12;
  _17 = __a_1_2 + _7;
  _8 = _13 + _17;
  _14 = (_Tp) _8;
  return _14;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
  2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
  2020-12-31 18:38 ` gabravier at gmail dot com
@ 2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
  2021-01-01  0:47 ` gabravier at gmail dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-12-31 19:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Could you simply post the complete C++ source code that you used
in the original example? This has the advantages of a) making it easier
to modify (for a non-C++-person such as me) and b) of conforming
to the gcc bug reporting guidelines at https://gcc.gnu.org/bugs/#need .

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
@ 2021-01-01  0:47 ` gabravier at gmail dot com
  2021-01-01  0:53 ` gabravier at gmail dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-01-01  0:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

--- Comment #4 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 49865
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49865&action=edit
Pre-processed source of a file reproducing the bug

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
                   ` (3 preceding siblings ...)
  2021-01-01  0:47 ` gabravier at gmail dot com
@ 2021-01-01  0:53 ` gabravier at gmail dot com
  2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: gabravier at gmail dot com @ 2021-01-01  0:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

--- Comment #5 from Gabriel Ravier <gabravier at gmail dot com> ---
If you're wondering, the `using x = y;` syntax is roughly equivalent to
`typedef y x;`, and the code just expects _Up to be the unsigned counterpart of
_Tp, but here are the details as demanded by https://gcc.gnu.org/bugs/#need
(sorry if the preprocessed file is a bit large, I usually use a template for
examining GCC bugs):

Exact version of GCC: gcc version 11.0.0 20201230 (experimental)
(Compiler-Explorer-Build)
System type: Assuming that's the target, x86_64-linux-gnu
Options given when GCC was configured/built: ../gcc-trunk-20201231/configure
--prefix=/opt/compiler-explorer/gcc-build/staging --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu --disable-bootstrap
--enable-multiarch --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --enable-clocale=gnu --enable-languages=c,c++,fortran,ada,d
--enable-ld=yes --enable-gold=yes --enable-libstdcxx-debug
--enable-libstdcxx-time=yes --enable-linker-build-id --enable-lto
--enable-plugins --enable-threads=posix
--with-pkgversion=Compiler-Explorer-Build
Complete command line that triggers the bug: g++ -g -o
/tmp/compiler-explorer-compiler202101-4546-1y1nk07.gl18/output.s -masm=intel -S
-fdiagnostics-color=always -O3 -std=gnu++2a -DNDEBUG -g0
/tmp/compiler-explorer-compiler202101-4546-1y1nk07.gl18/example.cpp
Compiler output (error messages, warnings, etc.): (nothing)
Preprocessed file (*.i*) that triggers the bug: Would have been sent as an
attachment to this message but was accidentally previously sent as an
attachment to a blank message, so you can find it there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
                   ` (4 preceding siblings ...)
  2021-01-01  0:53 ` gabravier at gmail dot com
@ 2021-01-01 12:50 ` tkoenig at gcc dot gnu.org
  2021-01-05 10:00 ` rguenth at gcc dot gnu.org
  2021-09-02  8:56 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-01-01 12:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
           Severity|normal                      |enhancement
           Keywords|                            |ra
   Target Milestone|---                         |11.0
          Component|target                      |rtl-optimization

--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Thanks for the expanded test case.

The assembly with gcc 8 for int is identical to that with trunk,
for _int128 it is a bit longer, so this does not appear to be
a regression.

The 128-bit case looks like a problem with register allocation.

Confirming, changing component, adding relevant keywords.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
                   ` (5 preceding siblings ...)
  2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
@ 2021-01-05 10:00 ` rguenth at gcc dot gnu.org
  2021-09-02  8:56 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-05 10:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|x86_64 i?86                 |x86_64-*-* i?86-*-*
   Target Milestone|11.0                        |---

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I wonder whether (_Tp)((type-with-double-size)__a + (..)__b)/2 will be faster
for some types.

Or whether we should transform such pattern to the fancy one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/98438] Rather bad optimization of midpoint implementation for __int128 (and other types)
  2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
                   ` (6 preceding siblings ...)
  2021-01-05 10:00 ` rguenth at gcc dot gnu.org
@ 2021-09-02  8:56 ` pinskia at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-02  8:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98438

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2020-12-31 00:00:00         |2021-9-2

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Part of this is register allocation but the __int128_t one seems to be more and
it comes down to how we don't handle double register integer arthematic that
well.  On the gimple level we leave it as normal and only split it up during
expansion to RTL.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-02  8:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-24 16:03 [Bug target/98438] New: Rather bad optimization of midpoint implementation for __int128 (and other types) gabravier at gmail dot com
2020-12-31 17:34 ` [Bug target/98438] " tkoenig at gcc dot gnu.org
2020-12-31 18:38 ` gabravier at gmail dot com
2020-12-31 19:21 ` tkoenig at gcc dot gnu.org
2021-01-01  0:47 ` gabravier at gmail dot com
2021-01-01  0:53 ` gabravier at gmail dot com
2021-01-01 12:50 ` [Bug rtl-optimization/98438] " tkoenig at gcc dot gnu.org
2021-01-05 10:00 ` rguenth at gcc dot gnu.org
2021-09-02  8:56 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).