public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain
@ 2023-06-03 13:53 slash.tmp at free dot fr
  2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-03 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

            Bug ID: 110104
           Summary: gcc produces sub-optimal code for _addcarry_u64 chain
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: slash.tmp at free dot fr
  Target Milestone: ---

Consider the following code:

#include <x86intrin.h>
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a*b;
  u64 lo = res, hi = res >> 64;
  unsigned char cf = 0;
  cf = _addcarry_u64(cf, lo, acc[0], acc+0);
  cf = _addcarry_u64(cf, hi, acc[1], acc+1);
  cf = _addcarry_u64(cf,  0, acc[2], acc+2);
}
void testcase2(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a * b;
  u64 lo = res, hi = res >> 64;
  asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" :
  [D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) :
  [LO] "r" (lo), [HI] "r" (hi) : "cc");
}

Compilation with either
gcc-trunk -Wall -Wextra -O3 -S testcase.c
gcc-trunk -Wall -Wextra -Os -S testcase.c
generate the same code:

// rdi = acc, rsi = a, rdx = b

testcase1:
  movq %rsi, %rax
  mulq %rdx
  addq %rax, (%rdi)
  movq %rdx, %rax
  adcq 8(%rdi), %rax
  adcq $0, 16(%rdi)
  movq %rax, 8(%rdi)
  ret

testcase2:
  movq %rsi, %rax       ; rax = rsi = a
  mulq %rdx             ; rdx:rax = rax*rdx = a*b
  add %rax, (%rdi)      ; acc[0] += lo
  adc %rdx, 8(%rdi)     ; acc[1] += hi + cf
  adc $0, 16(%rdi)      ; acc[2] += cf
  ret


Conclusion:
gcc generates the expected code for testcase2.
However, the code for testcase1 is sub-optimal.

  movq %rdx, %rax
  adcq 8(%rdi), %rax
  movq %rax, 8(%rdi)

instead of

  adc %rdx, 8(%rdi)     ; acc[1] += hi + cf


The copy of rdx to rax is useless.
The (load/add+store) ops can be merged into an load/add/store op.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
@ 2023-06-04 16:12 ` roger at nextmovesoftware dot com
  2023-06-08 20:02 ` roger at nextmovesoftware dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-06-04 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
   Last reconfirmed|                            |2023-06-04

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
  2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
@ 2023-06-08 20:02 ` roger at nextmovesoftware dot com
  2023-06-14  9:22 ` slash.tmp at free dot fr
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-06-08 20:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|roger at nextmovesoftware dot com  |unassigned at gcc dot gnu.org

--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
I proposed a fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620823.html
but this was obsoleted by a much more comprehensive patch (for PR79193)
proposed by Jakub just an hour earlier:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620821.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
  2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
  2023-06-08 20:02 ` roger at nextmovesoftware dot com
@ 2023-06-14  9:22 ` slash.tmp at free dot fr
  2023-06-15  7:37 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-14  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

--- Comment #2 from Mason <slash.tmp at free dot fr> ---
You meant PR79173 ;)

Latest update:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621554.html

I didn't see my testcase specifically in Jakub's patch,
but I'll test trunk on godbolt when/if the patch lands.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
                   ` (2 preceding siblings ...)
  2023-06-14  9:22 ` slash.tmp at free dot fr
@ 2023-06-15  7:37 ` jakub at gcc dot gnu.org
  2023-06-16 13:50 ` slash.tmp at free dot fr
  2023-07-07 11:29 ` slash.tmp at free dot fr
  5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-15  7:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC trunk now emits
        movq    %rsi, %rax
        mulq    %rdx
        addq    %rax, (%rdi)
        adcq    %rdx, 8(%rdi)
        adcq    $0, 16(%rdi)
        ret

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
                   ` (3 preceding siblings ...)
  2023-06-15  7:37 ` jakub at gcc dot gnu.org
@ 2023-06-16 13:50 ` slash.tmp at free dot fr
  2023-07-07 11:29 ` slash.tmp at free dot fr
  5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-16 13:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

--- Comment #4 from Mason <slash.tmp at free dot fr> ---
I confirm that trunk now emits the same code for testcase1 and testcase2.
Thanks Jakub and Roger, great work!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
  2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
                   ` (4 preceding siblings ...)
  2023-06-16 13:50 ` slash.tmp at free dot fr
@ 2023-07-07 11:29 ` slash.tmp at free dot fr
  5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-07-07 11:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

--- Comment #5 from Mason <slash.tmp at free dot fr> ---
FWIW, trunk (gcc14) translates testcase3 to the same code as the other
testcases, while remaining portable across all architectures:

$ gcc-trunk -O3 -march=bdver3 testcase3.c

typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase3(u64 *acc, u64 a, u64 b)
{
  int c1, c2;
  u128 res = (u128)a * b;
  u64 lo = res, hi = res >> 64;
  c1 = __builtin_add_overflow(lo, acc[0], &acc[0]);
  c2 = __builtin_add_overflow(hi, acc[1], &acc[1])
     | __builtin_add_overflow(c1, acc[1], &acc[1]);
       __builtin_add_overflow(c2, acc[2], &acc[2]);
}

testcase3:
        movq    %rsi, %rax
        mulq    %rdx
        addq    %rax, (%rdi)
        adcq    %rdx, 8(%rdi)
        adcq    $0, 16(%rdi)
        ret

Thanks again, Jakub.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-07 11:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
2023-06-08 20:02 ` roger at nextmovesoftware dot com
2023-06-14  9:22 ` slash.tmp at free dot fr
2023-06-15  7:37 ` jakub at gcc dot gnu.org
2023-06-16 13:50 ` slash.tmp at free dot fr
2023-07-07 11:29 ` slash.tmp at free dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).