public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain
@ 2023-06-03 13:53 slash.tmp at free dot fr
  2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-03 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104

            Bug ID: 110104
           Summary: gcc produces sub-optimal code for _addcarry_u64 chain
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: slash.tmp at free dot fr
  Target Milestone: ---

Consider the following code:

#include <x86intrin.h>
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a*b;
  u64 lo = res, hi = res >> 64;
  unsigned char cf = 0;
  cf = _addcarry_u64(cf, lo, acc[0], acc+0);
  cf = _addcarry_u64(cf, hi, acc[1], acc+1);
  cf = _addcarry_u64(cf,  0, acc[2], acc+2);
}
void testcase2(u64 *acc, u64 a, u64 b)
{
  u128 res = (u128)a * b;
  u64 lo = res, hi = res >> 64;
  asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" :
  [D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) :
  [LO] "r" (lo), [HI] "r" (hi) : "cc");
}

Compilation with either
gcc-trunk -Wall -Wextra -O3 -S testcase.c
gcc-trunk -Wall -Wextra -Os -S testcase.c
generate the same code:

// rdi = acc, rsi = a, rdx = b

testcase1:
  movq %rsi, %rax
  mulq %rdx
  addq %rax, (%rdi)
  movq %rdx, %rax
  adcq 8(%rdi), %rax
  adcq $0, 16(%rdi)
  movq %rax, 8(%rdi)
  ret

testcase2:
  movq %rsi, %rax       ; rax = rsi = a
  mulq %rdx             ; rdx:rax = rax*rdx = a*b
  add %rax, (%rdi)      ; acc[0] += lo
  adc %rdx, 8(%rdi)     ; acc[1] += hi + cf
  adc $0, 16(%rdi)      ; acc[2] += cf
  ret


Conclusion:
gcc generates the expected code for testcase2.
However, the code for testcase1 is sub-optimal.

  movq %rdx, %rax
  adcq 8(%rdi), %rax
  movq %rax, 8(%rdi)

instead of

  adc %rdx, 8(%rdi)     ; acc[1] += hi + cf


The copy of rdx to rax is useless.
The (load/add+store) ops can be merged into an load/add/store op.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-07 11:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
2023-06-08 20:02 ` roger at nextmovesoftware dot com
2023-06-14  9:22 ` slash.tmp at free dot fr
2023-06-15  7:37 ` jakub at gcc dot gnu.org
2023-06-16 13:50 ` slash.tmp at free dot fr
2023-07-07 11:29 ` slash.tmp at free dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).