public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain
@ 2023-06-03 13:53 slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-03 13:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
Bug ID: 110104
Summary: gcc produces sub-optimal code for _addcarry_u64 chain
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: slash.tmp at free dot fr
Target Milestone: ---
Consider the following code:
#include <x86intrin.h>
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase1(u64 *acc, u64 a, u64 b)
{
u128 res = (u128)a*b;
u64 lo = res, hi = res >> 64;
unsigned char cf = 0;
cf = _addcarry_u64(cf, lo, acc[0], acc+0);
cf = _addcarry_u64(cf, hi, acc[1], acc+1);
cf = _addcarry_u64(cf, 0, acc[2], acc+2);
}
void testcase2(u64 *acc, u64 a, u64 b)
{
u128 res = (u128)a * b;
u64 lo = res, hi = res >> 64;
asm("add %[LO], %[D0]\n\t" "adc %[HI], %[D1]\n\t" "adc $0, %[D2]" :
[D0] "+m" (acc[0]), [D1] "+m" (acc[1]), [D2] "+m" (acc[2]) :
[LO] "r" (lo), [HI] "r" (hi) : "cc");
}
Compilation with either
gcc-trunk -Wall -Wextra -O3 -S testcase.c
gcc-trunk -Wall -Wextra -Os -S testcase.c
generate the same code:
// rdi = acc, rsi = a, rdx = b
testcase1:
movq %rsi, %rax
mulq %rdx
addq %rax, (%rdi)
movq %rdx, %rax
adcq 8(%rdi), %rax
adcq $0, 16(%rdi)
movq %rax, 8(%rdi)
ret
testcase2:
movq %rsi, %rax ; rax = rsi = a
mulq %rdx ; rdx:rax = rax*rdx = a*b
add %rax, (%rdi) ; acc[0] += lo
adc %rdx, 8(%rdi) ; acc[1] += hi + cf
adc $0, 16(%rdi) ; acc[2] += cf
ret
Conclusion:
gcc generates the expected code for testcase2.
However, the code for testcase1 is sub-optimal.
movq %rdx, %rax
adcq 8(%rdi), %rax
movq %rax, 8(%rdi)
instead of
adc %rdx, 8(%rdi) ; acc[1] += hi + cf
The copy of rdx to rax is useless.
The (load/add+store) ops can be merged into an load/add/store op.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
@ 2023-06-04 16:12 ` roger at nextmovesoftware dot com
2023-06-08 20:02 ` roger at nextmovesoftware dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-06-04 16:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
Status|UNCONFIRMED |ASSIGNED
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
Last reconfirmed| |2023-06-04
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
@ 2023-06-08 20:02 ` roger at nextmovesoftware dot com
2023-06-14 9:22 ` slash.tmp at free dot fr
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-06-08 20:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
Assignee|roger at nextmovesoftware dot com |unassigned at gcc dot gnu.org
--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
I proposed a fix at
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620823.html
but this was obsoleted by a much more comprehensive patch (for PR79193)
proposed by Jakub just an hour earlier:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620821.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
2023-06-08 20:02 ` roger at nextmovesoftware dot com
@ 2023-06-14 9:22 ` slash.tmp at free dot fr
2023-06-15 7:37 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-14 9:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #2 from Mason <slash.tmp at free dot fr> ---
You meant PR79173 ;)
Latest update:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621554.html
I didn't see my testcase specifically in Jakub's patch,
but I'll test trunk on godbolt when/if the patch lands.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
` (2 preceding siblings ...)
2023-06-14 9:22 ` slash.tmp at free dot fr
@ 2023-06-15 7:37 ` jakub at gcc dot gnu.org
2023-06-16 13:50 ` slash.tmp at free dot fr
2023-07-07 11:29 ` slash.tmp at free dot fr
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-15 7:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC trunk now emits
movq %rsi, %rax
mulq %rdx
addq %rax, (%rdi)
adcq %rdx, 8(%rdi)
adcq $0, 16(%rdi)
ret
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
` (3 preceding siblings ...)
2023-06-15 7:37 ` jakub at gcc dot gnu.org
@ 2023-06-16 13:50 ` slash.tmp at free dot fr
2023-07-07 11:29 ` slash.tmp at free dot fr
5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-06-16 13:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #4 from Mason <slash.tmp at free dot fr> ---
I confirm that trunk now emits the same code for testcase1 and testcase2.
Thanks Jakub and Roger, great work!
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/110104] gcc produces sub-optimal code for _addcarry_u64 chain
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
` (4 preceding siblings ...)
2023-06-16 13:50 ` slash.tmp at free dot fr
@ 2023-07-07 11:29 ` slash.tmp at free dot fr
5 siblings, 0 replies; 7+ messages in thread
From: slash.tmp at free dot fr @ 2023-07-07 11:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110104
--- Comment #5 from Mason <slash.tmp at free dot fr> ---
FWIW, trunk (gcc14) translates testcase3 to the same code as the other
testcases, while remaining portable across all architectures:
$ gcc-trunk -O3 -march=bdver3 testcase3.c
typedef unsigned long long u64;
typedef unsigned __int128 u128;
void testcase3(u64 *acc, u64 a, u64 b)
{
int c1, c2;
u128 res = (u128)a * b;
u64 lo = res, hi = res >> 64;
c1 = __builtin_add_overflow(lo, acc[0], &acc[0]);
c2 = __builtin_add_overflow(hi, acc[1], &acc[1])
| __builtin_add_overflow(c1, acc[1], &acc[1]);
__builtin_add_overflow(c2, acc[2], &acc[2]);
}
testcase3:
movq %rsi, %rax
mulq %rdx
addq %rax, (%rdi)
adcq %rdx, 8(%rdi)
adcq $0, 16(%rdi)
ret
Thanks again, Jakub.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-07 11:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-03 13:53 [Bug target/110104] New: gcc produces sub-optimal code for _addcarry_u64 chain slash.tmp at free dot fr
2023-06-04 16:12 ` [Bug target/110104] " roger at nextmovesoftware dot com
2023-06-08 20:02 ` roger at nextmovesoftware dot com
2023-06-14 9:22 ` slash.tmp at free dot fr
2023-06-15 7:37 ` jakub at gcc dot gnu.org
2023-06-16 13:50 ` slash.tmp at free dot fr
2023-07-07 11:29 ` slash.tmp at free dot fr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).