public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions
@ 2024-02-18 19:07 janschultke at googlemail dot com
2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: janschultke at googlemail dot com @ 2024-02-18 19:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Bug ID: 113982
Summary: Poor codegen for 64-bit add with carry widening
functions
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: janschultke at googlemail dot com
Target Milestone: ---
I was trying to get optimal codegen for a 64-bit addition with a carry, but
it's tough to do with GCC:
> struct add_result {
> unsigned long long sum;
> bool carry;
> };
>
> add_result add_wide_1(unsigned long long x, unsigned long long y) {
> auto r = (unsigned __int128) x + y;
> return add_result{static_cast<unsigned long long>(r), bool(r >> 64)};
> }
>
> add_result add_wide_2(unsigned long long x, unsigned long long y) {
> unsigned long long r;
> bool carry = __builtin_add_overflow(x, y, &r);
> return add_result{r, carry};
> }
## Expected output (clang -march=x86-64-v4 -O3)
add_wide_1(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
add_wide_2(unsigned long long, unsigned long long):
mov rax, rdi
add rax, rsi
setb dl
ret
## Actual output (GCC -march=x86-64-v4 -O3) (https://godbolt.org/z/qGc9WeEvK)
add_wide_1(unsigned long long, unsigned long long):
mov rcx, rdi
lea rax, [rdi+rsi]
xor edx, edx
xor edi, edi
add rsi, rcx
adc rdi, 0
mov dl, dil
and dl, 1
ret
add_wide_2(unsigned long long, unsigned long long):
add rdi, rsi
mov edx, 0
mov rax, rdi
setc dl
ret
The output for the 128-bit version looks pretty bad.
It looks like GCC isn't aware that we only access the carry bit, so it doesn't
need to do full 128-bit arithmetic so to speak.
The add_wide_2 output also isn't optimal. Why would it output "mov edx, 0"
instead of "xor edx, edx"?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
@ 2024-02-18 19:20 ` pinskia at gcc dot gnu.org
2024-02-18 19:23 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 19:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|target |middle-end
Last reconfirmed| |2024-02-18
Severity|normal |enhancement
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
aarch64 looks fine (ok there is an one extra mov):
```
add_wide_1(unsigned long long, unsigned long long):
adds x2, x0, x1
mov x0, x2
cset x1, cs
ret
add_wide_2(unsigned long long, unsigned long long):
adds x0, x0, x1
cset x1, cs
ret
```
So we have:
```
_1 = (__int128 unsigned) x_6(D);
_2 = (__int128 unsigned) y_7(D);
r_8 = _1 + _2;
_3 = x_6(D) + y_7(D);
D.2566.sum = _3;
_4 = r_8 >> 64;
_5 = (bool) _4;
D.2566.carry = _5;
```
So if we should convert _5 into:
_t = .ADD_OVERFLOW (x_3(D), y_4(D));
_t2 = IMAGPART_EXPR <_t>;
_5 = (bool) _t2;
And then later on see that r_8 is REALPART_EXPR<_t>;
It would just work ...
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
@ 2024-02-18 19:23 ` pinskia at gcc dot gnu.org
2024-02-18 19:42 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 19:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note some of this is due to return register issues.
If we instead do stores:
```
add_wide_1(unsigned long long, unsigned long long, add_result*):
mov rax, rdi
lea rcx, [rdi+rsi]
xor edi, edi
add rsi, rax
mov QWORD PTR [rdx], rcx
adc rdi, 0
mov BYTE PTR [rdx+8], dil
and BYTE PTR [rdx+8], 1
ret
```
Which is better.
add_wide_2 becomes:
```
add_wide_2(unsigned long long, unsigned long long, add_result*):
add rdi, rsi
setc BYTE PTR [rdx+8]
mov QWORD PTR [rdx], rdi
and BYTE PTR [rdx+8], 1
ret
```
Which is better, the extra and is filed already though I can't seem to find it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
2024-02-18 19:23 ` pinskia at gcc dot gnu.org
@ 2024-02-18 19:42 ` jakub at gcc dot gnu.org
2024-02-19 8:16 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18 19:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We pattern recognize just something like (or r < y):
add_result add_wide_3(unsigned long long x, unsigned long long y) {
auto r = x + y;
return add_result{r, r < x};
}
and not doing the addition uselessly in twice as big mode + shift + comparison
against 0.
The reason for mov edx, 0 is not to clobber flags which are live at that point,
of course one could also move the clearing of edx one insn earlier and then it
could be xor edx, edx.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (2 preceding siblings ...)
2024-02-18 19:42 ` jakub at gcc dot gnu.org
@ 2024-02-19 8:16 ` rguenth at gcc dot gnu.org
2024-02-19 8:22 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-19 8:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed for more pattern recog.
Possibly documenting GCC recognized idoms for these kind of operations
might be a nice thing to have.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (3 preceding siblings ...)
2024-02-19 8:16 ` rguenth at gcc dot gnu.org
@ 2024-02-19 8:22 ` jakub at gcc dot gnu.org
2024-05-10 13:02 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-19 8:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With signed +- overflow not sure what exactly to pattern match, people can be
really creative there. I guess
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 64) != (w >> 63);
or
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 63) + (unsigned __int128) 1 <= 1
etc.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (4 preceding siblings ...)
2024-02-19 8:22 ` jakub at gcc dot gnu.org
@ 2024-05-10 13:02 ` jakub at gcc dot gnu.org
2024-05-10 16:27 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-10 13:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, since PR95853 we also recognize bool(r > ~0ULL) as the check rather than
bool(r >> 64).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (5 preceding siblings ...)
2024-05-10 13:02 ` jakub at gcc dot gnu.org
@ 2024-05-10 16:27 ` jakub at gcc dot gnu.org
2024-05-13 9:17 ` cvs-commit at gcc dot gnu.org
2024-05-13 9:25 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-10 16:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 58179
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58179&action=edit
gcc15-pr113982.patch
Untested fix.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (6 preceding siblings ...)
2024-05-10 16:27 ` jakub at gcc dot gnu.org
@ 2024-05-13 9:17 ` cvs-commit at gcc dot gnu.org
2024-05-13 9:25 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-13 9:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:b621482296f6dec0abb22ed39cc4ce6811535d47
commit r15-427-gb621482296f6dec0abb22ed39cc4ce6811535d47
Author: Jakub Jelinek <jakub@redhat.com>
Date: Mon May 13 11:15:27 2024 +0200
tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern
[PR113982]
We pattern recognize already many different patterns, and closest to the
requested one also
yc = (type) y;
zc = (type) z;
x = yc + zc;
w = (typeof_y) x;
if (x > max)
where y/z has the same unsigned type and type is a wider unsigned type
and max is maximum value of the narrower unsigned type.
But apparently people are creative in writing this in diffent ways,
this requests
yc = (type) y;
zc = (type) z;
x = yc + zc;
w = (typeof_y) x;
if (x >> narrower_type_bits)
The following patch implements that.
2024-05-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113982
* tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1
for RSHIFT_EXPR by precision of maxval if shift result is only
used in a cast or comparison against zero.
(match_arith_overflow): Handle the RSHIFT_EXPR use case.
* gcc.dg/pr113982.c: New test.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
` (7 preceding siblings ...)
2024-05-13 9:17 ` cvs-commit at gcc dot gnu.org
@ 2024-05-13 9:25 ` jakub at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-13 9:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-13 9:25 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
2024-02-18 19:23 ` pinskia at gcc dot gnu.org
2024-02-18 19:42 ` jakub at gcc dot gnu.org
2024-02-19 8:16 ` rguenth at gcc dot gnu.org
2024-02-19 8:22 ` jakub at gcc dot gnu.org
2024-05-10 13:02 ` jakub at gcc dot gnu.org
2024-05-10 16:27 ` jakub at gcc dot gnu.org
2024-05-13 9:17 ` cvs-commit at gcc dot gnu.org
2024-05-13 9:25 ` jakub at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).