[Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions
@ 2024-02-18 19:07 janschultke at googlemail dot com
  2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: janschultke at googlemail dot com @ 2024-02-18 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

            Bug ID: 113982
           Summary: Poor codegen for 64-bit add with carry widening
                    functions
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: janschultke at googlemail dot com
  Target Milestone: ---

I was trying to get optimal codegen for a 64-bit addition with a carry, but
it's tough to do with GCC:

> struct add_result {
>     unsigned long long sum;
>     bool carry;
> };
> 
> add_result add_wide_1(unsigned long long x, unsigned long long y) {
>     auto r = (unsigned __int128) x + y;
>     return add_result{static_cast<unsigned long long>(r), bool(r >> 64)};
> }
> 
> add_result add_wide_2(unsigned long long x, unsigned long long y) {
>     unsigned long long r;
>     bool carry = __builtin_add_overflow(x, y, &r);
>     return add_result{r, carry};
> }


## Expected output (clang -march=x86-64-v4 -O3)

add_wide_1(unsigned long long, unsigned long long):
        mov     rax, rdi
        add     rax, rsi
        setb    dl
        ret
add_wide_2(unsigned long long, unsigned long long):
        mov     rax, rdi
        add     rax, rsi
        setb    dl
        ret

## Actual output (GCC -march=x86-64-v4 -O3) (https://godbolt.org/z/qGc9WeEvK)

add_wide_1(unsigned long long, unsigned long long):
        mov     rcx, rdi
        lea     rax, [rdi+rsi]
        xor     edx, edx
        xor     edi, edi
        add     rsi, rcx
        adc     rdi, 0
        mov     dl, dil
        and     dl, 1
        ret
add_wide_2(unsigned long long, unsigned long long):
        add     rdi, rsi
        mov     edx, 0
        mov     rax, rdi
        setc    dl
        ret


The output for the 128-bit version looks pretty bad.
It looks like GCC isn't aware that we only access the carry bit, so it doesn't
need to do full 128-bit arithmetic so to speak.

The add_wide_2 output also isn't optimal. Why would it output "mov edx, 0"
instead of "xor edx, edx"?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
@ 2024-02-18 19:20 ` pinskia at gcc dot gnu.org
  2024-02-18 19:23 ` pinskia at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 19:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |middle-end
   Last reconfirmed|                            |2024-02-18
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
aarch64 looks fine (ok there is an one extra mov):
```
add_wide_1(unsigned long long, unsigned long long):
        adds    x2, x0, x1
        mov     x0, x2
        cset    x1, cs
        ret
add_wide_2(unsigned long long, unsigned long long):
        adds    x0, x0, x1
        cset    x1, cs
        ret

```

So we have:
```
  _1 = (__int128 unsigned) x_6(D);
  _2 = (__int128 unsigned) y_7(D);
  r_8 = _1 + _2;
  _3 = x_6(D) + y_7(D);
  D.2566.sum = _3;
  _4 = r_8 >> 64;
  _5 = (bool) _4;
  D.2566.carry = _5;
```

So if we should convert _5 into:
  _t = .ADD_OVERFLOW (x_3(D), y_4(D));
  _t2 = IMAGPART_EXPR <_t>;
  _5 = (bool) _t2;

And then later on see that r_8 is REALPART_EXPR<_t>;
It would just work ...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
  2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
@ 2024-02-18 19:23 ` pinskia at gcc dot gnu.org
  2024-02-18 19:42 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-18 19:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note some of this is due to return register issues.

If we instead do stores:
```
add_wide_1(unsigned long long, unsigned long long, add_result*):
        mov     rax, rdi
        lea     rcx, [rdi+rsi]
        xor     edi, edi
        add     rsi, rax
        mov     QWORD PTR [rdx], rcx
        adc     rdi, 0
        mov     BYTE PTR [rdx+8], dil
        and     BYTE PTR [rdx+8], 1
        ret
```

Which is better.

add_wide_2 becomes:
```
add_wide_2(unsigned long long, unsigned long long, add_result*):
        add     rdi, rsi
        setc    BYTE PTR [rdx+8]
        mov     QWORD PTR [rdx], rdi
        and     BYTE PTR [rdx+8], 1
        ret
```

Which is better, the extra and is filed already though I can't seem to find it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
  2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
  2024-02-18 19:23 ` pinskia at gcc dot gnu.org
@ 2024-02-18 19:42 ` jakub at gcc dot gnu.org
  2024-02-19  8:16 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-18 19:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We pattern recognize just something like (or r < y):
add_result add_wide_3(unsigned long long x, unsigned long long y) {
    auto r = x + y;
    return add_result{r, r < x};
}
and not doing the addition uselessly in twice as big mode + shift + comparison
against 0.
The reason for mov edx, 0 is not to clobber flags which are live at that point,
of course one could also move the clearing of edx one insn earlier and then it
could be xor edx, edx.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (2 preceding siblings ...)
  2024-02-18 19:42 ` jakub at gcc dot gnu.org
@ 2024-02-19  8:16 ` rguenth at gcc dot gnu.org
  2024-02-19  8:22 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-02-19  8:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed for more pattern recog.

Possibly documenting GCC recognized idoms for these kind of operations
might be a nice thing to have.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (3 preceding siblings ...)
  2024-02-19  8:16 ` rguenth at gcc dot gnu.org
@ 2024-02-19  8:22 ` jakub at gcc dot gnu.org
  2024-05-10 13:02 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-02-19  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With signed +- overflow not sure what exactly to pattern match, people can be
really creative there.  I guess
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 64) != (w >> 63);
or
w = (__int128) x + y;
r = (long long) w;
ovf = (w >> 63) + (unsigned __int128) 1 <= 1
etc.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (4 preceding siblings ...)
  2024-02-19  8:22 ` jakub at gcc dot gnu.org
@ 2024-05-10 13:02 ` jakub at gcc dot gnu.org
  2024-05-10 16:27 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-10 13:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Note, since PR95853 we also recognize bool(r > ~0ULL) as the check rather than
bool(r >> 64).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (5 preceding siblings ...)
  2024-05-10 13:02 ` jakub at gcc dot gnu.org
@ 2024-05-10 16:27 ` jakub at gcc dot gnu.org
  2024-05-13  9:17 ` cvs-commit at gcc dot gnu.org
  2024-05-13  9:25 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-10 16:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 58179
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58179&action=edit
gcc15-pr113982.patch

Untested fix.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (6 preceding siblings ...)
  2024-05-10 16:27 ` jakub at gcc dot gnu.org
@ 2024-05-13  9:17 ` cvs-commit at gcc dot gnu.org
  2024-05-13  9:25 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-13  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:b621482296f6dec0abb22ed39cc4ce6811535d47

commit r15-427-gb621482296f6dec0abb22ed39cc4ce6811535d47
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Mon May 13 11:15:27 2024 +0200

    tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern
[PR113982]

    We pattern recognize already many different patterns, and closest to the
    requested one also
       yc = (type) y;
       zc = (type) z;
       x = yc + zc;
       w = (typeof_y) x;
       if (x > max)
    where y/z has the same unsigned type and type is a wider unsigned type
    and max is maximum value of the narrower unsigned type.
    But apparently people are creative in writing this in diffent ways,
    this requests
       yc = (type) y;
       zc = (type) z;
       x = yc + zc;
       w = (typeof_y) x;
       if (x >> narrower_type_bits)

    The following patch implements that.

    2024-05-13  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/113982
            * tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1
            for RSHIFT_EXPR by precision of maxval if shift result is only
            used in a cast or comparison against zero.
            (match_arith_overflow): Handle the RSHIFT_EXPR use case.

            * gcc.dg/pr113982.c: New test.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/113982] Poor codegen for 64-bit add with carry widening functions
  2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
                   ` (7 preceding siblings ...)
  2024-05-13  9:17 ` cvs-commit at gcc dot gnu.org
@ 2024-05-13  9:25 ` jakub at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-13  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113982

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Should be fixed now.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-05-13  9:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-18 19:07 [Bug c++/113982] New: Poor codegen for 64-bit add with carry widening functions janschultke at googlemail dot com
2024-02-18 19:20 ` [Bug middle-end/113982] " pinskia at gcc dot gnu.org
2024-02-18 19:23 ` pinskia at gcc dot gnu.org
2024-02-18 19:42 ` jakub at gcc dot gnu.org
2024-02-19  8:16 ` rguenth at gcc dot gnu.org
2024-02-19  8:22 ` jakub at gcc dot gnu.org
2024-05-10 13:02 ` jakub at gcc dot gnu.org
2024-05-10 16:27 ` jakub at gcc dot gnu.org
2024-05-13  9:17 ` cvs-commit at gcc dot gnu.org
2024-05-13  9:25 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).