public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
@ 2020-11-15 21:21 ` already5chosen at yahoo dot com
  2020-11-15 21:24 ` already5chosen at yahoo dot com
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: already5chosen at yahoo dot com @ 2020-11-15 21:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Michael_S <already5chosen at yahoo dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |already5chosen at yahoo dot com

--- Comment #5 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Vincent Lefèvre from comment #0)
> There should be a way to support full add-with-carry and
> subtract-with-borrow by generating adc / sbb instructions on x86_64 (and
> similar instructions on other targets).
> 
> GCC could add builtins, such as __builtin_addc* and __builtin_subc* (two
> arguments, carry in, carry out, and the result), similar to Clang:
> http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-
> builtins
> as suggested in PR 60206 comment 3.
> 
> Detection of special constructs in standard C/... code would be useful too.
> Here are some examples from
> https://gcc.gnu.org/ml/gcc-help/2017-01/msg00067.html for subtraction:
> 
> typedef unsigned long T;
> 
> void sub1 (T *p, T u0, T u1, T u2, T v0, T v1, T v2)
> {
>   T t1;
>   int b0, b1;
> 
>   p[0] = u0 - v0;
>   b0 = u0 < v0;
>   t1 = u1 - v1;
>   b1 = u1 < v1;
>   p[1] = t1 - b0;
>   b1 |= p[1] > t1;
>   p[2] = u2 - v2 - b1;
> }
> 
> void sub2 (T *p, T u0, T u1, T u2, T v0, T v1, T v2)
> {
>   int b0, b1;
> 
>   p[0] = u0 - v0;
>   b0 = u0 < v0;
>   p[1] = u1 - v1 - b0;
>   b1 = u1 < v1 || (u1 == v1 && b0 != 0);
>   p[2] = u2 - v2 - b1;
> }
> 
> In the second example, the b1 line could also be replaced by:
> 
>   b1 = u1 < v1 + b0 || v1 + b0 < v1;
> 
> For the subtractions, optimal code would contain 1 sub and 2 sbb's.

I agree with regard to "other targets", first of all, aarch64, but x86_64
variant of gcc already provides requested functionality in for of
_subborrow_u64 () and _addcarry_u64() intrinsic functions.
The problem here is not lack of functionality, but very poor implementation
(mentioned many times on bugzilla with minimal effect).
In that regard gcc is more than decade behind MSVC and ~4 years behind
clang/llvm. Surprisingly, icc is also quite bad.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
  2020-11-15 21:21 ` [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others) already5chosen at yahoo dot com
@ 2020-11-15 21:24 ` already5chosen at yahoo dot com
  2020-11-15 21:32 ` jakub at gcc dot gnu.org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: already5chosen at yahoo dot com @ 2020-11-15 21:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #6 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Marc Glisse from comment #1)
> We could start with the simpler:
> 
> void f(unsigned*__restrict__ r,unsigned*__restrict__ s,unsigned a,unsigned
> b,unsigned c, unsigned d){
>   *r=a+b;
>   *s=c+d+(*r<a);
> }
> 

That works for dual-precision addition, but not for triple or more.

> After combine, we have:
> 
> (insn 34 12 20 2 (set (reg:SI 93 [ _15+4 ])
>         (ltu:SI (reg:CCC 17 flags)
>             (const_int 0 [0]))) 608 {*setcc_si_1_movzbl}
>      (expr_list:REG_DEAD (reg:CCC 17 flags)
>         (nil)))
> 
> (insn 21 20 22 2 (parallel [
>             (set (reg:SI 102)
>                 (plus:SI (reg:SI 37 r8 [ c ])
>                     (reg:SI 38 r9 [ d ])))
>             (clobber (reg:CC 17 flags))
>         ]) "a.c":3 213 {*addsi_1}
>      (expr_list:REG_DEAD (reg:SI 38 r9 [ d ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (expr_list:REG_DEAD (reg:SI 37 r8 [ c ])
>                 (nil)))))
> 
> (insn 25 24 26 2 (parallel [
>             (set (reg:SI 105)
>                 (plus:SI (reg:SI 102)
>                     (reg:SI 93 [ _15+4 ])))
>             (clobber (reg:CC 17 flags))
>         ]) "a.c":3 213 {*addsi_1}
>      (expr_list:REG_DEAD (reg:SI 93 [ _15+4 ])
>         (expr_list:REG_UNUSED (reg:CC 17 flags)
>             (expr_list:REG_DEAD (reg:SI 102)
>                 (nil)))))
> 
> The combine dump says "Trying 21, 34 -> 25:" but the next line is blank and
> it moves on to trying something else.
> 
> If I use parentheses *s=c+(d+(*r<a)); and disable reassoc, I get:
> 
> Trying 23, 24 -> 25:
> Successfully matched this instruction:
> (parallel [
>         (set (reg:SI 105)
>             (plus:SI (plus:SI (ltu:SI (reg:CCC 17 flags)
>                         (const_int 0 [0]))
>                     (reg:SI 37 r8 [ c ]))
>                 (reg:SI 38 r9 [ d ])))
>         (clobber (reg:CC 17 flags))
>     ])
> Instruction not appropriate for target.
> 
> I don't know where that target restriction is coming from, but at least we
> seem to be getting somewhere.
> 
> If I remove c and keep *s=d+(*r<a), I get
> 
> Failed to match this instruction:
> (parallel [
>         (set (reg:SI 103)
>             (plus:SI (ltu:SI (reg:CCC 17 flags)
>                     (const_int 0 [0]))
>                 (reg:SI 38 r9 [ d ])))
>         (clobber (reg:CC 17 flags))
>     ])
> Failed to match this instruction:
> (set (reg:SI 103)
>     (plus:SI (ltu:SI (reg:CCC 17 flags)
>             (const_int 0 [0]))
>         (reg:SI 38 r9 [ d ])))
> 
> we would probably need a special pattern for this case, virtually adding 0.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
  2020-11-15 21:21 ` [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others) already5chosen at yahoo dot com
  2020-11-15 21:24 ` already5chosen at yahoo dot com
@ 2020-11-15 21:32 ` jakub at gcc dot gnu.org
  2020-11-15 22:30 ` already5chosen at yahoo dot com
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-15 21:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Michael_S from comment #5)
> I agree with regard to "other targets", first of all, aarch64, but x86_64
> variant of gcc already provides requested functionality in for of
> _subborrow_u64 () and _addcarry_u64() intrinsic functions.
> The problem here is not lack of functionality, but very poor implementation
> (mentioned many times on bugzilla with minimal effect).
> In that regard gcc is more than decade behind MSVC and ~4 years behind
> clang/llvm. Surprisingly, icc is also quite bad.

Are you sure you have tested gcc trunk?
There have been fixes for this a month ago as part of PR97387 fixes.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-11-15 21:32 ` jakub at gcc dot gnu.org
@ 2020-11-15 22:30 ` already5chosen at yahoo dot com
  2020-11-15 22:45 ` already5chosen at yahoo dot com
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: already5chosen at yahoo dot com @ 2020-11-15 22:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #8 from Michael_S <already5chosen at yahoo dot com> ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to Michael_S from comment #5)
> > I agree with regard to "other targets", first of all, aarch64, but x86_64
> > variant of gcc already provides requested functionality in for of
> > _subborrow_u64 () and _addcarry_u64() intrinsic functions.
> > The problem here is not lack of functionality, but very poor implementation
> > (mentioned many times on bugzilla with minimal effect).
> > In that regard gcc is more than decade behind MSVC and ~4 years behind
> > clang/llvm. Surprisingly, icc is also quite bad.
> 
> Are you sure you have tested gcc trunk?
> There have been fixes for this a month ago as part of PR97387 fixes.

I am sure I did not :-)
And I most likely am not going to test trunk, sorry. I'd wait for release.
Essentially I am posting here not because I deeply care about the topic right
now (I did care a years or so ago, but lost interest since then, see
https://www.realworldtech.com/forum/?threadid=188061&curpostid=188061), but
because I stepped on it occasionally while waiting for response to 97832.

So, sorry for intervention.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-11-15 22:30 ` already5chosen at yahoo dot com
@ 2020-11-15 22:45 ` already5chosen at yahoo dot com
  2021-05-28 15:34 ` tkoenig at gcc dot gnu.org
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: already5chosen at yahoo dot com @ 2020-11-15 22:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #9 from Michael_S <already5chosen at yahoo dot com> ---
Despite what I wrote above, I did took a look at the trunk on godbolt with same
old code from a year ago. Because it was so easy. And indeed a trunk looks ALOT
better.
But until it's released I wouldn't know if it's actually up to speed of MSVC
and clang.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2020-11-15 22:45 ` already5chosen at yahoo dot com
@ 2021-05-28 15:34 ` tkoenig at gcc dot gnu.org
  2021-09-09  6:48 ` [Bug middle-end/79173] " pinskia at gcc dot gnu.org
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2021-05-28 15:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tkoenig at gcc dot gnu.org
   Last reconfirmed|2017-01-23 00:00:00         |2021-5-28

--- Comment #10 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Just had a look at trunk.

It currently produces

adc:
        leaq    800(%rsi), %rcx
        xorl    %edx, %edx
.L2:
        movq    (%rdi), %rax
        addb    $-1, %dl
        adcq    (%rsi), %rax
        setc    %dl
        addq    $8, %rsi
        movq    %rax, (%rdi)
        addq    $8, %rdi
        cmpq    %rcx, %rsi
        jne     .L2
        ret

Clang does

adc:                                    # @adc
        movl    $4, %eax
        xorl    %ecx, %ecx
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movq    -32(%rsi,%rax,8), %rdx
        addb    $-1, %cl
        adcq    %rdx, -32(%rdi,%rax,8)
        movq    -24(%rsi,%rax,8), %rcx
        adcq    %rcx, -24(%rdi,%rax,8)
        movq    -16(%rsi,%rax,8), %rcx
        adcq    %rcx, -16(%rdi,%rax,8)
        movq    -8(%rsi,%rax,8), %rcx
        adcq    %rcx, -8(%rdi,%rax,8)
        movq    (%rsi,%rax,8), %rcx
        adcq    %rcx, (%rdi,%rax,8)
        setb    %cl
        addq    $5, %rax
        cmpq    $104, %rax
        jne     .LBB0_1
        retq

so it actually unrolls the loop and does the ideal sequence of
add with carry.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2021-05-28 15:34 ` tkoenig at gcc dot gnu.org
@ 2021-09-09  6:48 ` pinskia at gcc dot gnu.org
  2021-09-09  8:54 ` vincent-gcc at vinc17 dot net
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-09  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=97387
          Component|target                      |middle-end

--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
x86 has _addcarry_u64 which gets it mostly (see PR 97387).

The clang builtins __builtin_*_overflow are there but not the __builtin_add*
builtins.

GCC does do a decent job of optimizing the original code now too.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2021-09-09  6:48 ` [Bug middle-end/79173] " pinskia at gcc dot gnu.org
@ 2021-09-09  8:54 ` vincent-gcc at vinc17 dot net
  2021-09-09  9:05 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: vincent-gcc at vinc17 dot net @ 2021-09-09  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #12 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
(In reply to Andrew Pinski from comment #11)
> x86 has _addcarry_u64 which gets it mostly (see PR 97387).
> 
> The clang builtins __builtin_*_overflow are there but not the __builtin_add*
> builtins.
> 
> GCC does do a decent job of optimizing the original code now too.

By "original code", do you mean the code with _addcarry_u64 (I haven't tested)?
Otherwise, I don't see any optimization at all on the code I posted in Comment
0.

One issue is that _addcarry_u64 / x86intrin.h are not documented, so the
conditions of its use in portable code are not clear. I suppose that it is
designed to be used in a target-independent compiler builtin.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2021-09-09  8:54 ` vincent-gcc at vinc17 dot net
@ 2021-09-09  9:05 ` pinskia at gcc dot gnu.org
  2023-06-05  7:17 ` jakub at gcc dot gnu.org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-09  9:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #13 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Vincent Lefèvre from comment #12) 
> One issue is that _addcarry_u64 / x86intrin.h are not documented, so the
> conditions of its use in portable code are not clear. I suppose that it is
> designed to be used in a target-independent compiler builtin.

None of the x86 intrinsics are documented except on the Intel intrinsics guide
page:
e.g:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_addcarry_u64

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2021-09-09  9:05 ` pinskia at gcc dot gnu.org
@ 2023-06-05  7:17 ` jakub at gcc dot gnu.org
  2023-06-05 13:01 ` chfast at gmail dot com
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-05  7:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Unfortunately, the clang __builtin_addc* and __builtin_subc* builtins are badly
designed.
Instead of adding or subtracting a 1-bit carry, where the result is guaranteed
to have 1-bit carry as well, they are:
unsigned __builtin_addc (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out)
{
  unsigned r;
  unsigned c1 = __builtin_add_overflow (x, y, &r);
  unsigned c2 = __builtin_add_overflow (r, carry_in, &r);
  *carry_out = c1 + c2;
  return r; 
}

unsigned __builtin_subc (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out)
{
  unsigned r;
  unsigned c1 = __builtin_sub_overflow (x, y, &r);
  unsigned c2 = __builtin_sub_overflow (r, carry_in, &r);
  *carry_out = c1 + c2;
  return r; 
}
So, instead of doing [0, 0xffffffff] + [0, 0xffffffff] + [0, 1] resulting in
[0, 0xffffffff] plus [0, 1] carry they actually do
[0, 0xffffffff] + [0, 0xffffffff] + [0, 0xffffffff] resulting in [0,
0xffffffff] plus [0, 2] carry.  So far for good "design".

So, am not really sure if it is worth implementing those builtins, one can use
__builtin_add_overflow/__builtin_sub_overflow instead, all we need is pattern
detect if they are chained and so start with 0 carry in and then the carry outs
are guaranteed to be [0, 1] and further pairs of .ADD_OVERFLOW/.SUB_OVERFLOW
again can count on [0, 1]
carry in and produce [0, 1] carry out.  And pattern detect that into some new
IFN which will try to produce efficient code for these.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2023-06-05  7:17 ` jakub at gcc dot gnu.org
@ 2023-06-05 13:01 ` chfast at gmail dot com
  2023-06-06 12:01 ` jakub at gcc dot gnu.org
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: chfast at gmail dot com @ 2023-06-05 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #15 from Paweł Bylica <chfast at gmail dot com> ---
For what it's worth, clang's __builtin_addc is implemented in frontend only as
a pair of __builtin_add_overflow. The commit from 11 year ago does not explain
why they were added.
https://github.com/llvm/llvm-project/commit/54398015bf8cbdc3af54dda74807d6f3c8436164

Producing a chain of ADC instructions out of __builtin_add_overflow patterns
has been done quite recently (~1 year ago). And this work is not fully finished
yet.

On the other hand, Go recently added "addc" like "builtins" in
https://pkg.go.dev/math/bits. And they are really pleasure to use in
multi-precision arithmetic.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2023-06-05 13:01 ` chfast at gmail dot com
@ 2023-06-06 12:01 ` jakub at gcc dot gnu.org
  2023-06-06 16:12 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-06 12:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 55271
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55271&action=edit
gcc14-pr79173-wip.patch

Untested WIP (with backend implementation for x86 only so far).
Still, need to add/tweak some peephole2s to make pr79173-{1,2,3,4}.c tests
clean
on both x86-64 and i?86, and then as can be seen in pr79173-5.c need to do some
tweaks so that it pattern recognizes also the pattern recognized
__builtin_{add,sub}_overflow instead of those being used directly.
C23 is standardizing __builtin_{add,sub,mul}_overflow under the
ckd_{add,sub,mul} names, so pattern recognizing it that way is definitely
desirable.
Oh, and maybe incrementally check what happens if one of the addends or
subtrahends are immediate.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2023-06-06 12:01 ` jakub at gcc dot gnu.org
@ 2023-06-06 16:12 ` jakub at gcc dot gnu.org
  2023-06-15  7:07 ` cvs-commit at gcc dot gnu.org
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-06 16:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #55271|0                           |1
        is obsolete|                            |

--- Comment #17 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 55274
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55274&action=edit
gcc14-pr79173.patch

Full patch I'm going to test.
Unfortunately, I haven't been successful at getting the subc stuff working when
not using __builtin_sub_overflow builtins nor _subborrow_u* instrinsics, only
addc seems to work in that case, seems reassoc rewrites stuff that we no longer
are able to even pattern match __builtin_sub_overflow in that case.

So, maybe even adding the ugly clang builtins as a canned way how to express it
canonically would be useful, the pattern matching can't handle infinite number
of different ways how to write the same thing.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2023-06-06 16:12 ` jakub at gcc dot gnu.org
@ 2023-06-15  7:07 ` cvs-commit at gcc dot gnu.org
  2023-06-15  7:10 ` cvs-commit at gcc dot gnu.org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-15  7:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:b6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11

commit r14-1835-gb6ca11407d4f5d16ccfb580ea2d3d9aa08d7cd11
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 15 09:05:01 2023 +0200

    i386: Add peephole2 patterns to improve add with carry or subtract with
borrow with memory destination [PR79173]

    This patch adds various peephole2s which help to recognize add with
    carry or subtract with borrow with memory destination.

    2023-06-14  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            * config/i386/i386.md (*sub<mode>_3, @add<mode>3_carry,
            addcarry<mode>, @sub<mode>3_carry, *add<mode>3_cc_overflow_1): Add
            define_peephole2 TARGET_READ_MODIFY_WRITE/-Os patterns to prefer
            using memory destination in these patterns.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2023-06-15  7:07 ` cvs-commit at gcc dot gnu.org
@ 2023-06-15  7:10 ` cvs-commit at gcc dot gnu.org
  2023-06-15  7:16 ` cvs-commit at gcc dot gnu.org
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-15  7:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:ec52d228d6db7f77188ad099a8c0ff65dead3241

commit r14-1836-gec52d228d6db7f77188ad099a8c0ff65dead3241
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 15 09:08:37 2023 +0200

    i386: Add peephole2 patterns to improve subtract with borrow with memory
destination [PR79173]

    This patch adds subborrow<mode> alternative so that it can have memory
    destination and adds various peephole2s which help to match it.

    2023-06-15  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            * config/i386/i386.md (subborrow<mode>): Add alternative with
            memory destination and add for it define_peephole2
            TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
            destination in these patterns.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2023-06-15  7:10 ` cvs-commit at gcc dot gnu.org
@ 2023-06-15  7:16 ` cvs-commit at gcc dot gnu.org
  2023-06-15  7:49 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-15  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:43a3252c42af12ad90082e4088ea58eecd0bf582

commit r14-1837-g43a3252c42af12ad90082e4088ea58eecd0bf582
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Jun 15 09:12:40 2023 +0200

    middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

    The following patch introduces {add,sub}c5_optab and pattern recognizes
    various forms of add with carry and subtract with carry/borrow, see
    pr79173-{1,2,3,4,5,6}.c tests on what is matched.
    Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
    calls per limb (with just one for the least significant one), for
    add with carry even when it is hand written in C (for subtraction
    reassoc seems to change it too much so that the pattern recognition
    doesn't work).  __builtin_{add,sub}_overflow are standardized in C23
    under ckd_{add,sub} names, so it isn't any longer a GNU only extension.

    Note, clang has for these (IMHO badly designed)
    __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
    a single bit of carry, but basically add 3 unsigned values or
    subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
    because of that.  If we wanted to introduce those for clang compatibility,
    we could and lower them early to just two __builtin_{add,sub}_overflow
    calls and let the pattern matching in this patch recognize it later.

    I've added expanders for this on ix86 and in addition to that
    added various peephole2s (in preparation patches for this patch) to make
    sure we get nice (and small) code for the common cases.  I think there are
    other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64}
    intrinsics, which the patch also improves.

    Would be nice if support for these optabs was added to many other targets,
    arm/aarch64 and powerpc* certainly have such instructions, I'd expect
    in fact that most targets do.

    The _BitInt support I'm working on will also need this to emit reasonable
    code.

    2023-06-15  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            * internal-fn.def (UADDC, USUBC): New internal functions.
            * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
            (commutative_ternary_fn_p): Return true also for IFN_UADDC.
            * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
            * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
            match_uaddc_usubc): New functions.
            (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
            for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
            other optimizations have been successful for those.
            * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and
IFN_USUBC.
            * fold-const-call.cc (fold_const_call): Likewise.
            * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
            * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
            * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
            patterns.
            * config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New
            define_expand patterns.
            (*setcc_qi_addqi3_cconly_overflow_1_<mode>, *setccc): Split
            into NOTE_INSN_DELETED note rather than nop instruction.
            (*setcc_qi_negqi_ccc_1_<mode>, *setcc_qi_negqi_ccc_2_<mode>):
            Likewise.

            * gcc.target/i386/pr79173-1.c: New test.
            * gcc.target/i386/pr79173-2.c: New test.
            * gcc.target/i386/pr79173-3.c: New test.
            * gcc.target/i386/pr79173-4.c: New test.
            * gcc.target/i386/pr79173-5.c: New test.
            * gcc.target/i386/pr79173-6.c: New test.
            * gcc.target/i386/pr79173-7.c: New test.
            * gcc.target/i386/pr79173-8.c: New test.
            * gcc.target/i386/pr79173-9.c: New test.
            * gcc.target/i386/pr79173-10.c: New test.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2023-06-15  7:16 ` cvs-commit at gcc dot gnu.org
@ 2023-06-15  7:49 ` jakub at gcc dot gnu.org
  2023-06-16 17:49 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-15  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |krebbel at gcc dot gnu.org,
                   |                            |law at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
CCing some maintainers, could you please consider adding these
uaddc<mode>5/usubc<mode>5 expanders to rs6000, aarch64, arm, s390, riscv
targets if the targets have some appropriate instructions (add with carry and
subtract with carry/borrow), copy the above gcc.target/i386/ testcases to other
gcc.target subdirectories and verify there you get optimal code for those
sequences?
The _BitInt addition/subtraction code will also use these new internal
functions if possible, so implementing it will also help get usable code for
_BitInt later on.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2023-06-15  7:49 ` jakub at gcc dot gnu.org
@ 2023-06-16 17:49 ` cvs-commit at gcc dot gnu.org
  2023-06-17 15:37 ` law at gcc dot gnu.org
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-16 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b

commit r14-1896-g2b4e0415ad664cdb3ce87d1f7eee5ca26911a05b
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Jun 16 19:47:28 2023 +0200

    uiltins: Add support for clang compatible __builtin_{add,sub}c{,l,ll}
[PR79173]

    While the design of these builtins in clang is questionable,
    rather than being say
    unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
    so that it is clear they add two [0, 0xffffffff] range numbers
    plus one [0, 1] range carry in and give [0, 0xffffffff] range
    return plus [0, 1] range carry out, they actually instead
    add 3 [0, 0xffffffff] values together but the carry out
    isn't then the expected [0, 2] value because
    0xffffffffULL + 0xffffffff + 0xffffffff is 0x2fffffffd,
    but just [0, 1] whether there was any overflow at all.

    It is something used in the wild and shorter to write than the
    corresponding
     #define __builtin_addc(a,b,carry_in,carry_out) \
      ({ unsigned _s; \
         unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
         unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
         *(carry_out) = (_c1 | _c2); \
         _s; })
    and so a canned builtin for something people could often use.
    It isn't that hard to maintain on the GCC side, as we just lower
    it to two .ADD_OVERFLOW calls early, and the already committed
    pottern recognization code can then make .UADDC/.USUBC calls out of
    that if the carry in is in [0, 1] range and the corresponding
    optab is supported by the target.

    2023-06-16  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            * builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
            BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
            BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
            types.
            * builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
            BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
            * builtins.cc (fold_builtin_addc_subc): New function.
            (fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
            * doc/extend.texi (__builtin_addc, __builtin_subc): Document.

            * gcc.target/i386/pr79173-11.c: New test.
            * gcc.dg/builtin-addc-1.c: New test.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2023-06-16 17:49 ` cvs-commit at gcc dot gnu.org
@ 2023-06-17 15:37 ` law at gcc dot gnu.org
  2023-06-17 16:34 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: law at gcc dot gnu.org @ 2023-06-17 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #23 from Jeffrey A. Law <law at gcc dot gnu.org> ---
risc-v doesn't have any special instructions to implement add-with-carry or
subtract-with-borrow.  Depending on who you talk do, it's either a feature or a
mis-design.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2023-06-17 15:37 ` law at gcc dot gnu.org
@ 2023-06-17 16:34 ` jakub at gcc dot gnu.org
  2023-06-20 18:18 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-17 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Sorry, in that case nothing needs to be done for riscv.  I'm sure aarch64, arm
has one (e.g. adcs), I think powerpc has some, but e.g. PR43892 is still open,
and I'm sure s390 has them too (alc*, slb*).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2023-06-17 16:34 ` jakub at gcc dot gnu.org
@ 2023-06-20 18:18 ` cvs-commit at gcc dot gnu.org
  2023-06-24  9:42 ` glisse at gcc dot gnu.org
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-20 18:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:f8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369

commit r14-2001-gf8f68c4ca622a24c2e8cf2b5f2f9fdcd47a7b369
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jun 20 20:17:41 2023 +0200

    tree-ssa-math-opts: Small uaddc/usubc pattern matching improvement
[PR79173]

    In the following testcase we fail to pattern recognize the least
significant
    .UADDC call.  The reason is that arg3 in that case is
      _3 = .ADD_OVERFLOW (...);
      _2 = __imag__ _3;
      _1 = _2 != 0;
      arg3 = (unsigned long) _1;
    and while before the changes arg3 has a single use in some .ADD_OVERFLOW
    later on, we add a .UADDC call next to it (and gsi_remove/gsi_replace only
    what is strictly necessary and leave quite a few dead stmts around which
    next DCE cleans up) and so it all of sudden isn't used just once, but twice
    (.ADD_OVERFLOW and .UADDC) and so uaddc_cast fails.  While we could tweak
    uaddc_cast and not require has_single_use in these uses, there is also
    no vrp that would figure out that because __imag__ _3 is in [0, 1] range,
    it can just use arg3 = __imag__ _3; and drop the comparison and cast.

    We already search if either arg2 or arg3 is ultimately set from __imag__
    of .{{ADD,SUB}_OVERFLOW,U{ADD,SUB}C} call, so the following patch just
    remembers the lhs of __imag__ from that case and uses it later.

    2023-06-20  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            * tree-ssa-math-opts.cc (match_uaddc_usubc): Remember lhs of
            IMAGPART_EXPR of arg2/arg3 and use that as arg3 if it has the right
            type.

            * g++.target/i386/pr79173-1.C: New test.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2023-06-20 18:18 ` cvs-commit at gcc dot gnu.org
@ 2023-06-24  9:42 ` glisse at gcc dot gnu.org
  2023-06-24  9:51 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: glisse at gcc dot gnu.org @ 2023-06-24  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #26 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to CVS Commits from comment #22)
>     While the design of these builtins in clang is questionable,
>     rather than being say
>     unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
>     so that it is clear they add two [0, 0xffffffff] range numbers
>     plus one [0, 1] range carry in and give [0, 0xffffffff] range
>     return plus [0, 1] range carry out, they actually instead
>     add 3 [0, 0xffffffff] values together but the carry out
>     isn't then the expected [0, 2] value because
>     0xffffffffULL + 0xffffffff + 0xffffffff is 0x2fffffffd,
>     but just [0, 1] whether there was any overflow at all.

That is very strange. I always thought that the original intent was for
__builtin_addc to assume that its third argument was in [0, 1] and generate a
single addc instruction on hardware that has it, and the type only ended up
being the same as the others for convenience (also C used not to have a bool
type). The final overflow never being 2 confirms this.

It may be worth discussing with clang developers if they would be willing to
document such a [0, 1] restriction, and maybe have ubsan check it.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2023-06-24  9:42 ` glisse at gcc dot gnu.org
@ 2023-06-24  9:51 ` jakub at gcc dot gnu.org
  2023-06-24 12:09 ` vincent-gcc at vinc17 dot net
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-24  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Given that the builtins exist for 10 years already, I think changing it for
them is too late, though they don't seem to take backwards compatibility as
seriously.
They don't document the [0, 1] restriction and the behavior implemented in GCC
is what I saw when trying it.
Note, in many cases it isn't that big deal, because if carry_in is in [0, 1]
range and compiler can see it from VRP, it can still optimize it.  And given
that carry_out is always in [0, 1] range, for chained cases worst case the
first additions will be less optimized but the chained will be already better.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2023-06-24  9:51 ` jakub at gcc dot gnu.org
@ 2023-06-24 12:09 ` vincent-gcc at vinc17 dot net
  2023-06-24 12:32 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: vincent-gcc at vinc17 dot net @ 2023-06-24 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #28 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
(In reply to Jakub Jelinek from comment #27)
> Given that the builtins exist for 10 years already, I think changing it for
> them is too late, though they don't seem to take backwards compatibility as
> seriously.
> They don't document the [0, 1] restriction and the behavior implemented in
> GCC is what I saw when trying it.

Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html is
currently just

  unsigned sum = __builtin_addc(x, y, carryin, &carryout);

But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction is
implicit (by the language that is used).

And in their example, the carries are always 0 or 1.

> Note, in many cases it isn't that big deal, because if carry_in is in [0, 1]
> range and compiler can see it from VRP, it can still optimize it.  And given
> that carry_out is always in [0, 1] range, for chained cases worst case the
> first additions will be less optimized but the chained will be already
> better.

What do you mean by "the first additions will be less optimized"? (If you don't
know anything about the initial carryin and the arguments, you can't optimize
at all, AFAIK.)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2023-06-24 12:09 ` vincent-gcc at vinc17 dot net
@ 2023-06-24 12:32 ` jakub at gcc dot gnu.org
  2023-06-24 13:10 ` vincent-gcc at vinc17 dot net
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-24 12:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Vincent Lefèvre from comment #28)
> (In reply to Jakub Jelinek from comment #27)
> > Given that the builtins exist for 10 years already, I think changing it for
> > them is too late, though they don't seem to take backwards compatibility as
> > seriously.
> > They don't document the [0, 1] restriction and the behavior implemented in
> > GCC is what I saw when trying it.
> 
> Their documentation at https://clang.llvm.org/docs/LanguageExtensions.html
> is currently just
> 
>   unsigned sum = __builtin_addc(x, y, carryin, &carryout);
> 
> But a carry for a 2-ary addition is always 0 or 1, so the [0, 1] restriction
> is implicit (by the language that is used).

That is something that would need to be said explicitly, that it is undefined
behavior if it is some other value.  Like we document that e.g. __builtin_clz
is undefined behavior on 0 input.

> What do you mean by "the first additions will be less optimized"? (If you
> don't know anything about the initial carryin and the arguments, you can't
> optimize at all, AFAIK.)

I mean that if the compiler can't see it is in [0, 1], it will need to use 2
additions and or the 2 carry bits together.  But, because the ored carry bits
are in [0, 1] range, all the higher limbs could be done using addc.

If you try clang trunk with -O2
unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { return __builtin_addc (x, y, carry_in, carry_out); }
unsigned int bar (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { if (carry_in > 1) __builtin_unreachable (); return __builtin_addc
(x, y, carry_in, carry_out); }
it shows exactly those 2 additions, rather than trying to optimize it, in both
cases.
GCC trunk emits something comparable for the first case, and 
unsigned int foo (unsigned x, unsigned y, unsigned carry_in, unsigned
*carry_out) { return __builtin_addc (x, y, carry_in, carry_out); }
you get
addb    $-1, %dl; adcl    %esi, %eax
for the main work.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2023-06-24 12:32 ` jakub at gcc dot gnu.org
@ 2023-06-24 13:10 ` vincent-gcc at vinc17 dot net
  2023-06-24 13:14 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 29+ messages in thread
From: vincent-gcc at vinc17 dot net @ 2023-06-24 13:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #30 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
(In reply to Jakub Jelinek from comment #29)
> (In reply to Vincent Lefèvre from comment #28)
> > What do you mean by "the first additions will be less optimized"? (If you
> > don't know anything about the initial carryin and the arguments, you can't
> > optimize at all, AFAIK.)
> 
> I mean that if the compiler can't see it is in [0, 1], it will need to use 2
> additions and or the 2 carry bits together.  But, because the ored carry
> bits are in [0, 1] range, all the higher limbs could be done using addc.

If the compiler can't see that carryin is in [0, 1], then it must not "or" the
carry bits; it needs to add them, as carryout may be 2. So each part of the
whole chain would need 2 __builtin_add_overflow and an addition of the carry
bits. However, if the compiler can detect that at some point, the arguments
cannot be both 0xffffffff at the same time (while carryin is in [0, 2]), then
an optimization is possible for the rest of the chain.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (25 preceding siblings ...)
  2023-06-24 13:10 ` vincent-gcc at vinc17 dot net
@ 2023-06-24 13:14 ` jakub at gcc dot gnu.org
  2023-06-25  0:22 ` vincent-gcc at vinc17 dot net
  2023-08-29  8:47 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-06-24 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #31 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Vincent Lefèvre from comment #30)
> (In reply to Jakub Jelinek from comment #29)
> > (In reply to Vincent Lefèvre from comment #28)
> > > What do you mean by "the first additions will be less optimized"? (If you
> > > don't know anything about the initial carryin and the arguments, you can't
> > > optimize at all, AFAIK.)
> > 
> > I mean that if the compiler can't see it is in [0, 1], it will need to use 2
> > additions and or the 2 carry bits together.  But, because the ored carry
> > bits are in [0, 1] range, all the higher limbs could be done using addc.
> 
> If the compiler can't see that carryin is in [0, 1], then it must not "or"
> the carry bits; it needs to add them, as carryout may be 2.

That is not how the clang builtin works, which is why I've implemented the |
and documented it that way, as it is a compatibility builtin.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (26 preceding siblings ...)
  2023-06-24 13:14 ` jakub at gcc dot gnu.org
@ 2023-06-25  0:22 ` vincent-gcc at vinc17 dot net
  2023-08-29  8:47 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: vincent-gcc at vinc17 dot net @ 2023-06-25  0:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #32 from Vincent Lefèvre <vincent-gcc at vinc17 dot net> ---
(In reply to Jakub Jelinek from comment #31)
> (In reply to Vincent Lefèvre from comment #30)
> > (In reply to Jakub Jelinek from comment #29)
> > > I mean that if the compiler can't see it is in [0, 1], it will need
> > > to use 2 additions and or the 2 carry bits together.  But, because
> > > the ored carry bits are in [0, 1] range, all the higher limbs could
> > > be done using addc.
> > 
> > If the compiler can't see that carryin is in [0, 1], then it must not "or"
> > the carry bits; it needs to add them, as carryout may be 2.
> 
> That is not how the clang builtin works, which is why I've implemented the |
> and documented it that way, as it is a compatibility builtin.

I'm confused. In Comment 14, you said that

  *carry_out = c1 + c2;

was used. This is an addition, not an OR.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
       [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
                   ` (27 preceding siblings ...)
  2023-06-25  0:22 ` vincent-gcc at vinc17 dot net
@ 2023-08-29  8:47 ` cvs-commit at gcc dot gnu.org
  28 siblings, 0 replies; 29+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-29  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #33 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a7aec76a74dd38524be325343158d3049b6ab3ac

commit r14-3541-ga7aec76a74dd38524be325343158d3049b6ab3ac
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Aug 29 10:46:01 2023 +0200

    tree-ssa-math-opts: Improve uaddc/usubc pattern matching [PR111209]

    The uaddc/usubc usual matching is of the .{ADD,SUB}_OVERFLOW pair in the
    middle, which adds/subtracts carry-in (from lower limbs) and computes
    carry-out (to higher limbs).  Before optimizations (unless user writes
    it intentionally that way already), all the steps look the same, but
    optimizations simplify the handling of the least significant limb
    (one which adds/subtracts 0 carry-in) to just a single
    .{ADD,SUB}_OVERFLOW and the handling of the most significant limb
    if the computed carry-out is ignored to normal addition/subtraction
    of multiple operands.
    Now, match_uaddc_usubc has code to turn that least significant
    .{ADD,SUB}_OVERFLOW call into .U{ADD,SUB}C call with 0 carry-in if
    a more significant limb above it is matched into .U{ADD,SUB}C; this
    isn't necessary for functionality, as .ADD_OVERFLOW (x, y) is
    functionally equal to .UADDC (x, y, 0) (provided the types of operands
    are the same and result is complex type with that type element), and
    it also has code to match the most significant limb with ignored carry-out
    (in that case one pattern match turns both the penultimate limb pair of
    .{ADD,SUB}_OVERFLOW into .U{ADD,SUB}C and the addition/subtraction
    of the 4 values (2 carries) into another .U{ADD,SUB}C.

    As the following patch shows, what we weren't handling is the case when
    one uses either the __builtin_{add,sub}c builtins or hand written forms
    thereof (either __builtin_*_overflow or even that written by hand) for
    just 2 limbs, where the least significant has 0 carry-in and the most
    significant ignores carry-out.  The following patch matches that, e.g.
      _16 = .ADD_OVERFLOW (_1, _2);
      _17 = REALPART_EXPR <_16>;
      _18 = IMAGPART_EXPR <_16>;
      _15 = _3 + _4;
      _12 = _15 + _18;
    into
      _16 = .UADDC (_1, _2, 0);
      _17 = REALPART_EXPR <_16>;
      _18 = IMAGPART_EXPR <_16>;
      _19 = .UADDC (_3, _4, _18);
      _12 = IMAGPART_EXPR <_19>;
    so that we can emit better code.

    As the 2 later comments show, we must do that carefully, because the
    pass walks the IL from first to last stmt in a bb and we must avoid
    pattern matching this way something that should be matched on a later
    instruction differently.

    2023-08-29  Jakub Jelinek  <jakub@redhat.com>

            PR middle-end/79173
            PR middle-end/111209
            * tree-ssa-math-opts.cc (match_uaddc_usubc): Match also
            just 2 limb uaddc/usubc with 0 carry-in on lower limb and ignored
            carry-out on higher limb.  Don't match it though if it could be
            matched later on 4 argument addition/subtraction.

            * gcc.target/i386/pr79173-12.c: New test.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-08-29  8:47 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-79173-4@http.gcc.gnu.org/bugzilla/>
2020-11-15 21:21 ` [Bug target/79173] add-with-carry and subtract-with-borrow support (x86_64 and others) already5chosen at yahoo dot com
2020-11-15 21:24 ` already5chosen at yahoo dot com
2020-11-15 21:32 ` jakub at gcc dot gnu.org
2020-11-15 22:30 ` already5chosen at yahoo dot com
2020-11-15 22:45 ` already5chosen at yahoo dot com
2021-05-28 15:34 ` tkoenig at gcc dot gnu.org
2021-09-09  6:48 ` [Bug middle-end/79173] " pinskia at gcc dot gnu.org
2021-09-09  8:54 ` vincent-gcc at vinc17 dot net
2021-09-09  9:05 ` pinskia at gcc dot gnu.org
2023-06-05  7:17 ` jakub at gcc dot gnu.org
2023-06-05 13:01 ` chfast at gmail dot com
2023-06-06 12:01 ` jakub at gcc dot gnu.org
2023-06-06 16:12 ` jakub at gcc dot gnu.org
2023-06-15  7:07 ` cvs-commit at gcc dot gnu.org
2023-06-15  7:10 ` cvs-commit at gcc dot gnu.org
2023-06-15  7:16 ` cvs-commit at gcc dot gnu.org
2023-06-15  7:49 ` jakub at gcc dot gnu.org
2023-06-16 17:49 ` cvs-commit at gcc dot gnu.org
2023-06-17 15:37 ` law at gcc dot gnu.org
2023-06-17 16:34 ` jakub at gcc dot gnu.org
2023-06-20 18:18 ` cvs-commit at gcc dot gnu.org
2023-06-24  9:42 ` glisse at gcc dot gnu.org
2023-06-24  9:51 ` jakub at gcc dot gnu.org
2023-06-24 12:09 ` vincent-gcc at vinc17 dot net
2023-06-24 12:32 ` jakub at gcc dot gnu.org
2023-06-24 13:10 ` vincent-gcc at vinc17 dot net
2023-06-24 13:14 ` jakub at gcc dot gnu.org
2023-06-25  0:22 ` vincent-gcc at vinc17 dot net
2023-08-29  8:47 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).