[Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow
@ 2023-11-17 21:40 redi at gcc dot gnu.org
  2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

            Bug ID: 112600
           Summary: Failed to optimize saturating addition using
                    __builtin_add_overflow
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

These two implementations of C++26 saturating addition (std::add_sat<unsigned>)
have equivalent behaviour:

unsigned
add_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_add_overflow(x, y, &z))
            return z;
    return -1u;
}

unsigned
add_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x + y;
    res |= -(res < x);
    return res;
}


For -O3 on x86_64 GCC uses a branch for the first one:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        jc      .L3
        mov     eax, edi
        ret
.L3:
        or      eax, -1
        ret

For the second one we get better code:

add_sat2(unsigned int, unsigned int):
        add     edi, esi
        sbb     eax, eax
        or      eax, edi
        ret



Clang compiles them both to the same code:

add_sat(unsigned int, unsigned int):
        add     edi, esi
        mov     eax, -1
        cmovae  eax, edi
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
@ 2023-11-17 21:41 ` redi at gcc dot gnu.org
  2023-11-17 21:45 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Similar results for aarch64 with GCC:

add_sat(unsigned int, unsigned int):
        adds    w0, w0, w1
        bcs     .L7
        ret
.L7:
        mov     w0, -1
        ret
add_sat2(unsigned int, unsigned int):
        adds    w0, w0, w1
        csinv   w0, w0, wzr, cc
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
  2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
@ 2023-11-17 21:45 ` redi at gcc dot gnu.org
  2023-11-19 21:43 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
For similar saturating subtraction functions:

unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
    unsigned z;
    if (!__builtin_sub_overflow(x, y, &z))
            return z;
    return 0;
}

unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
    unsigned res;
    res = x - y;
    res &= -(res <= x);;
    return res;
}

GCC x86_64 gives:

sub_sat(unsigned int, unsigned int):
        sub     edi, esi
        jb      .L3
        mov     eax, edi
        ret
.L3:
        xor     eax, eax
        ret
sub_sat2(unsigned int, unsigned int):
        sub     edi, esi
        mov     eax, 0
        cmovnb  eax, edi
        ret

GCC aarch64 gives:

sub_sat(unsigned int, unsigned int):
        subs    w2, w0, w1
        mov     w3, 0
        cmp     w0, w1
        csel    w0, w2, w3, cs
        ret
sub_sat2(unsigned int, unsigned int):
        subs    w0, w0, w1
        csel    w0, w0, wzr, cs
        ret


Clang x86_64 gives:

sub_sat(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret
sub_sat2(unsigned int, unsigned int):
        xor     eax, eax
        sub     edi, esi
        cmovae  eax, edi
        ret

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
  2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
  2023-11-17 21:45 ` redi at gcc dot gnu.org
@ 2023-11-19 21:43 ` pinskia at gcc dot gnu.org
  2023-11-20  9:23 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-19 21:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-11-19
     Ever confirmed|0                           |1

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-11-19 21:43 ` pinskia at gcc dot gnu.org
@ 2023-11-20  9:23 ` rguenth at gcc dot gnu.org
  2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-20  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note we don't have a good middle-end representation for (integer) saturation.

Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate
value (or the value in case the actual value is left unspecified when
overflow occurs) as additional argument would work.  So have the first
fold into

  <bb 2> :
  _8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u);
  _1 = REALPART_EXPR (_8);
  return _1;

of course that defers the code-generation problem to RTL expansion and
would require to pattern match

    res = x + y;
    res |= -(res < x);

to the same for canonicalization purposes.  I would expect that some
targets implement saturating integer arithmetic (not sure about
multiplication or division though).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-11-20  9:23 ` rguenth at gcc dot gnu.org
@ 2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
  2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-22 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tnfchris at gcc dot gnu.org

--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, this is hurting us a lot on vectors as well:

https://godbolt.org/z/ecnGadxcG

The first one isn't vectorizable and the second one we generates too
complicated code as the pattern vec_cond is expanded to something quite
complicated.

It was too complicated for the intern we had at the time, but I think basically
we should still do the conclusion of this thread no?
https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html

i.e. we should just make proper saturating IFN.

The only remaining question is, should we make them optab backed or can we do
something reasonably better for most target with better fallback code.

This seems to indicate yes since the REALPART_EXPR seems to screw things up a
bit.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
  2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183

commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183
Author: Pan Li <pan2.li@intel.com>
Date:   Wed May 15 10:14:05 2024 +0800

    Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

    This patch would like to add the middle-end presentation for the
    saturation add.  Aka set the result of add to the max when overflow.
    It will take the pattern similar as below.

    SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

    Take uint8_t as example, we will have:

    * SAT_ADD (1, 254)   => 255.
    * SAT_ADD (1, 255)   => 255.
    * SAT_ADD (2, 255)   => 255.
    * SAT_ADD (255, 255) => 255.

    Given below example for the unsigned scalar integer uint64_t:

    uint64_t sat_add_u64 (uint64_t x, uint64_t y)
    {
      return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
    }

    Before this patch:
    uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
    {
      long unsigned int _1;
      _Bool _2;
      long unsigned int _3;
      long unsigned int _4;
      uint64_t _7;
      long unsigned int _10;
      __complex__ long unsigned int _11;

    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
      _1 = REALPART_EXPR <_11>;
      _10 = IMAGPART_EXPR <_11>;
      _2 = _10 != 0;
      _3 = (long unsigned int) _2;
      _4 = -_3;
      _7 = _1 | _4;
      return _7;
    ;;    succ:       EXIT

    }

    After this patch:
    uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
    {
      uint64_t _7;

    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
      return _7;
    ;;    succ:       EXIT
    }

    The below tests are passed for this patch:
    1. The riscv fully regression tests.
    3. The x86 bootstrap tests.
    4. The x86 fully regression tests.

            PR target/51492
            PR target/112600

    gcc/ChangeLog:

            * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
            to the return true switch case(s).
            * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
            * match.pd: Add unsigned SAT_ADD match(es).
            * optabs.def (OPTAB_NL): Remove fixed-point limitation for
            us/ssadd.
            * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
            extern func decl generated in match.pd match.
            (match_saturation_arith): New func impl to match the saturation
arith.
            (math_opts_dom_walker::after_dom_children): Try match saturation
            arith when IOR expr.

    Signed-off-by: Pan Li <pan2.li@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
  2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010

commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li <pan2.li@intel.com>
Date:   Wed May 15 10:14:06 2024 +0800

    Vect: Support new IFN SAT_ADD for unsigned vector int

    For vectorize, we leverage the existing vect pattern recog to find
    the pattern similar to scalar and let the vectorizer to perform
    the rest part for standard name usadd<mode>3 in vector mode.
    The riscv vector backend have insn "Vector Single-Width Saturating
    Add and Subtract" which can be leveraged when expand the usadd<mode>3
    in vector mode.  For example:

    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      unsigned i;

      for (i = 0; i < n; i++)
        out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
    }

    Before this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
      ivtmp_58 = _80 * 8;
      vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
      vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
      vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
      mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
      vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
        ... }, vect__7.11_66);
      .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
vect__12.15_72);
      vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
      vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
      vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
      ivtmp_79 = ivtmp_78 - _80;
      ...
    }

    After this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
      ivtmp_46 = _62 * 8;
      vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
      vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
      vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
      .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
vect__12.11_54);
      ...
    }

    The below test suites are passed for this patch.
    * The riscv fully regression tests.
    * The x86 bootstrap tests.
    * The x86 fully regression tests.

            PR target/51492
            PR target/112600

    gcc/ChangeLog:

            * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
            func decl generated by match.pd match.
            (vect_recog_sat_add_pattern): New func impl to recog the pattern
            for unsigned SAT_ADD.

    Signed-off-by: Pan Li <pan2.li@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
@ 2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
  2024-05-18  2:17 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-17 14:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:b59de4113262f2bee14147eb17eb3592f03d9556

commit r15-634-gb59de4113262f2bee14147eb17eb3592f03d9556
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Fri May 17 09:55:49 2024 +0200

    i386: Rename sat_plusminus expanders to standard names [PR112600]

    Rename <sse2_avx2>_<insn><mode>3<mask_name> expander to a standard ssadd,
    usadd, sssub and ussub name to enable corresponding optab expansion.

    Also add named expander for MMX modes.

            PR middle-end/112600

    gcc/ChangeLog:

            * config/i386/mmx.md (<insn><mode>3): New expander.
            * config/i386/sse.md
            (<sse2_avx2>_<sat_plusminus:insn><mode>3<mask_name>):
            Rename expander to <sat_plusminus:insn><mode>3<mask_name>.
            (<umaxmin:code><mode>3): Update for rename.
            * config/i386/i386-builtin.def: Update for rename.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr112600-1a.c: New test.
            * gcc.target/i386/pr112600-1b.c: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
@ 2024-05-18  2:17 ` cvs-commit at gcc dot gnu.org
  2024-06-05  8:38 ` cvs-commit at gcc dot gnu.org
  2024-06-05 20:42 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-18  2:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:34ed2b4593fa98b613632d0dde30b6ba3e7ecad9

commit r15-642-g34ed2b4593fa98b613632d0dde30b6ba3e7ecad9
Author: Pan Li <pan2.li@intel.com>
Date:   Fri May 17 18:49:46 2024 +0800

    RISC-V: Implement IFN SAT_ADD for both the scalar and vector

    The patch implement the SAT_ADD in the riscv backend as the
    sample for both the scalar and vector.  Given below vector
    as example:

    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      unsigned i;

      for (i = 0; i < n; i++)
        out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
    }

    Before this patch:
    vec_sat_add_u64:
      ...
      vsetvli a5,a3,e64,m1,ta,ma
      vle64.v v0,0(a1)
      vle64.v v1,0(a2)
      slli    a4,a5,3
      sub     a3,a3,a5
      add     a1,a1,a4
      add     a2,a2,a4
      vadd.vv v1,v0,v1
      vmsgtu.vv       v0,v0,v1
      vmerge.vim      v1,v1,-1,v0
      vse64.v v1,0(a0)
      ...

    After this patch:
    vec_sat_add_u64:
      ...
      vsetvli a5,a3,e64,m1,ta,ma
      vle64.v v1,0(a1)
      vle64.v v2,0(a2)
      slli    a4,a5,3
      sub     a3,a3,a5
      add     a1,a1,a4
      add     a2,a2,a4
      vsaddu.vv       v1,v1,v2  <=  Vector Single-Width Saturating Add
      vse64.v v1,0(a0)
      ...

    The below test suites are passed for this patch.
    * The riscv fully regression tests.
    * The aarch64 fully regression tests.
    * The x86 bootstrap tests.
    * The x86 fully regression tests.

            PR target/51492
            PR target/112600

    gcc/ChangeLog:

            * config/riscv/autovec.md (usadd<mode>3): New pattern expand for
            the unsigned SAT_ADD in vector mode.
            * config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
            to expand usadd<mode>3 pattern.
            (expand_vec_usadd): Ditto but for vector.
            * config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
            the vsadd insn.
            (expand_vec_usadd): New func impl to expand usadd<mode>3 for
vector.
            * config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
            expand usadd<mode>3 for scalar.
            * config/riscv/riscv.md (usadd<mode>3): New pattern expand for
            the unsigned SAT_ADD in scalar mode.
            * config/riscv/vector.md: Allow VLS mode for vsaddu.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New
test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New
test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New
test.
            * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New
test.
            * gcc.target/riscv/sat_arith.h: New test.
            * gcc.target/riscv/sat_u_add-1.c: New test.
            * gcc.target/riscv/sat_u_add-2.c: New test.
            * gcc.target/riscv/sat_u_add-3.c: New test.
            * gcc.target/riscv/sat_u_add-4.c: New test.
            * gcc.target/riscv/sat_u_add-run-1.c: New test.
            * gcc.target/riscv/sat_u_add-run-2.c: New test.
            * gcc.target/riscv/sat_u_add-run-3.c: New test.
            * gcc.target/riscv/sat_u_add-run-4.c: New test.
            * gcc.target/riscv/scalar_sat_binary.h: New test.

    Signed-off-by: Pan Li <pan2.li@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-05-18  2:17 ` cvs-commit at gcc dot gnu.org
@ 2024-06-05  8:38 ` cvs-commit at gcc dot gnu.org
  2024-06-05 20:42 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-05  8:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755

commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li <pan2.li@intel.com>
Date:   Tue May 28 15:37:44 2024 +0800

    Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

    This patch would like to add the middle-end presentation for the
    saturation sub.  Aka set the result of add to the min when downflow.
    It will take the pattern similar as below.

    SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

    For example for uint8_t, we have

    * SAT_SUB (255, 0)   => 255
    * SAT_SUB (1, 2)     => 0
    * SAT_SUB (254, 255) => 0
    * SAT_SUB (0, 255)   => 0

    Given below SAT_SUB for uint64

    uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
    {
      return (x - y) & (-(TYPE)(x >= y));
    }

    Before this patch:
    uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
    {
      _Bool _1;
      long unsigned int _3;
      uint64_t _6;

    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _1 = x_4(D) >= y_5(D);
      _3 = x_4(D) - y_5(D);
      _6 = _1 ? _3 : 0;
      return _6;
    ;;    succ:       EXIT
    }

    After this patch:
    uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
    {
      uint64_t _6;

    ;;   basic block 2, loop depth 0
    ;;    pred:       ENTRY
      _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
      return _6;
    ;;    succ:       EXIT
    }

    The below tests are running for this patch:
    *. The riscv fully regression tests.
    *. The x86 bootstrap tests.
    *. The x86 fully regression tests.

            PR target/51492
            PR target/112600

    gcc/ChangeLog:

            * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
            * match.pd: Add new match for SAT_SUB.
            * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
            * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
            new decl for generated in match.pd.
            (build_saturation_binary_arith_call): Add new helper function
            to build the gimple call to binary SAT alu.
            (match_saturation_arith): Rename from.
            (match_unsigned_saturation_add): Rename to.
            (match_unsigned_saturation_sub): Add new func to match the
            unsigned sat sub.
            (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
            try when COND_EXPR.

    Signed-off-by: Pan Li <pan2.li@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
  2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-06-05  8:38 ` cvs-commit at gcc dot gnu.org
@ 2024-06-05 20:42 ` ubizjak at gmail dot com
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2024-06-05 20:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat<unsigned>) have equivalent behaviour:
> 
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
>     unsigned z;
>     if (!__builtin_add_overflow(x, y, &z))
> 	    return z;
>     return -1u;
> }

[...]

> For -O3 on x86_64 GCC uses a branch for the first one:
> 
> add_sat(unsigned int, unsigned int):
>         add     edi, esi
>         jc      .L3
>         mov     eax, edi
>         ret
> .L3:
>         or      eax, -1
>         ret

The reason for failed if-conversion to cmove is due to the "weird" compare
arguments, the consequence of addsi3_cc_overflow_1 definition:

(insn 9 4 10 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:SI (reg:SI 106)
                        (reg:SI 107))
                    (reg:SI 106)))
            (set (reg:SI 104)
                (plus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

the noce_try_cmove path fails in noce_emit_cmove:

Breakpoint 1, noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40,
code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
1774                return NULL_RTX;
(gdb) list
1766          /* Don't even try if the comparison operands are weird
1767             except that the target supports cbranchcc4.  */
1768          if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769              || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770            {
1771              if (!have_cbranchcc4
1772                  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773                  || cmp_b != const0_rtx)
1774                return NULL_RTX;
1775            }
1776
1777          target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode
},
1778                                          vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0  noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU,
cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
#1  0x00000000020d995b in noce_try_cmove (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:1884
#2  0x00000000020dec37 in noce_process_if_block (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:4149
#3  0x00000000020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80,
then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
    at ../../git/gcc/gcc/ifcvt.cc:4716
#4  0x00000000020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at
../../git/gcc/gcc/ifcvt.cc:4921
#5  0x00000000020e3255 in if_convert (after_combine=true) at
../../git/gcc/gcc/ifcvt.cc:6068

(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
    (reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void

The above cmp_a RTX fails general_operand check.

Please note that similar testcase:

unsigned
sub_sat(unsigned x, unsigned y)
{
    unsigned z;
    return __builtin_sub_overflow(x, y, &z) ? 0 : z;
}

results in the expected:

        subl    %esi, %edi      # 52    [c=4 l=2]  *subsi_3/0
        movl    $0, %eax        # 53    [c=4 l=5]  *movsi_internal/0
        cmovnb  %edi, %eax      # 54    [c=4 l=3]  *movsicc_noc/0
        ret             # 50    [c=0 l=1]  simple_return_internal

due to:

(insn 9 4 10 2 (parallel [
            (set (reg:CC 17 flags)
                (compare:CC (reg:SI 106)
                    (reg:SI 107)))
            (set (reg:SI 104)
                (minus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":28:12 416 {*subsi_3}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should
be improved to handle the above "weird" operand form.

Let's ask Jakub.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-06-05 20:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
2023-11-17 21:45 ` redi at gcc dot gnu.org
2023-11-19 21:43 ` pinskia at gcc dot gnu.org
2023-11-20  9:23 ` rguenth at gcc dot gnu.org
2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
2024-05-18  2:17 ` cvs-commit at gcc dot gnu.org
2024-06-05  8:38 ` cvs-commit at gcc dot gnu.org
2024-06-05 20:42 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).