public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow
@ 2023-11-17 21:40 redi at gcc dot gnu.org
2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
Bug ID: 112600
Summary: Failed to optimize saturating addition using
__builtin_add_overflow
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: redi at gcc dot gnu.org
Target Milestone: ---
These two implementations of C++26 saturating addition (std::add_sat<unsigned>)
have equivalent behaviour:
unsigned
add_sat(unsigned x, unsigned y) noexcept
{
unsigned z;
if (!__builtin_add_overflow(x, y, &z))
return z;
return -1u;
}
unsigned
add_sat2(unsigned x, unsigned y) noexcept
{
unsigned res;
res = x + y;
res |= -(res < x);
return res;
}
For -O3 on x86_64 GCC uses a branch for the first one:
add_sat(unsigned int, unsigned int):
add edi, esi
jc .L3
mov eax, edi
ret
.L3:
or eax, -1
ret
For the second one we get better code:
add_sat2(unsigned int, unsigned int):
add edi, esi
sbb eax, eax
or eax, edi
ret
Clang compiles them both to the same code:
add_sat(unsigned int, unsigned int):
add edi, esi
mov eax, -1
cmovae eax, edi
ret
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
@ 2023-11-17 21:41 ` redi at gcc dot gnu.org
2023-11-17 21:45 ` redi at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Similar results for aarch64 with GCC:
add_sat(unsigned int, unsigned int):
adds w0, w0, w1
bcs .L7
ret
.L7:
mov w0, -1
ret
add_sat2(unsigned int, unsigned int):
adds w0, w0, w1
csinv w0, w0, wzr, cc
ret
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
@ 2023-11-17 21:45 ` redi at gcc dot gnu.org
2023-11-19 21:43 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2023-11-17 21:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
For similar saturating subtraction functions:
unsigned
sub_sat(unsigned x, unsigned y) noexcept
{
unsigned z;
if (!__builtin_sub_overflow(x, y, &z))
return z;
return 0;
}
unsigned
sub_sat2(unsigned x, unsigned y) noexcept
{
unsigned res;
res = x - y;
res &= -(res <= x);;
return res;
}
GCC x86_64 gives:
sub_sat(unsigned int, unsigned int):
sub edi, esi
jb .L3
mov eax, edi
ret
.L3:
xor eax, eax
ret
sub_sat2(unsigned int, unsigned int):
sub edi, esi
mov eax, 0
cmovnb eax, edi
ret
GCC aarch64 gives:
sub_sat(unsigned int, unsigned int):
subs w2, w0, w1
mov w3, 0
cmp w0, w1
csel w0, w2, w3, cs
ret
sub_sat2(unsigned int, unsigned int):
subs w0, w0, w1
csel w0, w0, wzr, cs
ret
Clang x86_64 gives:
sub_sat(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae eax, edi
ret
sub_sat2(unsigned int, unsigned int):
xor eax, eax
sub edi, esi
cmovae eax, edi
ret
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
2023-11-17 21:45 ` redi at gcc dot gnu.org
@ 2023-11-19 21:43 ` pinskia at gcc dot gnu.org
2023-11-20 9:23 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-19 21:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Last reconfirmed| |2023-11-19
Ever confirmed|0 |1
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (2 preceding siblings ...)
2023-11-19 21:43 ` pinskia at gcc dot gnu.org
@ 2023-11-20 9:23 ` rguenth at gcc dot gnu.org
2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-20 9:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note we don't have a good middle-end representation for (integer) saturation.
Maybe having variants of .ADD_OVERFLOW and friends specifying an alternate
value (or the value in case the actual value is left unspecified when
overflow occurs) as additional argument would work. So have the first
fold into
<bb 2> :
_8 = .ADD_OVERFLOW (x_6(D), y_7(D), -1u);
_1 = REALPART_EXPR (_8);
return _1;
of course that defers the code-generation problem to RTL expansion and
would require to pattern match
res = x + y;
res |= -(res < x);
to the same for canonicalization purposes. I would expect that some
targets implement saturating integer arithmetic (not sure about
multiplication or division though).
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (3 preceding siblings ...)
2023-11-20 9:23 ` rguenth at gcc dot gnu.org
@ 2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2024-01-22 12:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
Tamar Christina <tnfchris at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tnfchris at gcc dot gnu.org
--- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, this is hurting us a lot on vectors as well:
https://godbolt.org/z/ecnGadxcG
The first one isn't vectorizable and the second one we generates too
complicated code as the pattern vec_cond is expanded to something quite
complicated.
It was too complicated for the intern we had at the time, but I think basically
we should still do the conclusion of this thread no?
https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html
i.e. we should just make proper saturating IFN.
The only remaining question is, should we make them optab backed or can we do
something reasonably better for most target with better fallback code.
This seems to indicate yes since the REALPART_EXPR seems to screw things up a
bit.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (4 preceding siblings ...)
2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:52b0536710ff3f3ace72ab00ce9ef6c630cd1183
commit r15-576-g52b0536710ff3f3ace72ab00ce9ef6c630cd1183
Author: Pan Li <pan2.li@intel.com>
Date: Wed May 15 10:14:05 2024 +0800
Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
This patch would like to add the middle-end presentation for the
saturation add. Aka set the result of add to the max when overflow.
It will take the pattern similar as below.
SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
Take uint8_t as example, we will have:
* SAT_ADD (1, 254) => 255.
* SAT_ADD (1, 255) => 255.
* SAT_ADD (2, 255) => 255.
* SAT_ADD (255, 255) => 255.
Given below example for the unsigned scalar integer uint64_t:
uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}
Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
long unsigned int _1;
_Bool _2;
long unsigned int _3;
long unsigned int _4;
uint64_t _7;
long unsigned int _10;
__complex__ long unsigned int _11;
;; basic block 2, loop depth 0
;; pred: ENTRY
_11 = .ADD_OVERFLOW (x_5(D), y_6(D));
_1 = REALPART_EXPR <_11>;
_10 = IMAGPART_EXPR <_11>;
_2 = _10 != 0;
_3 = (long unsigned int) _2;
_4 = -_3;
_7 = _1 | _4;
return _7;
;; succ: EXIT
}
After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
uint64_t _7;
;; basic block 2, loop depth 0
;; pred: ENTRY
_7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
return _7;
;; succ: EXIT
}
The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
to the return true switch case(s).
* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
* match.pd: Add unsigned SAT_ADD match(es).
* optabs.def (OPTAB_NL): Remove fixed-point limitation for
us/ssadd.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
extern func decl generated in match.pd match.
(match_saturation_arith): New func impl to match the saturation
arith.
(math_opts_dom_walker::after_dom_children): Try match saturation
arith when IOR expr.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (5 preceding siblings ...)
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
@ 2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16 12:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010
commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li <pan2.li@intel.com>
Date: Wed May 15 10:14:06 2024 +0800
Vect: Support new IFN SAT_ADD for unsigned vector int
For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode. For example:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
unsigned i;
for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
}
Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
...
_80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
ivtmp_58 = _80 * 8;
vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
... }, vect__7.11_66);
.MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
vect__12.15_72);
vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
ivtmp_79 = ivtmp_78 - _80;
...
}
After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
...
_62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
ivtmp_46 = _62 * 8;
vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
.MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
vect__12.11_54);
...
}
The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
func decl generated by match.pd match.
(vect_recog_sat_add_pattern): New func impl to recog the pattern
for unsigned SAT_ADD.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (6 preceding siblings ...)
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
@ 2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
2024-05-18 2:17 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-17 14:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:
https://gcc.gnu.org/g:b59de4113262f2bee14147eb17eb3592f03d9556
commit r15-634-gb59de4113262f2bee14147eb17eb3592f03d9556
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Fri May 17 09:55:49 2024 +0200
i386: Rename sat_plusminus expanders to standard names [PR112600]
Rename <sse2_avx2>_<insn><mode>3<mask_name> expander to a standard ssadd,
usadd, sssub and ussub name to enable corresponding optab expansion.
Also add named expander for MMX modes.
PR middle-end/112600
gcc/ChangeLog:
* config/i386/mmx.md (<insn><mode>3): New expander.
* config/i386/sse.md
(<sse2_avx2>_<sat_plusminus:insn><mode>3<mask_name>):
Rename expander to <sat_plusminus:insn><mode>3<mask_name>.
(<umaxmin:code><mode>3): Update for rename.
* config/i386/i386-builtin.def: Update for rename.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112600-1a.c: New test.
* gcc.target/i386/pr112600-1b.c: New test.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (7 preceding siblings ...)
2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
@ 2024-05-18 2:17 ` cvs-commit at gcc dot gnu.org
2024-06-05 8:38 ` cvs-commit at gcc dot gnu.org
2024-06-05 20:42 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-18 2:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:34ed2b4593fa98b613632d0dde30b6ba3e7ecad9
commit r15-642-g34ed2b4593fa98b613632d0dde30b6ba3e7ecad9
Author: Pan Li <pan2.li@intel.com>
Date: Fri May 17 18:49:46 2024 +0800
RISC-V: Implement IFN SAT_ADD for both the scalar and vector
The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector. Given below vector
as example:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
unsigned i;
for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
}
Before this patch:
vec_sat_add_u64:
...
vsetvli a5,a3,e64,m1,ta,ma
vle64.v v0,0(a1)
vle64.v v1,0(a2)
slli a4,a5,3
sub a3,a3,a5
add a1,a1,a4
add a2,a2,a4
vadd.vv v1,v0,v1
vmsgtu.vv v0,v0,v1
vmerge.vim v1,v1,-1,v0
vse64.v v1,0(a0)
...
After this patch:
vec_sat_add_u64:
...
vsetvli a5,a3,e64,m1,ta,ma
vle64.v v1,0(a1)
vle64.v v2,0(a2)
slli a4,a5,3
sub a3,a3,a5
add a1,a1,a4
add a2,a2,a4
vsaddu.vv v1,v1,v2 <= Vector Single-Width Saturating Add
vse64.v v1,0(a0)
...
The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* config/riscv/autovec.md (usadd<mode>3): New pattern expand for
the unsigned SAT_ADD in vector mode.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
to expand usadd<mode>3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd<mode>3 for
vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
expand usadd<mode>3 for scalar.
* config/riscv/riscv.md (usadd<mode>3): New pattern expand for
the unsigned SAT_ADD in scalar mode.
* config/riscv/vector.md: Allow VLS mode for vsaddu.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New
test.
* gcc.target/riscv/sat_arith.h: New test.
* gcc.target/riscv/sat_u_add-1.c: New test.
* gcc.target/riscv/sat_u_add-2.c: New test.
* gcc.target/riscv/sat_u_add-3.c: New test.
* gcc.target/riscv/sat_u_add-4.c: New test.
* gcc.target/riscv/sat_u_add-run-1.c: New test.
* gcc.target/riscv/sat_u_add-run-2.c: New test.
* gcc.target/riscv/sat_u_add-run-3.c: New test.
* gcc.target/riscv/sat_u_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (8 preceding siblings ...)
2024-05-18 2:17 ` cvs-commit at gcc dot gnu.org
@ 2024-06-05 8:38 ` cvs-commit at gcc dot gnu.org
2024-06-05 20:42 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-05 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
--- Comment #10 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:
https://gcc.gnu.org/g:abe6d39365476e6be724815d09d072e305018755
commit r15-1030-gabe6d39365476e6be724815d09d072e305018755
Author: Pan Li <pan2.li@intel.com>
Date: Tue May 28 15:37:44 2024 +0800
Internal-fn: Support new IFN SAT_SUB for unsigned scalar int
This patch would like to add the middle-end presentation for the
saturation sub. Aka set the result of add to the min when downflow.
It will take the pattern similar as below.
SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
For example for uint8_t, we have
* SAT_SUB (255, 0) => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255) => 0
Given below SAT_SUB for uint64
uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
return (x - y) & (-(TYPE)(x >= y));
}
Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
_Bool _1;
long unsigned int _3;
uint64_t _6;
;; basic block 2, loop depth 0
;; pred: ENTRY
_1 = x_4(D) >= y_5(D);
_3 = x_4(D) - y_5(D);
_6 = _1 ? _3 : 0;
return _6;
;; succ: EXIT
}
After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
uint64_t _6;
;; basic block 2, loop depth 0
;; pred: ENTRY
_6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
return _6;
;; succ: EXIT
}
The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.
PR target/51492
PR target/112600
gcc/ChangeLog:
* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.
Signed-off-by: Pan Li <pan2.li@intel.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
` (9 preceding siblings ...)
2024-06-05 8:38 ` cvs-commit at gcc dot gnu.org
@ 2024-06-05 20:42 ` ubizjak at gmail dot com
10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2024-06-05 20:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
Uroš Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat<unsigned>) have equivalent behaviour:
>
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
> unsigned z;
> if (!__builtin_add_overflow(x, y, &z))
> return z;
> return -1u;
> }
[...]
> For -O3 on x86_64 GCC uses a branch for the first one:
>
> add_sat(unsigned int, unsigned int):
> add edi, esi
> jc .L3
> mov eax, edi
> ret
> .L3:
> or eax, -1
> ret
The reason for failed if-conversion to cmove is due to the "weird" compare
arguments, the consequence of addsi3_cc_overflow_1 definition:
(insn 9 4 10 2 (parallel [
(set (reg:CCC 17 flags)
(compare:CCC (plus:SI (reg:SI 106)
(reg:SI 107))
(reg:SI 106)))
(set (reg:SI 104)
(plus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
(expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil))))
the noce_try_cmove path fails in noce_emit_cmove:
Breakpoint 1, noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40,
code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8,
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
1774 return NULL_RTX;
(gdb) list
1766 /* Don't even try if the comparison operands are weird
1767 except that the target supports cbranchcc4. */
1768 if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769 || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770 {
1771 if (!have_cbranchcc4
1772 || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773 || cmp_b != const0_rtx)
1774 return NULL_RTX;
1775 }
1776
1777 target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode
},
1778 vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0 noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU,
cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8,
vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
#1 0x00000000020d995b in noce_try_cmove (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:1884
#2 0x00000000020dec37 in noce_process_if_block (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:4149
#3 0x00000000020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80,
then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
at ../../git/gcc/gcc/ifcvt.cc:4716
#4 0x00000000020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at
../../git/gcc/gcc/ifcvt.cc:4921
#5 0x00000000020e3255 in if_convert (after_combine=true) at
../../git/gcc/gcc/ifcvt.cc:6068
(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
(reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void
The above cmp_a RTX fails general_operand check.
Please note that similar testcase:
unsigned
sub_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_sub_overflow(x, y, &z) ? 0 : z;
}
results in the expected:
subl %esi, %edi # 52 [c=4 l=2] *subsi_3/0
movl $0, %eax # 53 [c=4 l=5] *movsi_internal/0
cmovnb %edi, %eax # 54 [c=4 l=3] *movsicc_noc/0
ret # 50 [c=0 l=1] simple_return_internal
due to:
(insn 9 4 10 2 (parallel [
(set (reg:CC 17 flags)
(compare:CC (reg:SI 106)
(reg:SI 107)))
(set (reg:SI 104)
(minus:SI (reg:SI 106)
(reg:SI 107)))
]) "sadd.c":28:12 416 {*subsi_3}
(expr_list:REG_DEAD (reg:SI 107)
(expr_list:REG_DEAD (reg:SI 106)
(nil))))
So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should
be improved to handle the above "weird" operand form.
Let's ask Jakub.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-06-05 20:42 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17 21:40 [Bug middle-end/112600] New: Failed to optimize saturating addition using __builtin_add_overflow redi at gcc dot gnu.org
2023-11-17 21:41 ` [Bug middle-end/112600] " redi at gcc dot gnu.org
2023-11-17 21:45 ` redi at gcc dot gnu.org
2023-11-19 21:43 ` pinskia at gcc dot gnu.org
2023-11-20 9:23 ` rguenth at gcc dot gnu.org
2024-01-22 12:46 ` tnfchris at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-16 12:09 ` cvs-commit at gcc dot gnu.org
2024-05-17 14:57 ` cvs-commit at gcc dot gnu.org
2024-05-18 2:17 ` cvs-commit at gcc dot gnu.org
2024-06-05 8:38 ` cvs-commit at gcc dot gnu.org
2024-06-05 20:42 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).