[PATCH v5 1/3] Internal-fn: Support new IFN SAT

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
@ 2024-05-15  2:14 pan2.li
  2024-05-15  2:14 ` [PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int pan2.li
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: pan2.li @ 2024-05-15  2:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

Given below example for the unsigned scalar integer uint64_t:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

The below tests are passed for this patch:
1. The riscv fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
	to the return true switch case(s).
	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match(es).
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
	us/ssadd.
	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
	extern func decl generated in match.pd match.
	(match_saturation_arith): New func impl to match the saturation arith.
	(math_opts_dom_walker::after_dom_children): Try match saturation
	arith when IOR expr.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/internal-fn.cc        |  1 +
 gcc/internal-fn.def       |  2 ++
 gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
 gcc/optabs.def            |  4 +--
 gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
 5 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0a7053c2286..73045ca8c8c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..25badbb86e5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..0f9c34fa897 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_left_part_2 @0 @1)
+ (realpart (IFN_ADD_OVERFLOW:c @0 @1))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
+   because the sub part of left_part_2 cannot work with right_part_1.
+   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
+   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
+
+/* Unsigned saturation add, case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
+
+/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index e8c804f09b7..62da1c5ee08 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt,
   return 0;
 }
 
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/*
+ * Try to match saturation arith pattern(s).
+ *   1. SAT_ADD (unsigned)
+ *      _7 = _4 + _6;
+ *      _8 = _4 > _7;
+ *      _9 = (long unsigned int) _8;
+ *      _10 = -_9;
+ *      _12 = _7 | _10;
+ *      =>
+ *      _12 = .SAT_ADD (_4, _6);  */
+static void
+match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
+{
+  gcall *call = NULL;
+
+  tree ops[2];
+  tree lhs = gimple_assign_lhs (stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
+      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
+					 OPTIMIZE_FOR_BOTH))
+    {
+      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
+      gimple_call_set_lhs (call, lhs);
+      gsi_replace (gsi, call, true);
+    }
+}
+
 /* Recognize for unsigned x
    x = y - z;
    if (x > y)
@@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
 	      break;
 
 	    case BIT_IOR_EXPR:
+	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
+	      /* fall-through  */
 	    case BIT_XOR_EXPR:
 	      match_uaddc_usubc (&gsi, stmt, code);
 	      break;
-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int
  2024-05-15  2:14 [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
@ 2024-05-15  2:14 ` pan2.li
  2024-05-15  2:14 ` [PATCH v5 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li
  2024-05-15  9:12 ` [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Tamar Christina
  2 siblings, 0 replies; 13+ messages in thread
From: pan2.li @ 2024-05-15  2:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
    ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
	func decl generated by match.pd match.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-patterns.cc | 52 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index dfb7d800526..a313dc64643 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4487,6 +4487,57 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      tree itype = TREE_TYPE (res_ops[0]);
+      tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+      if (vtype != NULL_TREE
+	&& direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
+					   OPTIMIZE_FOR_BOTH))
+	{
+	  *type_out = vtype;
+	  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
+						    res_ops[1]);
+
+	  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+	  gimple_call_set_nothrow (call, /* nothrow_p */ false);
+	  gimple_set_location (call, gimple_location (last_stmt));
+
+	  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+	  return call;
+	}
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6987,6 +7038,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector
  2024-05-15  2:14 [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
  2024-05-15  2:14 ` [PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int pan2.li
@ 2024-05-15  2:14 ` pan2.li
  2024-05-15  9:12 ` [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Tamar Christina
  2 siblings, 0 replies; 13+ messages in thread
From: pan2.li @ 2024-05-15  2:14 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector.  Given below vector
as example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2  <=  Vector Single-Width Saturating Add
  vse64.v v1,0(a0)
  ...

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand for
	the unsigned SAT_ADD in vector mode.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
	to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
	the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
	expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand for
	the unsigned SAT_ADD in scalar mode.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 +++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 ++++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 19 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 20 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 20 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 20 +++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 31 ++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 19 +++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 21 ++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 18 +++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 17 +++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 +++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 25 files changed, 744 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..7ceeb8d64f6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5c8a52b78a2..1ada792a162 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -133,6 +133,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -623,6 +624,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c9e0feebca6..c34111f89b8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4067505270e..cc0185f64cd 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11281,6 +11281,53 @@ riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ee15c63db10..85a34adea83 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4145,6 +4145,17 @@ (define_insn_and_split ""
   "{ operands[6] = gen_lowpart (SImode, operands[5]); }"
   [(set_attr "type" "arith")])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 228d0f9a766..f8ed61f4a13 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4062,8 +4062,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4072,10 +4072,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..dbbfa00afe2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..1253fdb5f60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..74bba9cadd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f3692b4cc25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..2ef9fd825f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,31 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..609e1ea343b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..d30436c782a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..12347c607bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..f2c6b74d917
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-15  2:14 [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
  2024-05-15  2:14 ` [PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int pan2.li
  2024-05-15  2:14 ` [PATCH v5 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li
@ 2024-05-15  9:12 ` Tamar Christina
  2024-05-15 11:36   ` Li, Pan2
  2024-05-17  3:18   ` Li, Pan2
  2 siblings, 2 replies; 13+ messages in thread
From: Tamar Christina @ 2024-05-15  9:12 UTC (permalink / raw)
  To: pan2.li, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, hongtao.liu

Hi Pan,

Thanks!

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Wednesday, May 15, 2024 3:14 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> 	to the return true switch case(s).
> 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match(es).
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
> 	us/ssadd.
> 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> 	extern func decl generated in match.pd match.
> 	(match_saturation_arith): New func impl to match the saturation arith.
> 	(math_opts_dom_walker::after_dom_children): Try match saturation
> 	arith when IOR expr.
> 

 LGTM but you'll need an OK from Richard,

Thanks for working on this!

Tamar

> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
>  gcc/optabs.def            |  4 +--
>  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..0f9c34fa897 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part_1 @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_left_part_2 @0 @1)
> + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_2 @0 @1)
> + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> +   because the sub part of left_part_2 cannot work with right_part_1.
> +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> +
> +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index e8c804f09b7..62da1c5ee08 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static void
> +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> +{
> +  gcall *call = NULL;
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +					 OPTIMIZE_FOR_BOTH))
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +    }
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> (basic_block bb)
>  	      break;
> 
>  	    case BIT_IOR_EXPR:
> +	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> +	      /* fall-through  */
>  	    case BIT_XOR_EXPR:
>  	      match_uaddc_usubc (&gsi, stmt, code);
>  	      break;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-15  9:12 ` [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Tamar Christina
@ 2024-05-15 11:36   ` Li, Pan2
  2024-05-16  8:10     ` Richard Biener
  2024-05-17  3:18   ` Li, Pan2
  1 sibling, 1 reply; 13+ messages in thread
From: Li, Pan2 @ 2024-05-15 11:36 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> LGTM but you'll need an OK from Richard,
> Thanks for working on this!

Thanks Tamar for help and coaching, let's wait Richard for a while,😊!

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Wednesday, May 15, 2024 5:12 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

Hi Pan,

Thanks!

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Wednesday, May 15, 2024 3:14 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> 	to the return true switch case(s).
> 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match(es).
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
> 	us/ssadd.
> 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> 	extern func decl generated in match.pd match.
> 	(match_saturation_arith): New func impl to match the saturation arith.
> 	(math_opts_dom_walker::after_dom_children): Try match saturation
> 	arith when IOR expr.
> 

 LGTM but you'll need an OK from Richard,

Thanks for working on this!

Tamar

> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
>  gcc/optabs.def            |  4 +--
>  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..0f9c34fa897 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part_1 @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_left_part_2 @0 @1)
> + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_2 @0 @1)
> + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> +   because the sub part of left_part_2 cannot work with right_part_1.
> +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> +
> +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index e8c804f09b7..62da1c5ee08 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static void
> +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> +{
> +  gcall *call = NULL;
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +					 OPTIMIZE_FOR_BOTH))
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +    }
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> (basic_block bb)
>  	      break;
> 
>  	    case BIT_IOR_EXPR:
> +	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> +	      /* fall-through  */
>  	    case BIT_XOR_EXPR:
>  	      match_uaddc_usubc (&gsi, stmt, code);
>  	      break;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-15 11:36   ` Li, Pan2
@ 2024-05-16  8:10     ` Richard Biener
  2024-05-16  9:34       ` Li, Pan2
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Biener @ 2024-05-16  8:10 UTC (permalink / raw)
  To: Li, Pan2
  Cc: Tamar Christina, gcc-patches, juzhe.zhong, kito.cheng, Liu, Hongtao

On Wed, May 15, 2024 at 1:36 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,😊!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >       PR target/51492
> >       PR target/112600
> >
> > gcc/ChangeLog:
> >
> >       * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >       to the return true switch case(s).
> >       * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >       * match.pd: Add unsigned SAT_ADD match(es).
> >       * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >       us/ssadd.
> >       * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >       extern func decl generated in match.pd match.
> >       (match_saturation_arith): New func impl to match the saturation arith.
> >       (math_opts_dom_walker::after_dom_children): Try match saturation
> >       arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >                             smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..0f9c34fa897 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_left_part_2 @0 @1)
> > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > integer_zerop)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > +   because the sub part of left_part_2 cannot work with right_part_1.
> > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > +
> > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index e8c804f09b7..62da1c5ee08 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static void
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > +{
> > +  gcall *call = NULL;
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +                                      OPTIMIZE_FOR_BOTH))
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +    }
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >             break;
> >
> >           case BIT_IOR_EXPR:
> > +           match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > +           /* fall-through  */
> >           case BIT_XOR_EXPR:
> >             match_uaddc_usubc (&gsi, stmt, code);
> >             break;
> > --
> > 2.34.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-16  8:10     ` Richard Biener
@ 2024-05-16  9:34       ` Li, Pan2
  2024-05-16 11:58         ` Richard Biener
  0 siblings, 1 reply; 13+ messages in thread
From: Li, Pan2 @ 2024-05-16  9:34 UTC (permalink / raw)
  To: Richard Biener
  Cc: Tamar Christina, gcc-patches, juzhe.zhong, kito.cheng, Liu, Hongtao

> OK.

Thanks Richard for help and coaching. To double confirm, are you OK with this patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, May 16, 2024 4:10 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

On Wed, May 15, 2024 at 1:36 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,😊!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >       PR target/51492
> >       PR target/112600
> >
> > gcc/ChangeLog:
> >
> >       * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >       to the return true switch case(s).
> >       * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >       * match.pd: Add unsigned SAT_ADD match(es).
> >       * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >       us/ssadd.
> >       * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >       extern func decl generated in match.pd match.
> >       (match_saturation_arith): New func impl to match the saturation arith.
> >       (math_opts_dom_walker::after_dom_children): Try match saturation
> >       arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >                             smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..0f9c34fa897 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_left_part_2 @0 @1)
> > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > integer_zerop)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > +   because the sub part of left_part_2 cannot work with right_part_1.
> > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > +
> > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index e8c804f09b7..62da1c5ee08 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static void
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > +{
> > +  gcall *call = NULL;
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +                                      OPTIMIZE_FOR_BOTH))
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +    }
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >             break;
> >
> >           case BIT_IOR_EXPR:
> > +           match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > +           /* fall-through  */
> >           case BIT_XOR_EXPR:
> >             match_uaddc_usubc (&gsi, stmt, code);
> >             break;
> > --
> > 2.34.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-16  9:34       ` Li, Pan2
@ 2024-05-16 11:58         ` Richard Biener
  2024-05-16 12:00           ` Li, Pan2
  2024-05-16 13:32           ` Jeff Law
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Biener @ 2024-05-16 11:58 UTC (permalink / raw)
  To: Li, Pan2
  Cc: Tamar Christina, gcc-patches, juzhe.zhong, kito.cheng, Liu, Hongtao

On Thu, May 16, 2024 at 11:35 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,😊!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -----Original Message-----
> > > From: pan2.li@intel.com <pan2.li@intel.com>
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> > > int
> > >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >       PR target/51492
> > >       PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >       * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >       to the return true switch case(s).
> > >       * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >       * match.pd: Add unsigned SAT_ADD match(es).
> > >       * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >       us/ssadd.
> > >       * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >       extern func decl generated in match.pd match.
> > >       (match_saturation_arith): New func impl to match the saturation arith.
> > >       (math_opts_dom_walker::after_dom_children): Try match saturation
> > >       arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li <pan2.li@intel.com>
> > > ---
> > >  gcc/internal-fn.cc        |  1 +
> > >  gcc/internal-fn.def       |  2 ++
> > >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> > >  gcc/optabs.def            |  4 +--
> > >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 0a7053c2286..73045ca8c8c 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_UBSAN_CHECK_MUL:
> > >      case IFN_ADD_OVERFLOW:
> > >      case IFN_MUL_OVERFLOW:
> > > +    case IFN_SAT_ADD:
> > >      case IFN_VEC_WIDEN_PLUS:
> > >      case IFN_VEC_WIDEN_PLUS_LO:
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index 848bb9dbff3..25badbb86e5 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > > | ECF_NOTHROW, first,
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > > first,
> > >                             smulhrs, umulhrs, binary)
> > >
> > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > > binary)
> > > +
> > >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> > >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 07e743ae464..0f9c34fa897 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         || POINTER_TYPE_P (itype))
> > >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> > >
> > > +/* Unsigned Saturation Add */
> > > +(match (usadd_left_part_1 @0 @1)
> > > + (plus:c @0 @1)
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_left_part_2 @0 @1)
> > > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (lt (plus:c @0 @1) @0)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (gt @0 (plus:c @0 @1))))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_2 @0 @1)
> > > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > > integer_zerop)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > > +   because the sub part of left_part_2 cannot work with right_part_1.
> > > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > > +
> > > +/* Unsigned saturation add, case 1 (branchless):
> > > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > > +
> > > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > > +
> > >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> > >     x >  y  &&  x == XXX_MIN  -->  false . */
> > >  (for eqne (eq ne)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..3f2cb46aff8 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> > >  OPTAB_NX(add_optab, "add$Q$a3")
> > >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> > >  OPTAB_VX(addv_optab, "add$F$a3")
> > > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
> > >  OPTAB_NX(sub_optab, "sub$F$a3")
> > >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > > index e8c804f09b7..62da1c5ee08 100644
> > > --- a/gcc/tree-ssa-math-opts.cc
> > > +++ b/gcc/tree-ssa-math-opts.cc
> > > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > > *cast_stmt, gimple *&use_stmt,
> > >    return 0;
> > >  }
> > >
> > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > > +
> > > +/*
> > > + * Try to match saturation arith pattern(s).
> > > + *   1. SAT_ADD (unsigned)
> > > + *      _7 = _4 + _6;
> > > + *      _8 = _4 > _7;
> > > + *      _9 = (long unsigned int) _8;
> > > + *      _10 = -_9;
> > > + *      _12 = _7 | _10;
> > > + *      =>
> > > + *      _12 = .SAT_ADD (_4, _6);  */
> > > +static void
> > > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > > +{
> > > +  gcall *call = NULL;
> > > +
> > > +  tree ops[2];
> > > +  tree lhs = gimple_assign_lhs (stmt);
> > > +
> > > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > > +                                      OPTIMIZE_FOR_BOTH))
> > > +    {
> > > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > > +      gimple_call_set_lhs (call, lhs);
> > > +      gsi_replace (gsi, call, true);
> > > +    }
> > > +}
> > > +
> > >  /* Recognize for unsigned x
> > >     x = y - z;
> > >     if (x > y)
> > > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > > (basic_block bb)
> > >             break;
> > >
> > >           case BIT_IOR_EXPR:
> > > +           match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > > +           /* fall-through  */
> > >           case BIT_XOR_EXPR:
> > >             match_uaddc_usubc (&gsi, stmt, code);
> > >             break;
> > > --
> > > 2.34.1
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-16 11:58         ` Richard Biener
@ 2024-05-16 12:00           ` Li, Pan2
  2024-05-16 13:32           ` Jeff Law
  1 sibling, 0 replies; 13+ messages in thread
From: Li, Pan2 @ 2024-05-16 12:00 UTC (permalink / raw)
  To: Richard Biener
  Cc: Tamar Christina, gcc-patches, juzhe.zhong, kito.cheng, Liu, Hongtao

> For the series, the riscv specific part of course needs riscv approval.

Thanks a lot, have a nice day!

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, May 16, 2024 7:59 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

On Thu, May 16, 2024 at 11:35 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,😊!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -----Original Message-----
> > > From: pan2.li@intel.com <pan2.li@intel.com>
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> > > int
> > >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >       PR target/51492
> > >       PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >       * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >       to the return true switch case(s).
> > >       * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >       * match.pd: Add unsigned SAT_ADD match(es).
> > >       * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >       us/ssadd.
> > >       * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >       extern func decl generated in match.pd match.
> > >       (match_saturation_arith): New func impl to match the saturation arith.
> > >       (math_opts_dom_walker::after_dom_children): Try match saturation
> > >       arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li <pan2.li@intel.com>
> > > ---
> > >  gcc/internal-fn.cc        |  1 +
> > >  gcc/internal-fn.def       |  2 ++
> > >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> > >  gcc/optabs.def            |  4 +--
> > >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 0a7053c2286..73045ca8c8c 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_UBSAN_CHECK_MUL:
> > >      case IFN_ADD_OVERFLOW:
> > >      case IFN_MUL_OVERFLOW:
> > > +    case IFN_SAT_ADD:
> > >      case IFN_VEC_WIDEN_PLUS:
> > >      case IFN_VEC_WIDEN_PLUS_LO:
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index 848bb9dbff3..25badbb86e5 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> > > | ECF_NOTHROW, first,
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > > first,
> > >                             smulhrs, umulhrs, binary)
> > >
> > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > > binary)
> > > +
> > >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> > >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 07e743ae464..0f9c34fa897 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         || POINTER_TYPE_P (itype))
> > >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> > >
> > > +/* Unsigned Saturation Add */
> > > +(match (usadd_left_part_1 @0 @1)
> > > + (plus:c @0 @1)
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_left_part_2 @0 @1)
> > > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (lt (plus:c @0 @1) @0)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (gt @0 (plus:c @0 @1))))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_2 @0 @1)
> > > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > > integer_zerop)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > > +   because the sub part of left_part_2 cannot work with right_part_1.
> > > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > > +
> > > +/* Unsigned saturation add, case 1 (branchless):
> > > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > > +
> > > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > > +
> > >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> > >     x >  y  &&  x == XXX_MIN  -->  false . */
> > >  (for eqne (eq ne)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..3f2cb46aff8 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> > >  OPTAB_NX(add_optab, "add$Q$a3")
> > >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> > >  OPTAB_VX(addv_optab, "add$F$a3")
> > > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
> > >  OPTAB_NX(sub_optab, "sub$F$a3")
> > >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > > index e8c804f09b7..62da1c5ee08 100644
> > > --- a/gcc/tree-ssa-math-opts.cc
> > > +++ b/gcc/tree-ssa-math-opts.cc
> > > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > > *cast_stmt, gimple *&use_stmt,
> > >    return 0;
> > >  }
> > >
> > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > > +
> > > +/*
> > > + * Try to match saturation arith pattern(s).
> > > + *   1. SAT_ADD (unsigned)
> > > + *      _7 = _4 + _6;
> > > + *      _8 = _4 > _7;
> > > + *      _9 = (long unsigned int) _8;
> > > + *      _10 = -_9;
> > > + *      _12 = _7 | _10;
> > > + *      =>
> > > + *      _12 = .SAT_ADD (_4, _6);  */
> > > +static void
> > > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > > +{
> > > +  gcall *call = NULL;
> > > +
> > > +  tree ops[2];
> > > +  tree lhs = gimple_assign_lhs (stmt);
> > > +
> > > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > > +                                      OPTIMIZE_FOR_BOTH))
> > > +    {
> > > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > > +      gimple_call_set_lhs (call, lhs);
> > > +      gsi_replace (gsi, call, true);
> > > +    }
> > > +}
> > > +
> > >  /* Recognize for unsigned x
> > >     x = y - z;
> > >     if (x > y)
> > > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > > (basic_block bb)
> > >             break;
> > >
> > >           case BIT_IOR_EXPR:
> > > +           match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > > +           /* fall-through  */
> > >           case BIT_XOR_EXPR:
> > >             match_uaddc_usubc (&gsi, stmt, code);
> > >             break;
> > > --
> > > 2.34.1
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-16 11:58         ` Richard Biener
  2024-05-16 12:00           ` Li, Pan2
@ 2024-05-16 13:32           ` Jeff Law
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Law @ 2024-05-16 13:32 UTC (permalink / raw)
  To: Richard Biener, Li, Pan2
  Cc: Tamar Christina, gcc-patches, juzhe.zhong, kito.cheng, Liu, Hongtao



On 5/16/24 5:58 AM, Richard Biener wrote:
> On Thu, May 16, 2024 at 11:35 AM Li, Pan2 <pan2.li@intel.com> wrote:
>>
>>> OK.
>>
>> Thanks Richard for help and coaching. To double confirm, are you OK with this patch only or for the series patch(es) of SAT middle-end?
>> Thanks again for reviewing and suggestions.
> 
> For the series, the riscv specific part of course needs riscv approval.
Yea, we'll take a look at it.  Tons of stuff to go through, but this is 
definitely on the list.

jeff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-15  9:12 ` [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Tamar Christina
  2024-05-15 11:36   ` Li, Pan2
@ 2024-05-17  3:18   ` Li, Pan2
  2024-05-17 14:46     ` Tamar Christina
  1 sibling, 1 reply; 13+ messages in thread
From: Li, Pan2 @ 2024-05-17  3:18 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

Hi Tamar,

I am trying to add more shape(s) like below branch version for SAT_ADD. I suspect that widening_mul may not be the best place to take care of this shape.
Because after_dom_children almost works on bb but we actually need to find the def/use cross the bb.

Thus, is there any suggestion for branch shape? Add new simplify to match.pd works well but it is not recommended per previous discussion.

Thanks a lot for help!

Pan

-------Source code---------

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint16_t)

-------Gimple---------

uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
{
  short unsigned int _1;
  uint16_t _2;

  <bb 2> [local count: 1073741824]:
  _1 = x_3(D) + y_4(D);
  if (_1 >= x_3(D))
    goto <bb 3>; [65.00%]
  else
    goto <bb 4>; [35.00%]

  <bb 3> [local count: 697932184]:

  <bb 4> [local count: 1073741824]:
  # _2 = PHI <65535(2), _1(3)>
  return _2;
}

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Wednesday, May 15, 2024 5:12 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

Hi Pan,

Thanks!

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Wednesday, May 15, 2024 3:14 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> 	to the return true switch case(s).
> 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match(es).
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
> 	us/ssadd.
> 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> 	extern func decl generated in match.pd match.
> 	(match_saturation_arith): New func impl to match the saturation arith.
> 	(math_opts_dom_walker::after_dom_children): Try match saturation
> 	arith when IOR expr.
> 

 LGTM but you'll need an OK from Richard,

Thanks for working on this!

Tamar

> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
>  gcc/optabs.def            |  4 +--
>  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
>  5 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..0f9c34fa897 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part_1 @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_left_part_2 @0 @1)
> + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_2 @0 @1)
> + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> +   because the sub part of left_part_2 cannot work with right_part_1.
> +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> +
> +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index e8c804f09b7..62da1c5ee08 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static void
> +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> +{
> +  gcall *call = NULL;
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +					 OPTIMIZE_FOR_BOTH))
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +    }
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> (basic_block bb)
>  	      break;
> 
>  	    case BIT_IOR_EXPR:
> +	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> +	      /* fall-through  */
>  	    case BIT_XOR_EXPR:
>  	      match_uaddc_usubc (&gsi, stmt, code);
>  	      break;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-17  3:18   ` Li, Pan2
@ 2024-05-17 14:46     ` Tamar Christina
  2024-05-18  5:24       ` Li, Pan2
  0 siblings, 1 reply; 13+ messages in thread
From: Tamar Christina @ 2024-05-17 14:46 UTC (permalink / raw)
  To: Li, Pan2, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

Hi Pan,

> 
> Hi Tamar,
> 
> I am trying to add more shape(s) like below branch version for SAT_ADD. I suspect
> that widening_mul may not be the best place to take care of this shape.
> Because after_dom_children almost works on bb but we actually need to find the
> def/use cross the bb.

It actually already does this, see for example optimize_spaceship which optimizes
across basic blocks. However...

> 
> Thus, is there any suggestion for branch shape? Add new simplify to match.pd
> works well but it is not recommended per previous discussion.

The objection previously was not to introduce the IFNs at match.pd, it doesn't
mean we can't use match.pd to force the versions with branches to banchless
code so the existing patterns can deal with them as is.

...in this case something like this:

#if GIMPLE
(simplify
 (cond (ge (plus:c@3 @0 @1) @0) @3 integer_minus_onep)
  (if (direct_internal_fn_supported_p (...))
   (bit_ior @3 (negate (...)))))
#endif

Works better I think.

That is, for targets we know we can optimize it later on, or do something with it
in the vectorizer we canonicalize it.  The reason I have it guarded with the IFN is
that some target maintainers objected to replacing the branch code with branchless
code as their targets can more optimally deal with branches.

Cheers,
Tamar
> 
> Thanks a lot for help!
> 
> Pan
> 
> -------Source code---------
> 
> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint16_t)
> 
> -------Gimple---------
> 
> uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
> {
>   short unsigned int _1;
>   uint16_t _2;
> 
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
>     goto <bb 3>; [65.00%]
>   else
>     goto <bb 4>; [35.00%]
> 
>   <bb 3> [local count: 697932184]:
> 
>   <bb 4> [local count: 1073741824]:
>   # _2 = PHI <65535(2), _1(3)>
>   return _2;
> }
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> Thanks!
> 
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > 	to the return true switch case(s).
> > 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match(es).
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > 	us/ssadd.
> > 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > 	extern func decl generated in match.pd match.
> > 	(match_saturation_arith): New func impl to match the saturation arith.
> > 	(math_opts_dom_walker::after_dom_children): Try match saturation
> > 	arith when IOR expr.
> >
> 
>  LGTM but you'll need an OK from Richard,
> 
> Thanks for working on this!
> 
> Tamar
> 
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..0f9c34fa897 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_left_part_2 @0 @1)
> > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > integer_zerop)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > +   because the sub part of left_part_2 cannot work with right_part_1.
> > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > +
> > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index e8c804f09b7..62da1c5ee08 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static void
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > +{
> > +  gcall *call = NULL;
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +					 OPTIMIZE_FOR_BOTH))
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +    }
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >  	      break;
> >
> >  	    case BIT_IOR_EXPR:
> > +	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > +	      /* fall-through  */
> >  	    case BIT_XOR_EXPR:
> >  	      match_uaddc_usubc (&gsi, stmt, code);
> >  	      break;
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-17 14:46     ` Tamar Christina
@ 2024-05-18  5:24       ` Li, Pan2
  0 siblings, 0 replies; 13+ messages in thread
From: Li, Pan2 @ 2024-05-18  5:24 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

Thanks Tamer for enlightening, will have a try for the ingenious idea!

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Friday, May 17, 2024 10:46 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

Hi Pan,

> 
> Hi Tamar,
> 
> I am trying to add more shape(s) like below branch version for SAT_ADD. I suspect
> that widening_mul may not be the best place to take care of this shape.
> Because after_dom_children almost works on bb but we actually need to find the
> def/use cross the bb.

It actually already does this, see for example optimize_spaceship which optimizes
across basic blocks. However...

> 
> Thus, is there any suggestion for branch shape? Add new simplify to match.pd
> works well but it is not recommended per previous discussion.

The objection previously was not to introduce the IFNs at match.pd, it doesn't
mean we can't use match.pd to force the versions with branches to banchless
code so the existing patterns can deal with them as is.

...in this case something like this:

#if GIMPLE
(simplify
 (cond (ge (plus:c@3 @0 @1) @0) @3 integer_minus_onep)
  (if (direct_internal_fn_supported_p (...))
   (bit_ior @3 (negate (...)))))
#endif

Works better I think.

That is, for targets we know we can optimize it later on, or do something with it
in the vectorizer we canonicalize it.  The reason I have it guarded with the IFN is
that some target maintainers objected to replacing the branch code with branchless
code as their targets can more optimally deal with branches.

Cheers,
Tamar
> 
> Thanks a lot for help!
> 
> Pan
> 
> -------Source code---------
> 
> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint16_t)
> 
> -------Gimple---------
> 
> uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
> {
>   short unsigned int _1;
>   uint16_t _2;
> 
>   <bb 2> [local count: 1073741824]:
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
>     goto <bb 3>; [65.00%]
>   else
>     goto <bb 4>; [35.00%]
> 
>   <bb 3> [local count: 697932184]:
> 
>   <bb 4> [local count: 1073741824]:
>   # _2 = PHI <65535(2), _1(3)>
>   return _2;
> }
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> Thanks!
> 
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > 	to the return true switch case(s).
> > 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match(es).
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > 	us/ssadd.
> > 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > 	extern func decl generated in match.pd match.
> > 	(match_saturation_arith): New func impl to match the saturation arith.
> > 	(math_opts_dom_walker::after_dom_children): Try match saturation
> > 	arith when IOR expr.
> >
> 
>  LGTM but you'll need an OK from Richard,
> 
> Thanks for working on this!
> 
> Tamar
> 
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 51 +++++++++++++++++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 ++++++++++++++++++++++++
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..0f9c34fa897 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,57 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_left_part_2 @0 @1)
> > + (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> > integer_zerop)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> > +   because the sub part of left_part_2 cannot work with right_part_1.
> > +   For example, left_part_2 pattern focus one .ADD_OVERFLOW but the
> > +   right_part_1 has nothing to do with .ADD_OVERFLOW.  */
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_1 @0 @1) (usadd_right_part_1 @0 @1)))
> > +
> > +/* Unsigned saturation add, case 2 (branchless with .ADD_OVERFLOW).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index e8c804f09b7..62da1c5ee08 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4086,6 +4086,36 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static void
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gassign *stmt)
> > +{
> > +  gcall *call = NULL;
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +					 OPTIMIZE_FOR_BOTH))
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +    }
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -6048,6 +6078,8 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >  	      break;
> >
> >  	    case BIT_IOR_EXPR:
> > +	      match_saturation_arith (&gsi, as_a<gassign *> (stmt));
> > +	      /* fall-through  */
> >  	    case BIT_XOR_EXPR:
> >  	      match_uaddc_usubc (&gsi, stmt, code);
> >  	      break;
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-18  5:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-15  2:14 [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
2024-05-15  2:14 ` [PATCH v5 2/3] Vect: Support new IFN SAT_ADD for unsigned vector int pan2.li
2024-05-15  2:14 ` [PATCH v5 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li
2024-05-15  9:12 ` [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Tamar Christina
2024-05-15 11:36   ` Li, Pan2
2024-05-16  8:10     ` Richard Biener
2024-05-16  9:34       ` Li, Pan2
2024-05-16 11:58         ` Richard Biener
2024-05-16 12:00           ` Li, Pan2
2024-05-16 13:32           ` Jeff Law
2024-05-17  3:18   ` Li, Pan2
2024-05-17 14:46     ` Tamar Christina
2024-05-18  5:24       ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).