[PATCH v1] Internal-fn: Introduce new internal function SAT

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD
@ 2024-04-06 12:07 pan2.li
  2024-04-07  7:03 ` [PATCH v2] " pan2.li
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: pan2.li @ 2024-04-06 12:07 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, yanzhang.wang, tamar.christina,
	richard.guenther, hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd<mode>3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD vector.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
	decl to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
	emit the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
	vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
	to expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD scalar.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.
	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match and simply.
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
	to build the IFN_SAT_ADD gimple call.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-10.c: New test.
	* gcc.target/riscv/sat_u_add-run-11.c: New test.
	* gcc.target/riscv/sat_u_add-run-12.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-5.c: New test.
	* gcc.target/riscv/sat_u_add-run-6.c: New test.
	* gcc.target/riscv/sat_u_add-run-7.c: New test.
	* gcc.target/riscv/sat_u_add-run-8.c: New test.
	* gcc.target/riscv/sat_u_add-run-9.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 ++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 +++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 gcc/internal-fn.cc                            |  1 +
 gcc/internal-fn.def                           |  3 +
 gcc/match.pd                                  | 64 +++++++++++++++
 gcc/optabs.def                                |  4 +-
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 41 ++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 44 +++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-5.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-6.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-7.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-8.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 ++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 79 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 44 +++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 50 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 41 ++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 38 +++++++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-10.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-11.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-12.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-5.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-6.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-7.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-8.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-9.c        | 25 ++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 gcc/tree-vect-patterns.cc                     | 61 ++++++++++++++
 46 files changed, 1915 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..06a4c34863e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b8735593805..fefd9a1c2c4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -619,6 +620,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..eadbc63431b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index fe9976bfffe..519a7684cc4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10840,6 +10840,53 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
   return true;
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0346cc3859d..28d26579c3a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3839,6 +3839,17 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
    (set (attr "length") (const_int 8))])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8b1c24c5d79..58abc2a2f9e 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4073,8 +4073,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4083,10 +4083,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5269f0ac528..e517ea7fbfb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4181,6 +4181,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..47326b7033c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, first,
+			      ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 15a1e7350d4..6f8cdf074ed 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* Unsigned saturation add. Case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Unsigned saturation add. Case 2 (branch):
+   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
+   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
+(simplify
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..4fb8b233ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..10c112b77b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..281036ea0ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f392533f114
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
new file mode 100644
index 00000000000..5a0e73303cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
new file mode 100644
index 00000000000..b3efc9243e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
new file mode 100644
index 00000000000..f478c244ff4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
new file mode 100644
index 00000000000..4201b31eb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
new file mode 100644
index 00000000000..35ec9ea3455
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
new file mode 100644
index 00000000000..8b1abdb4ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
new file mode 100644
index 00000000000..8c72b567590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
new file mode 100644
index 00000000000..f454f3997ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..f233c74acfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,79 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_SAT_U_ADD_FMT_2(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_2 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : -1;   \
+}
+
+#define DEF_SAT_U_ADD_FMT_3(T, MAX)        \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_3 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : MAX;  \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_2(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : -1;                       \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_3(T, MAX)                              \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : MAX;                      \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
+#define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..b348d93f938
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..df54b984110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..6ff2e6ac52b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_2:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_3:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..1585f9a231f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffff)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
new file mode 100644
index 00000000000..0b675666dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
new file mode 100644
index 00000000000..ac9809e47c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
new file mode 100644
index 00000000000..110e9b14d7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
new file mode 100644
index 00000000000..508531c09d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
new file mode 100644
index 00000000000..99b5c3a39f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
new file mode 100644
index 00000000000..13f59548935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
new file mode 100644
index 00000000000..cdbea7b1b2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
new file mode 100644
index 00000000000..04a857aed58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4f491c6b833..51b53dbb16f 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
 #include "gimple-range.h"
+#include "gimple-match-auto.h"
 
 
 /* TODO:  Note the vectorizer still builds COND_EXPRs with GENERIC compares
@@ -4498,6 +4499,65 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+static gimple *
+vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
+			 tree op_0, tree op_1)
+{
+  tree itype = TREE_TYPE (op_0);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype == NULL_TREE)
+    return NULL;
+
+  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
+    return NULL;
+
+  *type_out = vtype;
+
+  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
+  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_nothrow (call, /* nothrow_p */ true);
+  gimple_set_location (call, gimple_location (last_stmt));
+
+  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+
+  return call;
+}
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = stmt_vinfo->stmt;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
+					      res_ops[0], res_ops[1]);
+      if (call)
+	return call;
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6998,6 +7058,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] Internal-fn: Introduce new internal function SAT_ADD
  2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
@ 2024-04-07  7:03 ` pan2.li
  2024-04-28 12:10   ` Li, Pan2
  2024-04-29  7:53 ` [PATCH v3] " pan2.li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: pan2.li @ 2024-04-07  7:03 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, yanzhang.wang, tamar.christina,
	richard.guenther, hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

Update in v2:
* Fix one failure for x86 bootstrap.

Original log:

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd<mode>3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD vector.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
	decl to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
	emit the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
	vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
	to expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD scalar.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.
	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match and simply.
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
	to build the IFN_SAT_ADD gimple call.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-10.c: New test.
	* gcc.target/riscv/sat_u_add-run-11.c: New test.
	* gcc.target/riscv/sat_u_add-run-12.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-5.c: New test.
	* gcc.target/riscv/sat_u_add-run-6.c: New test.
	* gcc.target/riscv/sat_u_add-run-7.c: New test.
	* gcc.target/riscv/sat_u_add-run-8.c: New test.
	* gcc.target/riscv/sat_u_add-run-9.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 ++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 +++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 gcc/internal-fn.cc                            |  1 +
 gcc/internal-fn.def                           |  3 +
 gcc/match.pd                                  | 64 +++++++++++++++
 gcc/optabs.def                                |  4 +-
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 41 ++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 44 +++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-5.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-6.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-7.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-8.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 ++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 79 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 44 +++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 50 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 41 ++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 38 +++++++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-10.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-11.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-12.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-5.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-6.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-7.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-8.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-9.c        | 25 ++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 gcc/tree-vect-patterns.cc                     | 62 +++++++++++++++
 46 files changed, 1916 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..06a4c34863e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b8735593805..fefd9a1c2c4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -619,6 +620,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..eadbc63431b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index fe9976bfffe..519a7684cc4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10840,6 +10840,53 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
   return true;
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0346cc3859d..28d26579c3a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3839,6 +3839,17 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
    (set (attr "length") (const_int 8))])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8b1c24c5d79..58abc2a2f9e 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4073,8 +4073,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4083,10 +4083,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5269f0ac528..e517ea7fbfb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4181,6 +4181,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..47326b7033c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, first,
+			      ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 15a1e7350d4..6f8cdf074ed 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* Unsigned saturation add. Case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Unsigned saturation add. Case 2 (branch):
+   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
+   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
+(simplify
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..4fb8b233ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..10c112b77b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..281036ea0ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f392533f114
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
new file mode 100644
index 00000000000..5a0e73303cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
new file mode 100644
index 00000000000..b3efc9243e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
new file mode 100644
index 00000000000..f478c244ff4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
new file mode 100644
index 00000000000..4201b31eb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
new file mode 100644
index 00000000000..35ec9ea3455
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
new file mode 100644
index 00000000000..8b1abdb4ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
new file mode 100644
index 00000000000..8c72b567590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
new file mode 100644
index 00000000000..f454f3997ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..f233c74acfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,79 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_SAT_U_ADD_FMT_2(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_2 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : -1;   \
+}
+
+#define DEF_SAT_U_ADD_FMT_3(T, MAX)        \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_3 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : MAX;  \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_2(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : -1;                       \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_3(T, MAX)                              \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : MAX;                      \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
+#define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..b348d93f938
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..df54b984110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..6ff2e6ac52b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_2:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_3:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..1585f9a231f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffff)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
new file mode 100644
index 00000000000..0b675666dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
new file mode 100644
index 00000000000..ac9809e47c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
new file mode 100644
index 00000000000..110e9b14d7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
new file mode 100644
index 00000000000..508531c09d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
new file mode 100644
index 00000000000..99b5c3a39f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
new file mode 100644
index 00000000000..13f59548935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
new file mode 100644
index 00000000000..cdbea7b1b2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
new file mode 100644
index 00000000000..04a857aed58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4f491c6b833..44ca182cfa3 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4498,6 +4498,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+static gimple *
+vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
+			 tree op_0, tree op_1)
+{
+  tree itype = TREE_TYPE (op_0);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype == NULL_TREE)
+    return NULL;
+
+  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
+    return NULL;
+
+  *type_out = vtype;
+
+  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
+  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_nothrow (call, /* nothrow_p */ true);
+  gimple_set_location (call, gimple_location (last_stmt));
+
+  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+
+  return call;
+}
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = stmt_vinfo->stmt;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
+					      res_ops[0], res_ops[1]);
+      if (call)
+	return call;
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6998,6 +7059,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2] Internal-fn: Introduce new internal function SAT_ADD
  2024-04-07  7:03 ` [PATCH v2] " pan2.li
@ 2024-04-28 12:10   ` Li, Pan2
  0 siblings, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2024-04-28 12:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther, Liu, Hongtao

Kinding ping for SAT_ADD.

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Sunday, April 7, 2024 3:03 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Wang, Yanzhang <yanzhang.wang@intel.com>; tamar.christina@arm.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Li, Pan2 <pan2.li@intel.com>
Subject: [PATCH v2] Internal-fn: Introduce new internal function SAT_ADD

From: Pan Li <pan2.li@intel.com>

Update in v2:
* Fix one failure for x86 bootstrap.

Original log:

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd<mode>3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD vector.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
	decl to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
	emit the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
	vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
	to expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD scalar.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.
	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match and simply.
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
	to build the IFN_SAT_ADD gimple call.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-10.c: New test.
	* gcc.target/riscv/sat_u_add-run-11.c: New test.
	* gcc.target/riscv/sat_u_add-run-12.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-5.c: New test.
	* gcc.target/riscv/sat_u_add-run-6.c: New test.
	* gcc.target/riscv/sat_u_add-run-7.c: New test.
	* gcc.target/riscv/sat_u_add-run-8.c: New test.
	* gcc.target/riscv/sat_u_add-run-9.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 ++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 +++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 gcc/internal-fn.cc                            |  1 +
 gcc/internal-fn.def                           |  3 +
 gcc/match.pd                                  | 64 +++++++++++++++
 gcc/optabs.def                                |  4 +-
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 41 ++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 44 +++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-5.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-6.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-7.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-8.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 ++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 79 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 44 +++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 50 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 41 ++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 38 +++++++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-10.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-11.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-12.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-5.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-6.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-7.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-8.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-9.c        | 25 ++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 gcc/tree-vect-patterns.cc                     | 62 +++++++++++++++
 46 files changed, 1916 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..06a4c34863e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index b8735593805..fefd9a1c2c4 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -619,6 +620,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..eadbc63431b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index fe9976bfffe..519a7684cc4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10840,6 +10840,53 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
   return true;
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0346cc3859d..28d26579c3a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3839,6 +3839,17 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
    (set (attr "length") (const_int 8))])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 8b1c24c5d79..58abc2a2f9e 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4073,8 +4073,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4083,10 +4083,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 5269f0ac528..e517ea7fbfb 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4181,6 +4181,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..47326b7033c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, first,
+			      ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 15a1e7350d4..6f8cdf074ed 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* Unsigned saturation add. Case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Unsigned saturation add. Case 2 (branch):
+   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
+   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
+(simplify
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..4fb8b233ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..10c112b77b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..281036ea0ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f392533f114
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
new file mode 100644
index 00000000000..5a0e73303cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
new file mode 100644
index 00000000000..b3efc9243e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
new file mode 100644
index 00000000000..f478c244ff4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
new file mode 100644
index 00000000000..4201b31eb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
new file mode 100644
index 00000000000..35ec9ea3455
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
new file mode 100644
index 00000000000..8b1abdb4ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
new file mode 100644
index 00000000000..8c72b567590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
new file mode 100644
index 00000000000..f454f3997ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..f233c74acfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,79 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_SAT_U_ADD_FMT_2(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_2 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : -1;   \
+}
+
+#define DEF_SAT_U_ADD_FMT_3(T, MAX)        \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_3 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : MAX;  \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_2(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : -1;                       \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_3(T, MAX)                              \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : MAX;                      \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
+#define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..b348d93f938
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..df54b984110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..6ff2e6ac52b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_2:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_3:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..1585f9a231f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffff)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
new file mode 100644
index 00000000000..0b675666dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
new file mode 100644
index 00000000000..ac9809e47c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
new file mode 100644
index 00000000000..110e9b14d7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
new file mode 100644
index 00000000000..508531c09d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
new file mode 100644
index 00000000000..99b5c3a39f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
new file mode 100644
index 00000000000..13f59548935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
new file mode 100644
index 00000000000..cdbea7b1b2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
new file mode 100644
index 00000000000..04a857aed58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 4f491c6b833..44ca182cfa3 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4498,6 +4498,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+static gimple *
+vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
+			 tree op_0, tree op_1)
+{
+  tree itype = TREE_TYPE (op_0);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype == NULL_TREE)
+    return NULL;
+
+  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
+    return NULL;
+
+  *type_out = vtype;
+
+  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
+  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_nothrow (call, /* nothrow_p */ true);
+  gimple_set_location (call, gimple_location (last_stmt));
+
+  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+
+  return call;
+}
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = stmt_vinfo->stmt;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
+					      res_ops[0], res_ops[1]);
+      if (call)
+	return call;
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6998,6 +7059,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
  2024-04-07  7:03 ` [PATCH v2] " pan2.li
@ 2024-04-29  7:53 ` pan2.li
  2024-05-01 17:06   ` Tamar Christina
  2024-05-06 14:48 ` [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: pan2.li @ 2024-04-29  7:53 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

Update in v3:
* Rebase upstream for conflict.

Update in v2:
* Fix one failure for x86 bootstrap.

Original log:

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below example:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2
  vse64.v v1,0(a0)
  ...

To limit the patch size for review, only unsigned version of
usadd<mode>3 are involved here. The signed version will be covered
in the underlying patch(es).

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD vector.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
	decl to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
	emit the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
	vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
	to expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
	for unsigned SAT_ADD scalar.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.
	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match and simply.
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
	to build the IFN_SAT_ADD gimple call.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-10.c: New test.
	* gcc.target/riscv/sat_u_add-run-11.c: New test.
	* gcc.target/riscv/sat_u_add-run-12.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-5.c: New test.
	* gcc.target/riscv/sat_u_add-run-6.c: New test.
	* gcc.target/riscv/sat_u_add-run-7.c: New test.
	* gcc.target/riscv/sat_u_add-run-8.c: New test.
	* gcc.target/riscv/sat_u_add-run-9.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 ++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 +++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 gcc/internal-fn.cc                            |  1 +
 gcc/internal-fn.def                           |  3 +
 gcc/match.pd                                  | 64 +++++++++++++++
 gcc/optabs.def                                |  4 +-
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 41 ++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 44 +++++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 44 +++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-5.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-6.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-7.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-8.c   | 75 ++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 ++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 79 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 44 +++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 50 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 41 ++++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 38 +++++++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-10.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-11.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-12.c       | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-5.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-6.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-7.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-8.c        | 25 ++++++
 .../gcc.target/riscv/sat_u_add-run-9.c        | 25 ++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 gcc/tree-vect-patterns.cc                     | 62 +++++++++++++++
 46 files changed, 1916 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..7ceeb8d64f6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5d46a29d8b7..e211ffa45ce 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -133,6 +133,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -621,6 +622,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..eadbc63431b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0519e0679ed..e52521cd59f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11025,6 +11025,53 @@ riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 455715ab2f7..34ef10c3cf0 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3809,6 +3809,17 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
    (set (attr "length") (const_int 8))])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 228d0f9a766..f8ed61f4a13 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4062,8 +4062,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4072,10 +4072,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 2c764441cde..1104bb03b41 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..47326b7033c 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW, first,
+			      ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e6..0b0298df829 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part_1 @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_1 @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part_2 @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* Unsigned saturation add. Case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Unsigned saturation add. Case 2 (branch):
+   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
+   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
+(simplify
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+(simplify
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize) (IFN_SAT_ADD @0 @1)))
+
+/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_1 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c
+  (usadd_left_part_1 @0 @1)
+  (usadd_right_part_2 @0 @1))
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
+ (if (optimize)))
+(match (unsigned_integer_sat_add @0 @1)
+ (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
+ (if (optimize)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..4fb8b233ee9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** vec_sat_u_add_uint8_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..10c112b77b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** vec_sat_u_add_uint16_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..281036ea0ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** vec_sat_u_add_uint32_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f392533f114
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_2:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** vec_sat_u_add_uint64_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 12 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
new file mode 100644
index 00000000000..5a0e73303cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
new file mode 100644
index 00000000000..b3efc9243e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
new file mode 100644
index 00000000000..f478c244ff4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
new file mode 100644
index 00000000000..4201b31eb3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-5.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
new file mode 100644
index 00000000000..35ec9ea3455
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-6.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
new file mode 100644
index 00000000000..8b1abdb4ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-7.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
new file mode 100644
index 00000000000..8c72b567590
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-8.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_2
+
+DEF_VEC_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
new file mode 100644
index 00000000000..f454f3997ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_3
+
+DEF_VEC_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..f233c74acfd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,79 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_SAT_U_ADD_FMT_2(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_2 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : -1;   \
+}
+
+#define DEF_SAT_U_ADD_FMT_3(T, MAX)        \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_3 (T x, T y)           \
+{                                          \
+  return (T)(x + y) >= x ? (x + y) : MAX;  \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_2(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_2 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : -1;                       \
+    }                                                                \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_3(T, MAX)                              \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (T)(x + y) >= x ? (x + y) : MAX;                      \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+#define RUN_SAT_U_ADD_FMT_2(T, x, y) sat_u_add_##T##_fmt_2(x, y)
+#define RUN_SAT_U_ADD_FMT_3(T, x, y) sat_u_add_##T##_fmt_3(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_2(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_2(out, op_1, op_2, N)
+
+#define RUN_VEC_SAT_U_ADD_FMT_3(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_3(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..b348d93f938
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint8_t)
+
+/*
+** sat_u_add_uint8_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint8_t, 0xffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..df54b984110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint16_t)
+
+/*
+** sat_u_add_uint16_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint16_t, 0xffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..6ff2e6ac52b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_2:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint32_t)
+
+/*
+** sat_u_add_uint32_t_fmt_3:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint32_t, 0xffffffffu)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..1585f9a231f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_2:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_2(uint64_t)
+
+/*
+** sat_u_add_uint64_t_fmt_3:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_3(uint64_t, 0xffffffffffffffff)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 6 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
new file mode 100644
index 00000000000..0b675666dc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
new file mode 100644
index 00000000000..ac9809e47c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffu)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
new file mode 100644
index 00000000000..110e9b14d7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffffffffffffffffu)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
new file mode 100644
index 00000000000..508531c09d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-5.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
new file mode 100644
index 00000000000..99b5c3a39f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-6.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
new file mode 100644
index 00000000000..13f59548935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-7.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
new file mode 100644
index 00000000000..cdbea7b1b2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-8.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_2
+
+DEF_SAT_U_ADD_FMT_2(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
new file mode 100644
index 00000000000..04a857aed58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_3
+
+DEF_SAT_U_ADD_FMT_3(T, 0xffu)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 87c2acff386..77924cf10f8 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+static gimple *
+vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
+			 tree op_0, tree op_1)
+{
+  tree itype = TREE_TYPE (op_0);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype == NULL_TREE)
+    return NULL;
+
+  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
+    return NULL;
+
+  *type_out = vtype;
+
+  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
+  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_nothrow (call, /* nothrow_p */ true);
+  gimple_set_location (call, gimple_location (last_stmt));
+
+  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+
+  return call;
+}
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = stmt_vinfo->stmt;
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
+					      res_ops[0], res_ops[1]);
+      if (call)
+	return call;
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-04-29  7:53 ` [PATCH v3] " pan2.li
@ 2024-05-01 17:06   ` Tamar Christina
  2024-05-02  3:10     ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Tamar Christina @ 2024-05-01 17:06 UTC (permalink / raw)
  To: pan2.li, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, hongtao.liu

Hi,

> From: Pan Li <pan2.li@intel.com>
> 
> Update in v3:
> * Rebase upstream for conflict.
> 
> Update in v2:
> * Fix one failure for x86 bootstrap.
> 
> Original log:
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> The patch also implement the SAT_ADD in the riscv backend as
> the sample for both the scalar and vector.  Given below example:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd<mode>3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd<mode>3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
>     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
> ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v0,0(a1)
>   vle64.v v1,0(a2)
>   slli    a4,a5,3
>   sub     a3,a3,a5
>   add     a1,a1,a4
>   add     a2,a2,a4
>   vadd.vv v1,v0,v1
>   vmsgtu.vv       v0,v0,v1
>   vmerge.vim      v1,v1,-1,v0
>   vse64.v v1,0(a0)
>   ...
> 
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v1,0(a1)
>   vle64.v v2,0(a2)
>   slli    a4,a5,3
>   sub     a3,a3,a5
>   add     a1,a1,a4
>   add     a2,a2,a4
>   vsaddu.vv       v1,v1,v2
>   vse64.v v1,0(a0)
>   ...
> 
> To limit the patch size for review, only unsigned version of
> usadd<mode>3 are involved here. The signed version will be covered
> in the underlying patch(es).
> 
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> 	for unsigned SAT_ADD vector.
> 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> 	decl to expand usadd<mode>3 pattern.
> 	(expand_vec_usadd): Ditto but for vector.
> 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> 	emit the vsadd insn.
> 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> 	vector.
> 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> 	to expand usadd<mode>3 for scalar.
> 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> 	for unsigned SAT_ADD scalar.
> 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match and simply.
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> 	to build the IFN_SAT_ADD gimple call.
> 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> 	for unsigned SAT_ADD.
> 

Could you split the generic changes off from the RISCV changes? The RISCV changes need to be reviewed by the backend maintainer.

Could you also split off the vectorizer change from scalar recog one? Typically I would structure a change like this as:

1. create types/structures + scalar recogn
2. Vector recog code
3. Backend changes

Which makes review and bisect easier. I'll only focus on the generic bits.

> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 2c764441cde..1104bb03b41 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..47326b7033c 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW,
> first,
> +			      ssadd, usadd, binary)
> +

Is ECF_NOTHROW correct here? At least on most targets I believe the scalar version
can set flags/throw exceptions if the saturation happens?

>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..0b0298df829 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 

Hmm I believe Richi mentioned that he wanted the recognition done in isel?

The problem with doing it in match.pd is that it replaces the operations quite
early the pipeline. Did I miss an email perhaps? The early replacement means we
lose optimizations and things such as range calculations etc, since e.g. ranger
doesn't know these internal functions.

I think Richi will want this in islet or mult widening but I'll continue with match.pd
review just in case.

> +/* Unsigned Saturation Add */
> +(match (usadd_left_part_1 @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_2 @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))

Predicates can be overloaded, so these two can just be usadd_right_part which then...

> +
> +/* Unsigned saturation add. Case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(simplify
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_1 @0 @1))
> + (if (optimize) (IFN_SAT_ADD @0 @1)))


The optimize checks in the match.pd file are weird as it seems to check if we have
optimizations enabled?

We don't typically need to do this.

> +(simplify
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_2 @0 @1))
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +

Allows you to collapse rules like these into one line. Similarly for below.

Note  that even when moving to gimple-isel you can reuse the match.pd code by
Leveraging it to build the predicates for you and call them from another pass.
See how ctz_table_index is used for example.

Doing this, moving it to gimple-isel.cc should be easy.

> +/* Unsigned saturation add. Case 2 (branch):
> +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(simplify
> + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +(simplify
> + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +
> +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_1 @0 @1))
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_2 @0 @1))
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> + (if (optimize)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
...
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..77924cf10f8 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
>    return pattern_stmt;
>  }
> 
> +static gimple *
> +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> +			 tree op_0, tree op_1)
> +{
> +  tree itype = TREE_TYPE (op_0);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype == NULL_TREE)
> +    return NULL;
> +
> +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> OPTIMIZE_FOR_SPEED))
> +    return NULL;
> +
> +  *type_out = vtype;
> +
> +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> +  gimple_set_location (call, gimple_location (last_stmt));
> +
> +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> +
> +  return call;
> +}

The function has only one caller, you should just inline it into the pattern.

> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +			    tree *type_out)
> +{
> +  gimple *last_stmt = stmt_vinfo->stmt;
> +

STMT_VINFO_STMT (stmt_vinfo);

> +  if (!is_gimple_assign (last_stmt))
> +    return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);

Once you inline vect_sat_add_build_call you can do the check for
vtype here, which is the cheaper check so perform it early.

Otherwise this looks really good!

Thanks for working on it,

Tamar

> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +    {
> +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> +					      res_ops[0], res_ops[1]);
> +      if (call)
> +	return call;
> +    }
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
>     otherwise vectorized:
> 
> @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
> +  { vect_recog_sat_add_pattern, "sat_add" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
>    { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-05-01 17:06   ` Tamar Christina
@ 2024-05-02  3:10     ` Li, Pan2
  2024-05-02  3:25       ` Tamar Christina
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2024-05-02  3:10 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

Thanks Tamar

> Could you also split off the vectorizer change from scalar recog one? Typically I would structure a change like this as:

> 1. create types/structures + scalar recogn
> 2. Vector recog code
> 3. Backend changes

Sure thing, will rearrange the patch like this.

> Is ECF_NOTHROW correct here? At least on most targets I believe the scalar version
> can set flags/throw exceptions if the saturation happens?

I see, will remove that.

> Hmm I believe Richi mentioned that he wanted the recognition done in isel?

> The problem with doing it in match.pd is that it replaces the operations quite
> early the pipeline. Did I miss an email perhaps? The early replacement means we
> lose optimizations and things such as range calculations etc, since e.g. ranger
> doesn't know these internal functions.

> I think Richi will want this in islet or mult widening but I'll continue with match.pd
> review just in case.

If I understand is correct, Richard suggested try vectorizer patterns first and then possible isel.
Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well for SAT_ADD.
Let's wait the confirmation from Richard. Below are the original words from previous mail for reference.

>> As I said, use vectorizer patterns and possibly do instruction
>> selection at ISEL/widen_mult time.

> The optimize checks in the match.pd file are weird as it seems to check if we have
> optimizations enabled?

> We don't typically need to do this.

Sure, will remove this.

> The function has only one caller, you should just inline it into the pattern.

Sure thing.

> Once you inline vect_sat_add_build_call you can do the check for
> vtype here, which is the cheaper check so perform it early.

Sure thing.

Thanks again and will send the v4 with all comments addressed, as well as the test results.

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Thursday, May 2, 2024 1:06 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

Hi,

> From: Pan Li <pan2.li@intel.com>
> 
> Update in v3:
> * Rebase upstream for conflict.
> 
> Update in v2:
> * Fix one failure for x86 bootstrap.
> 
> Original log:
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> The patch also implement the SAT_ADD in the riscv backend as
> the sample for both the scalar and vector.  Given below example:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd<mode>3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd<mode>3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
>     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
> ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v0,0(a1)
>   vle64.v v1,0(a2)
>   slli    a4,a5,3
>   sub     a3,a3,a5
>   add     a1,a1,a4
>   add     a2,a2,a4
>   vadd.vv v1,v0,v1
>   vmsgtu.vv       v0,v0,v1
>   vmerge.vim      v1,v1,-1,v0
>   vse64.v v1,0(a0)
>   ...
> 
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
> 
> vec_sat_add_u64:
>   ...
>   vsetvli a5,a3,e64,m1,ta,ma
>   vle64.v v1,0(a1)
>   vle64.v v2,0(a2)
>   slli    a4,a5,3
>   sub     a3,a3,a5
>   add     a1,a1,a4
>   add     a2,a2,a4
>   vsaddu.vv       v1,v1,v2
>   vse64.v v1,0(a0)
>   ...
> 
> To limit the patch size for review, only unsigned version of
> usadd<mode>3 are involved here. The signed version will be covered
> in the underlying patch(es).
> 
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> 	for unsigned SAT_ADD vector.
> 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> 	decl to expand usadd<mode>3 pattern.
> 	(expand_vec_usadd): Ditto but for vector.
> 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> 	emit the vsadd insn.
> 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> 	vector.
> 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> 	to expand usadd<mode>3 for scalar.
> 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> 	for unsigned SAT_ADD scalar.
> 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match and simply.
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> 	to build the IFN_SAT_ADD gimple call.
> 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> 	for unsigned SAT_ADD.
> 

Could you split the generic changes off from the RISCV changes? The RISCV changes need to be reviewed by the backend maintainer.

Could you also split off the vectorizer change from scalar recog one? Typically I would structure a change like this as:

1. create types/structures + scalar recogn
2. Vector recog code
3. Backend changes

Which makes review and bisect easier. I'll only focus on the generic bits.

> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 2c764441cde..1104bb03b41 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..47326b7033c 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW,
> first,
> +			      ssadd, usadd, binary)
> +

Is ECF_NOTHROW correct here? At least on most targets I believe the scalar version
can set flags/throw exceptions if the saturation happens?

>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..0b0298df829 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 

Hmm I believe Richi mentioned that he wanted the recognition done in isel?

The problem with doing it in match.pd is that it replaces the operations quite
early the pipeline. Did I miss an email perhaps? The early replacement means we
lose optimizations and things such as range calculations etc, since e.g. ranger
doesn't know these internal functions.

I think Richi will want this in islet or mult widening but I'll continue with match.pd
review just in case.

> +/* Unsigned Saturation Add */
> +(match (usadd_left_part_1 @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_1 @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part_2 @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))

Predicates can be overloaded, so these two can just be usadd_right_part which then...

> +
> +/* Unsigned saturation add. Case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(simplify
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_1 @0 @1))
> + (if (optimize) (IFN_SAT_ADD @0 @1)))


The optimize checks in the match.pd file are weird as it seems to check if we have
optimizations enabled?

We don't typically need to do this.

> +(simplify
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_2 @0 @1))
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +

Allows you to collapse rules like these into one line. Similarly for below.

Note  that even when moving to gimple-isel you can reuse the match.pd code by
Leveraging it to build the predicates for you and call them from another pass.
See how ctz_table_index is used for example.

Doing this, moving it to gimple-isel.cc should be easy.

> +/* Unsigned saturation add. Case 2 (branch):
> +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(simplify
> + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +(simplify
> + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> + (if (optimize) (IFN_SAT_ADD @0 @1)))
> +
> +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_1 @0 @1))
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c
> +  (usadd_left_part_1 @0 @1)
> +  (usadd_right_part_2 @0 @1))
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> + (if (optimize)))
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> + (if (optimize)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
...
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..77924cf10f8 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
>    return pattern_stmt;
>  }
> 
> +static gimple *
> +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> +			 tree op_0, tree op_1)
> +{
> +  tree itype = TREE_TYPE (op_0);
> +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  if (vtype == NULL_TREE)
> +    return NULL;
> +
> +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> OPTIMIZE_FOR_SPEED))
> +    return NULL;
> +
> +  *type_out = vtype;
> +
> +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> +  gimple_set_location (call, gimple_location (last_stmt));
> +
> +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> +
> +  return call;
> +}

The function has only one caller, you should just inline it into the pattern.

> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +			    tree *type_out)
> +{
> +  gimple *last_stmt = stmt_vinfo->stmt;
> +

STMT_VINFO_STMT (stmt_vinfo);

> +  if (!is_gimple_assign (last_stmt))
> +    return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);

Once you inline vect_sat_add_build_call you can do the check for
vtype here, which is the cheaper check so perform it early.

Otherwise this looks really good!

Thanks for working on it,

Tamar

> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +    {
> +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> +					      res_ops[0], res_ops[1]);
> +      if (call)
> +	return call;
> +    }
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
>     otherwise vectorized:
> 
> @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
> +  { vect_recog_sat_add_pattern, "sat_add" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
>    { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-05-02  3:10     ` Li, Pan2
@ 2024-05-02  3:25       ` Tamar Christina
  2024-05-02 10:57         ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Tamar Christina @ 2024-05-02  3:25 UTC (permalink / raw)
  To: Li, Pan2, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Thursday, May 2, 2024 4:11 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Thanks Tamar
> 
> > Could you also split off the vectorizer change from scalar recog one? Typically I
> would structure a change like this as:
> 
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> 
> Sure thing, will rearrange the patch like this.
> 
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> 
> I see, will remove that.
> 
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> > The problem with doing it in match.pd is that it replaces the operations quite
> > early the pipeline. Did I miss an email perhaps? The early replacement means we
> > lose optimizations and things such as range calculations etc, since e.g. ranger
> > doesn't know these internal functions.
> 
> > I think Richi will want this in islet or mult widening but I'll continue with match.pd
> > review just in case.
> 
> If I understand is correct, Richard suggested try vectorizer patterns first and then
> possible isel.
> Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well for
> SAT_ADD.
> Let's wait the confirmation from Richard. Below are the original words from
> previous mail for reference.
> 

I think the comment he made was this

> > Given we have saturating integer alu like below, could you help to coach me the most reasonable way to represent
> > It in scalar as well as vectorize part? Sorry not familiar with this part and still dig into how it works...
> 
> As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> the other cases.
>
> As I said, use vectorizer patterns and possibly do instruction
> selection at ISEL/widen_mult time.

So he was responding for how to do it for the vectorizer and scalar parts.
Remember that the goal is not to introduce new gimple IL that can block other optimizations.
The vectorizer already introduces new IL (various IFN) but this is fine as we don't track things like ranges for
vector instructions.  So we don't loose any information here.

Now for the scalar, if we do an early replacement like in match.pd we prevent a lot of other optimizations
because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, and so at this point we don't
expect many more optimizations to happen, so it's a safe spot to insert more IL with "unknown semantics".

Was that your intention Richi?

Thanks,
Tamar

> >> As I said, use vectorizer patterns and possibly do instruction
> >> selection at ISEL/widen_mult time.
> 
> > The optimize checks in the match.pd file are weird as it seems to check if we have
> > optimizations enabled?
> 
> > We don't typically need to do this.
> 
> Sure, will remove this.
> 
> > The function has only one caller, you should just inline it into the pattern.
> 
> Sure thing.
> 
> > Once you inline vect_sat_add_build_call you can do the check for
> > vtype here, which is the cheaper check so perform it early.
> 
> Sure thing.
> 
> Thanks again and will send the v4 with all comments addressed, as well as the test
> results.
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Thursday, May 2, 2024 1:06 AM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Hi,
> 
> > From: Pan Li <pan2.li@intel.com>
> >
> > Update in v3:
> > * Rebase upstream for conflict.
> >
> > Update in v2:
> > * Fix one failure for x86 bootstrap.
> >
> > Original log:
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > The patch also implement the SAT_ADD in the riscv backend as
> > the sample for both the scalar and vector.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > For vectorize, we leverage the existing vect pattern recog to find
> > the pattern similar to scalar and let the vectorizer to perform
> > the rest part for standard name usadd<mode>3 in vector mode.
> > The riscv vector backend have insn "Vector Single-Width Saturating
> > Add and Subtract" which can be leveraged when expand the usadd<mode>3
> > in vector mode.  For example:
> >
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   unsigned i;
> >
> >   for (i = 0; i < n; i++)
> >     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> > }
> >
> > Before this patch:
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   ...
> >   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
> >   ivtmp_58 = _80 * 8;
> >   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
> >   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
> >   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
> >   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
> >   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, {
> 18446744073709551615,
> > ... }, vect__7.11_66);
> >   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
> >   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
> >   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
> >   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
> >   ivtmp_79 = ivtmp_78 - _80;
> >   ...
> > }
> >
> > vec_sat_add_u64:
> >   ...
> >   vsetvli a5,a3,e64,m1,ta,ma
> >   vle64.v v0,0(a1)
> >   vle64.v v1,0(a2)
> >   slli    a4,a5,3
> >   sub     a3,a3,a5
> >   add     a1,a1,a4
> >   add     a2,a2,a4
> >   vadd.vv v1,v0,v1
> >   vmsgtu.vv       v0,v0,v1
> >   vmerge.vim      v1,v1,-1,v0
> >   vse64.v v1,0(a0)
> >   ...
> >
> > After this patch:
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   ...
> >   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
> >   ivtmp_46 = _62 * 8;
> >   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
> >   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
> >   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
> >   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
> >   ...
> > }
> >
> > vec_sat_add_u64:
> >   ...
> >   vsetvli a5,a3,e64,m1,ta,ma
> >   vle64.v v1,0(a1)
> >   vle64.v v2,0(a2)
> >   slli    a4,a5,3
> >   sub     a3,a3,a5
> >   add     a1,a1,a4
> >   add     a2,a2,a4
> >   vsaddu.vv       v1,v1,v2
> >   vse64.v v1,0(a0)
> >   ...
> >
> > To limit the patch size for review, only unsigned version of
> > usadd<mode>3 are involved here. The signed version will be covered
> > in the underlying patch(es).
> >
> > The below test suites are passed for this patch.
> > * The riscv fully regression tests.
> > * The aarch64 fully regression tests.
> > * The x86 bootstrap tests.
> > * The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> > 	for unsigned SAT_ADD vector.
> > 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> > 	decl to expand usadd<mode>3 pattern.
> > 	(expand_vec_usadd): Ditto but for vector.
> > 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> > 	emit the vsadd insn.
> > 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> > 	vector.
> > 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> > 	to expand usadd<mode>3 for scalar.
> > 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> > 	for unsigned SAT_ADD scalar.
> > 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> > 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match and simply.
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> > 	to build the IFN_SAT_ADD gimple call.
> > 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> > 	for unsigned SAT_ADD.
> >
> 
> Could you split the generic changes off from the RISCV changes? The RISCV
> changes need to be reviewed by the backend maintainer.
> 
> Could you also split off the vectorizer change from scalar recog one? Typically I
> would structure a change like this as:
> 
> 1. create types/structures + scalar recogn
> 2. Vector recog code
> 3. Backend changes
> 
> Which makes review and bisect easier. I'll only focus on the generic bits.
> 
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 2c764441cde..1104bb03b41 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..47326b7033c 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW,
> > first,
> > +			      ssadd, usadd, binary)
> > +
> 
> Is ECF_NOTHROW correct here? At least on most targets I believe the scalar version
> can set flags/throw exceptions if the saturation happens?
> 
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..0b0298df829 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> 
> Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> The problem with doing it in match.pd is that it replaces the operations quite
> early the pipeline. Did I miss an email perhaps? The early replacement means we
> lose optimizations and things such as range calculations etc, since e.g. ranger
> doesn't know these internal functions.
> 
> I think Richi will want this in islet or mult widening but I'll continue with match.pd
> review just in case.
> 
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> 
> Predicates can be overloaded, so these two can just be usadd_right_part which
> then...
> 
> > +
> > +/* Unsigned saturation add. Case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(simplify
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_1 @0 @1))
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> 
> 
> The optimize checks in the match.pd file are weird as it seems to check if we have
> optimizations enabled?
> 
> We don't typically need to do this.
> 
> > +(simplify
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_2 @0 @1))
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +
> 
> Allows you to collapse rules like these into one line. Similarly for below.
> 
> Note  that even when moving to gimple-isel you can reuse the match.pd code by
> Leveraging it to build the predicates for you and call them from another pass.
> See how ctz_table_index is used for example.
> 
> Doing this, moving it to gimple-isel.cc should be easy.
> 
> > +/* Unsigned saturation add. Case 2 (branch):
> > +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> > +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> > +(simplify
> > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +(simplify
> > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +
> > +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_1 @0 @1))
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_2 @0 @1))
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > + (if (optimize)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> ...
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 87c2acff386..77924cf10f8 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
> >    return pattern_stmt;
> >  }
> >
> > +static gimple *
> > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> > +			 tree op_0, tree op_1)
> > +{
> > +  tree itype = TREE_TYPE (op_0);
> > +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> > +
> > +  if (vtype == NULL_TREE)
> > +    return NULL;
> > +
> > +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> > OPTIMIZE_FOR_SPEED))
> > +    return NULL;
> > +
> > +  *type_out = vtype;
> > +
> > +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> > +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> > +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> > +  gimple_set_location (call, gimple_location (last_stmt));
> > +
> > +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> > +
> > +  return call;
> > +}
> 
> The function has only one caller, you should just inline it into the pattern.
> 
> > +/*
> > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> > + *   _7 = _4 + _6;
> > + *   _8 = _4 > _7;
> > + *   _9 = (long unsigned int) _8;
> > + *   _10 = -_9;
> > + *   _12 = _7 | _10;
> > + *
> > + * And then simplied to
> > + *   _12 = .SAT_ADD (_4, _6);
> > + */
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +static gimple *
> > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> > +			    tree *type_out)
> > +{
> > +  gimple *last_stmt = stmt_vinfo->stmt;
> > +
> 
> STMT_VINFO_STMT (stmt_vinfo);
> 
> > +  if (!is_gimple_assign (last_stmt))
> > +    return NULL;
> > +
> > +  tree res_ops[2];
> > +  tree lhs = gimple_assign_lhs (last_stmt);
> 
> Once you inline vect_sat_add_build_call you can do the check for
> vtype here, which is the cheaper check so perform it early.
> 
> Otherwise this looks really good!
> 
> Thanks for working on it,
> 
> Tamar
> 
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> > +    {
> > +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> > +					      res_ops[0], res_ops[1]);
> > +      if (call)
> > +	return call;
> > +    }
> > +
> > +  return NULL;
> > +}
> > +
> >  /* Detect a signed division by a constant that wouldn't be
> >     otherwise vectorized:
> >
> > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> {
> >    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> >    { vect_recog_divmod_pattern, "divmod" },
> >    { vect_recog_mult_pattern, "mult" },
> > +  { vect_recog_sat_add_pattern, "sat_add" },
> >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> >    { vect_recog_gcond_pattern, "gcond" },
> >    { vect_recog_bool_pattern, "bool" },
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-05-02  3:25       ` Tamar Christina
@ 2024-05-02 10:57         ` Li, Pan2
  2024-05-02 12:57           ` Tamar Christina
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2024-05-02 10:57 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> So he was responding for how to do it for the vectorizer and scalar parts.
> Remember that the goal is not to introduce new gimple IL that can block other optimizations.
> The vectorizer already introduces new IL (various IFN) but this is fine as we don't track things like ranges for
> vector instructions.  So we don't loose any information here.

> Now for the scalar, if we do an early replacement like in match.pd we prevent a lot of other optimizations
> because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, and so at this point we don't
> expect many more optimizations to happen, so it's a safe spot to insert more IL with "unknown semantics".

> Was that your intention Richi?

Thanks Tamar for clear explanation, does that mean both the scalar and vector will go isel approach? If so I may
misunderstand in previous that it is only for vectorize.

Understand the point that we would like to put the pattern match late but I may have a question here.
Given SAT_ADD related pattern is sort of complicated, it is possible that the sub-expression of SAT_ADD is optimized
In early pass by others and we can hardly catch the shapes later.

For example, there is a plus expression in SAT_ADD, and in early pass it may be optimized to .ADD_OVERFLOW, and
then the pattern is quite different to aware of that in later pass.

Sorry not sure if my understanding is correct, feel free to correct me.

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Thursday, May 2, 2024 11:26 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Thursday, May 2, 2024 4:11 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Thanks Tamar
> 
> > Could you also split off the vectorizer change from scalar recog one? Typically I
> would structure a change like this as:
> 
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> 
> Sure thing, will rearrange the patch like this.
> 
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> 
> I see, will remove that.
> 
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> > The problem with doing it in match.pd is that it replaces the operations quite
> > early the pipeline. Did I miss an email perhaps? The early replacement means we
> > lose optimizations and things such as range calculations etc, since e.g. ranger
> > doesn't know these internal functions.
> 
> > I think Richi will want this in islet or mult widening but I'll continue with match.pd
> > review just in case.
> 
> If I understand is correct, Richard suggested try vectorizer patterns first and then
> possible isel.
> Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well for
> SAT_ADD.
> Let's wait the confirmation from Richard. Below are the original words from
> previous mail for reference.
> 

I think the comment he made was this

> > Given we have saturating integer alu like below, could you help to coach me the most reasonable way to represent
> > It in scalar as well as vectorize part? Sorry not familiar with this part and still dig into how it works...
> 
> As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> the other cases.
>
> As I said, use vectorizer patterns and possibly do instruction
> selection at ISEL/widen_mult time.

So he was responding for how to do it for the vectorizer and scalar parts.
Remember that the goal is not to introduce new gimple IL that can block other optimizations.
The vectorizer already introduces new IL (various IFN) but this is fine as we don't track things like ranges for
vector instructions.  So we don't loose any information here.

Now for the scalar, if we do an early replacement like in match.pd we prevent a lot of other optimizations
because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late, and so at this point we don't
expect many more optimizations to happen, so it's a safe spot to insert more IL with "unknown semantics".

Was that your intention Richi?

Thanks,
Tamar

> >> As I said, use vectorizer patterns and possibly do instruction
> >> selection at ISEL/widen_mult time.
> 
> > The optimize checks in the match.pd file are weird as it seems to check if we have
> > optimizations enabled?
> 
> > We don't typically need to do this.
> 
> Sure, will remove this.
> 
> > The function has only one caller, you should just inline it into the pattern.
> 
> Sure thing.
> 
> > Once you inline vect_sat_add_build_call you can do the check for
> > vtype here, which is the cheaper check so perform it early.
> 
> Sure thing.
> 
> Thanks again and will send the v4 with all comments addressed, as well as the test
> results.
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Thursday, May 2, 2024 1:06 AM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> Hi,
> 
> > From: Pan Li <pan2.li@intel.com>
> >
> > Update in v3:
> > * Rebase upstream for conflict.
> >
> > Update in v2:
> > * Fix one failure for x86 bootstrap.
> >
> > Original log:
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > The patch also implement the SAT_ADD in the riscv backend as
> > the sample for both the scalar and vector.  Given below example:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > For vectorize, we leverage the existing vect pattern recog to find
> > the pattern similar to scalar and let the vectorizer to perform
> > the rest part for standard name usadd<mode>3 in vector mode.
> > The riscv vector backend have insn "Vector Single-Width Saturating
> > Add and Subtract" which can be leveraged when expand the usadd<mode>3
> > in vector mode.  For example:
> >
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   unsigned i;
> >
> >   for (i = 0; i < n; i++)
> >     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> > }
> >
> > Before this patch:
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   ...
> >   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
> >   ivtmp_58 = _80 * 8;
> >   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
> >   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
> >   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
> >   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
> >   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, {
> 18446744073709551615,
> > ... }, vect__7.11_66);
> >   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
> >   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
> >   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
> >   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
> >   ivtmp_79 = ivtmp_78 - _80;
> >   ...
> > }
> >
> > vec_sat_add_u64:
> >   ...
> >   vsetvli a5,a3,e64,m1,ta,ma
> >   vle64.v v0,0(a1)
> >   vle64.v v1,0(a2)
> >   slli    a4,a5,3
> >   sub     a3,a3,a5
> >   add     a1,a1,a4
> >   add     a2,a2,a4
> >   vadd.vv v1,v0,v1
> >   vmsgtu.vv       v0,v0,v1
> >   vmerge.vim      v1,v1,-1,v0
> >   vse64.v v1,0(a0)
> >   ...
> >
> > After this patch:
> > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > {
> >   ...
> >   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
> >   ivtmp_46 = _62 * 8;
> >   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
> >   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
> >   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
> >   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
> >   ...
> > }
> >
> > vec_sat_add_u64:
> >   ...
> >   vsetvli a5,a3,e64,m1,ta,ma
> >   vle64.v v1,0(a1)
> >   vle64.v v2,0(a2)
> >   slli    a4,a5,3
> >   sub     a3,a3,a5
> >   add     a1,a1,a4
> >   add     a2,a2,a4
> >   vsaddu.vv       v1,v1,v2
> >   vse64.v v1,0(a0)
> >   ...
> >
> > To limit the patch size for review, only unsigned version of
> > usadd<mode>3 are involved here. The signed version will be covered
> > in the underlying patch(es).
> >
> > The below test suites are passed for this patch.
> > * The riscv fully regression tests.
> > * The aarch64 fully regression tests.
> > * The x86 bootstrap tests.
> > * The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> > 	for unsigned SAT_ADD vector.
> > 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> > 	decl to expand usadd<mode>3 pattern.
> > 	(expand_vec_usadd): Ditto but for vector.
> > 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> > 	emit the vsadd insn.
> > 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> > 	vector.
> > 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> > 	to expand usadd<mode>3 for scalar.
> > 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> > 	for unsigned SAT_ADD scalar.
> > 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> > 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match and simply.
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> > 	to build the IFN_SAT_ADD gimple call.
> > 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> > 	for unsigned SAT_ADD.
> >
> 
> Could you split the generic changes off from the RISCV changes? The RISCV
> changes need to be reviewed by the backend maintainer.
> 
> Could you also split off the vectorizer change from scalar recog one? Typically I
> would structure a change like this as:
> 
> 1. create types/structures + scalar recogn
> 2. Vector recog code
> 3. Backend changes
> 
> Which makes review and bisect easier. I'll only focus on the generic bits.
> 
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 2c764441cde..1104bb03b41 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..47326b7033c 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST | ECF_NOTHROW,
> > first,
> > +			      ssadd, usadd, binary)
> > +
> 
> Is ECF_NOTHROW correct here? At least on most targets I believe the scalar version
> can set flags/throw exceptions if the saturation happens?
> 
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..0b0298df829 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> 
> Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> 
> The problem with doing it in match.pd is that it replaces the operations quite
> early the pipeline. Did I miss an email perhaps? The early replacement means we
> lose optimizations and things such as range calculations etc, since e.g. ranger
> doesn't know these internal functions.
> 
> I think Richi will want this in islet or mult widening but I'll continue with match.pd
> review just in case.
> 
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part_1 @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_1 @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part_2 @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> 
> Predicates can be overloaded, so these two can just be usadd_right_part which
> then...
> 
> > +
> > +/* Unsigned saturation add. Case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(simplify
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_1 @0 @1))
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> 
> 
> The optimize checks in the match.pd file are weird as it seems to check if we have
> optimizations enabled?
> 
> We don't typically need to do this.
> 
> > +(simplify
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_2 @0 @1))
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +
> 
> Allows you to collapse rules like these into one line. Similarly for below.
> 
> Note  that even when moving to gimple-isel you can reuse the match.pd code by
> Leveraging it to build the predicates for you and call them from another pass.
> See how ctz_table_index is used for example.
> 
> Doing this, moving it to gimple-isel.cc should be easy.
> 
> > +/* Unsigned saturation add. Case 2 (branch):
> > +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> > +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> > +(simplify
> > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +(simplify
> > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > +
> > +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_1 @0 @1))
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c
> > +  (usadd_left_part_1 @0 @1)
> > +  (usadd_right_part_2 @0 @1))
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > + (if (optimize)))
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > + (if (optimize)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> ...
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 87c2acff386..77924cf10f8 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
> >    return pattern_stmt;
> >  }
> >
> > +static gimple *
> > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> > +			 tree op_0, tree op_1)
> > +{
> > +  tree itype = TREE_TYPE (op_0);
> > +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> > +
> > +  if (vtype == NULL_TREE)
> > +    return NULL;
> > +
> > +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> > OPTIMIZE_FOR_SPEED))
> > +    return NULL;
> > +
> > +  *type_out = vtype;
> > +
> > +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> > +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> > +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> > +  gimple_set_location (call, gimple_location (last_stmt));
> > +
> > +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> > +
> > +  return call;
> > +}
> 
> The function has only one caller, you should just inline it into the pattern.
> 
> > +/*
> > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> > + *   _7 = _4 + _6;
> > + *   _8 = _4 > _7;
> > + *   _9 = (long unsigned int) _8;
> > + *   _10 = -_9;
> > + *   _12 = _7 | _10;
> > + *
> > + * And then simplied to
> > + *   _12 = .SAT_ADD (_4, _6);
> > + */
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +static gimple *
> > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> > +			    tree *type_out)
> > +{
> > +  gimple *last_stmt = stmt_vinfo->stmt;
> > +
> 
> STMT_VINFO_STMT (stmt_vinfo);
> 
> > +  if (!is_gimple_assign (last_stmt))
> > +    return NULL;
> > +
> > +  tree res_ops[2];
> > +  tree lhs = gimple_assign_lhs (last_stmt);
> 
> Once you inline vect_sat_add_build_call you can do the check for
> vtype here, which is the cheaper check so perform it early.
> 
> Otherwise this looks really good!
> 
> Thanks for working on it,
> 
> Tamar
> 
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> > +    {
> > +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> > +					      res_ops[0], res_ops[1]);
> > +      if (call)
> > +	return call;
> > +    }
> > +
> > +  return NULL;
> > +}
> > +
> >  /* Detect a signed division by a constant that wouldn't be
> >     otherwise vectorized:
> >
> > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> {
> >    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> >    { vect_recog_divmod_pattern, "divmod" },
> >    { vect_recog_mult_pattern, "mult" },
> > +  { vect_recog_sat_add_pattern, "sat_add" },
> >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> >    { vect_recog_gcond_pattern, "gcond" },
> >    { vect_recog_bool_pattern, "bool" },
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-05-02 10:57         ` Li, Pan2
@ 2024-05-02 12:57           ` Tamar Christina
  2024-05-03  1:45             ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Tamar Christina @ 2024-05-02 12:57 UTC (permalink / raw)
  To: Li, Pan2, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> > So he was responding for how to do it for the vectorizer and scalar parts.
> > Remember that the goal is not to introduce new gimple IL that can block other
> optimizations.
> > The vectorizer already introduces new IL (various IFN) but this is fine as we don't
> track things like ranges for
> > vector instructions.  So we don't loose any information here.
> 
> > Now for the scalar, if we do an early replacement like in match.pd we prevent a
> lot of other optimizations
> > because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> > expect many more optimizations to happen, so it's a safe spot to insert more IL
> with "unknown semantics".
> 
> > Was that your intention Richi?
> 
> Thanks Tamar for clear explanation, does that mean both the scalar and vector will
> go isel approach? If so I may
> misunderstand in previous that it is only for vectorize.

No, The isel would only be for the scalar, The vectorizer will still use the vect_pattern.
It needs to so we can cost the operation correctly, and in some cases depending on how
the saturation is described you are unable the vectorize.  The pattern allows us to catch
these cases and still vectorize.

But you should be able to use the same match.pd predicate for both the vectorizer pattern
and isel.

> 
> Understand the point that we would like to put the pattern match late but I may
> have a question here.
> Given SAT_ADD related pattern is sort of complicated, it is possible that the sub-
> expression of SAT_ADD is optimized
> In early pass by others and we can hardly catch the shapes later.
> 
> For example, there is a plus expression in SAT_ADD, and in early pass it may be
> optimized to .ADD_OVERFLOW, and
> then the pattern is quite different to aware of that in later pass.
> 

Yeah, it looks like this transformation is done in widening_mul, which is the other
place richi suggested to recognize SAT_ADD.  widening_mul already runs quite
late as well so it's also ok.

If you put it there before the code that transforms the sequence to overflow it
should work.

Eventually we do need to recognize this variant since:

uint64_t
add_sat(uint64_t x, uint64_t y) noexcept
{
    uint64_t z;
    if (!__builtin_add_overflow(x, y, &z))
	    return z;
    return -1u;
}

Is a valid and common way to do saturation too.

But for now, it's fine.

Cheers,
Tamar

> Sorry not sure if my understanding is correct, feel free to correct me.
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Thursday, May 2, 2024 11:26 AM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Thursday, May 2, 2024 4:11 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> > Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Thanks Tamar
> >
> > > Could you also split off the vectorizer change from scalar recog one? Typically I
> > would structure a change like this as:
> >
> > > 1. create types/structures + scalar recogn
> > > 2. Vector recog code
> > > 3. Backend changes
> >
> > Sure thing, will rearrange the patch like this.
> >
> > > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> > version
> > > can set flags/throw exceptions if the saturation happens?
> >
> > I see, will remove that.
> >
> > > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> >
> > > The problem with doing it in match.pd is that it replaces the operations quite
> > > early the pipeline. Did I miss an email perhaps? The early replacement means
> we
> > > lose optimizations and things such as range calculations etc, since e.g. ranger
> > > doesn't know these internal functions.
> >
> > > I think Richi will want this in islet or mult widening but I'll continue with
> match.pd
> > > review just in case.
> >
> > If I understand is correct, Richard suggested try vectorizer patterns first and then
> > possible isel.
> > Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well for
> > SAT_ADD.
> > Let's wait the confirmation from Richard. Below are the original words from
> > previous mail for reference.
> >
> 
> I think the comment he made was this
> 
> > > Given we have saturating integer alu like below, could you help to coach me the
> most reasonable way to represent
> > > It in scalar as well as vectorize part? Sorry not familiar with this part and still dig
> into how it works...
> >
> > As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> > the other cases.
> >
> > As I said, use vectorizer patterns and possibly do instruction
> > selection at ISEL/widen_mult time.
> 
> So he was responding for how to do it for the vectorizer and scalar parts.
> Remember that the goal is not to introduce new gimple IL that can block other
> optimizations.
> The vectorizer already introduces new IL (various IFN) but this is fine as we don't
> track things like ranges for
> vector instructions.  So we don't loose any information here.
> 
> Now for the scalar, if we do an early replacement like in match.pd we prevent a lot
> of other optimizations
> because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> expect many more optimizations to happen, so it's a safe spot to insert more IL
> with "unknown semantics".
> 
> Was that your intention Richi?
> 
> Thanks,
> Tamar
> 
> > >> As I said, use vectorizer patterns and possibly do instruction
> > >> selection at ISEL/widen_mult time.
> >
> > > The optimize checks in the match.pd file are weird as it seems to check if we
> have
> > > optimizations enabled?
> >
> > > We don't typically need to do this.
> >
> > Sure, will remove this.
> >
> > > The function has only one caller, you should just inline it into the pattern.
> >
> > Sure thing.
> >
> > > Once you inline vect_sat_add_build_call you can do the check for
> > > vtype here, which is the cheaper check so perform it early.
> >
> > Sure thing.
> >
> > Thanks again and will send the v4 with all comments addressed, as well as the
> test
> > results.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Thursday, May 2, 2024 1:06 AM
> > To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> > Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Hi,
> >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > Update in v3:
> > > * Rebase upstream for conflict.
> > >
> > > Update in v2:
> > > * Fix one failure for x86 bootstrap.
> > >
> > > Original log:
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > The patch also implement the SAT_ADD in the riscv backend as
> > > the sample for both the scalar and vector.  Given below example:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > > }
> > >
> > > For vectorize, we leverage the existing vect pattern recog to find
> > > the pattern similar to scalar and let the vectorizer to perform
> > > the rest part for standard name usadd<mode>3 in vector mode.
> > > The riscv vector backend have insn "Vector Single-Width Saturating
> > > Add and Subtract" which can be leveraged when expand the usadd<mode>3
> > > in vector mode.  For example:
> > >
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   unsigned i;
> > >
> > >   for (i = 0; i < n; i++)
> > >     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> > > }
> > >
> > > Before this patch:
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   ...
> > >   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
> > >   ivtmp_58 = _80 * 8;
> > >   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
> > >   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
> > >   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
> > >   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
> > >   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, {
> > 18446744073709551615,
> > > ... }, vect__7.11_66);
> > >   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
> vect__12.15_72);
> > >   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
> > >   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
> > >   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
> > >   ivtmp_79 = ivtmp_78 - _80;
> > >   ...
> > > }
> > >
> > > vec_sat_add_u64:
> > >   ...
> > >   vsetvli a5,a3,e64,m1,ta,ma
> > >   vle64.v v0,0(a1)
> > >   vle64.v v1,0(a2)
> > >   slli    a4,a5,3
> > >   sub     a3,a3,a5
> > >   add     a1,a1,a4
> > >   add     a2,a2,a4
> > >   vadd.vv v1,v0,v1
> > >   vmsgtu.vv       v0,v0,v1
> > >   vmerge.vim      v1,v1,-1,v0
> > >   vse64.v v1,0(a0)
> > >   ...
> > >
> > > After this patch:
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   ...
> > >   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
> > >   ivtmp_46 = _62 * 8;
> > >   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
> > >   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
> > >   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
> > >   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
> vect__12.11_54);
> > >   ...
> > > }
> > >
> > > vec_sat_add_u64:
> > >   ...
> > >   vsetvli a5,a3,e64,m1,ta,ma
> > >   vle64.v v1,0(a1)
> > >   vle64.v v2,0(a2)
> > >   slli    a4,a5,3
> > >   sub     a3,a3,a5
> > >   add     a1,a1,a4
> > >   add     a2,a2,a4
> > >   vsaddu.vv       v1,v1,v2
> > >   vse64.v v1,0(a0)
> > >   ...
> > >
> > > To limit the patch size for review, only unsigned version of
> > > usadd<mode>3 are involved here. The signed version will be covered
> > > in the underlying patch(es).
> > >
> > > The below test suites are passed for this patch.
> > > * The riscv fully regression tests.
> > > * The aarch64 fully regression tests.
> > > * The x86 bootstrap tests.
> > > * The x86 fully regression tests.
> > >
> > > 	PR target/51492
> > > 	PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> > > 	for unsigned SAT_ADD vector.
> > > 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> > > 	decl to expand usadd<mode>3 pattern.
> > > 	(expand_vec_usadd): Ditto but for vector.
> > > 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> > > 	emit the vsadd insn.
> > > 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> > > 	vector.
> > > 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> > > 	to expand usadd<mode>3 for scalar.
> > > 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> > > 	for unsigned SAT_ADD scalar.
> > > 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> > > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> > > 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> > > 	* match.pd: Add unsigned SAT_ADD match and simply.
> > > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > > 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> > > 	to build the IFN_SAT_ADD gimple call.
> > > 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> > > 	for unsigned SAT_ADD.
> > >
> >
> > Could you split the generic changes off from the RISCV changes? The RISCV
> > changes need to be reviewed by the backend maintainer.
> >
> > Could you also split off the vectorizer change from scalar recog one? Typically I
> > would structure a change like this as:
> >
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> >
> > Which makes review and bisect easier. I'll only focus on the generic bits.
> >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 2c764441cde..1104bb03b41 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_UBSAN_CHECK_MUL:
> > >      case IFN_ADD_OVERFLOW:
> > >      case IFN_MUL_OVERFLOW:
> > > +    case IFN_SAT_ADD:
> > >      case IFN_VEC_WIDEN_PLUS:
> > >      case IFN_VEC_WIDEN_PLUS_LO:
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index 848bb9dbff3..47326b7033c 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> > ECF_CONST
> > > | ECF_NOTHROW, first,
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > > first,
> > >  			      smulhrs, umulhrs, binary)
> > >
> > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST |
> ECF_NOTHROW,
> > > first,
> > > +			      ssadd, usadd, binary)
> > > +
> >
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> >
> > >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> > >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index d401e7503e6..0b0298df829 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         || POINTER_TYPE_P (itype))
> > >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> > >
> >
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> >
> > The problem with doing it in match.pd is that it replaces the operations quite
> > early the pipeline. Did I miss an email perhaps? The early replacement means we
> > lose optimizations and things such as range calculations etc, since e.g. ranger
> > doesn't know these internal functions.
> >
> > I think Richi will want this in islet or mult widening but I'll continue with match.pd
> > review just in case.
> >
> > > +/* Unsigned Saturation Add */
> > > +(match (usadd_left_part_1 @0 @1)
> > > + (plus:c @0 @1)
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (lt (plus:c @0 @1) @0)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_2 @0 @1)
> > > + (negate (convert (gt @0 (plus:c @0 @1))))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> >
> > Predicates can be overloaded, so these two can just be usadd_right_part which
> > then...
> >
> > > +
> > > +/* Unsigned saturation add. Case 1 (branchless):
> > > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > > +(simplify
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_1 @0 @1))
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> >
> >
> > The optimize checks in the match.pd file are weird as it seems to check if we have
> > optimizations enabled?
> >
> > We don't typically need to do this.
> >
> > > +(simplify
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_2 @0 @1))
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +
> >
> > Allows you to collapse rules like these into one line. Similarly for below.
> >
> > Note  that even when moving to gimple-isel you can reuse the match.pd code by
> > Leveraging it to build the predicates for you and call them from another pass.
> > See how ctz_table_index is used for example.
> >
> > Doing this, moving it to gimple-isel.cc should be easy.
> >
> > > +/* Unsigned saturation add. Case 2 (branch):
> > > +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> > > +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> > > +(simplify
> > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +(simplify
> > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +
> > > +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_1 @0 @1))
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_2 @0 @1))
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > > + (if (optimize)))
> > > +
> > >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> > >     x >  y  &&  x == XXX_MIN  -->  false . */
> > >  (for eqne (eq ne)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..3f2cb46aff8 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> > >  OPTAB_NX(add_optab, "add$Q$a3")
> > >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> > >  OPTAB_VX(addv_optab, "add$F$a3")
> > > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> > gen_int_fp_fixed_libfunc)
> > >  OPTAB_NX(sub_optab, "sub$F$a3")
> > >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > ...
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index 87c2acff386..77924cf10f8 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
> > >    return pattern_stmt;
> > >  }
> > >
> > > +static gimple *
> > > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> > > +			 tree op_0, tree op_1)
> > > +{
> > > +  tree itype = TREE_TYPE (op_0);
> > > +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> > > +
> > > +  if (vtype == NULL_TREE)
> > > +    return NULL;
> > > +
> > > +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> > > OPTIMIZE_FOR_SPEED))
> > > +    return NULL;
> > > +
> > > +  *type_out = vtype;
> > > +
> > > +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> > > +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> > > +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> > > +  gimple_set_location (call, gimple_location (last_stmt));
> > > +
> > > +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> > > +
> > > +  return call;
> > > +}
> >
> > The function has only one caller, you should just inline it into the pattern.
> >
> > > +/*
> > > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> > > + *   _7 = _4 + _6;
> > > + *   _8 = _4 > _7;
> > > + *   _9 = (long unsigned int) _8;
> > > + *   _10 = -_9;
> > > + *   _12 = _7 | _10;
> > > + *
> > > + * And then simplied to
> > > + *   _12 = .SAT_ADD (_4, _6);
> > > + */
> > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > > +
> > > +static gimple *
> > > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> > > +			    tree *type_out)
> > > +{
> > > +  gimple *last_stmt = stmt_vinfo->stmt;
> > > +
> >
> > STMT_VINFO_STMT (stmt_vinfo);
> >
> > > +  if (!is_gimple_assign (last_stmt))
> > > +    return NULL;
> > > +
> > > +  tree res_ops[2];
> > > +  tree lhs = gimple_assign_lhs (last_stmt);
> >
> > Once you inline vect_sat_add_build_call you can do the check for
> > vtype here, which is the cheaper check so perform it early.
> >
> > Otherwise this looks really good!
> >
> > Thanks for working on it,
> >
> > Tamar
> >
> > > +
> > > +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> > > +    {
> > > +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> > > +					      res_ops[0], res_ops[1]);
> > > +      if (call)
> > > +	return call;
> > > +    }
> > > +
> > > +  return NULL;
> > > +}
> > > +
> > >  /* Detect a signed division by a constant that wouldn't be
> > >     otherwise vectorized:
> > >
> > > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[]
> =
> > {
> > >    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> > >    { vect_recog_divmod_pattern, "divmod" },
> > >    { vect_recog_mult_pattern, "mult" },
> > > +  { vect_recog_sat_add_pattern, "sat_add" },
> > >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> > >    { vect_recog_gcond_pattern, "gcond" },
> > >    { vect_recog_bool_pattern, "bool" },
> > > --
> > > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
  2024-05-02 12:57           ` Tamar Christina
@ 2024-05-03  1:45             ` Li, Pan2
  0 siblings, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2024-05-03  1:45 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> No, The isel would only be for the scalar, The vectorizer will still use the vect_pattern.
> It needs to so we can cost the operation correctly, and in some cases depending on how
> the saturation is described you are unable the vectorize.  The pattern allows us to catch
> these cases and still vectorize.

> But you should be able to use the same match.pd predicate for both the vectorizer pattern
> and isel.

Thanks. Got it, will have a try isel for scalar and remove the simplify in match.md.

> Eventually we do need to recognize this variant since:
>
> uint64_t
> add_sat(uint64_t x, uint64_t y) noexcept
> {
>     uint64_t z;
>     if (!__builtin_add_overflow(x, y, &z))
> 	    return z;
>     return -1u;
> }
> 
> Is a valid and common way to do saturation too.

Sure thing, will cover this later.

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Thursday, May 2, 2024 8:58 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD

> > So he was responding for how to do it for the vectorizer and scalar parts.
> > Remember that the goal is not to introduce new gimple IL that can block other
> optimizations.
> > The vectorizer already introduces new IL (various IFN) but this is fine as we don't
> track things like ranges for
> > vector instructions.  So we don't loose any information here.
> 
> > Now for the scalar, if we do an early replacement like in match.pd we prevent a
> lot of other optimizations
> > because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> > expect many more optimizations to happen, so it's a safe spot to insert more IL
> with "unknown semantics".
> 
> > Was that your intention Richi?
> 
> Thanks Tamar for clear explanation, does that mean both the scalar and vector will
> go isel approach? If so I may
> misunderstand in previous that it is only for vectorize.

No, The isel would only be for the scalar, The vectorizer will still use the vect_pattern.
It needs to so we can cost the operation correctly, and in some cases depending on how
the saturation is described you are unable the vectorize.  The pattern allows us to catch
these cases and still vectorize.

But you should be able to use the same match.pd predicate for both the vectorizer pattern
and isel.

> 
> Understand the point that we would like to put the pattern match late but I may
> have a question here.
> Given SAT_ADD related pattern is sort of complicated, it is possible that the sub-
> expression of SAT_ADD is optimized
> In early pass by others and we can hardly catch the shapes later.
> 
> For example, there is a plus expression in SAT_ADD, and in early pass it may be
> optimized to .ADD_OVERFLOW, and
> then the pattern is quite different to aware of that in later pass.
> 

Yeah, it looks like this transformation is done in widening_mul, which is the other
place richi suggested to recognize SAT_ADD.  widening_mul already runs quite
late as well so it's also ok.

If you put it there before the code that transforms the sequence to overflow it
should work.

Eventually we do need to recognize this variant since:

uint64_t
add_sat(uint64_t x, uint64_t y) noexcept
{
    uint64_t z;
    if (!__builtin_add_overflow(x, y, &z))
	    return z;
    return -1u;
}

Is a valid and common way to do saturation too.

But for now, it's fine.

Cheers,
Tamar

> Sorry not sure if my understanding is correct, feel free to correct me.
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Thursday, May 2, 2024 11:26 AM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> 
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Thursday, May 2, 2024 4:11 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> > Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Thanks Tamar
> >
> > > Could you also split off the vectorizer change from scalar recog one? Typically I
> > would structure a change like this as:
> >
> > > 1. create types/structures + scalar recogn
> > > 2. Vector recog code
> > > 3. Backend changes
> >
> > Sure thing, will rearrange the patch like this.
> >
> > > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> > version
> > > can set flags/throw exceptions if the saturation happens?
> >
> > I see, will remove that.
> >
> > > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> >
> > > The problem with doing it in match.pd is that it replaces the operations quite
> > > early the pipeline. Did I miss an email perhaps? The early replacement means
> we
> > > lose optimizations and things such as range calculations etc, since e.g. ranger
> > > doesn't know these internal functions.
> >
> > > I think Richi will want this in islet or mult widening but I'll continue with
> match.pd
> > > review just in case.
> >
> > If I understand is correct, Richard suggested try vectorizer patterns first and then
> > possible isel.
> > Thus, I don't have a try for SAT_ADD in ISEL as vectorizer patterns works well for
> > SAT_ADD.
> > Let's wait the confirmation from Richard. Below are the original words from
> > previous mail for reference.
> >
> 
> I think the comment he made was this
> 
> > > Given we have saturating integer alu like below, could you help to coach me the
> most reasonable way to represent
> > > It in scalar as well as vectorize part? Sorry not familiar with this part and still dig
> into how it works...
> >
> > As in your v2, .SAT_ADD for both sat_uadd and sat_sadd, similar for
> > the other cases.
> >
> > As I said, use vectorizer patterns and possibly do instruction
> > selection at ISEL/widen_mult time.
> 
> So he was responding for how to do it for the vectorizer and scalar parts.
> Remember that the goal is not to introduce new gimple IL that can block other
> optimizations.
> The vectorizer already introduces new IL (various IFN) but this is fine as we don't
> track things like ranges for
> vector instructions.  So we don't loose any information here.
> 
> Now for the scalar, if we do an early replacement like in match.pd we prevent a lot
> of other optimizations
> because they don't know what IFN_SAT_ADD does. gimple-isel runs pretty late,
> and so at this point we don't
> expect many more optimizations to happen, so it's a safe spot to insert more IL
> with "unknown semantics".
> 
> Was that your intention Richi?
> 
> Thanks,
> Tamar
> 
> > >> As I said, use vectorizer patterns and possibly do instruction
> > >> selection at ISEL/widen_mult time.
> >
> > > The optimize checks in the match.pd file are weird as it seems to check if we
> have
> > > optimizations enabled?
> >
> > > We don't typically need to do this.
> >
> > Sure, will remove this.
> >
> > > The function has only one caller, you should just inline it into the pattern.
> >
> > Sure thing.
> >
> > > Once you inline vect_sat_add_build_call you can do the check for
> > > vtype here, which is the cheaper check so perform it early.
> >
> > Sure thing.
> >
> > Thanks again and will send the v4 with all comments addressed, as well as the
> test
> > results.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Tamar Christina <Tamar.Christina@arm.com>
> > Sent: Thursday, May 2, 2024 1:06 AM
> > To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> > Liu, Hongtao <hongtao.liu@intel.com>
> > Subject: RE: [PATCH v3] Internal-fn: Introduce new internal function SAT_ADD
> >
> > Hi,
> >
> > > From: Pan Li <pan2.li@intel.com>
> > >
> > > Update in v3:
> > > * Rebase upstream for conflict.
> > >
> > > Update in v2:
> > > * Fix one failure for x86 bootstrap.
> > >
> > > Original log:
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > The patch also implement the SAT_ADD in the riscv backend as
> > > the sample for both the scalar and vector.  Given below example:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;    succ:       EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;    pred:       ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;    succ:       EXIT
> > > }
> > >
> > > For vectorize, we leverage the existing vect pattern recog to find
> > > the pattern similar to scalar and let the vectorizer to perform
> > > the rest part for standard name usadd<mode>3 in vector mode.
> > > The riscv vector backend have insn "Vector Single-Width Saturating
> > > Add and Subtract" which can be leveraged when expand the usadd<mode>3
> > > in vector mode.  For example:
> > >
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   unsigned i;
> > >
> > >   for (i = 0; i < n; i++)
> > >     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> > > }
> > >
> > > Before this patch:
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   ...
> > >   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
> > >   ivtmp_58 = _80 * 8;
> > >   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
> > >   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
> > >   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
> > >   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
> > >   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, {
> > 18446744073709551615,
> > > ... }, vect__7.11_66);
> > >   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
> vect__12.15_72);
> > >   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
> > >   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
> > >   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
> > >   ivtmp_79 = ivtmp_78 - _80;
> > >   ...
> > > }
> > >
> > > vec_sat_add_u64:
> > >   ...
> > >   vsetvli a5,a3,e64,m1,ta,ma
> > >   vle64.v v0,0(a1)
> > >   vle64.v v1,0(a2)
> > >   slli    a4,a5,3
> > >   sub     a3,a3,a5
> > >   add     a1,a1,a4
> > >   add     a2,a2,a4
> > >   vadd.vv v1,v0,v1
> > >   vmsgtu.vv       v0,v0,v1
> > >   vmerge.vim      v1,v1,-1,v0
> > >   vse64.v v1,0(a0)
> > >   ...
> > >
> > > After this patch:
> > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> > > {
> > >   ...
> > >   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
> > >   ivtmp_46 = _62 * 8;
> > >   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
> > >   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
> > >   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
> > >   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
> vect__12.11_54);
> > >   ...
> > > }
> > >
> > > vec_sat_add_u64:
> > >   ...
> > >   vsetvli a5,a3,e64,m1,ta,ma
> > >   vle64.v v1,0(a1)
> > >   vle64.v v2,0(a2)
> > >   slli    a4,a5,3
> > >   sub     a3,a3,a5
> > >   add     a1,a1,a4
> > >   add     a2,a2,a4
> > >   vsaddu.vv       v1,v1,v2
> > >   vse64.v v1,0(a0)
> > >   ...
> > >
> > > To limit the patch size for review, only unsigned version of
> > > usadd<mode>3 are involved here. The signed version will be covered
> > > in the underlying patch(es).
> > >
> > > The below test suites are passed for this patch.
> > > * The riscv fully regression tests.
> > > * The aarch64 fully regression tests.
> > > * The x86 bootstrap tests.
> > > * The x86 fully regression tests.
> > >
> > > 	PR target/51492
> > > 	PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > > 	* config/riscv/autovec.md (usadd<mode>3): New pattern expand
> > > 	for unsigned SAT_ADD vector.
> > > 	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func
> > > 	decl to expand usadd<mode>3 pattern.
> > > 	(expand_vec_usadd): Ditto but for vector.
> > > 	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to
> > > 	emit the vsadd insn.
> > > 	(expand_vec_usadd): New func impl to expand usadd<mode>3 for
> > > 	vector.
> > > 	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl
> > > 	to expand usadd<mode>3 for scalar.
> > > 	* config/riscv/riscv.md (usadd<mode>3): New pattern expand
> > > 	for unsigned SAT_ADD scalar.
> > > 	* config/riscv/vector.md: Allow VLS mode for vsaddu.
> > > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD.
> > > 	* internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD.
> > > 	* match.pd: Add unsigned SAT_ADD match and simply.
> > > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > > 	* tree-vect-patterns.cc (vect_sat_add_build_call): New func impl
> > > 	to build the IFN_SAT_ADD gimple call.
> > > 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> > > 	for unsigned SAT_ADD.
> > >
> >
> > Could you split the generic changes off from the RISCV changes? The RISCV
> > changes need to be reviewed by the backend maintainer.
> >
> > Could you also split off the vectorizer change from scalar recog one? Typically I
> > would structure a change like this as:
> >
> > 1. create types/structures + scalar recogn
> > 2. Vector recog code
> > 3. Backend changes
> >
> > Which makes review and bisect easier. I'll only focus on the generic bits.
> >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 2c764441cde..1104bb03b41 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4200,6 +4200,7 @@ commutative_binary_fn_p (internal_fn fn)
> > >      case IFN_UBSAN_CHECK_MUL:
> > >      case IFN_ADD_OVERFLOW:
> > >      case IFN_MUL_OVERFLOW:
> > > +    case IFN_SAT_ADD:
> > >      case IFN_VEC_WIDEN_PLUS:
> > >      case IFN_VEC_WIDEN_PLUS_LO:
> > >      case IFN_VEC_WIDEN_PLUS_HI:
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index 848bb9dbff3..47326b7033c 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,9 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> > ECF_CONST
> > > | ECF_NOTHROW, first,
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > > first,
> > >  			      smulhrs, umulhrs, binary)
> > >
> > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST |
> ECF_NOTHROW,
> > > first,
> > > +			      ssadd, usadd, binary)
> > > +
> >
> > Is ECF_NOTHROW correct here? At least on most targets I believe the scalar
> version
> > can set flags/throw exceptions if the saturation happens?
> >
> > >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> > >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index d401e7503e6..0b0298df829 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3043,6 +3043,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >         || POINTER_TYPE_P (itype))
> > >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> > >
> >
> > Hmm I believe Richi mentioned that he wanted the recognition done in isel?
> >
> > The problem with doing it in match.pd is that it replaces the operations quite
> > early the pipeline. Did I miss an email perhaps? The early replacement means we
> > lose optimizations and things such as range calculations etc, since e.g. ranger
> > doesn't know these internal functions.
> >
> > I think Richi will want this in islet or mult widening but I'll continue with match.pd
> > review just in case.
> >
> > > +/* Unsigned Saturation Add */
> > > +(match (usadd_left_part_1 @0 @1)
> > > + (plus:c @0 @1)
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_1 @0 @1)
> > > + (negate (convert (lt (plus:c @0 @1) @0)))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> > > +
> > > +(match (usadd_right_part_2 @0 @1)
> > > + (negate (convert (gt @0 (plus:c @0 @1))))
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@0))
> > > +      && types_match (type, TREE_TYPE (@1)))))
> >
> > Predicates can be overloaded, so these two can just be usadd_right_part which
> > then...
> >
> > > +
> > > +/* Unsigned saturation add. Case 1 (branchless):
> > > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > > +(simplify
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_1 @0 @1))
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> >
> >
> > The optimize checks in the match.pd file are weird as it seems to check if we have
> > optimizations enabled?
> >
> > We don't typically need to do this.
> >
> > > +(simplify
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_2 @0 @1))
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +
> >
> > Allows you to collapse rules like these into one line. Similarly for below.
> >
> > Note  that even when moving to gimple-isel you can reuse the match.pd code by
> > Leveraging it to build the predicates for you and call them from another pass.
> > See how ctz_table_index is used for example.
> >
> > Doing this, moving it to gimple-isel.cc should be easy.
> >
> > > +/* Unsigned saturation add. Case 2 (branch):
> > > +   SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1 or
> > > +   SAT_U_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> > > +(simplify
> > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +(simplify
> > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > > + (if (optimize) (IFN_SAT_ADD @0 @1)))
> > > +
> > > +/* Vect recog pattern will leverage unsigned_integer_sat_add.  */
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_1 @0 @1))
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (bit_ior:c
> > > +  (usadd_left_part_1 @0 @1)
> > > +  (usadd_right_part_2 @0 @1))
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (cond (ge (usadd_left_part_1@2 @0 @1) @0) @2 integer_minus_onep)
> > > + (if (optimize)))
> > > +(match (unsigned_integer_sat_add @0 @1)
> > > + (cond (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)
> > > + (if (optimize)))
> > > +
> > >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> > >     x >  y  &&  x == XXX_MIN  -->  false . */
> > >  (for eqne (eq ne)
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index ad14f9328b9..3f2cb46aff8 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> > >  OPTAB_NX(add_optab, "add$Q$a3")
> > >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> > >  OPTAB_VX(addv_optab, "add$F$a3")
> > > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > > gen_signed_fixed_libfunc)
> > > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > > gen_unsigned_fixed_libfunc)
> > >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> > gen_int_fp_fixed_libfunc)
> > >  OPTAB_NX(sub_optab, "sub$F$a3")
> > >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > ...
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index 87c2acff386..77924cf10f8 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -4487,6 +4487,67 @@ vect_recog_mult_pattern (vec_info *vinfo,
> > >    return pattern_stmt;
> > >  }
> > >
> > > +static gimple *
> > > +vect_sat_add_build_call (vec_info *vinfo, gimple *last_stmt, tree *type_out,
> > > +			 tree op_0, tree op_1)
> > > +{
> > > +  tree itype = TREE_TYPE (op_0);
> > > +  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> > > +
> > > +  if (vtype == NULL_TREE)
> > > +    return NULL;
> > > +
> > > +  if (!direct_internal_fn_supported_p (IFN_SAT_ADD, vtype,
> > > OPTIMIZE_FOR_SPEED))
> > > +    return NULL;
> > > +
> > > +  *type_out = vtype;
> > > +
> > > +  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, op_0, op_1);
> > > +  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> > > +  gimple_call_set_nothrow (call, /* nothrow_p */ true);
> > > +  gimple_set_location (call, gimple_location (last_stmt));
> > > +
> > > +  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> > > +
> > > +  return call;
> > > +}
> >
> > The function has only one caller, you should just inline it into the pattern.
> >
> > > +/*
> > > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> > > + *   _7 = _4 + _6;
> > > + *   _8 = _4 > _7;
> > > + *   _9 = (long unsigned int) _8;
> > > + *   _10 = -_9;
> > > + *   _12 = _7 | _10;
> > > + *
> > > + * And then simplied to
> > > + *   _12 = .SAT_ADD (_4, _6);
> > > + */
> > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > > +
> > > +static gimple *
> > > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> > > +			    tree *type_out)
> > > +{
> > > +  gimple *last_stmt = stmt_vinfo->stmt;
> > > +
> >
> > STMT_VINFO_STMT (stmt_vinfo);
> >
> > > +  if (!is_gimple_assign (last_stmt))
> > > +    return NULL;
> > > +
> > > +  tree res_ops[2];
> > > +  tree lhs = gimple_assign_lhs (last_stmt);
> >
> > Once you inline vect_sat_add_build_call you can do the check for
> > vtype here, which is the cheaper check so perform it early.
> >
> > Otherwise this looks really good!
> >
> > Thanks for working on it,
> >
> > Tamar
> >
> > > +
> > > +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> > > +    {
> > > +      gimple *call = vect_sat_add_build_call (vinfo, last_stmt, type_out,
> > > +					      res_ops[0], res_ops[1]);
> > > +      if (call)
> > > +	return call;
> > > +    }
> > > +
> > > +  return NULL;
> > > +}
> > > +
> > >  /* Detect a signed division by a constant that wouldn't be
> > >     otherwise vectorized:
> > >
> > > @@ -6987,6 +7048,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[]
> =
> > {
> > >    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
> > >    { vect_recog_divmod_pattern, "divmod" },
> > >    { vect_recog_mult_pattern, "mult" },
> > > +  { vect_recog_sat_add_pattern, "sat_add" },
> > >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> > >    { vect_recog_gcond_pattern, "gcond" },
> > >    { vect_recog_bool_pattern, "bool" },
> > > --
> > > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
  2024-04-07  7:03 ` [PATCH v2] " pan2.li
  2024-04-29  7:53 ` [PATCH v3] " pan2.li
@ 2024-05-06 14:48 ` pan2.li
  2024-05-13  9:09   ` Tamar Christina
  2024-05-14 13:18   ` Richard Biener
  2024-05-06 14:49 ` [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int pan2.li
  2024-05-06 14:50 ` [PATCH v4 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li
  4 siblings, 2 replies; 21+ messages in thread
From: pan2.li @ 2024-05-06 14:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch would like to add the middle-end presentation for the
saturation add.  Aka set the result of add to the max when overflow.
It will take the pattern similar as below.

SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))

Take uint8_t as example, we will have:

* SAT_ADD (1, 254)   => 255.
* SAT_ADD (1, 255)   => 255.
* SAT_ADD (2, 255)   => 255.
* SAT_ADD (255, 255) => 255.

Given below example for the unsigned scalar integer uint64_t:

uint64_t sat_add_u64 (uint64_t x, uint64_t y)
{
  return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
}

Before this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  _Bool _2;
  long unsigned int _3;
  long unsigned int _4;
  uint64_t _7;
  long unsigned int _10;
  __complex__ long unsigned int _11;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
  _1 = REALPART_EXPR <_11>;
  _10 = IMAGPART_EXPR <_11>;
  _2 = _10 != 0;
  _3 = (long unsigned int) _2;
  _4 = -_3;
  _7 = _1 | _4;
  return _7;
;;    succ:       EXIT

}

After this patch:
uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _7;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  return _7;
;;    succ:       EXIT
}

We perform the tranform during widen_mult because that the sub-expr of
SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
.SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
cannot perform the .SAT_ADD pattern match as the sub-expr will be
optmized to .ADD_OVERFLOW first.

The below tests are passed for this patch:
1. The riscv fully regression tests.
2. The aarch64 fully regression tests.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
	to the return true switch case(s).
	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
	* match.pd: Add unsigned SAT_ADD match.
	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
	func decl generated in match.pd match.
	(match_saturation_arith): New func impl to match the saturation arith.
	(math_opts_dom_walker::after_dom_children): Try match saturation
	arith.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/internal-fn.cc        |  1 +
 gcc/internal-fn.def       |  2 ++
 gcc/match.pd              | 28 ++++++++++++++++++++++++
 gcc/optabs.def            |  4 ++--
 gcc/tree-ssa-math-opts.cc | 46 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0a7053c2286..73045ca8c8c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
     case IFN_UBSAN_CHECK_MUL:
     case IFN_ADD_OVERFLOW:
     case IFN_MUL_OVERFLOW:
+    case IFN_SAT_ADD:
     case IFN_VEC_WIDEN_PLUS:
     case IFN_VEC_WIDEN_PLUS_LO:
     case IFN_VEC_WIDEN_PLUS_HI:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 848bb9dbff3..25badbb86e5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
 			      smulhrs, umulhrs, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
+
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
 DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e6..7058e4cbe29 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        || POINTER_TYPE_P (itype))
       && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
 
+/* Unsigned Saturation Add */
+(match (usadd_left_part @0 @1)
+ (plus:c @0 @1)
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part @0 @1)
+ (negate (convert (lt (plus:c @0 @1) @0)))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+(match (usadd_right_part @0 @1)
+ (negate (convert (gt @0 (plus:c @0 @1))))
+ (if (INTEGRAL_TYPE_P (type)
+      && TYPE_UNSIGNED (TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@0))
+      && types_match (type, TREE_TYPE (@1)))))
+
+/* Unsigned saturation add, case 1 (branchless):
+   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
+   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
+(match (unsigned_integer_sat_add @0 @1)
+ (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
    x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..3f2cb46aff8 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
 OPTAB_NX(add_optab, "add$Q$a3")
 OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
 OPTAB_VX(addv_optab, "add$F$a3")
-OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
-OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
+OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
+OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
 OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 705f4a4695a..35a46edc9f6 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt,
   return 0;
 }
 
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/*
+ * Try to match saturation arith pattern(s).
+ *   1. SAT_ADD (unsigned)
+ *      _7 = _4 + _6;
+ *      _8 = _4 > _7;
+ *      _9 = (long unsigned int) _8;
+ *      _10 = -_9;
+ *      _12 = _7 | _10;
+ *      =>
+ *      _12 = .SAT_ADD (_4, _6);  */
+static bool
+match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
+			bool *cfg_changed_p)
+{
+  gcall *call = NULL;
+  bool changed_p = false;
+
+  gcc_assert (is_gimple_assign (stmt));
+
+  tree ops[2];
+  tree lhs = gimple_assign_lhs (stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
+      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
+					OPTIMIZE_FOR_SPEED))
+    {
+      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
+      gimple_call_set_lhs (call, lhs);
+      gsi_replace (gsi, call, true);
+      changed_p = true;
+      *cfg_changed_p = changed_p;
+    }
+
+  return changed_p;
+}
+
 /* Recognize for unsigned x
    x = y - z;
    if (x > y)
@@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
 
   fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
 
+  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      if (is_gimple_assign (stmt))
+	match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
+    }
+
   for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
     {
       gimple *stmt = gsi_stmt (gsi);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
  2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
                   ` (2 preceding siblings ...)
  2024-05-06 14:48 ` [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
@ 2024-05-06 14:49 ` pan2.li
  2024-05-13  9:10   ` Tamar Christina
  2024-05-14 13:21   ` Richard Biener
  2024-05-06 14:50 ` [PATCH v4 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li
  4 siblings, 2 replies; 21+ messages in thread
From: pan2.li @ 2024-05-06 14:49 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch depends on below scalar enabling patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html

For vectorize, we leverage the existing vect pattern recog to find
the pattern similar to scalar and let the vectorizer to perform
the rest part for standard name usadd<mode>3 in vector mode.
The riscv vector backend have insn "Vector Single-Width Saturating
Add and Subtract" which can be leveraged when expand the usadd<mode>3
in vector mode.  For example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
  ivtmp_58 = _80 * 8;
  vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
  vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
  vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
  mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
  vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
  .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
  vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
  vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
  vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
  ivtmp_79 = ivtmp_78 - _80;
  ...
}

After this patch:
void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
  ivtmp_46 = _62 * 8;
  vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
  vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
  vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
  .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
  ...
}

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func
	decl generated by match.pd match.
	(vect_recog_sat_add_pattern): New func impl to recog the pattern
	for unsigned SAT_ADD.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-patterns.cc | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 87c2acff386..8ffcaf71d5c 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo,
   return pattern_stmt;
 }
 
+extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+
+/*
+ * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
+ *   _7 = _4 + _6;
+ *   _8 = _4 > _7;
+ *   _9 = (long unsigned int) _8;
+ *   _10 = -_9;
+ *   _12 = _7 | _10;
+ *
+ * And then simplied to
+ *   _12 = .SAT_ADD (_4, _6);
+ */
+
+static gimple *
+vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+			    tree *type_out)
+{
+  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
+
+  if (!is_gimple_assign (last_stmt))
+    return NULL;
+
+  tree res_ops[2];
+  tree lhs = gimple_assign_lhs (last_stmt);
+
+  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+    {
+      tree itype = TREE_TYPE (res_ops[0]);
+      tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+      if (vtype != NULL_TREE && direct_internal_fn_supported_p (
+	IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
+	{
+	  *type_out = vtype;
+	  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
+						    res_ops[1]);
+
+	  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+	  gimple_call_set_nothrow (call, /* nothrow_p */ false);
+	  gimple_set_location (call, gimple_location (last_stmt));
+
+	  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
+	  return call;
+	}
+    }
+
+  return NULL;
+}
+
 /* Detect a signed division by a constant that wouldn't be
    otherwise vectorized:
 
@@ -6987,6 +7037,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
   { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
   { vect_recog_divmod_pattern, "divmod" },
   { vect_recog_mult_pattern, "mult" },
+  { vect_recog_sat_add_pattern, "sat_add" },
   { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
   { vect_recog_gcond_pattern, "gcond" },
   { vect_recog_bool_pattern, "bool" },
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector
  2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
                   ` (3 preceding siblings ...)
  2024-05-06 14:49 ` [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int pan2.li
@ 2024-05-06 14:50 ` pan2.li
  4 siblings, 0 replies; 21+ messages in thread
From: pan2.li @ 2024-05-06 14:50 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, tamar.christina, richard.guenther,
	hongtao.liu, Pan Li

From: Pan Li <pan2.li@intel.com>

This patch depends on below middle-end enabling patches for scalar and vector.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650823.html

The patch also implement the SAT_ADD in the riscv backend as
the sample for both the scalar and vector.  Given below vector
as example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
    out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv       v0,v0,v1
  vmerge.vim      v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  slli    a4,a5,3
  sub     a3,a3,a5
  add     a1,a1,a4
  add     a2,a2,a4
  vsaddu.vv       v1,v1,v2  <=  Vector Single-Width Saturating Add
  vse64.v v1,0(a0)
  ...

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

	PR target/51492
	PR target/112600

gcc/ChangeLog:

	* config/riscv/autovec.md (usadd<mode>3): New pattern expand for
	the unsigned SAT_ADD in vector mode.
	* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
	to expand usadd<mode>3 pattern.
	(expand_vec_usadd): Ditto but for vector.
	* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
	the vsadd insn.
	(expand_vec_usadd): New func impl to expand usadd<mode>3 for vector.
	* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
	expand usadd<mode>3 for scalar.
	* config/riscv/riscv.md (usadd<mode>3): New pattern expand for
	the unsigned SAT_ADD in scalar mode.
	* config/riscv/vector.md: Allow VLS mode for vsaddu.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
	* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
	* gcc.target/riscv/sat_arith.h: New test.
	* gcc.target/riscv/sat_u_add-1.c: New test.
	* gcc.target/riscv/sat_u_add-2.c: New test.
	* gcc.target/riscv/sat_u_add-3.c: New test.
	* gcc.target/riscv/sat_u_add-4.c: New test.
	* gcc.target/riscv/sat_u_add-run-1.c: New test.
	* gcc.target/riscv/sat_u_add-run-2.c: New test.
	* gcc.target/riscv/sat_u_add-run-3.c: New test.
	* gcc.target/riscv/sat_u_add-run-4.c: New test.
	* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/autovec.md                   | 17 +++++
 gcc/config/riscv/riscv-protos.h               |  2 +
 gcc/config/riscv/riscv-v.cc                   | 16 ++++
 gcc/config/riscv/riscv.cc                     | 47 ++++++++++++
 gcc/config/riscv/riscv.md                     | 11 +++
 gcc/config/riscv/vector.md                    | 12 +--
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 ++++++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 19 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 20 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 20 +++++
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 20 +++++
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 +++++++++++++++++++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 +++++++++++++++++++
 gcc/testsuite/gcc.target/riscv/sat_arith.h    | 31 ++++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 19 +++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 21 ++++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 18 +++++
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 17 +++++
 .../gcc.target/riscv/sat_u_add-run-1.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-2.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-3.c        | 25 +++++++
 .../gcc.target/riscv/sat_u_add-run-4.c        | 25 +++++++
 .../gcc.target/riscv/scalar_sat_binary.h      | 27 +++++++
 25 files changed, 744 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_arith.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..7ceeb8d64f6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,20 @@ (define_expand "rawmemchr<ANYI:mode>"
     DONE;
   }
 )
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Saturation ALU.
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - add
+;; -------------------------------------------------------------------------
+(define_expand "usadd<mode>3"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")
+   (match_operand:V_VLSI 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+    riscv_vector::expand_vec_usadd (operands[0], operands[1], operands[2], <MODE>mode);
+    DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e5aebf3fc3d..0d95ecb6508 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -133,6 +133,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_usadd (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0);
@@ -621,6 +622,7 @@ void expand_vec_lrint (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lround (rtx, rtx, machine_mode, machine_mode, machine_mode);
 void expand_vec_lceil (rtx, rtx, machine_mode, machine_mode);
 void expand_vec_lfloor (rtx, rtx, machine_mode, machine_mode);
+void expand_vec_usadd (rtx, rtx, rtx, machine_mode);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx), enum avl_type);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index c9e0feebca6..c34111f89b8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4635,6 +4635,16 @@ emit_vec_cvt_x_f_rtz (rtx op_dest, rtx op_src, rtx mask,
     }
 }
 
+static void
+emit_vec_saddu (rtx op_dest, rtx op_1, rtx op_2, insn_type type,
+		machine_mode vec_mode)
+{
+  rtx ops[] = {op_dest, op_1, op_2};
+  insn_code icode = code_for_pred (US_PLUS, vec_mode);
+
+  emit_vlmax_insn (icode, type, ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 		 machine_mode vec_int_mode)
@@ -4862,6 +4872,12 @@ expand_vec_lfloor (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 				vec_int_mode);
 }
 
+void
+expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
+{
+  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
+}
+
 /* Vectorize popcount by the Wilkes-Wheeler-Gill algorithm that libgcc uses as
    well.  */
 void
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 44945d47fd6..4445d77d5a9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11127,6 +11127,53 @@ riscv_get_raw_result_mode (int regno)
   return default_get_reg_raw_mode (regno);
 }
 
+/* Implements the unsigned saturation add standard name usadd for int mode.  */
+
+void
+riscv_expand_usadd (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx xmode_sum = gen_reg_rtx (Xmode);
+  rtx xmode_lt = gen_reg_rtx (Xmode);
+  rtx xmode_x = gen_lowpart (Xmode, x);
+  rtx xmode_y = gen_lowpart (Xmode, y);
+  rtx xmode_dest = gen_reg_rtx (Xmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Xmode)
+    { /* Take addw to avoid the sum truncate.  */
+      rtx simode_sum = gen_reg_rtx (SImode);
+      riscv_emit_binary (PLUS, simode_sum, x, y);
+      emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
+    }
+  else
+    riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  */
+  if (mode == HImode || mode == QImode)
+    {
+      int shift_bits = GET_MODE_BITSIZE (Xmode)
+	- GET_MODE_BITSIZE (mode).to_constant ();
+
+      gcc_assert (shift_bits > 0);
+
+      riscv_emit_binary (ASHIFT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+      riscv_emit_binary (LSHIFTRT, xmode_sum, xmode_sum, GEN_INT (shift_bits));
+    }
+
+  /* Step-2: lt = sum < x  */
+  riscv_emit_binary (LTU, xmode_lt, xmode_sum, xmode_x);
+
+  /* Step-3: lt = -lt  */
+  riscv_emit_unary (NEG, xmode_lt, xmode_lt);
+
+  /* Step-4: xmode_dest = sum | lt  */
+  riscv_emit_binary (IOR, xmode_dest, xmode_lt, xmode_sum);
+
+  /* Step-5: dest = xmode_dest */
+  emit_move_insn (dest, gen_lowpart (mode, xmode_dest));
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index d4676507b45..b8639e9bc15 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4006,6 +4006,17 @@ (define_insn "*large_load_address"
   [(set_attr "type" "load")
    (set (attr "length") (const_int 8))])
 
+(define_expand "usadd<mode>3"
+  [(match_operand:ANYI 0 "register_operand")
+   (match_operand:ANYI 1 "register_operand")
+   (match_operand:ANYI 2 "register_operand")]
+  ""
+  {
+    riscv_expand_usadd (operands[0], operands[1], operands[2]);
+    DONE;
+  }
+)
+
 (include "bitmanip.md")
 (include "crypto.md")
 (include "sync.md")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 228d0f9a766..f8ed61f4a13 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4062,8 +4062,8 @@ (define_insn "@pred_trunc<mode>"
 
 ;; Saturating Add and Subtract
 (define_insn "@pred_<optab><mode>"
-  [(set (match_operand:VI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
-	(if_then_else:VI
+  [(set (match_operand:V_VLSI 0 "register_operand"           "=vd, vd, vr, vr, vd, vd, vr, vr")
+	(if_then_else:V_VLSI
 	  (unspec:<VM>
 	    [(match_operand:<VM> 1 "vector_mask_operand" " vm, vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
 	     (match_operand 5 "vector_length_operand"    " rK, rK, rK, rK, rK, rK, rK, rK")
@@ -4072,10 +4072,10 @@ (define_insn "@pred_<optab><mode>"
 	     (match_operand 8 "const_int_operand"        "  i,  i,  i,  i,  i,  i,  i,  i")
 	     (reg:SI VL_REGNUM)
 	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-	  (any_sat_int_binop:VI
-	    (match_operand:VI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
-	    (match_operand:VI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
-	  (match_operand:VI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
+	  (any_sat_int_binop:V_VLSI
+	    (match_operand:V_VLSI 3 "<binop_rhs1_predicate>" " vr, vr, vr, vr, vr, vr, vr, vr")
+	    (match_operand:V_VLSI 4 "<binop_rhs2_predicate>" "<binop_rhs2_constraint>"))
+	  (match_operand:V_VLSI 2 "vector_merge_operand"     " vu,  0, vu,  0, vu,  0, vu,  0")))]
   "TARGET_VECTOR"
   "@
    v<insn>.vv\t%0,%3,%4%p1
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
new file mode 100644
index 00000000000..0976ae97830
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
@@ -0,0 +1,33 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY
+#define HAVE_DEFINED_VEC_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. defint N as the test array size, for example 16.
+   3. define RUN_VEC_SAT_BINARY as run function.
+   4. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i, k;
+  T out[N];
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      T *op_1 = test_data[i][0];
+      T *op_2 = test_data[i][1];
+      T *expect = test_data[i][2];
+
+      RUN_VEC_SAT_BINARY (T, out, op_1, op_2, N);
+
+      for (k = 0; k < N; k++)
+	if (out[k] != expect[k])
+	  __builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
new file mode 100644
index 00000000000..dbbfa00afe2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint8_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e8,\s*m1,\s*ta,\s*ma
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle8\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
new file mode 100644
index 00000000000..1253fdb5f60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
new file mode 100644
index 00000000000..74bba9cadd1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint32_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
+** ...
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
new file mode 100644
index 00000000000..f3692b4cc25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint64_t_fmt_1:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
+** ...
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_1(uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
new file mode 100644
index 00000000000..1dcb333f687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint8_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+    {
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+    },
+  },
+  {
+    {
+	0,   0,   1,   0,
+	1,   2,   3,   0,
+	1,   2,   3,   4,
+	5, 254, 255,   9,
+    },
+    {
+	0,   1,   1, 254,
+      254, 254, 254, 255,
+      255, 255, 255, 255,
+      255, 255, 255,   9,
+    },
+    {
+	0,   1,   2, 254,
+      255, 255, 255, 255,
+      255, 255, 255, 255,
+      255, 255, 255,  18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
new file mode 100644
index 00000000000..dbf01ac863d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint16_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+    {
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+    },
+  },
+  {
+    {
+	  0,     0,     1,     0,
+	  1,     2,     3,     0,
+	  1,     2,     3,     4,
+	  5, 65534, 65535,     9,
+    },
+    {
+	  0,     1,     1, 65534,
+      65534, 65534, 65534, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,     9,
+    },
+    {
+	  0,     1,     2, 65534,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535, 65535,
+      65535, 65535, 65535,    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
new file mode 100644
index 00000000000..20ad2736403
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint32_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+    {
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+    },
+  },
+  {
+    {
+	       0,          0,          1,          0,
+	       1,          2,          3,          0,
+	       1,          2,          3,          4,
+	       5, 4294967294, 4294967295,          9,
+    },
+    {
+	       0,          1,          1, 4294967294,
+      4294967294, 4294967294, 4294967294, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,          9,
+    },
+    {
+	       0,          1,          2, 4294967294,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295, 4294967295,
+      4294967295, 4294967295, 4294967295,         18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
new file mode 100644
index 00000000000..2f31edc527e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c
@@ -0,0 +1,75 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "../../../sat_arith.h"
+
+#define T                  uint64_t
+#define N                  16
+#define RUN_VEC_SAT_BINARY RUN_VEC_SAT_U_ADD_FMT_1
+
+DEF_VEC_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3][N] = {
+  {
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_0 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* arg_1 */
+    {
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+      0, 0, 0, 0,
+    }, /* expect */
+  },
+  {
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+    {
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+    },
+  },
+  {
+    {
+			  0,                     0,                     1,                     0,
+			  1,                     2,                     3,                     0,
+			  1,                     2,                     3,                     4,
+			  5, 18446744073709551614u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     1, 18446744073709551614u,
+      18446744073709551614u, 18446744073709551614u, 18446744073709551614u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                     9,
+    },
+    {
+			  0,                     1,                     2, 18446744073709551614u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u, 18446744073709551615u,
+      18446744073709551615u, 18446744073709551615u, 18446744073709551615u,                    18,
+    },
+  },
+};
+
+#include "vec_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h b/gcc/testsuite/gcc.target/riscv/sat_arith.h
new file mode 100644
index 00000000000..2ef9fd825f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -0,0 +1,31 @@
+#ifndef HAVE_SAT_ARITH
+#define HAVE_SAT_ARITH
+
+#include <stdint-gcc.h>
+
+#define DEF_SAT_U_ADD_FMT_1(T)             \
+T __attribute__((noinline))                \
+sat_u_add_##T##_fmt_1 (T x, T y)           \
+{                                          \
+  return (x + y) | (-(T)((T)(x + y) < x)); \
+}
+
+#define DEF_VEC_SAT_U_ADD_FMT_1(T)                                   \
+void __attribute__((noinline))                                       \
+vec_sat_u_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \
+{                                                                    \
+  unsigned i;                                                        \
+  for (i = 0; i < limit; i++)                                        \
+    {                                                                \
+      T x = op_1[i];                                                 \
+      T y = op_2[i];                                                 \
+      out[i] = (x + y) | (-(T)((T)(x + y) < x));                     \
+    }                                                                \
+}
+
+#define RUN_SAT_U_ADD_FMT_1(T, x, y) sat_u_add_##T##_fmt_1(x, y)
+
+#define RUN_VEC_SAT_U_ADD_FMT_1(T, out, op_1, op_2, N) \
+  vec_sat_u_add_##T##_fmt_1(out, op_1, op_2, N)
+
+#endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
new file mode 100644
index 00000000000..609e1ea343b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint8_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
new file mode 100644
index 00000000000..d30436c782a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint16_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
new file mode 100644
index 00000000000..12347c607bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint32_t_fmt_1:
+** addw\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** sext.w\s+a0,\s*a0
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint32_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
new file mode 100644
index 00000000000..f2c6b74d917
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_uint64_t_fmt_1:
+** add\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+a0,\s*[atx][0-9]+,\s*[atx][0-9]+
+** ret
+*/
+DEF_SAT_U_ADD_FMT_1(uint64_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
new file mode 100644
index 00000000000..f1972490006
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint8_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0,   254,    254, },
+  {      1,   254,    255, },
+  {      2,   254,    255, },
+  {      0,   255,    255, },
+  {      1,   255,    255, },
+  {      2,   255,    255, },
+  {    255,   255,    255, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
new file mode 100644
index 00000000000..cb3879d0cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-2.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint16_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /* arg_0, arg_1, expect */
+  {      0,     0,      0, },
+  {      0,     1,      1, },
+  {      1,     1,      2, },
+  {      0, 65534,  65534, },
+  {      1, 65534,  65535, },
+  {      2, 65534,  65535, },
+  {      0, 65535,  65535, },
+  {      1, 65535,  65535, },
+  {      2, 65535,  65535, },
+  {  65535, 65535,  65535, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
new file mode 100644
index 00000000000..c9a6080ca3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-3.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint32_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*     arg_0,      arg_1,      expect */
+  {          0,          0,           0, },
+  {          0,          1,           1, },
+  {          1,          1,           2, },
+  {          0, 4294967294,  4294967294, },
+  {          1, 4294967294,  4294967295, },
+  {          2, 4294967294,  4294967295, },
+  {          0, 4294967295,  4294967295, },
+  {          1, 4294967295,  4294967295, },
+  {          2, 4294967295,  4294967295, },
+  { 4294967295, 4294967295,  4294967295, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
new file mode 100644
index 00000000000..c19b7e22387
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add-run-4.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99" } */
+
+#include "sat_arith.h"
+
+#define T              uint64_t
+#define RUN_SAT_BINARY RUN_SAT_U_ADD_FMT_1
+
+DEF_SAT_U_ADD_FMT_1(T)
+
+T test_data[][3] = {
+  /*                arg_0,                 arg_1,                 expect */
+  {                     0,                     0,                      0, },
+  {                     0,                     1,                      1, },
+  {                     1,                     1,                      2, },
+  {                     0, 18446744073709551614u,  18446744073709551614u, },
+  {                     1, 18446744073709551614u,  18446744073709551615u, },
+  {                     2, 18446744073709551614u,  18446744073709551615u, },
+  {                     0, 18446744073709551615u,  18446744073709551615u, },
+  {                     1, 18446744073709551615u,  18446744073709551615u, },
+  {                     2, 18446744073709551615u,  18446744073709551615u, },
+  { 18446744073709551615u, 18446744073709551615u,  18446744073709551615u, },
+};
+
+#include "scalar_sat_binary.h"
diff --git a/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
new file mode 100644
index 00000000000..cbb2d750107
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/scalar_sat_binary.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_SCALAR_SAT_BINARY
+#define HAVE_DEFINED_SCALAR_SAT_BINARY
+
+/* To leverage this header files for run test, you need to:
+   1. define T as the type, for example uint8_t,
+   2. define RUN_SAT_BINARY as run function.
+   3. prepare the test_data for test cases.
+ */
+
+int
+main ()
+{
+  unsigned i;
+  T *d;
+
+  for (i = 0; i < sizeof (test_data) / sizeof (test_data[0]); i++)
+    {
+      d = test_data[i];
+
+      if (RUN_SAT_BINARY (T, d[0], d[1]) != d[2])
+	__builtin_abort ();
+    }
+
+  return 0;
+}
+
+#endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-06 14:48 ` [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
@ 2024-05-13  9:09   ` Tamar Christina
  2024-05-13 13:36     ` Li, Pan2
  2024-05-14 13:18   ` Richard Biener
  1 sibling, 1 reply; 21+ messages in thread
From: Tamar Christina @ 2024-05-13  9:09 UTC (permalink / raw)
  To: pan2.li, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, hongtao.liu

Hi Pan,

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Monday, May 6, 2024 3:48 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> 	to the return true switch case(s).
> 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match.
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
> 	func decl generated in match.pd match.
> 	(match_saturation_arith): New func impl to match the saturation arith.
> 	(math_opts_dom_walker::after_dom_children): Try match saturation
> 	arith.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 28 ++++++++++++++++++++++++
>  gcc/optabs.def            |  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46
> +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 705f4a4695a..35a46edc9f6 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple
> *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static bool
> +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> +			bool *cfg_changed_p)
> +{
> +  gcall *call = NULL;
> +  bool changed_p = false;
> +
> +  gcc_assert (is_gimple_assign (stmt));
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +					OPTIMIZE_FOR_SPEED))

I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when optimizing for size.
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +      changed_p = true;
> +      *cfg_changed_p = changed_p;
> +    }
> +
> +  return changed_p;
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children
> (basic_block bb)
> 
>    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
> 
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (is_gimple_assign (stmt))
> +	match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> +    }
> +

Hmm why do you iterate independently over the statements? The block below already visits
Every statement doesn't it?

The root of your match is a BIT_IOR_EXPR expression, so I think you just need to change the entry below to:

	    case BIT_IOR_EXPR:
	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
	      /* fall-through */
	    case BIT_XOR_EXPR:
	      match_uaddc_usubc (&gsi, stmt, code);
	      break;

Patch is looking good! Thanks again for working on this.

Regards,
Tamar

>    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
>      {
>        gimple *stmt = gsi_stmt (gsi);
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
  2024-05-06 14:49 ` [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int pan2.li
@ 2024-05-13  9:10   ` Tamar Christina
  2024-05-14 13:21   ` Richard Biener
  1 sibling, 0 replies; 21+ messages in thread
From: Tamar Christina @ 2024-05-13  9:10 UTC (permalink / raw)
  To: pan2.li, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, hongtao.liu

Hi Pan,

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Monday, May 6, 2024 3:49 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch depends on below scalar enabling patch:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html
> 
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd<mode>3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd<mode>3
> in vector mode.  For example:
> 
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
> 
>   for (i = 0; i < n; i++)
>     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
> 
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
> ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
> 
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
> 
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func
> 	decl generated by match.pd match.
> 	(vect_recog_sat_add_pattern): New func impl to recog the pattern
> 	for unsigned SAT_ADD.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>

Patch looks good to me, but I cannot approve so I'll pass it on to Richi.

Cheers,
Tamar

> ---
>  gcc/tree-vect-patterns.cc | 51 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..8ffcaf71d5c 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo,
>    return pattern_stmt;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +			    tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +    return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +    {
> +      tree itype = TREE_TYPE (res_ops[0]);
> +      tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +      if (vtype != NULL_TREE && direct_internal_fn_supported_p (
> +	IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))
> +	{
> +	  *type_out = vtype;
> +	  gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
> +						    res_ops[1]);
> +
> +	  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +	  gimple_call_set_nothrow (call, /* nothrow_p */ false);
> +	  gimple_set_location (call, gimple_location (last_stmt));
> +
> +	  vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> +	  return call;
> +	}
> +    }
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
>     otherwise vectorized:
> 
> @@ -6987,6 +7037,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
> +  { vect_recog_sat_add_pattern, "sat_add" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
>    { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-13  9:09   ` Tamar Christina
@ 2024-05-13 13:36     ` Li, Pan2
  2024-05-13 15:03       ` Tamar Christina
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2024-05-13 13:36 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

Thanks Tamer for comments.

> I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when optimizing for size.

Sure thing, let me update it in v5.

> Hmm why do you iterate independently over the statements? The block below already visits
> Every statement doesn't it?

Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the shape changed, or shall we put it to the previous pass ?

> The root of your match is a BIT_IOR_EXPR expression, so I think you just need to change the entry below to:
>
> 	    case BIT_IOR_EXPR:
> 	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> 	      /* fall-through */
> 	    case BIT_XOR_EXPR:
> 	      match_uaddc_usubc (&gsi, stmt, code);
> 	      break;

There are other shapes (not covered in this patch) of SAT_ADD like below branch version, the IOR should be one of the ROOT. Thus doesn't
add case here.  Then, shall we take case for each shape here ? Both works for me.

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint32_t)

Pan


-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Monday, May 13, 2024 5:10 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

Hi Pan,

> -----Original Message-----
> From: pan2.li@intel.com <pan2.li@intel.com>
> Sent: Monday, May 6, 2024 3:48 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar
> int
> 
> From: Pan Li <pan2.li@intel.com>
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
> 
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
> 	PR target/51492
> 	PR target/112600
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> 	to the return true switch case(s).
> 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> 	* match.pd: Add unsigned SAT_ADD match.
> 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
> 	func decl generated in match.pd match.
> 	(match_saturation_arith): New func impl to match the saturation arith.
> 	(math_opts_dom_walker::after_dom_children): Try match saturation
> 	arith.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 28 ++++++++++++++++++++++++
>  gcc/optabs.def            |  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46
> +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> first,
>  			      smulhrs, umulhrs, binary)
> 
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> 
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 705f4a4695a..35a46edc9f6 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple
> *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
> 
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static bool
> +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> +			bool *cfg_changed_p)
> +{
> +  gcall *call = NULL;
> +  bool changed_p = false;
> +
> +  gcc_assert (is_gimple_assign (stmt));
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +					OPTIMIZE_FOR_SPEED))

I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when optimizing for size.
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +      changed_p = true;
> +      *cfg_changed_p = changed_p;
> +    }
> +
> +  return changed_p;
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children
> (basic_block bb)
> 
>    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
> 
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (is_gimple_assign (stmt))
> +	match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> +    }
> +

Hmm why do you iterate independently over the statements? The block below already visits
Every statement doesn't it?

The root of your match is a BIT_IOR_EXPR expression, so I think you just need to change the entry below to:

	    case BIT_IOR_EXPR:
	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
	      /* fall-through */
	    case BIT_XOR_EXPR:
	      match_uaddc_usubc (&gsi, stmt, code);
	      break;

Patch is looking good! Thanks again for working on this.

Regards,
Tamar

>    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
>      {
>        gimple *stmt = gsi_stmt (gsi);
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-13 13:36     ` Li, Pan2
@ 2024-05-13 15:03       ` Tamar Christina
  2024-05-14  1:50         ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Tamar Christina @ 2024-05-13 15:03 UTC (permalink / raw)
  To: Li, Pan2, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao

> 
> Thanks Tamer for comments.
> 
> > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when
> optimizing for size.
> 
> Sure thing, let me update it in v5.
> 
> > Hmm why do you iterate independently over the statements? The block below
> already visits
> > Every statement doesn't it?
> 
> Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the
> shape changed, or shall we put it to the previous pass ?
> 

That's just a matter of matching the overflow as an additional case no?
i.e. you can add an overload for unsigned_integer_sat_add matching the
IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

I think that would be better as it avoid visiting all the statements twice but also
extends the matching to some __builtin_add_overflow uses and should be fairly
simple.

> > The root of your match is a BIT_IOR_EXPR expression, so I think you just need to
> change the entry below to:
> >
> > 	    case BIT_IOR_EXPR:
> > 	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> > 	      /* fall-through */
> > 	    case BIT_XOR_EXPR:
> > 	      match_uaddc_usubc (&gsi, stmt, code);
> > 	      break;
> 
> There are other shapes (not covered in this patch) of SAT_ADD like below branch
> version, the IOR should be one of the ROOT. Thus doesn't
> add case here.  Then, shall we take case for each shape here ? Both works for me.
> 

Yeah, I think that's better than iterating over the statements twice.  It also fits better
In the existing code.

Tamar.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint32_t)
> 
> Pan
> 
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, May 13, 2024 5:10 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Monday, May 6, 2024 3:48 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > We perform the tranform during widen_mult because that the sub-expr of
> > SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> > pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> > .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> > cannot perform the .SAT_ADD pattern match as the sub-expr will be
> > optmized to .ADD_OVERFLOW first.
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The aarch64 fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > 	to the return true switch case(s).
> > 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match.
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
> > 	func decl generated in match.pd match.
> > 	(match_saturation_arith): New func impl to match the saturation arith.
> > 	(math_opts_dom_walker::after_dom_children): Try match saturation
> > 	arith.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 28 ++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 ++--
> >  gcc/tree-ssa-math-opts.cc | 46
> > +++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..7058e4cbe29 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index 705f4a4695a..35a46edc9f6 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static bool
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> > +			bool *cfg_changed_p)
> > +{
> > +  gcall *call = NULL;
> > +  bool changed_p = false;
> > +
> > +  gcc_assert (is_gimple_assign (stmt));
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +					OPTIMIZE_FOR_SPEED))
> 
> I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when optimizing
> for size.
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +      changed_p = true;
> > +      *cfg_changed_p = changed_p;
> > +    }
> > +
> > +  return changed_p;
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >
> >    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
> >
> > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +    {
> > +      gimple *stmt = gsi_stmt (gsi);
> > +
> > +      if (is_gimple_assign (stmt))
> > +	match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> > +    }
> > +
> 
> Hmm why do you iterate independently over the statements? The block below
> already visits
> Every statement doesn't it?
> 
> The root of your match is a BIT_IOR_EXPR expression, so I think you just need to
> change the entry below to:
> 
> 	    case BIT_IOR_EXPR:
> 	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> 	      /* fall-through */
> 	    case BIT_XOR_EXPR:
> 	      match_uaddc_usubc (&gsi, stmt, code);
> 	      break;
> 
> Patch is looking good! Thanks again for working on this.
> 
> Regards,
> Tamar
> 
> >    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
> >      {
> >        gimple *stmt = gsi_stmt (gsi);
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-13 15:03       ` Tamar Christina
@ 2024-05-14  1:50         ` Li, Pan2
  0 siblings, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2024-05-14  1:50 UTC (permalink / raw)
  To: Tamar Christina, gcc-patches
  Cc: juzhe.zhong, kito.cheng, richard.guenther, Liu, Hongtao


> That's just a matter of matching the overflow as an additional case no?
> i.e. you can add an overload for unsigned_integer_sat_add matching the
> IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

> I think that would be better as it avoid visiting all the statements twice but also
> extends the matching to some __builtin_add_overflow uses and should be fairly
> simple.

Thanks Tamar, got the point here, will have a try with overload unsigned_integer_sat_add for that.

> Yeah, I think that's better than iterating over the statements twice.  It also fits better
> In the existing code.

Ack, will follow the existing code.

Pan


-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Monday, May 13, 2024 11:03 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

> 
> Thanks Tamer for comments.
> 
> > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when
> optimizing for size.
> 
> Sure thing, let me update it in v5.
> 
> > Hmm why do you iterate independently over the statements? The block below
> already visits
> > Every statement doesn't it?
> 
> Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the
> shape changed, or shall we put it to the previous pass ?
> 

That's just a matter of matching the overflow as an additional case no?
i.e. you can add an overload for unsigned_integer_sat_add matching the
IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers.

I think that would be better as it avoid visiting all the statements twice but also
extends the matching to some __builtin_add_overflow uses and should be fairly
simple.

> > The root of your match is a BIT_IOR_EXPR expression, so I think you just need to
> change the entry below to:
> >
> > 	    case BIT_IOR_EXPR:
> > 	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> > 	      /* fall-through */
> > 	    case BIT_XOR_EXPR:
> > 	      match_uaddc_usubc (&gsi, stmt, code);
> > 	      break;
> 
> There are other shapes (not covered in this patch) of SAT_ADD like below branch
> version, the IOR should be one of the ROOT. Thus doesn't
> add case here.  Then, shall we take case for each shape here ? Both works for me.
> 

Yeah, I think that's better than iterating over the statements twice.  It also fits better
In the existing code.

Tamar.

> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint32_t)
> 
> Pan
> 
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Monday, May 13, 2024 5:10 PM
> To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; richard.guenther@gmail.com;
> Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> > -----Original Message-----
> > From: pan2.li@intel.com <pan2.li@intel.com>
> > Sent: Monday, May 6, 2024 3:48 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina
> > <Tamar.Christina@arm.com>; richard.guenther@gmail.com;
> > hongtao.liu@intel.com; Pan Li <pan2.li@intel.com>
> > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;    succ:       EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;    pred:       ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;    succ:       EXIT
> > }
> >
> > We perform the tranform during widen_mult because that the sub-expr of
> > SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> > pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> > .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> > cannot perform the .SAT_ADD pattern match as the sub-expr will be
> > optmized to .ADD_OVERFLOW first.
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The aarch64 fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > 	PR target/51492
> > 	PR target/112600
> >
> > gcc/ChangeLog:
> >
> > 	* internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > 	to the return true switch case(s).
> > 	* internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > 	* match.pd: Add unsigned SAT_ADD match.
> > 	* optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
> > 	* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
> > 	func decl generated in match.pd match.
> > 	(match_saturation_arith): New func impl to match the saturation arith.
> > 	(math_opts_dom_walker::after_dom_children): Try match saturation
> > 	arith.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/internal-fn.cc        |  1 +
> >  gcc/internal-fn.def       |  2 ++
> >  gcc/match.pd              | 28 ++++++++++++++++++++++++
> >  gcc/optabs.def            |  4 ++--
> >  gcc/tree-ssa-math-opts.cc | 46
> > +++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >      case IFN_UBSAN_CHECK_MUL:
> >      case IFN_ADD_OVERFLOW:
> >      case IFN_MUL_OVERFLOW:
> > +    case IFN_SAT_ADD:
> >      case IFN_VEC_WIDEN_PLUS:
> >      case IFN_VEC_WIDEN_PLUS_LO:
> >      case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS,
> ECF_CONST
> > | ECF_NOTHROW, first,
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW,
> > first,
> >  			      smulhrs, umulhrs, binary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd,
> > binary)
> > +
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> >  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index d401e7503e6..7058e4cbe29 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >         || POINTER_TYPE_P (itype))
> >        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
> >
> > +/* Unsigned Saturation Add */
> > +(match (usadd_left_part @0 @1)
> > + (plus:c @0 @1)
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part @0 @1)
> > + (negate (convert (lt (plus:c @0 @1) @0)))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +(match (usadd_right_part @0 @1)
> > + (negate (convert (gt @0 (plus:c @0 @1))))
> > + (if (INTEGRAL_TYPE_P (type)
> > +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@0))
> > +      && types_match (type, TREE_TYPE (@1)))))
> > +
> > +/* Unsigned saturation add, case 1 (branchless):
> > +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> > +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> > +(match (unsigned_integer_sat_add @0 @1)
> > + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> > +
> >  /* x >  y  &&  x != XXX_MIN  -->  x > y
> >     x >  y  &&  x == XXX_MIN  -->  false . */
> >  (for eqne (eq ne)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index ad14f9328b9..3f2cb46aff8 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
> >  OPTAB_NX(add_optab, "add$Q$a3")
> >  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
> >  OPTAB_VX(addv_optab, "add$F$a3")
> > -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> > +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3',
> > gen_signed_fixed_libfunc)
> > +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3',
> > gen_unsigned_fixed_libfunc)
> >  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3',
> gen_int_fp_fixed_libfunc)
> >  OPTAB_NX(sub_optab, "sub$F$a3")
> >  OPTAB_NX(sub_optab, "sub$Q$a3")
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index 705f4a4695a..35a46edc9f6 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple
> > *cast_stmt, gimple *&use_stmt,
> >    return 0;
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/*
> > + * Try to match saturation arith pattern(s).
> > + *   1. SAT_ADD (unsigned)
> > + *      _7 = _4 + _6;
> > + *      _8 = _4 > _7;
> > + *      _9 = (long unsigned int) _8;
> > + *      _10 = -_9;
> > + *      _12 = _7 | _10;
> > + *      =>
> > + *      _12 = .SAT_ADD (_4, _6);  */
> > +static bool
> > +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> > +			bool *cfg_changed_p)
> > +{
> > +  gcall *call = NULL;
> > +  bool changed_p = false;
> > +
> > +  gcc_assert (is_gimple_assign (stmt));
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +
> > +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +					OPTIMIZE_FOR_SPEED))
> 
> I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when optimizing
> for size.
> > +    {
> > +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> > +      gimple_call_set_lhs (call, lhs);
> > +      gsi_replace (gsi, call, true);
> > +      changed_p = true;
> > +      *cfg_changed_p = changed_p;
> > +    }
> > +
> > +  return changed_p;
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children
> > (basic_block bb)
> >
> >    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
> >
> > +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +    {
> > +      gimple *stmt = gsi_stmt (gsi);
> > +
> > +      if (is_gimple_assign (stmt))
> > +	match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> > +    }
> > +
> 
> Hmm why do you iterate independently over the statements? The block below
> already visits
> Every statement doesn't it?
> 
> The root of your match is a BIT_IOR_EXPR expression, so I think you just need to
> change the entry below to:
> 
> 	    case BIT_IOR_EXPR:
> 	      match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> 	      /* fall-through */
> 	    case BIT_XOR_EXPR:
> 	      match_uaddc_usubc (&gsi, stmt, code);
> 	      break;
> 
> Patch is looking good! Thanks again for working on this.
> 
> Regards,
> Tamar
> 
> >    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
> >      {
> >        gimple *stmt = gsi_stmt (gsi);
> > --
> > 2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-06 14:48 ` [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
  2024-05-13  9:09   ` Tamar Christina
@ 2024-05-14 13:18   ` Richard Biener
  2024-05-14 14:14     ` Li, Pan2
  1 sibling, 1 reply; 21+ messages in thread
From: Richard Biener @ 2024-05-14 13:18 UTC (permalink / raw)
  To: pan2.li
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina, hongtao.liu

On Mon, May 6, 2024 at 4:48 PM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
>
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>
> Take uint8_t as example, we will have:
>
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
>
> Given below example for the unsigned scalar integer uint64_t:
>
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
>
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
>
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
>
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
>         PR target/51492
>         PR target/112600
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
>         to the return true switch case(s).
>         * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
>         * match.pd: Add unsigned SAT_ADD match.
>         * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
>         * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
>         func decl generated in match.pd match.
>         (match_saturation_arith): New func impl to match the saturation arith.
>         (math_opts_dom_walker::after_dom_children): Try match saturation
>         arith.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 28 ++++++++++++++++++++++++
>  gcc/optabs.def            |  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
>                               smulhrs, umulhrs, binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
>
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 705f4a4695a..35a46edc9f6 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static bool
> +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> +                       bool *cfg_changed_p)
> +{
> +  gcall *call = NULL;
> +  bool changed_p = false;
> +
> +  gcc_assert (is_gimple_assign (stmt));

If you require a gassign please statically type your function
argument as gassign * instead and remove this assert.

> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +                                       OPTIMIZE_FOR_SPEED))
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +      changed_p = true;
> +      *cfg_changed_p = changed_p;

As addition to Tamars good comments why do you set *cfg_changed_p to
true?  You are
not changing the CFG afer all?

> +    }
> +
> +  return changed_p;
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
>
>    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
>
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (is_gimple_assign (stmt))
> +       match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> +    }
> +
>    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
>      {
>        gimple *stmt = gsi_stmt (gsi);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
  2024-05-06 14:49 ` [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int pan2.li
  2024-05-13  9:10   ` Tamar Christina
@ 2024-05-14 13:21   ` Richard Biener
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Biener @ 2024-05-14 13:21 UTC (permalink / raw)
  To: pan2.li
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina, hongtao.liu

On Mon, May 6, 2024 at 4:49 PM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch depends on below scalar enabling patch:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html
>
> For vectorize, we leverage the existing vect pattern recog to find
> the pattern similar to scalar and let the vectorizer to perform
> the rest part for standard name usadd<mode>3 in vector mode.
> The riscv vector backend have insn "Vector Single-Width Saturating
> Add and Subtract" which can be leveraged when expand the usadd<mode>3
> in vector mode.  For example:
>
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   unsigned i;
>
>   for (i = 0; i < n; i++)
>     out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
> }
>
> Before this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
>   ivtmp_58 = _80 * 8;
>   vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
>   vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
>   vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
>   mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
>   vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, ... }, vect__7.11_66);
>   .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72);
>   vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
>   vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
>   vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
>   ivtmp_79 = ivtmp_78 - _80;
>   ...
> }
>
> After this patch:
> void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
> {
>   ...
>   _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
>   ivtmp_46 = _62 * 8;
>   vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
>   vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
>   vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
>   .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54);
>   ...
> }
>
> The below test suites are passed for this patch.
> * The riscv fully regression tests.
> * The aarch64 fully regression tests.
> * The x86 bootstrap tests.
> * The x86 fully regression tests.
>
>         PR target/51492
>         PR target/112600
>
> gcc/ChangeLog:
>
>         * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func
>         decl generated by match.pd match.
>         (vect_recog_sat_add_pattern): New func impl to recog the pattern
>         for unsigned SAT_ADD.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-patterns.cc | 51 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87c2acff386..8ffcaf71d5c 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo,
>    return pattern_stmt;
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> + *   _7 = _4 + _6;
> + *   _8 = _4 > _7;
> + *   _9 = (long unsigned int) _8;
> + *   _10 = -_9;
> + *   _12 = _7 | _10;
> + *
> + * And then simplied to
> + *   _12 = .SAT_ADD (_4, _6);
> + */
> +
> +static gimple *
> +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> +                           tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +    return NULL;
> +
> +  tree res_ops[2];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
> +    {
> +      tree itype = TREE_TYPE (res_ops[0]);
> +      tree vtype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +      if (vtype != NULL_TREE && direct_internal_fn_supported_p (
> +       IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED))

Please break the line before the && instead, like

  if (vtype != NULL_TREE
      && direct_internal_fn_supported_p (...

Otherwise this is OK once 1/3 is approved.

Thanks,
Richard.

> +       {
> +         *type_out = vtype;
> +         gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0],
> +                                                   res_ops[1]);
> +
> +         gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
> +         gimple_call_set_nothrow (call, /* nothrow_p */ false);
> +         gimple_set_location (call, gimple_location (last_stmt));
> +
> +         vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt);
> +         return call;
> +       }
> +    }
> +
> +  return NULL;
> +}
> +
>  /* Detect a signed division by a constant that wouldn't be
>     otherwise vectorized:
>
> @@ -6987,6 +7037,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
>    { vect_recog_vector_vector_shift_pattern, "vector_vector_shift" },
>    { vect_recog_divmod_pattern, "divmod" },
>    { vect_recog_mult_pattern, "mult" },
> +  { vect_recog_sat_add_pattern, "sat_add" },
>    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
>    { vect_recog_gcond_pattern, "gcond" },
>    { vect_recog_bool_pattern, "bool" },
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
  2024-05-14 13:18   ` Richard Biener
@ 2024-05-14 14:14     ` Li, Pan2
  0 siblings, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2024-05-14 14:14 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, tamar.christina, Liu, Hongtao

Thanks Richard for comments.

> If you require a gassign please statically type your function
> argument as gassign * instead and remove this assert.

Sure

> As addition to Tamars good comments why do you set *cfg_changed_p to
> true?  You are
> not changing the CFG afer all?

Yes, we can add it back in future if we really changed cfg, will update in v5 (include vect patch 2/3) after all test passed.

Pan


-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, May 14, 2024 9:18 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

On Mon, May 6, 2024 at 4:48 PM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
>
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
>
> Take uint8_t as example, we will have:
>
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
>
> Given below example for the unsigned scalar integer uint64_t:
>
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
>
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;    succ:       EXIT
>
> }
>
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
>
> ;;   basic block 2, loop depth 0
> ;;    pred:       ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;    succ:       EXIT
> }
>
> We perform the tranform during widen_mult because that the sub-expr of
> SAT_ADD will be optimized to .ADD_OVERFLOW.  We need to try the .SAT_ADD
> pattern first and then .ADD_OVERFLOW,  or we may never catch the pattern
> .SAT_ADD.  Meanwhile, the isel pass is after widen_mult and then we
> cannot perform the .SAT_ADD pattern match as the sub-expr will be
> optmized to .ADD_OVERFLOW first.
>
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 2. The aarch64 fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
>         PR target/51492
>         PR target/112600
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
>         to the return true switch case(s).
>         * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
>         * match.pd: Add unsigned SAT_ADD match.
>         * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd.
>         * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern
>         func decl generated in match.pd match.
>         (match_saturation_arith): New func impl to match the saturation arith.
>         (math_opts_dom_walker::after_dom_children): Try match saturation
>         arith.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc        |  1 +
>  gcc/internal-fn.def       |  2 ++
>  gcc/match.pd              | 28 ++++++++++++++++++++++++
>  gcc/optabs.def            |  4 ++--
>  gcc/tree-ssa-math-opts.cc | 46 +++++++++++++++++++++++++++++++++++++++
>  5 files changed, 79 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>      case IFN_UBSAN_CHECK_MUL:
>      case IFN_ADD_OVERFLOW:
>      case IFN_MUL_OVERFLOW:
> +    case IFN_SAT_ADD:
>      case IFN_VEC_WIDEN_PLUS:
>      case IFN_VEC_WIDEN_PLUS_LO:
>      case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
>  DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
>                               smulhrs, umulhrs, binary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
> +
>  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
>  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
>  DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index d401e7503e6..7058e4cbe29 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>         || POINTER_TYPE_P (itype))
>        && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))))))
>
> +/* Unsigned Saturation Add */
> +(match (usadd_left_part @0 @1)
> + (plus:c @0 @1)
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (lt (plus:c @0 @1) @0)))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +(match (usadd_right_part @0 @1)
> + (negate (convert (gt @0 (plus:c @0 @1))))
> + (if (INTEGRAL_TYPE_P (type)
> +      && TYPE_UNSIGNED (TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@0))
> +      && types_match (type, TREE_TYPE (@1)))))
> +
> +/* Unsigned saturation add, case 1 (branchless):
> +   SAT_U_ADD = (X + Y) | - ((X + Y) < X) or
> +   SAT_U_ADD = (X + Y) | - (X > (X + Y)).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (bit_ior:c (usadd_left_part @0 @1) (usadd_right_part @0 @1)))
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
>     x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..3f2cb46aff8 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -111,8 +111,8 @@ OPTAB_NX(add_optab, "add$F$a3")
>  OPTAB_NX(add_optab, "add$Q$a3")
>  OPTAB_VL(addv_optab, "addv$I$a3", PLUS, "add", '3', gen_intv_fp_libfunc)
>  OPTAB_VX(addv_optab, "add$F$a3")
> -OPTAB_NL(ssadd_optab, "ssadd$Q$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
> -OPTAB_NL(usadd_optab, "usadd$Q$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
> +OPTAB_NL(ssadd_optab, "ssadd$a3", SS_PLUS, "ssadd", '3', gen_signed_fixed_libfunc)
> +OPTAB_NL(usadd_optab, "usadd$a3", US_PLUS, "usadd", '3', gen_unsigned_fixed_libfunc)
>  OPTAB_NL(sub_optab, "sub$P$a3", MINUS, "sub", '3', gen_int_fp_fixed_libfunc)
>  OPTAB_NX(sub_optab, "sub$F$a3")
>  OPTAB_NX(sub_optab, "sub$Q$a3")
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 705f4a4695a..35a46edc9f6 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -4026,6 +4026,44 @@ arith_overflow_check_p (gimple *stmt, gimple *cast_stmt, gimple *&use_stmt,
>    return 0;
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/*
> + * Try to match saturation arith pattern(s).
> + *   1. SAT_ADD (unsigned)
> + *      _7 = _4 + _6;
> + *      _8 = _4 > _7;
> + *      _9 = (long unsigned int) _8;
> + *      _10 = -_9;
> + *      _12 = _7 | _10;
> + *      =>
> + *      _12 = .SAT_ADD (_4, _6);  */
> +static bool
> +match_saturation_arith (gimple_stmt_iterator *gsi, gimple *stmt,
> +                       bool *cfg_changed_p)
> +{
> +  gcall *call = NULL;
> +  bool changed_p = false;
> +
> +  gcc_assert (is_gimple_assign (stmt));

If you require a gassign please statically type your function
argument as gassign * instead and remove this assert.

> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt);
> +
> +  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +      && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +                                       OPTIMIZE_FOR_SPEED))
> +    {
> +      call = gimple_build_call_internal (IFN_SAT_ADD, 2, ops[0], ops[1]);
> +      gimple_call_set_lhs (call, lhs);
> +      gsi_replace (gsi, call, true);
> +      changed_p = true;
> +      *cfg_changed_p = changed_p;

As addition to Tamars good comments why do you set *cfg_changed_p to
true?  You are
not changing the CFG afer all?

> +    }
> +
> +  return changed_p;
> +}
> +
>  /* Recognize for unsigned x
>     x = y - z;
>     if (x > y)
> @@ -5886,6 +5924,14 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
>
>    fma_deferring_state fma_state (param_avoid_fma_max_bits > 0);
>
> +  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *stmt = gsi_stmt (gsi);
> +
> +      if (is_gimple_assign (stmt))
> +       match_saturation_arith (&gsi, stmt, m_cfg_changed_p);
> +    }
> +
>    for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi);)
>      {
>        gimple *stmt = gsi_stmt (gsi);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-05-14 14:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-06 12:07 [PATCH v1] Internal-fn: Introduce new internal function SAT_ADD pan2.li
2024-04-07  7:03 ` [PATCH v2] " pan2.li
2024-04-28 12:10   ` Li, Pan2
2024-04-29  7:53 ` [PATCH v3] " pan2.li
2024-05-01 17:06   ` Tamar Christina
2024-05-02  3:10     ` Li, Pan2
2024-05-02  3:25       ` Tamar Christina
2024-05-02 10:57         ` Li, Pan2
2024-05-02 12:57           ` Tamar Christina
2024-05-03  1:45             ` Li, Pan2
2024-05-06 14:48 ` [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int pan2.li
2024-05-13  9:09   ` Tamar Christina
2024-05-13 13:36     ` Li, Pan2
2024-05-13 15:03       ` Tamar Christina
2024-05-14  1:50         ` Li, Pan2
2024-05-14 13:18   ` Richard Biener
2024-05-14 14:14     ` Li, Pan2
2024-05-06 14:49 ` [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int pan2.li
2024-05-13  9:10   ` Tamar Christina
2024-05-14 13:21   ` Richard Biener
2024-05-06 14:50 ` [PATCH v4 3/3] RISC-V: Implement IFN SAT_ADD for both the scalar and vector pan2.li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).