public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RISC-V: Support non-SLP unordered reduction
@ 2023-07-14 12:30 juzhe.zhong
  2023-07-14 12:38 ` Kito Cheng
  2023-07-17  7:00 ` Kito Cheng
  0 siblings, 2 replies; 7+ messages in thread
From: juzhe.zhong @ 2023-07-14 12:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, palmer, rdapp.gcc, jeffreyalaw, Ju-Zhe Zhong

From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
    res &= x[i];
  return res;
}

ASM:
and_loop:
	ble	a1,zero,.L4
	vsetvli	a3,zero,e32,m1,ta,ma
	vmv.v.i	v1,-1
.L3:
	vsetvli	a5,a1,e32,m1,tu,ma       ------------> MUST BE "TU".
	slli	a4,a5,2
	sub	a1,a1,a5
	vle32.v	v2,0(a0)
	add	a0,a0,a4
	vand.vv	v1,v2,v1
	bne	a1,zero,.L3
	vsetivli	zero,1,e32,m1,ta,ma
	vmv.v.i	v2,-1
	vsetvli	a3,zero,e32,m1,ta,ma
	vredand.vs	v1,v1,v2
	vmv.x.s	a5,v1
	and	a0,a2,a5
	ret
.L4:
	mv	a0,a2
	ret

Fix bug of VSETVL PASS which is caused by reduction testcase.

SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

        * config/riscv/autovec.md (reduc_plus_scal_<mode>): New pattern.
        (reduc_smax_scal_<mode>): Ditto.
        (reduc_umax_scal_<mode>): Ditto.
        (reduc_smin_scal_<mode>): Ditto.
        (reduc_umin_scal_<mode>): Ditto.
        (reduc_and_scal_<mode>): Ditto.
        (reduc_ior_scal_<mode>): Ditto.
        (reduc_xor_scal_<mode>): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): New enum.
        (emit_nonvlmax_integer_move_insn): Add reduction.
        (expand_reduction): New function.
        * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
        (emit_vlmax_fp_reduction_insn): Ditto.
        (get_m1_mode): Ditto.
        (expand_cond_len_binop): Fix name.
        (expand_reduction): New function.
        * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
        (change_insn): Ditto.
        (change_vsetvl_insn): Ditto.
        (pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md                   | 138 ++++++++++++++++++
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   |  84 ++++++++++-
 gcc/config/riscv/riscv-vsetvl.cc              |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c         | 118 +++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-2.c         | 129 ++++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-3.c         |  65 +++++++++
 .../riscv/rvv/autovec/reduc/reduc-4.c         |  59 ++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-1.c     |  56 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c     |  79 ++++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c     |  49 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c     |  66 +++++++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (<CODE>, operands);
   DONE;
 })
+
+;; =========================================================================
+;; == Reductions
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, min);
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (UMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_and_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_ior_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_xor_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfredusum.vs
+;; - vfredmax.vs
+;; - vfredmin.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, true);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, f);
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, false);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, f);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 1a622c58f4b..f19b7fc2b8d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -197,6 +197,7 @@ enum insn_type
   RVV_COMPRESS_OP = 4,
   RVV_GATHER_M_OP = 5,
   RVV_SCATTER_M_OP = 4,
+  RVV_REDUCTION_OP = 3,
 };
 enum vlmul_type
 {
@@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
 void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
+//void emit_vlmax_reduction_insn (unsigned, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -280,6 +282,7 @@ bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void expand_cond_len_binop (rtx_code, rtx *);
+void expand_reduction (rtx_code, rtx *, rtx);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
 			  bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 90da63889bd..ccf0f6ff852 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1137,6 +1137,43 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
+/* Emit reduction instruction.  */
+static void
+emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+					  /* HAS_DEST_P */ true,
+					  /* FULLY_UNMASKED_P */ true,
+					  /* USE_REAL_MERGE_P */ false,
+					  /* HAS_AVL_P */ true,
+					  /* VLMAX_P */ true, dest_mode,
+					  mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* Emit reduction instruction.  */
+static void
+emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+					  /* HAS_DEST_P */ true,
+					  /* FULLY_UNMASKED_P */ true,
+					  /* USE_REAL_MERGE_P */ false,
+					  /* HAS_AVL_P */ true,
+					  /* VLMAX_P */ true, dest_mode,
+					  mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_rounding_mode (FRM_DYN);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* Emit merge instruction.  */
 
 static machine_mode
@@ -1629,6 +1666,17 @@ get_mask_mode (machine_mode mode)
   return get_vector_mode (BImode, GET_MODE_NUNITS (mode));
 }
 
+/* Return the appropriate M1 mode for MODE.  */
+
+static opt_machine_mode
+get_m1_mode (machine_mode mode)
+{
+  scalar_mode smode = GET_MODE_INNER (mode);
+  unsigned int bytes = GET_MODE_SIZE (smode);
+  poly_uint64 m1_nunits = exact_div (BYTES_PER_RISCV_VECTOR, bytes);
+  return get_vector_mode (smode, m1_nunits);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
    This function is not only used by builtins, but also will be used by
    auto-vectorization in the future.  */
@@ -3099,9 +3147,9 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
       rtx ops[] = {dest, mask, merge, src1, src2};
       insn_code icode = code_for_pred (code, mode);
       if (needs_fp_rounding (code, mode))
-	emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
+	emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, ops, len);
       else
-	emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
+	emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, ops, len);
     }
   else
     /* FIXME: Enable this case when we support it in the middle-end.  */
@@ -3267,4 +3315,36 @@ expand_gather_scatter (rtx *ops, bool is_load)
     }
 }
 
+/* Expand reduction operations.  */
+void
+expand_reduction (rtx_code code, rtx *ops, rtx init)
+{
+  machine_mode vmode = GET_MODE (ops[1]);
+  machine_mode m1_mode = get_m1_mode (vmode).require ();
+  machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
+
+  rtx m1_tmp = gen_reg_rtx (m1_mode);
+  rtx m1_mask = gen_scalar_move_mask (m1_mmode);
+  rtx m1_undef = RVV_VUNDEF (m1_mode);
+  rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
+  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
+
+  rtx m1_tmp2 = gen_reg_rtx (m1_mode);
+  rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
+
+  if (FLOAT_MODE_P (vmode) && code == PLUS)
+    {
+      insn_code icode
+	= code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
+      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+  else
+    {
+      insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
+      emit_vlmax_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+
+  emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 586dc8e5379..97a9dad8a77 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
 }
 
 static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
+gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
+		rtx vl = NULL_RTX)
 {
   rtx new_pat;
   vl_vtype_info new_info = info;
@@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
   if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
     {
       rtx dest = get_vl (rinsn);
-      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
+      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
     }
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
     new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
@@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
       print_rtl_single (dump_file, PATTERN (rinsn));
     }
 
-  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  gcc_assert (change_p);
 
   if (dump_file)
     {
@@ -931,7 +933,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
 }
 
 static void
-change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
+change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
+		    rtx vl = NULL_RTX)
 {
   rtx_insn *rinsn;
   if (vector_config_insn_p (insn->rtl ()))
@@ -945,7 +948,7 @@ change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
       rinsn = PREV_INSN (insn->rtl ());
       gcc_assert (vector_config_insn_p (rinsn));
     }
-  rtx new_pat = gen_vsetvl_pat (rinsn, info);
+  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
   change_insn (rinsn, new_pat);
 }
 
@@ -3377,7 +3380,20 @@ pass_vsetvl::backward_demand_fusion (void)
 				       new_info))
 		continue;
 
-	      change_vsetvl_insn (new_info.get_insn (), new_info);
+	      rtx vl = NULL_RTX;
+	      /* Backward VLMAX VL:
+		   bb 3:
+		     vsetivli zero, 1 ... -> vsetvli t1, zero
+		     vmv.s.x
+		   bb 5:
+		     vsetvli t1, zero ... -> to be elided.
+		     vlse16.v
+
+		   We should forward "t1".  */
+	      if (!block_info.reaching_out.has_avl_reg ()
+		  && vlmax_avl_p (new_info.get_avl ()))
+		vl = get_vl (prop.get_insn ()->rtl ());
+	      change_vsetvl_insn (new_info.get_insn (), new_info, vl);
 	      if (block_info.local_dem == block_info.reaching_out)
 		block_info.local_dem = new_info;
 	      block_info.reaching_out = new_info;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
new file mode 100644
index 00000000000..0d543af13ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define DEF_REDUC_PLUS(TYPE)			\
+TYPE __attribute__ ((noinline, noclone))	\
+reduc_plus_##TYPE (TYPE *a, int n)		\
+{						\
+  TYPE r = 0;					\
+  for (int i = 0; i < n; ++i)			\
+    r += a[i];					\
+  return r;					\
+}
+
+#define TEST_PLUS(T)				\
+  T (int8_t)					\
+  T (int16_t)					\
+  T (int32_t)					\
+  T (int64_t)					\
+  T (uint8_t)					\
+  T (uint16_t)					\
+  T (uint32_t)					\
+  T (uint64_t)					\
+  T (_Float16)					\
+  T (float)					\
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)	\
+TYPE __attribute__ ((noinline, noclone))	\
+reduc_##NAME##_##TYPE (TYPE *a, int n)		\
+{						\
+  TYPE r = 13;					\
+  for (int i = 0; i < n; ++i)			\
+    r = a[i] CMP_OP r ? a[i] : r;		\
+  return r;					\
+}
+
+#define TEST_MAXMIN(T)				\
+  T (int8_t, max, >)				\
+  T (int16_t, max, >)				\
+  T (int32_t, max, >)				\
+  T (int64_t, max, >)				\
+  T (uint8_t, max, >)				\
+  T (uint16_t, max, >)				\
+  T (uint32_t, max, >)				\
+  T (uint64_t, max, >)				\
+  T (_Float16, max, >)				\
+  T (float, max, >)				\
+  T (double, max, >)				\
+						\
+  T (int8_t, min, <)				\
+  T (int16_t, min, <)				\
+  T (int32_t, min, <)				\
+  T (int64_t, min, <)				\
+  T (uint8_t, min, <)				\
+  T (uint16_t, min, <)				\
+  T (uint32_t, min, <)				\
+  T (uint64_t, min, <)				\
+  T (_Float16, min, <)				\
+  T (float, min, <)				\
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)	\
+TYPE __attribute__ ((noinline, noclone))	\
+reduc_##NAME##_##TYPE (TYPE *a, int n)		\
+{						\
+  TYPE r = 13;					\
+  for (int i = 0; i < n; ++i)			\
+    r BIT_OP a[i];				\
+  return r;					\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
new file mode 100644
index 00000000000..136a8a378bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
@@ -0,0 +1,129 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define NUM_ELEMS(TYPE) (1024 / sizeof (TYPE))
+
+#define DEF_REDUC_PLUS(TYPE)					\
+void __attribute__ ((noinline, noclone))			\
+reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],	\
+		   TYPE *restrict r, int n)			\
+{								\
+  for (int i = 0; i < n; i++)					\
+    {								\
+      r[i] = 0;							\
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)		\
+        r[i] += a[i][j];					\
+    }								\
+}
+
+#define TEST_PLUS(T)				\
+  T (int8_t)					\
+  T (int16_t)					\
+  T (int32_t)					\
+  T (int64_t)					\
+  T (uint8_t)					\
+  T (uint16_t)					\
+  T (uint32_t)					\
+  T (uint64_t)					\
+  T (_Float16)					\
+  T (float)					\
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)			\
+void __attribute__ ((noinline, noclone))			\
+reduc_##NAME##_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],	\
+		       TYPE *restrict r, int n)			\
+{								\
+  for (int i = 0; i < n; i++)					\
+    {								\
+      r[i] = a[i][0];						\
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)		\
+        r[i] = a[i][j] CMP_OP r[i] ? a[i][j] : r[i];		\
+    }								\
+}
+
+#define TEST_MAXMIN(T)				\
+  T (int8_t, max, >)				\
+  T (int16_t, max, >)				\
+  T (int32_t, max, >)				\
+  T (int64_t, max, >)				\
+  T (uint8_t, max, >)				\
+  T (uint16_t, max, >)				\
+  T (uint32_t, max, >)				\
+  T (uint64_t, max, >)				\
+  T (_Float16, max, >)				\
+  T (float, max, >)				\
+  T (double, max, >)				\
+						\
+  T (int8_t, min, <)				\
+  T (int16_t, min, <)				\
+  T (int32_t, min, <)				\
+  T (int64_t, min, <)				\
+  T (uint8_t, min, <)				\
+  T (uint16_t, min, <)				\
+  T (uint32_t, min, <)				\
+  T (uint64_t, min, <)				\
+  T (_Float16, min, <)				\
+  T (float, min, <)				\
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)			\
+void __attribute__ ((noinline, noclone))			\
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],	\
+		    TYPE *restrict r, int n)			\
+{								\
+  for (int i = 0; i < n; i++)					\
+    {								\
+      r[i] = a[i][0];						\
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)			\
+        r[i] BIT_OP a[i][j];					\
+    }								\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
new file mode 100644
index 00000000000..c3638344f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
new file mode 100644
index 00000000000..f00a12826c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
new file mode 100644
index 00000000000..b500f857598
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-1.c"
+
+#define NUM_ELEMS(TYPE) (73 + sizeof (TYPE))
+
+#define INIT_VECTOR(TYPE)				\
+  TYPE a[NUM_ELEMS (TYPE) + 1];				\
+  for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)	\
+    {							\
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);		\
+      asm volatile ("" ::: "memory");			\
+    }
+
+#define TEST_REDUC_PLUS(TYPE)				\
+  {							\
+    INIT_VECTOR (TYPE);					\
+    TYPE r1 = reduc_plus_##TYPE (a, NUM_ELEMS (TYPE));	\
+    volatile TYPE r2 = 0;				\
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)		\
+      r2 += a[i];					\
+    if (r1 != r2)					\
+      __builtin_abort ();				\
+  }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)			\
+  {								\
+    INIT_VECTOR (TYPE);						\
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));	\
+    volatile TYPE r2 = 13;					\
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)			\
+      r2 = a[i] CMP_OP r2 ? a[i] : r2;				\
+    if (r1 != r2)						\
+      __builtin_abort ();					\
+  }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)			\
+  {								\
+    INIT_VECTOR (TYPE);						\
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));	\
+    volatile TYPE r2 = 13;					\
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)			\
+      r2 BIT_OP a[i];						\
+    if (r1 != r2)						\
+      __builtin_abort ();					\
+  }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
new file mode 100644
index 00000000000..3c2f62557b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "reduc-2.c"
+
+#define NROWS 53
+
+/* -ffast-math fuzz for PLUS.  */
+#define CMP__Float16(X, Y) ((X) >= (Y) * 0.875 && (X) <= (Y) * 1.125)
+#define CMP_float(X, Y) ((X) == (Y))
+#define CMP_double(X, Y) ((X) == (Y))
+#define CMP_int8_t(X, Y) ((X) == (Y))
+#define CMP_int16_t(X, Y) ((X) == (Y))
+#define CMP_int32_t(X, Y) ((X) == (Y))
+#define CMP_int64_t(X, Y) ((X) == (Y))
+#define CMP_uint8_t(X, Y) ((X) == (Y))
+#define CMP_uint16_t(X, Y) ((X) == (Y))
+#define CMP_uint32_t(X, Y) ((X) == (Y))
+#define CMP_uint64_t(X, Y) ((X) == (Y))
+
+#define INIT_MATRIX(TYPE)				\
+  TYPE mat[NROWS][NUM_ELEMS (TYPE)];			\
+  TYPE r[NROWS];					\
+  for (int i = 0; i < NROWS; i++)			\
+    for (int j = 0; j < NUM_ELEMS (TYPE); j++)		\
+      {							\
+	mat[i][j] = i + (j * 2) * (j & 1 ? 1 : -1);	\
+	asm volatile ("" ::: "memory");			\
+      }
+
+#define TEST_REDUC_PLUS(TYPE)				\
+  {							\
+    INIT_MATRIX (TYPE);					\
+    reduc_plus_##TYPE (mat, r, NROWS);			\
+    for (int i = 0; i < NROWS; i++)			\
+      {							\
+	volatile TYPE r2 = 0;				\
+	for (int j = 0; j < NUM_ELEMS (TYPE); ++j)	\
+	  r2 += mat[i][j];				\
+	if (!CMP_##TYPE (r[i], r2))			\
+	  __builtin_abort ();				\
+      }							\
+    }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)		\
+  {							\
+    INIT_MATRIX (TYPE);					\
+    reduc_##NAME##_##TYPE (mat, r, NROWS);		\
+    for (int i = 0; i < NROWS; i++)			\
+      {							\
+	volatile TYPE r2 = mat[i][0];			\
+	for (int j = 0; j < NUM_ELEMS (TYPE); ++j)	\
+	  r2 = mat[i][j] CMP_OP r2 ? mat[i][j] : r2;	\
+	if (r[i] != r2)					\
+	  __builtin_abort ();				\
+      }							\
+    }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)		\
+  {							\
+    INIT_MATRIX (TYPE);					\
+    reduc_##NAME##_##TYPE (mat, r, NROWS);		\
+    for (int i = 0; i < NROWS; i++)			\
+      {							\
+	volatile TYPE r2 = mat[i][0];			\
+	for (int j = 0; j < NUM_ELEMS (TYPE); ++j)	\
+	  r2 BIT_OP mat[i][j];				\
+	if (r[i] != r2)					\
+	  __builtin_abort ();				\
+      }							\
+    }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
new file mode 100644
index 00000000000..d1b22c0d69a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-3.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0) != 0
+      || add_loop (x, 11) != 572
+      || add_loop (x, 0x100) != 22016
+      || add_loop (x, 0xfff) != 20480
+      || max_loop (x, 0) != 0
+      || max_loop (x, 11) != 132
+      || max_loop (x, 0x100) != 65280
+      || max_loop (x, 0xfff) != 65504
+      || or_loop (x, 0) != 0
+      || or_loop (x, 11) != 0xfe
+      || or_loop (x, 0x80) != 0x7ffe
+      || or_loop (x, 0xb4) != 0x7ffe
+      || or_loop (x, 0xb5) != 0xfffe
+      || eor_loop (x, 0) != 0
+      || eor_loop (x, 11) != 0xe8
+      || eor_loop (x, 0x100) != 0xcf00
+      || eor_loop (x, 0xfff) != 0xa000)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0) != 65535
+      || min_loop (x, 11) != 65403
+      || min_loop (x, 0x100) != 255
+      || min_loop (x, 0xfff) != 31
+      || and_loop (x, 0) != 0xffff
+      || and_loop (x, 11) != 0xff01
+      || and_loop (x, 0x80) != 0x8001
+      || and_loop (x, 0xb4) != 0x8001
+      || and_loop (x, 0xb5) != 1)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
new file mode 100644
index 00000000000..c17e125a763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-4.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0, 10) != 10
+      || add_loop (x, 11, 42) != 614
+      || add_loop (x, 0x100, 84) != 22100
+      || add_loop (x, 0xfff, 20) != 20500
+      || max_loop (x, 0, 10) != 10
+      || max_loop (x, 11, 131) != 132
+      || max_loop (x, 11, 133) != 133
+      || max_loop (x, 0x100, 65279) != 65280
+      || max_loop (x, 0x100, 65281) != 65281
+      || max_loop (x, 0xfff, 65503) != 65504
+      || max_loop (x, 0xfff, 65505) != 65505
+      || or_loop (x, 0, 0x71) != 0x71
+      || or_loop (x, 11, 0) != 0xfe
+      || or_loop (x, 11, 0xb3c) != 0xbfe
+      || or_loop (x, 0x80, 0) != 0x7ffe
+      || or_loop (x, 0x80, 1) != 0x7fff
+      || or_loop (x, 0xb4, 0) != 0x7ffe
+      || or_loop (x, 0xb4, 1) != 0x7fff
+      || or_loop (x, 0xb5, 0) != 0xfffe
+      || or_loop (x, 0xb5, 1) != 0xffff
+      || eor_loop (x, 0, 0x3e) != 0x3e
+      || eor_loop (x, 11, 0) != 0xe8
+      || eor_loop (x, 11, 0x1ff) != 0x117
+      || eor_loop (x, 0x100, 0) != 0xcf00
+      || eor_loop (x, 0x100, 0xeee) != 0xc1ee
+      || eor_loop (x, 0xfff, 0) != 0xa000
+      || eor_loop (x, 0xfff, 0x8888) != 0x2888)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0, 10000) != 10000
+      || min_loop (x, 11, 65404) != 65403
+      || min_loop (x, 11, 65402) != 65402
+      || min_loop (x, 0x100, 256) != 255
+      || min_loop (x, 0x100, 254) != 254
+      || min_loop (x, 0xfff, 32) != 31
+      || min_loop (x, 0xfff, 30) != 30
+      || and_loop (x, 0, 0x1234) != 0x1234
+      || and_loop (x, 11, 0xffff) != 0xff01
+      || and_loop (x, 11, 0xcdef) != 0xcd01
+      || and_loop (x, 0x80, 0xffff) != 0x8001
+      || and_loop (x, 0x80, 0xfffe) != 0x8000
+      || and_loop (x, 0xb4, 0xffff) != 0x8001
+      || and_loop (x, 0xb4, 0xfffe) != 0x8000
+      || and_loop (x, 0xb5, 0xffff) != 1
+      || and_loop (x, 0xb5, 0xfffe) != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 19589fa9638..532c17c4065 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -71,6 +71,8 @@ foreach op $AUTOVEC_TEST_OPTS {
     "" "$op"
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
     "" "$op"
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
+    "" "$op"
 }
 
 # widening operation only test on LMUL < 8
-- 
2.36.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-14 12:30 [PATCH] RISC-V: Support non-SLP unordered reduction juzhe.zhong
@ 2023-07-14 12:38 ` Kito Cheng
  2023-07-14 12:47   ` 钟居哲
  2023-07-14 12:51   ` 钟居哲
  2023-07-17  7:00 ` Kito Cheng
  1 sibling, 2 replies; 7+ messages in thread
From: Kito Cheng @ 2023-07-14 12:38 UTC (permalink / raw)
  To: 钟居哲
  Cc: GCC Patches, Kito Cheng, Palmer Dabbelt, Robin Dapp, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 49370 bytes --]

<juzhe.zhong@rivai.ai> 於 2023年7月14日 週五 20:31 寫道:

> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
>
> This patch add reduc_*_scal to support reduction auto-vectorization.
>
> Use COND_LEN_* + reduc_*_scal to support unordered non-SLP
> auto-vectorization.
>
> Consider this following case:
> int __attribute__((noipa))
> and_loop (int32_t * __restrict x,
> int32_t n, int res)
> {
>   for (int i = 0; i < n; ++i)
>     res &= x[i];
>   return res;
> }
>
> ASM:
> and_loop:
>         ble     a1,zero,.L4
>         vsetvli a3,zero,e32,m1,ta,ma
>         vmv.v.i v1,-1
> .L3:
>         vsetvli a5,a1,e32,m1,tu,ma       ------------> MUST BE "TU".
>         slli    a4,a5,2
>         sub     a1,a1,a5
>         vle32.v v2,0(a0)
>         add     a0,a0,a4
>         vand.vv v1,v2,v1
>         bne     a1,zero,.L3
>         vsetivli        zero,1,e32,m1,ta,ma
>         vmv.v.i v2,-1
>         vsetvli a3,zero,e32,m1,ta,ma
>         vredand.vs      v1,v1,v2
>         vmv.x.s a5,v1
>         and     a0,a2,a5
>         ret
> .L4:
>         mv      a0,a2
>         ret
>
> Fix bug of VSETVL PASS which is caused by reduction testcase.
>


It's performance bug or correctness bug? Does it's also appeared in gcc 13
if it's a correctness bug?


> SLP reduction and floating-point in-order reduction are not supported yet.
>
> gcc/ChangeLog:
>
>         * config/riscv/autovec.md (reduc_plus_scal_<mode>): New pattern.
>         (reduc_smax_scal_<mode>): Ditto.
>         (reduc_umax_scal_<mode>): Ditto.
>         (reduc_smin_scal_<mode>): Ditto.
>         (reduc_umin_scal_<mode>): Ditto.
>         (reduc_and_scal_<mode>): Ditto.
>         (reduc_ior_scal_<mode>): Ditto.
>         (reduc_xor_scal_<mode>): Ditto.
>         * config/riscv/riscv-protos.h (enum insn_type): New enum.
>         (emit_nonvlmax_integer_move_insn): Add reduction.
>         (expand_reduction): New function.
>         * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
>         (emit_vlmax_fp_reduction_insn): Ditto.
>         (get_m1_mode): Ditto.
>         (expand_cond_len_binop): Fix name.
>         (expand_reduction): New function.
>         * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
>         (change_insn): Ditto.
>         (change_vsetvl_insn): Ditto.
>         (pass_vsetvl::backward_demand_fusion): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
>         * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md                   | 138 ++++++++++++++++++
>  gcc/config/riscv/riscv-protos.h               |   3 +
>  gcc/config/riscv/riscv-v.cc                   |  84 ++++++++++-
>  gcc/config/riscv/riscv-vsetvl.cc              |  28 +++-
>  .../riscv/rvv/autovec/reduc/reduc-1.c         | 118 +++++++++++++++
>  .../riscv/rvv/autovec/reduc/reduc-2.c         | 129 ++++++++++++++++
>  .../riscv/rvv/autovec/reduc/reduc-3.c         |  65 +++++++++
>  .../riscv/rvv/autovec/reduc/reduc-4.c         |  59 ++++++++
>  .../riscv/rvv/autovec/reduc/reduc_run-1.c     |  56 +++++++
>  .../riscv/rvv/autovec/reduc/reduc_run-2.c     |  79 ++++++++++
>  .../riscv/rvv/autovec/reduc/reduc_run-3.c     |  49 +++++++
>  .../riscv/rvv/autovec/reduc/reduc_run-4.c     |  66 +++++++++
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
>  13 files changed, 868 insertions(+), 8 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 0476b1dea45..a74f66f41ac 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1531,3 +1531,141 @@
>    riscv_vector::expand_cond_len_binop (<CODE>, operands);
>    DONE;
>  })
> +
> +;;
> =========================================================================
> +;; == Reductions
> +;;
> =========================================================================
> +
> +;;
> -------------------------------------------------------------------------
> +;; ---- [INT] Tree reductions
> +;;
> -------------------------------------------------------------------------
> +;; Includes:
> +;; - vredsum.vs
> +;; - vredmaxu.vs
> +;; - vredmax.vs
> +;; - vredminu.vs
> +;; - vredmin.vs
> +;; - vredand.vs
> +;; - vredor.vs
> +;; - vredxor.vs
> +;;
> -------------------------------------------------------------------------
> +
> +(define_expand "reduc_plus_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +(define_expand "reduc_smax_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  int prec = GET_MODE_PRECISION (<VEL>mode);
> +  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED),
> <VEL>mode);
> +  riscv_vector::expand_reduction (SMAX, operands, min);
> +  DONE;
> +})
> +
> +(define_expand "reduc_umax_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +(define_expand "reduc_smin_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  int prec = GET_MODE_PRECISION (<VEL>mode);
> +  rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED),
> <VEL>mode);
> +  riscv_vector::expand_reduction (SMIN, operands, max);
> +  DONE;
> +})
> +
> +(define_expand "reduc_umin_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  int prec = GET_MODE_PRECISION (<VEL>mode);
> +  rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED),
> <VEL>mode);
> +  riscv_vector::expand_reduction (UMIN, operands, max);
> +  DONE;
> +})
> +
> +(define_expand "reduc_and_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +(define_expand "reduc_ior_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +(define_expand "reduc_xor_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VI 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +;;
> -------------------------------------------------------------------------
> +;; ---- [FP] Tree reductions
> +;;
> -------------------------------------------------------------------------
> +;; Includes:
> +;; - vfredusum.vs
> +;; - vfredmax.vs
> +;; - vfredmin.vs
> +;;
> -------------------------------------------------------------------------
> +
> +(define_expand "reduc_plus_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VF 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
> +  DONE;
> +})
> +
> +(define_expand "reduc_smax_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VF 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  REAL_VALUE_TYPE rv;
> +  real_inf (&rv, true);
> +  rtx f = const_double_from_real_value (rv, <VEL>mode);
> +  riscv_vector::expand_reduction (SMAX, operands, f);
> +  DONE;
> +})
> +
> +(define_expand "reduc_smin_scal_<mode>"
> +  [(match_operand:<VEL> 0 "register_operand")
> +   (match_operand:VF 1 "register_operand")]
> +  "TARGET_VECTOR"
> +{
> +  REAL_VALUE_TYPE rv;
> +  real_inf (&rv, false);
> +  rtx f = const_double_from_real_value (rv, <VEL>mode);
> +  riscv_vector::expand_reduction (SMIN, operands, f);
> +  DONE;
> +})
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 1a622c58f4b..f19b7fc2b8d 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -197,6 +197,7 @@ enum insn_type
>    RVV_COMPRESS_OP = 4,
>    RVV_GATHER_M_OP = 5,
>    RVV_SCATTER_M_OP = 4,
> +  RVV_REDUCTION_OP = 3,
>  };
>  enum vlmul_type
>  {
> @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
>  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
>  void emit_scalar_move_insn (unsigned, rtx *);
>  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
> +//void emit_vlmax_reduction_insn (unsigned, rtx *);
>  enum vlmul_type get_vlmul (machine_mode);
>  unsigned int get_ratio (machine_mode);
>  unsigned int get_nf (machine_mode);
> @@ -280,6 +282,7 @@ bool has_vi_variant_p (rtx_code, rtx);
>  void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
>  bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
>  void expand_cond_len_binop (rtx_code, rtx *);
> +void expand_reduction (rtx_code, rtx *, rtx);
>  #endif
>  bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
>                           bool, void (*)(rtx *, rtx));
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 90da63889bd..ccf0f6ff852 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -1137,6 +1137,43 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
>    e.emit_insn ((enum insn_code) icode, ops);
>  }
>
> +/* Emit reduction instruction.  */
> +static void
> +emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
> +  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
> +                                         /* HAS_DEST_P */ true,
> +                                         /* FULLY_UNMASKED_P */ true,
> +                                         /* USE_REAL_MERGE_P */ false,
> +                                         /* HAS_AVL_P */ true,
> +                                         /* VLMAX_P */ true, dest_mode,
> +                                         mask_mode);
> +
> +  e.set_policy (TAIL_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
> +/* Emit reduction instruction.  */
> +static void
> +emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
> +  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
> +                                         /* HAS_DEST_P */ true,
> +                                         /* FULLY_UNMASKED_P */ true,
> +                                         /* USE_REAL_MERGE_P */ false,
> +                                         /* HAS_AVL_P */ true,
> +                                         /* VLMAX_P */ true, dest_mode,
> +                                         mask_mode);
> +
> +  e.set_policy (TAIL_ANY);
> +  e.set_rounding_mode (FRM_DYN);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
>  /* Emit merge instruction.  */
>
>  static machine_mode
> @@ -1629,6 +1666,17 @@ get_mask_mode (machine_mode mode)
>    return get_vector_mode (BImode, GET_MODE_NUNITS (mode));
>  }
>
> +/* Return the appropriate M1 mode for MODE.  */
> +
> +static opt_machine_mode
> +get_m1_mode (machine_mode mode)
> +{
> +  scalar_mode smode = GET_MODE_INNER (mode);
> +  unsigned int bytes = GET_MODE_SIZE (smode);
> +  poly_uint64 m1_nunits = exact_div (BYTES_PER_RISCV_VECTOR, bytes);
> +  return get_vector_mode (smode, m1_nunits);
> +}
> +
>  /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
>     This function is not only used by builtins, but also will be used by
>     auto-vectorization in the future.  */
> @@ -3099,9 +3147,9 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
>        rtx ops[] = {dest, mask, merge, src1, src2};
>        insn_code icode = code_for_pred (code, mode);
>        if (needs_fp_rounding (code, mode))
> -       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, ops, len);
>        else
> -       emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
> +       emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, ops, len);
>      }
>    else
>      /* FIXME: Enable this case when we support it in the middle-end.  */
> @@ -3267,4 +3315,36 @@ expand_gather_scatter (rtx *ops, bool is_load)
>      }
>  }
>
> +/* Expand reduction operations.  */
> +void
> +expand_reduction (rtx_code code, rtx *ops, rtx init)
> +{
> +  machine_mode vmode = GET_MODE (ops[1]);
> +  machine_mode m1_mode = get_m1_mode (vmode).require ();
> +  machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
> +
> +  rtx m1_tmp = gen_reg_rtx (m1_mode);
> +  rtx m1_mask = gen_scalar_move_mask (m1_mmode);
> +  rtx m1_undef = RVV_VUNDEF (m1_mode);
> +  rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
> +  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode),
> scalar_move_ops);
> +
> +  rtx m1_tmp2 = gen_reg_rtx (m1_mode);
> +  rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
> +
> +  if (FLOAT_MODE_P (vmode) && code == PLUS)
> +    {
> +      insn_code icode
> +       = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
> +      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
> +    }
> +  else
> +    {
> +      insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
> +      emit_vlmax_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
> +    }
> +
> +  emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
> +}
> +
>  } // namespace riscv_vector
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 586dc8e5379..97a9dad8a77 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const
> vl_vtype_info &info, rtx vl)
>  }
>
>  static rtx
> -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
> +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
> +               rtx vl = NULL_RTX)
>  {
>    rtx new_pat;
>    vl_vtype_info new_info = info;
> @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const
> vector_insn_info &info)
>    if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
>      {
>        rtx dest = get_vl (rinsn);
> -      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
> +      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
>      }
>    else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
>      new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info,
> NULL_RTX);
> @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
>        print_rtl_single (dump_file, PATTERN (rinsn));
>      }
>
> -  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> +  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat,
> false);
> +  gcc_assert (change_p);
>
>    if (dump_file)
>      {
> @@ -931,7 +933,8 @@ change_insn (function_info *ssa, insn_change change,
> insn_info *insn,
>  }
>
>  static void
> -change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
> +change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
> +                   rtx vl = NULL_RTX)
>  {
>    rtx_insn *rinsn;
>    if (vector_config_insn_p (insn->rtl ()))
> @@ -945,7 +948,7 @@ change_vsetvl_insn (const insn_info *insn, const
> vector_insn_info &info)
>        rinsn = PREV_INSN (insn->rtl ());
>        gcc_assert (vector_config_insn_p (rinsn));
>      }
> -  rtx new_pat = gen_vsetvl_pat (rinsn, info);
> +  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
>    change_insn (rinsn, new_pat);
>  }
>
> @@ -3377,7 +3380,20 @@ pass_vsetvl::backward_demand_fusion (void)
>                                        new_info))
>                 continue;
>
> -             change_vsetvl_insn (new_info.get_insn (), new_info);
> +             rtx vl = NULL_RTX;
> +             /* Backward VLMAX VL:
> +                  bb 3:
> +                    vsetivli zero, 1 ... -> vsetvli t1, zero
> +                    vmv.s.x
> +                  bb 5:
> +                    vsetvli t1, zero ... -> to be elided.
> +                    vlse16.v
> +
> +                  We should forward "t1".  */
> +             if (!block_info.reaching_out.has_avl_reg ()
> +                 && vlmax_avl_p (new_info.get_avl ()))
> +               vl = get_vl (prop.get_insn ()->rtl ());
> +             change_vsetvl_insn (new_info.get_insn (), new_info, vl);
>               if (block_info.local_dem == block_info.reaching_out)
>                 block_info.local_dem = new_info;
>               block_info.reaching_out = new_info;
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
> new file mode 100644
> index 00000000000..0d543af13ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
> @@ -0,0 +1,118 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d
> --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model"
> } */
> +
> +#include <stdint-gcc.h>
> +
> +#define DEF_REDUC_PLUS(TYPE)                   \
> +TYPE __attribute__ ((noinline, noclone))       \
> +reduc_plus_##TYPE (TYPE *a, int n)             \
> +{                                              \
> +  TYPE r = 0;                                  \
> +  for (int i = 0; i < n; ++i)                  \
> +    r += a[i];                                 \
> +  return r;                                    \
> +}
> +
> +#define TEST_PLUS(T)                           \
> +  T (int8_t)                                   \
> +  T (int16_t)                                  \
> +  T (int32_t)                                  \
> +  T (int64_t)                                  \
> +  T (uint8_t)                                  \
> +  T (uint16_t)                                 \
> +  T (uint32_t)                                 \
> +  T (uint64_t)                                 \
> +  T (_Float16)                                 \
> +  T (float)                                    \
> +  T (double)
> +
> +TEST_PLUS (DEF_REDUC_PLUS)
> +
> +#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)   \
> +TYPE __attribute__ ((noinline, noclone))       \
> +reduc_##NAME##_##TYPE (TYPE *a, int n)         \
> +{                                              \
> +  TYPE r = 13;                                 \
> +  for (int i = 0; i < n; ++i)                  \
> +    r = a[i] CMP_OP r ? a[i] : r;              \
> +  return r;                                    \
> +}
> +
> +#define TEST_MAXMIN(T)                         \
> +  T (int8_t, max, >)                           \
> +  T (int16_t, max, >)                          \
> +  T (int32_t, max, >)                          \
> +  T (int64_t, max, >)                          \
> +  T (uint8_t, max, >)                          \
> +  T (uint16_t, max, >)                         \
> +  T (uint32_t, max, >)                         \
> +  T (uint64_t, max, >)                         \
> +  T (_Float16, max, >)                         \
> +  T (float, max, >)                            \
> +  T (double, max, >)                           \
> +                                               \
> +  T (int8_t, min, <)                           \
> +  T (int16_t, min, <)                          \
> +  T (int32_t, min, <)                          \
> +  T (int64_t, min, <)                          \
> +  T (uint8_t, min, <)                          \
> +  T (uint16_t, min, <)                         \
> +  T (uint32_t, min, <)                         \
> +  T (uint64_t, min, <)                         \
> +  T (_Float16, min, <)                         \
> +  T (float, min, <)                            \
> +  T (double, min, <)
> +
> +TEST_MAXMIN (DEF_REDUC_MAXMIN)
> +
> +#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)  \
> +TYPE __attribute__ ((noinline, noclone))       \
> +reduc_##NAME##_##TYPE (TYPE *a, int n)         \
> +{                                              \
> +  TYPE r = 13;                                 \
> +  for (int i = 0; i < n; ++i)                  \
> +    r BIT_OP a[i];                             \
> +  return r;                                    \
> +}
> +
> +#define TEST_BITWISE(T)                                \
> +  T (int8_t, and, &=)                          \
> +  T (int16_t, and, &=)                         \
> +  T (int32_t, and, &=)                         \
> +  T (int64_t, and, &=)                         \
> +  T (uint8_t, and, &=)                         \
> +  T (uint16_t, and, &=)                                \
> +  T (uint32_t, and, &=)                                \
> +  T (uint64_t, and, &=)                                \
> +                                               \
> +  T (int8_t, ior, |=)                          \
> +  T (int16_t, ior, |=)                         \
> +  T (int32_t, ior, |=)                         \
> +  T (int64_t, ior, |=)                         \
> +  T (uint8_t, ior, |=)                         \
> +  T (uint16_t, ior, |=)                                \
> +  T (uint32_t, ior, |=)                                \
> +  T (uint64_t, ior, |=)                                \
> +                                               \
> +  T (int8_t, xor, ^=)                          \
> +  T (int16_t, xor, ^=)                         \
> +  T (int32_t, xor, ^=)                         \
> +  T (int64_t, xor, ^=)                         \
> +  T (uint8_t, xor, ^=)                         \
> +  T (uint16_t, xor, ^=)                                \
> +  T (uint32_t, xor, ^=)                                \
> +  T (uint64_t, xor, ^=)
> +
> +TEST_BITWISE (DEF_REDUC_BITWISE)
> +
> +/* { dg-final { scan-assembler-times
> {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> +/* { dg-final { scan-assembler-times
> {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> +/* { dg-final { scan-assembler-times
> {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
> new file mode 100644
> index 00000000000..136a8a378bf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
> @@ -0,0 +1,129 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d
> --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model"
> } */
> +
> +#include <stdint-gcc.h>
> +
> +#define NUM_ELEMS(TYPE) (1024 / sizeof (TYPE))
> +
> +#define DEF_REDUC_PLUS(TYPE)                                   \
> +void __attribute__ ((noinline, noclone))                       \
> +reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],       \
> +                  TYPE *restrict r, int n)                     \
> +{                                                              \
> +  for (int i = 0; i < n; i++)                                  \
> +    {                                                          \
> +      r[i] = 0;                                                        \
> +      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
> +        r[i] += a[i][j];                                       \
> +    }                                                          \
> +}
> +
> +#define TEST_PLUS(T)                           \
> +  T (int8_t)                                   \
> +  T (int16_t)                                  \
> +  T (int32_t)                                  \
> +  T (int64_t)                                  \
> +  T (uint8_t)                                  \
> +  T (uint16_t)                                 \
> +  T (uint32_t)                                 \
> +  T (uint64_t)                                 \
> +  T (_Float16)                                 \
> +  T (float)                                    \
> +  T (double)
> +
> +TEST_PLUS (DEF_REDUC_PLUS)
> +
> +#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                   \
> +void __attribute__ ((noinline, noclone))                       \
> +reduc_##NAME##_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],   \
> +                      TYPE *restrict r, int n)                 \
> +{                                                              \
> +  for (int i = 0; i < n; i++)                                  \
> +    {                                                          \
> +      r[i] = a[i][0];                                          \
> +      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
> +        r[i] = a[i][j] CMP_OP r[i] ? a[i][j] : r[i];           \
> +    }                                                          \
> +}
> +
> +#define TEST_MAXMIN(T)                         \
> +  T (int8_t, max, >)                           \
> +  T (int16_t, max, >)                          \
> +  T (int32_t, max, >)                          \
> +  T (int64_t, max, >)                          \
> +  T (uint8_t, max, >)                          \
> +  T (uint16_t, max, >)                         \
> +  T (uint32_t, max, >)                         \
> +  T (uint64_t, max, >)                         \
> +  T (_Float16, max, >)                         \
> +  T (float, max, >)                            \
> +  T (double, max, >)                           \
> +                                               \
> +  T (int8_t, min, <)                           \
> +  T (int16_t, min, <)                          \
> +  T (int32_t, min, <)                          \
> +  T (int64_t, min, <)                          \
> +  T (uint8_t, min, <)                          \
> +  T (uint16_t, min, <)                         \
> +  T (uint32_t, min, <)                         \
> +  T (uint64_t, min, <)                         \
> +  T (_Float16, min, <)                         \
> +  T (float, min, <)                            \
> +  T (double, min, <)
> +
> +TEST_MAXMIN (DEF_REDUC_MAXMIN)
> +
> +#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)                    \
> +void __attribute__ ((noinline, noclone))                       \
> +reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],       \
> +                   TYPE *restrict r, int n)                    \
> +{                                                              \
> +  for (int i = 0; i < n; i++)                                  \
> +    {                                                          \
> +      r[i] = a[i][0];                                          \
> +      for (int j = 0; j < NUM_ELEMS(TYPE); j++)                        \
> +        r[i] BIT_OP a[i][j];                                   \
> +    }                                                          \
> +}
> +
> +#define TEST_BITWISE(T)                                \
> +  T (int8_t, and, &=)                          \
> +  T (int16_t, and, &=)                         \
> +  T (int32_t, and, &=)                         \
> +  T (int64_t, and, &=)                         \
> +  T (uint8_t, and, &=)                         \
> +  T (uint16_t, and, &=)                                \
> +  T (uint32_t, and, &=)                                \
> +  T (uint64_t, and, &=)                                \
> +                                               \
> +  T (int8_t, ior, |=)                          \
> +  T (int16_t, ior, |=)                         \
> +  T (int32_t, ior, |=)                         \
> +  T (int64_t, ior, |=)                         \
> +  T (uint8_t, ior, |=)                         \
> +  T (uint16_t, ior, |=)                                \
> +  T (uint32_t, ior, |=)                                \
> +  T (uint64_t, ior, |=)                                \
> +                                               \
> +  T (int8_t, xor, ^=)                          \
> +  T (int16_t, xor, ^=)                         \
> +  T (int32_t, xor, ^=)                         \
> +  T (int64_t, xor, ^=)                         \
> +  T (uint8_t, xor, ^=)                         \
> +  T (uint16_t, xor, ^=)                                \
> +  T (uint32_t, xor, ^=)                                \
> +  T (uint64_t, xor, ^=)
> +
> +TEST_BITWISE (DEF_REDUC_BITWISE)
> +
> +/* { dg-final { scan-assembler-times
> {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> +/* { dg-final { scan-assembler-times
> {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> +/* { dg-final { scan-assembler-times
> {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
> new file mode 100644
> index 00000000000..c3638344f80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
> @@ -0,0 +1,65 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d
> --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model"
> } */
> +
> +#include <stdint-gcc.h>
> +
> +unsigned short __attribute__((noipa))
> +add_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = 0;
> +  for (int i = 0; i < n; ++i)
> +    res += x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +min_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = ~0;
> +  for (int i = 0; i < n; ++i)
> +    res = res < x[i] ? res : x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +max_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = 0;
> +  for (int i = 0; i < n; ++i)
> +    res = res > x[i] ? res : x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +and_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = ~0;
> +  for (int i = 0; i < n; ++i)
> +    res &= x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +or_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = 0;
> +  for (int i = 0; i < n; ++i)
> +    res |= x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +eor_loop (unsigned short *x, int n)
> +{
> +  unsigned short res = 0;
> +  for (int i = 0; i < n; ++i)
> +    res ^= x[i];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
> new file mode 100644
> index 00000000000..f00a12826c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d
> --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model"
> } */
> +
> +#include <stdint-gcc.h>
> +
> +unsigned short __attribute__((noipa))
> +add_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res += x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +min_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res = res < x[i] ? res : x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +max_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res = res > x[i] ? res : x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +and_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res &= x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +or_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res |= x[i];
> +  return res;
> +}
> +
> +unsigned short __attribute__((noipa))
> +eor_loop (unsigned short *x, int n, unsigned short res)
> +{
> +  for (int i = 0; i < n; ++i)
> +    res ^= x[i];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times
> {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times
> {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
> new file mode 100644
> index 00000000000..b500f857598
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
> @@ -0,0 +1,56 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable
> -ffast-math -fno-vect-cost-model" } */
> +
> +#include "reduc-1.c"
> +
> +#define NUM_ELEMS(TYPE) (73 + sizeof (TYPE))
> +
> +#define INIT_VECTOR(TYPE)                              \
> +  TYPE a[NUM_ELEMS (TYPE) + 1];                                \
> +  for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)       \
> +    {                                                  \
> +      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);         \
> +      asm volatile ("" ::: "memory");                  \
> +    }
> +
> +#define TEST_REDUC_PLUS(TYPE)                          \
> +  {                                                    \
> +    INIT_VECTOR (TYPE);                                        \
> +    TYPE r1 = reduc_plus_##TYPE (a, NUM_ELEMS (TYPE)); \
> +    volatile TYPE r2 = 0;                              \
> +    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)         \
> +      r2 += a[i];                                      \
> +    if (r1 != r2)                                      \
> +      __builtin_abort ();                              \
> +  }
> +
> +#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                  \
> +  {                                                            \
> +    INIT_VECTOR (TYPE);                                                \
> +    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
> +    volatile TYPE r2 = 13;                                     \
> +    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
> +      r2 = a[i] CMP_OP r2 ? a[i] : r2;                         \
> +    if (r1 != r2)                                              \
> +      __builtin_abort ();                                      \
> +  }
> +
> +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)                 \
> +  {                                                            \
> +    INIT_VECTOR (TYPE);                                                \
> +    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
> +    volatile TYPE r2 = 13;                                     \
> +    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
> +      r2 BIT_OP a[i];                                          \
> +    if (r1 != r2)                                              \
> +      __builtin_abort ();                                      \
> +  }
> +
> +int main ()
> +{
> +  TEST_PLUS (TEST_REDUC_PLUS)
> +  TEST_MAXMIN (TEST_REDUC_MAXMIN)
> +  TEST_BITWISE (TEST_REDUC_BITWISE)
> +
> +  return 0;
> +}
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
> new file mode 100644
> index 00000000000..3c2f62557b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
> @@ -0,0 +1,79 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable" }
> */
> +
> +#include "reduc-2.c"
> +
> +#define NROWS 53
> +
> +/* -ffast-math fuzz for PLUS.  */
> +#define CMP__Float16(X, Y) ((X) >= (Y) * 0.875 && (X) <= (Y) * 1.125)
> +#define CMP_float(X, Y) ((X) == (Y))
> +#define CMP_double(X, Y) ((X) == (Y))
> +#define CMP_int8_t(X, Y) ((X) == (Y))
> +#define CMP_int16_t(X, Y) ((X) == (Y))
> +#define CMP_int32_t(X, Y) ((X) == (Y))
> +#define CMP_int64_t(X, Y) ((X) == (Y))
> +#define CMP_uint8_t(X, Y) ((X) == (Y))
> +#define CMP_uint16_t(X, Y) ((X) == (Y))
> +#define CMP_uint32_t(X, Y) ((X) == (Y))
> +#define CMP_uint64_t(X, Y) ((X) == (Y))
> +
> +#define INIT_MATRIX(TYPE)                              \
> +  TYPE mat[NROWS][NUM_ELEMS (TYPE)];                   \
> +  TYPE r[NROWS];                                       \
> +  for (int i = 0; i < NROWS; i++)                      \
> +    for (int j = 0; j < NUM_ELEMS (TYPE); j++)         \
> +      {                                                        \
> +       mat[i][j] = i + (j * 2) * (j & 1 ? 1 : -1);     \
> +       asm volatile ("" ::: "memory");                 \
> +      }
> +
> +#define TEST_REDUC_PLUS(TYPE)                          \
> +  {                                                    \
> +    INIT_MATRIX (TYPE);                                        \
> +    reduc_plus_##TYPE (mat, r, NROWS);                 \
> +    for (int i = 0; i < NROWS; i++)                    \
> +      {                                                        \
> +       volatile TYPE r2 = 0;                           \
> +       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
> +         r2 += mat[i][j];                              \
> +       if (!CMP_##TYPE (r[i], r2))                     \
> +         __builtin_abort ();                           \
> +      }                                                        \
> +    }
> +
> +#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)          \
> +  {                                                    \
> +    INIT_MATRIX (TYPE);                                        \
> +    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
> +    for (int i = 0; i < NROWS; i++)                    \
> +      {                                                        \
> +       volatile TYPE r2 = mat[i][0];                   \
> +       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
> +         r2 = mat[i][j] CMP_OP r2 ? mat[i][j] : r2;    \
> +       if (r[i] != r2)                                 \
> +         __builtin_abort ();                           \
> +      }                                                        \
> +    }
> +
> +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)         \
> +  {                                                    \
> +    INIT_MATRIX (TYPE);                                        \
> +    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
> +    for (int i = 0; i < NROWS; i++)                    \
> +      {                                                        \
> +       volatile TYPE r2 = mat[i][0];                   \
> +       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
> +         r2 BIT_OP mat[i][j];                          \
> +       if (r[i] != r2)                                 \
> +         __builtin_abort ();                           \
> +      }                                                        \
> +    }
> +
> +int main ()
> +{
> +  TEST_PLUS (TEST_REDUC_PLUS)
> +  TEST_MAXMIN (TEST_REDUC_MAXMIN)
> +
> +  return 0;
> +}
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
> new file mode 100644
> index 00000000000..d1b22c0d69a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
> @@ -0,0 +1,49 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable
> -ffast-math -fno-vect-cost-model" } */
> +
> +#include "reduc-3.c"
> +
> +#define N 0x1100
> +
> +int
> +main (void)
> +{
> +  unsigned short x[N];
> +  for (int i = 0; i < N; ++i)
> +    x[i] = (i + 1) * (i + 2);
> +
> +  if (add_loop (x, 0) != 0
> +      || add_loop (x, 11) != 572
> +      || add_loop (x, 0x100) != 22016
> +      || add_loop (x, 0xfff) != 20480
> +      || max_loop (x, 0) != 0
> +      || max_loop (x, 11) != 132
> +      || max_loop (x, 0x100) != 65280
> +      || max_loop (x, 0xfff) != 65504
> +      || or_loop (x, 0) != 0
> +      || or_loop (x, 11) != 0xfe
> +      || or_loop (x, 0x80) != 0x7ffe
> +      || or_loop (x, 0xb4) != 0x7ffe
> +      || or_loop (x, 0xb5) != 0xfffe
> +      || eor_loop (x, 0) != 0
> +      || eor_loop (x, 11) != 0xe8
> +      || eor_loop (x, 0x100) != 0xcf00
> +      || eor_loop (x, 0xfff) != 0xa000)
> +    __builtin_abort ();
> +
> +  for (int i = 0; i < N; ++i)
> +    x[i] = ~x[i];
> +
> +  if (min_loop (x, 0) != 65535
> +      || min_loop (x, 11) != 65403
> +      || min_loop (x, 0x100) != 255
> +      || min_loop (x, 0xfff) != 31
> +      || and_loop (x, 0) != 0xffff
> +      || and_loop (x, 11) != 0xff01
> +      || and_loop (x, 0x80) != 0x8001
> +      || and_loop (x, 0xb4) != 0x8001
> +      || and_loop (x, 0xb5) != 1)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
> new file mode 100644
> index 00000000000..c17e125a763
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
> @@ -0,0 +1,66 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable
> -ffast-math -fno-vect-cost-model" } */
> +
> +#include "reduc-4.c"
> +
> +#define N 0x1100
> +
> +int
> +main (void)
> +{
> +  unsigned short x[N];
> +  for (int i = 0; i < N; ++i)
> +    x[i] = (i + 1) * (i + 2);
> +
> +  if (add_loop (x, 0, 10) != 10
> +      || add_loop (x, 11, 42) != 614
> +      || add_loop (x, 0x100, 84) != 22100
> +      || add_loop (x, 0xfff, 20) != 20500
> +      || max_loop (x, 0, 10) != 10
> +      || max_loop (x, 11, 131) != 132
> +      || max_loop (x, 11, 133) != 133
> +      || max_loop (x, 0x100, 65279) != 65280
> +      || max_loop (x, 0x100, 65281) != 65281
> +      || max_loop (x, 0xfff, 65503) != 65504
> +      || max_loop (x, 0xfff, 65505) != 65505
> +      || or_loop (x, 0, 0x71) != 0x71
> +      || or_loop (x, 11, 0) != 0xfe
> +      || or_loop (x, 11, 0xb3c) != 0xbfe
> +      || or_loop (x, 0x80, 0) != 0x7ffe
> +      || or_loop (x, 0x80, 1) != 0x7fff
> +      || or_loop (x, 0xb4, 0) != 0x7ffe
> +      || or_loop (x, 0xb4, 1) != 0x7fff
> +      || or_loop (x, 0xb5, 0) != 0xfffe
> +      || or_loop (x, 0xb5, 1) != 0xffff
> +      || eor_loop (x, 0, 0x3e) != 0x3e
> +      || eor_loop (x, 11, 0) != 0xe8
> +      || eor_loop (x, 11, 0x1ff) != 0x117
> +      || eor_loop (x, 0x100, 0) != 0xcf00
> +      || eor_loop (x, 0x100, 0xeee) != 0xc1ee
> +      || eor_loop (x, 0xfff, 0) != 0xa000
> +      || eor_loop (x, 0xfff, 0x8888) != 0x2888)
> +    __builtin_abort ();
> +
> +  for (int i = 0; i < N; ++i)
> +    x[i] = ~x[i];
> +
> +  if (min_loop (x, 0, 10000) != 10000
> +      || min_loop (x, 11, 65404) != 65403
> +      || min_loop (x, 11, 65402) != 65402
> +      || min_loop (x, 0x100, 256) != 255
> +      || min_loop (x, 0x100, 254) != 254
> +      || min_loop (x, 0xfff, 32) != 31
> +      || min_loop (x, 0xfff, 30) != 30
> +      || and_loop (x, 0, 0x1234) != 0x1234
> +      || and_loop (x, 11, 0xffff) != 0xff01
> +      || and_loop (x, 11, 0xcdef) != 0xcd01
> +      || and_loop (x, 0x80, 0xffff) != 0x8001
> +      || and_loop (x, 0x80, 0xfffe) != 0x8000
> +      || and_loop (x, 0xb4, 0xffff) != 0x8001
> +      || and_loop (x, 0xb4, 0xfffe) != 0x8000
> +      || and_loop (x, 0xb5, 0xffff) != 1
> +      || and_loop (x, 0xb5, 0xfffe) != 0)
> +    __builtin_abort ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> index 19589fa9638..532c17c4065 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> @@ -71,6 +71,8 @@ foreach op $AUTOVEC_TEST_OPTS {
>      "" "$op"
>    dg-runtest [lsort [glob -nocomplain
> $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
>      "" "$op"
> +  dg-runtest [lsort [glob -nocomplain
> $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
> +    "" "$op"
>  }
>
>  # widening operation only test on LMUL < 8
> --
> 2.36.3
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-14 12:38 ` Kito Cheng
@ 2023-07-14 12:47   ` 钟居哲
  2023-07-14 12:51   ` 钟居哲
  1 sibling, 0 replies; 7+ messages in thread
From: 钟居哲 @ 2023-07-14 12:47 UTC (permalink / raw)
  To: kito.cheng; +Cc: gcc-patches, kito.cheng, palmer, rdapp.gcc, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 47670 bytes --]

>> It's performance bug or correctness bug? Does it's also appeared in gcc 13 if it's a correctness bug?

It's correctness bug. 

The bug as below:

vsetvli zero, 1, e16, m1, ta, ma  ----> VSETVL pass detect it can be  fused as "t1,zero,e16,m2,ta,ma" but failed in change_insn
vmv.s.x v1,a5
...
vsetvli t1,zero,e16,m2,ta,ma  -----> elided 
vlse16.v v2...

So finally, we end up with:

vsetvli zero, 1, e16, m1, ta, ma 
vmv.s.x v1,a5
...
vlse16.v v2...

which is incorrect.
I tried to reproduce this situation by intrinsic but failed.
It seems that it can only be reproduced by reduction auto-vectorization.



juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


<juzhe.zhong@rivai.ai> 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
    res &= x[i];
  return res;
}

ASM:
and_loop:
        ble     a1,zero,.L4
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.i v1,-1
.L3:
        vsetvli a5,a1,e32,m1,tu,ma       ------------> MUST BE "TU".
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vand.vv v1,v2,v1
        bne     a1,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.i v2,-1
        vsetvli a3,zero,e32,m1,ta,ma
        vredand.vs      v1,v1,v2
        vmv.x.s a5,v1
        and     a0,a2,a5
        ret
.L4:
        mv      a0,a2
        ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

        * config/riscv/autovec.md (reduc_plus_scal_<mode>): New pattern.
        (reduc_smax_scal_<mode>): Ditto.
        (reduc_umax_scal_<mode>): Ditto.
        (reduc_smin_scal_<mode>): Ditto.
        (reduc_umin_scal_<mode>): Ditto.
        (reduc_and_scal_<mode>): Ditto.
        (reduc_ior_scal_<mode>): Ditto.
        (reduc_xor_scal_<mode>): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): New enum.
        (emit_nonvlmax_integer_move_insn): Add reduction.
        (expand_reduction): New function.
        * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
        (emit_vlmax_fp_reduction_insn): Ditto.
        (get_m1_mode): Ditto.
        (expand_cond_len_binop): Fix name.
        (expand_reduction): New function.
        * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
        (change_insn): Ditto.
        (change_vsetvl_insn): Ditto.
        (pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md                   | 138 ++++++++++++++++++
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   |  84 ++++++++++-
 gcc/config/riscv/riscv-vsetvl.cc              |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c         | 118 +++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-2.c         | 129 ++++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-3.c         |  65 +++++++++
 .../riscv/rvv/autovec/reduc/reduc-4.c         |  59 ++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-1.c     |  56 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c     |  79 ++++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c     |  49 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c     |  66 +++++++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (<CODE>, operands);
   DONE;
 })
+
+;; =========================================================================
+;; == Reductions
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, min);
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (UMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_and_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_ior_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_xor_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfredusum.vs
+;; - vfredmax.vs
+;; - vfredmin.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, true);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, f);
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, false);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, f);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 1a622c58f4b..f19b7fc2b8d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -197,6 +197,7 @@ enum insn_type
   RVV_COMPRESS_OP = 4,
   RVV_GATHER_M_OP = 5,
   RVV_SCATTER_M_OP = 4,
+  RVV_REDUCTION_OP = 3,
 };
 enum vlmul_type
 {
@@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
 void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
+//void emit_vlmax_reduction_insn (unsigned, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -280,6 +282,7 @@ bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void expand_cond_len_binop (rtx_code, rtx *);
+void expand_reduction (rtx_code, rtx *, rtx);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
                          bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 90da63889bd..ccf0f6ff852 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1137,6 +1137,43 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
   e.emit_insn ((enum insn_code) icode, ops);
 }

+/* Emit reduction instruction.  */
+static void
+emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* Emit reduction instruction.  */
+static void
+emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_rounding_mode (FRM_DYN);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* Emit merge instruction.  */

 static machine_mode
@@ -1629,6 +1666,17 @@ get_mask_mode (machine_mode mode)
   return get_vector_mode (BImode, GET_MODE_NUNITS (mode));
 }

+/* Return the appropriate M1 mode for MODE.  */
+
+static opt_machine_mode
+get_m1_mode (machine_mode mode)
+{
+  scalar_mode smode = GET_MODE_INNER (mode);
+  unsigned int bytes = GET_MODE_SIZE (smode);
+  poly_uint64 m1_nunits = exact_div (BYTES_PER_RISCV_VECTOR, bytes);
+  return get_vector_mode (smode, m1_nunits);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
    This function is not only used by builtins, but also will be used by
    auto-vectorization in the future.  */
@@ -3099,9 +3147,9 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
       rtx ops[] = {dest, mask, merge, src1, src2};
       insn_code icode = code_for_pred (code, mode);
       if (needs_fp_rounding (code, mode))
-       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, ops, len);
       else
-       emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, ops, len);
     }
   else
     /* FIXME: Enable this case when we support it in the middle-end.  */
@@ -3267,4 +3315,36 @@ expand_gather_scatter (rtx *ops, bool is_load)
     }
 }

+/* Expand reduction operations.  */
+void
+expand_reduction (rtx_code code, rtx *ops, rtx init)
+{
+  machine_mode vmode = GET_MODE (ops[1]);
+  machine_mode m1_mode = get_m1_mode (vmode).require ();
+  machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
+
+  rtx m1_tmp = gen_reg_rtx (m1_mode);
+  rtx m1_mask = gen_scalar_move_mask (m1_mmode);
+  rtx m1_undef = RVV_VUNDEF (m1_mode);
+  rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
+  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
+
+  rtx m1_tmp2 = gen_reg_rtx (m1_mode);
+  rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
+
+  if (FLOAT_MODE_P (vmode) && code == PLUS)
+    {
+      insn_code icode
+       = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
+      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+  else
+    {
+      insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
+      emit_vlmax_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+
+  emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 586dc8e5379..97a9dad8a77 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
 }

 static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
+gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
+               rtx vl = NULL_RTX)
 {
   rtx new_pat;
   vl_vtype_info new_info = info;
@@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
   if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
     {
       rtx dest = get_vl (rinsn);
-      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
+      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
     }
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
     new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
@@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
       print_rtl_single (dump_file, PATTERN (rinsn));
     }

-  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  gcc_assert (change_p);

   if (dump_file)
     {
@@ -931,7 +933,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
 }

 static void
-change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
+change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
+                   rtx vl = NULL_RTX)
 {
   rtx_insn *rinsn;
   if (vector_config_insn_p (insn->rtl ()))
@@ -945,7 +948,7 @@ change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
       rinsn = PREV_INSN (insn->rtl ());
       gcc_assert (vector_config_insn_p (rinsn));
     }
-  rtx new_pat = gen_vsetvl_pat (rinsn, info);
+  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
   change_insn (rinsn, new_pat);
 }

@@ -3377,7 +3380,20 @@ pass_vsetvl::backward_demand_fusion (void)
                                       new_info))
                continue;

-             change_vsetvl_insn (new_info.get_insn (), new_info);
+             rtx vl = NULL_RTX;
+             /* Backward VLMAX VL:
+                  bb 3:
+                    vsetivli zero, 1 ... -> vsetvli t1, zero
+                    vmv.s.x
+                  bb 5:
+                    vsetvli t1, zero ... -> to be elided.
+                    vlse16.v
+
+                  We should forward "t1".  */
+             if (!block_info.reaching_out.has_avl_reg ()
+                 && vlmax_avl_p (new_info.get_avl ()))
+               vl = get_vl (prop.get_insn ()->rtl ());
+             change_vsetvl_insn (new_info.get_insn (), new_info, vl);
              if (block_info.local_dem == block_info.reaching_out)
                block_info.local_dem = new_info;
              block_info.reaching_out = new_info;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
new file mode 100644
index 00000000000..0d543af13ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define DEF_REDUC_PLUS(TYPE)                   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_plus_##TYPE (TYPE *a, int n)             \
+{                                              \
+  TYPE r = 0;                                  \
+  for (int i = 0; i < n; ++i)                  \
+    r += a[i];                                 \
+  return r;                                    \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r = a[i] CMP_OP r ? a[i] : r;              \
+  return r;                                    \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)  \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r BIT_OP a[i];                             \
+  return r;                                    \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
new file mode 100644
index 00000000000..136a8a378bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
@@ -0,0 +1,129 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define NUM_ELEMS(TYPE) (1024 / sizeof (TYPE))
+
+#define DEF_REDUC_PLUS(TYPE)                                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],       \
+                  TYPE *restrict r, int n)                     \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = 0;                                                        \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] += a[i][j];                                       \
+    }                                                          \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],   \
+                      TYPE *restrict r, int n)                 \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] = a[i][j] CMP_OP r[i] ? a[i][j] : r[i];           \
+    }                                                          \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)                    \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],       \
+                   TYPE *restrict r, int n)                    \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)                        \
+        r[i] BIT_OP a[i][j];                                   \
+    }                                                          \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
new file mode 100644
index 00000000000..c3638344f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
new file mode 100644
index 00000000000..f00a12826c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
new file mode 100644
index 00000000000..b500f857598
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-1.c"
+
+#define NUM_ELEMS(TYPE) (73 + sizeof (TYPE))
+
+#define INIT_VECTOR(TYPE)                              \
+  TYPE a[NUM_ELEMS (TYPE) + 1];                                \
+  for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)       \
+    {                                                  \
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);         \
+      asm volatile ("" ::: "memory");                  \
+    }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_VECTOR (TYPE);                                        \
+    TYPE r1 = reduc_plus_##TYPE (a, NUM_ELEMS (TYPE)); \
+    volatile TYPE r2 = 0;                              \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)         \
+      r2 += a[i];                                      \
+    if (r1 != r2)                                      \
+      __builtin_abort ();                              \
+  }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                  \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 = a[i] CMP_OP r2 ? a[i] : r2;                         \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)                 \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 BIT_OP a[i];                                          \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
new file mode 100644
index 00000000000..3c2f62557b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "reduc-2.c"
+
+#define NROWS 53
+
+/* -ffast-math fuzz for PLUS.  */
+#define CMP__Float16(X, Y) ((X) >= (Y) * 0.875 && (X) <= (Y) * 1.125)
+#define CMP_float(X, Y) ((X) == (Y))
+#define CMP_double(X, Y) ((X) == (Y))
+#define CMP_int8_t(X, Y) ((X) == (Y))
+#define CMP_int16_t(X, Y) ((X) == (Y))
+#define CMP_int32_t(X, Y) ((X) == (Y))
+#define CMP_int64_t(X, Y) ((X) == (Y))
+#define CMP_uint8_t(X, Y) ((X) == (Y))
+#define CMP_uint16_t(X, Y) ((X) == (Y))
+#define CMP_uint32_t(X, Y) ((X) == (Y))
+#define CMP_uint64_t(X, Y) ((X) == (Y))
+
+#define INIT_MATRIX(TYPE)                              \
+  TYPE mat[NROWS][NUM_ELEMS (TYPE)];                   \
+  TYPE r[NROWS];                                       \
+  for (int i = 0; i < NROWS; i++)                      \
+    for (int j = 0; j < NUM_ELEMS (TYPE); j++)         \
+      {                                                        \
+       mat[i][j] = i + (j * 2) * (j & 1 ? 1 : -1);     \
+       asm volatile ("" ::: "memory");                 \
+      }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_plus_##TYPE (mat, r, NROWS);                 \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = 0;                           \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 += mat[i][j];                              \
+       if (!CMP_##TYPE (r[i], r2))                     \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 = mat[i][j] CMP_OP r2 ? mat[i][j] : r2;    \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)         \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 BIT_OP mat[i][j];                          \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
new file mode 100644
index 00000000000..d1b22c0d69a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-3.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0) != 0
+      || add_loop (x, 11) != 572
+      || add_loop (x, 0x100) != 22016
+      || add_loop (x, 0xfff) != 20480
+      || max_loop (x, 0) != 0
+      || max_loop (x, 11) != 132
+      || max_loop (x, 0x100) != 65280
+      || max_loop (x, 0xfff) != 65504
+      || or_loop (x, 0) != 0
+      || or_loop (x, 11) != 0xfe
+      || or_loop (x, 0x80) != 0x7ffe
+      || or_loop (x, 0xb4) != 0x7ffe
+      || or_loop (x, 0xb5) != 0xfffe
+      || eor_loop (x, 0) != 0
+      || eor_loop (x, 11) != 0xe8
+      || eor_loop (x, 0x100) != 0xcf00
+      || eor_loop (x, 0xfff) != 0xa000)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0) != 65535
+      || min_loop (x, 11) != 65403
+      || min_loop (x, 0x100) != 255
+      || min_loop (x, 0xfff) != 31
+      || and_loop (x, 0) != 0xffff
+      || and_loop (x, 11) != 0xff01
+      || and_loop (x, 0x80) != 0x8001
+      || and_loop (x, 0xb4) != 0x8001
+      || and_loop (x, 0xb5) != 1)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
new file mode 100644
index 00000000000..c17e125a763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-4.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0, 10) != 10
+      || add_loop (x, 11, 42) != 614
+      || add_loop (x, 0x100, 84) != 22100
+      || add_loop (x, 0xfff, 20) != 20500
+      || max_loop (x, 0, 10) != 10
+      || max_loop (x, 11, 131) != 132
+      || max_loop (x, 11, 133) != 133
+      || max_loop (x, 0x100, 65279) != 65280
+      || max_loop (x, 0x100, 65281) != 65281
+      || max_loop (x, 0xfff, 65503) != 65504
+      || max_loop (x, 0xfff, 65505) != 65505
+      || or_loop (x, 0, 0x71) != 0x71
+      || or_loop (x, 11, 0) != 0xfe
+      || or_loop (x, 11, 0xb3c) != 0xbfe
+      || or_loop (x, 0x80, 0) != 0x7ffe
+      || or_loop (x, 0x80, 1) != 0x7fff
+      || or_loop (x, 0xb4, 0) != 0x7ffe
+      || or_loop (x, 0xb4, 1) != 0x7fff
+      || or_loop (x, 0xb5, 0) != 0xfffe
+      || or_loop (x, 0xb5, 1) != 0xffff
+      || eor_loop (x, 0, 0x3e) != 0x3e
+      || eor_loop (x, 11, 0) != 0xe8
+      || eor_loop (x, 11, 0x1ff) != 0x117
+      || eor_loop (x, 0x100, 0) != 0xcf00
+      || eor_loop (x, 0x100, 0xeee) != 0xc1ee
+      || eor_loop (x, 0xfff, 0) != 0xa000
+      || eor_loop (x, 0xfff, 0x8888) != 0x2888)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0, 10000) != 10000
+      || min_loop (x, 11, 65404) != 65403
+      || min_loop (x, 11, 65402) != 65402
+      || min_loop (x, 0x100, 256) != 255
+      || min_loop (x, 0x100, 254) != 254
+      || min_loop (x, 0xfff, 32) != 31
+      || min_loop (x, 0xfff, 30) != 30
+      || and_loop (x, 0, 0x1234) != 0x1234
+      || and_loop (x, 11, 0xffff) != 0xff01
+      || and_loop (x, 11, 0xcdef) != 0xcd01
+      || and_loop (x, 0x80, 0xffff) != 0x8001
+      || and_loop (x, 0x80, 0xfffe) != 0x8000
+      || and_loop (x, 0xb4, 0xffff) != 0x8001
+      || and_loop (x, 0xb4, 0xfffe) != 0x8000
+      || and_loop (x, 0xb5, 0xffff) != 1
+      || and_loop (x, 0xb5, 0xfffe) != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 19589fa9638..532c17c4065 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -71,6 +71,8 @@ foreach op $AUTOVEC_TEST_OPTS {
     "" "$op"
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
     "" "$op"
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
+    "" "$op"
 }

 # widening operation only test on LMUL < 8
-- 
2.36.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-14 12:38 ` Kito Cheng
  2023-07-14 12:47   ` 钟居哲
@ 2023-07-14 12:51   ` 钟居哲
  2023-07-16  2:18     ` Li, Pan2
  1 sibling, 1 reply; 7+ messages in thread
From: 钟居哲 @ 2023-07-14 12:51 UTC (permalink / raw)
  To: kito.cheng; +Cc: gcc-patches, kito.cheng, palmer, rdapp.gcc, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 47161 bytes --]

So to be safe, I think it should be backport to GCC 13 even though I didn't have a intrinsic testcase to reproduce it.



juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


<juzhe.zhong@rivai.ai> 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
    res &= x[i];
  return res;
}

ASM:
and_loop:
        ble     a1,zero,.L4
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.i v1,-1
.L3:
        vsetvli a5,a1,e32,m1,tu,ma       ------------> MUST BE "TU".
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vand.vv v1,v2,v1
        bne     a1,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.i v2,-1
        vsetvli a3,zero,e32,m1,ta,ma
        vredand.vs      v1,v1,v2
        vmv.x.s a5,v1
        and     a0,a2,a5
        ret
.L4:
        mv      a0,a2
        ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

        * config/riscv/autovec.md (reduc_plus_scal_<mode>): New pattern.
        (reduc_smax_scal_<mode>): Ditto.
        (reduc_umax_scal_<mode>): Ditto.
        (reduc_smin_scal_<mode>): Ditto.
        (reduc_umin_scal_<mode>): Ditto.
        (reduc_and_scal_<mode>): Ditto.
        (reduc_ior_scal_<mode>): Ditto.
        (reduc_xor_scal_<mode>): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): New enum.
        (emit_nonvlmax_integer_move_insn): Add reduction.
        (expand_reduction): New function.
        * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
        (emit_vlmax_fp_reduction_insn): Ditto.
        (get_m1_mode): Ditto.
        (expand_cond_len_binop): Fix name.
        (expand_reduction): New function.
        * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
        (change_insn): Ditto.
        (change_vsetvl_insn): Ditto.
        (pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md                   | 138 ++++++++++++++++++
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   |  84 ++++++++++-
 gcc/config/riscv/riscv-vsetvl.cc              |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c         | 118 +++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-2.c         | 129 ++++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-3.c         |  65 +++++++++
 .../riscv/rvv/autovec/reduc/reduc-4.c         |  59 ++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-1.c     |  56 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c     |  79 ++++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c     |  49 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c     |  66 +++++++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (<CODE>, operands);
   DONE;
 })
+
+;; =========================================================================
+;; == Reductions
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, min);
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (UMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_and_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_ior_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_xor_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfredusum.vs
+;; - vfredmax.vs
+;; - vfredmin.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, true);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, f);
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, false);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, f);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 1a622c58f4b..f19b7fc2b8d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -197,6 +197,7 @@ enum insn_type
   RVV_COMPRESS_OP = 4,
   RVV_GATHER_M_OP = 5,
   RVV_SCATTER_M_OP = 4,
+  RVV_REDUCTION_OP = 3,
 };
 enum vlmul_type
 {
@@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
 void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
+//void emit_vlmax_reduction_insn (unsigned, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -280,6 +282,7 @@ bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void expand_cond_len_binop (rtx_code, rtx *);
+void expand_reduction (rtx_code, rtx *, rtx);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
                          bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 90da63889bd..ccf0f6ff852 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1137,6 +1137,43 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
   e.emit_insn ((enum insn_code) icode, ops);
 }

+/* Emit reduction instruction.  */
+static void
+emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* Emit reduction instruction.  */
+static void
+emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_rounding_mode (FRM_DYN);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* Emit merge instruction.  */

 static machine_mode
@@ -1629,6 +1666,17 @@ get_mask_mode (machine_mode mode)
   return get_vector_mode (BImode, GET_MODE_NUNITS (mode));
 }

+/* Return the appropriate M1 mode for MODE.  */
+
+static opt_machine_mode
+get_m1_mode (machine_mode mode)
+{
+  scalar_mode smode = GET_MODE_INNER (mode);
+  unsigned int bytes = GET_MODE_SIZE (smode);
+  poly_uint64 m1_nunits = exact_div (BYTES_PER_RISCV_VECTOR, bytes);
+  return get_vector_mode (smode, m1_nunits);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
    This function is not only used by builtins, but also will be used by
    auto-vectorization in the future.  */
@@ -3099,9 +3147,9 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
       rtx ops[] = {dest, mask, merge, src1, src2};
       insn_code icode = code_for_pred (code, mode);
       if (needs_fp_rounding (code, mode))
-       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, ops, len);
       else
-       emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, ops, len);
     }
   else
     /* FIXME: Enable this case when we support it in the middle-end.  */
@@ -3267,4 +3315,36 @@ expand_gather_scatter (rtx *ops, bool is_load)
     }
 }

+/* Expand reduction operations.  */
+void
+expand_reduction (rtx_code code, rtx *ops, rtx init)
+{
+  machine_mode vmode = GET_MODE (ops[1]);
+  machine_mode m1_mode = get_m1_mode (vmode).require ();
+  machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
+
+  rtx m1_tmp = gen_reg_rtx (m1_mode);
+  rtx m1_mask = gen_scalar_move_mask (m1_mmode);
+  rtx m1_undef = RVV_VUNDEF (m1_mode);
+  rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
+  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
+
+  rtx m1_tmp2 = gen_reg_rtx (m1_mode);
+  rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
+
+  if (FLOAT_MODE_P (vmode) && code == PLUS)
+    {
+      insn_code icode
+       = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
+      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+  else
+    {
+      insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
+      emit_vlmax_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+
+  emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 586dc8e5379..97a9dad8a77 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
 }

 static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
+gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
+               rtx vl = NULL_RTX)
 {
   rtx new_pat;
   vl_vtype_info new_info = info;
@@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
   if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
     {
       rtx dest = get_vl (rinsn);
-      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
+      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
     }
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
     new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
@@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
       print_rtl_single (dump_file, PATTERN (rinsn));
     }

-  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  gcc_assert (change_p);

   if (dump_file)
     {
@@ -931,7 +933,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
 }

 static void
-change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
+change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
+                   rtx vl = NULL_RTX)
 {
   rtx_insn *rinsn;
   if (vector_config_insn_p (insn->rtl ()))
@@ -945,7 +948,7 @@ change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
       rinsn = PREV_INSN (insn->rtl ());
       gcc_assert (vector_config_insn_p (rinsn));
     }
-  rtx new_pat = gen_vsetvl_pat (rinsn, info);
+  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
   change_insn (rinsn, new_pat);
 }

@@ -3377,7 +3380,20 @@ pass_vsetvl::backward_demand_fusion (void)
                                       new_info))
                continue;

-             change_vsetvl_insn (new_info.get_insn (), new_info);
+             rtx vl = NULL_RTX;
+             /* Backward VLMAX VL:
+                  bb 3:
+                    vsetivli zero, 1 ... -> vsetvli t1, zero
+                    vmv.s.x
+                  bb 5:
+                    vsetvli t1, zero ... -> to be elided.
+                    vlse16.v
+
+                  We should forward "t1".  */
+             if (!block_info.reaching_out.has_avl_reg ()
+                 && vlmax_avl_p (new_info.get_avl ()))
+               vl = get_vl (prop.get_insn ()->rtl ());
+             change_vsetvl_insn (new_info.get_insn (), new_info, vl);
              if (block_info.local_dem == block_info.reaching_out)
                block_info.local_dem = new_info;
              block_info.reaching_out = new_info;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
new file mode 100644
index 00000000000..0d543af13ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define DEF_REDUC_PLUS(TYPE)                   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_plus_##TYPE (TYPE *a, int n)             \
+{                                              \
+  TYPE r = 0;                                  \
+  for (int i = 0; i < n; ++i)                  \
+    r += a[i];                                 \
+  return r;                                    \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r = a[i] CMP_OP r ? a[i] : r;              \
+  return r;                                    \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)  \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r BIT_OP a[i];                             \
+  return r;                                    \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
new file mode 100644
index 00000000000..136a8a378bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
@@ -0,0 +1,129 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define NUM_ELEMS(TYPE) (1024 / sizeof (TYPE))
+
+#define DEF_REDUC_PLUS(TYPE)                                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],       \
+                  TYPE *restrict r, int n)                     \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = 0;                                                        \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] += a[i][j];                                       \
+    }                                                          \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],   \
+                      TYPE *restrict r, int n)                 \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] = a[i][j] CMP_OP r[i] ? a[i][j] : r[i];           \
+    }                                                          \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)                    \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],       \
+                   TYPE *restrict r, int n)                    \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)                        \
+        r[i] BIT_OP a[i][j];                                   \
+    }                                                          \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
new file mode 100644
index 00000000000..c3638344f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
new file mode 100644
index 00000000000..f00a12826c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
new file mode 100644
index 00000000000..b500f857598
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-1.c"
+
+#define NUM_ELEMS(TYPE) (73 + sizeof (TYPE))
+
+#define INIT_VECTOR(TYPE)                              \
+  TYPE a[NUM_ELEMS (TYPE) + 1];                                \
+  for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)       \
+    {                                                  \
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);         \
+      asm volatile ("" ::: "memory");                  \
+    }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_VECTOR (TYPE);                                        \
+    TYPE r1 = reduc_plus_##TYPE (a, NUM_ELEMS (TYPE)); \
+    volatile TYPE r2 = 0;                              \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)         \
+      r2 += a[i];                                      \
+    if (r1 != r2)                                      \
+      __builtin_abort ();                              \
+  }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                  \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 = a[i] CMP_OP r2 ? a[i] : r2;                         \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)                 \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 BIT_OP a[i];                                          \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
new file mode 100644
index 00000000000..3c2f62557b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "reduc-2.c"
+
+#define NROWS 53
+
+/* -ffast-math fuzz for PLUS.  */
+#define CMP__Float16(X, Y) ((X) >= (Y) * 0.875 && (X) <= (Y) * 1.125)
+#define CMP_float(X, Y) ((X) == (Y))
+#define CMP_double(X, Y) ((X) == (Y))
+#define CMP_int8_t(X, Y) ((X) == (Y))
+#define CMP_int16_t(X, Y) ((X) == (Y))
+#define CMP_int32_t(X, Y) ((X) == (Y))
+#define CMP_int64_t(X, Y) ((X) == (Y))
+#define CMP_uint8_t(X, Y) ((X) == (Y))
+#define CMP_uint16_t(X, Y) ((X) == (Y))
+#define CMP_uint32_t(X, Y) ((X) == (Y))
+#define CMP_uint64_t(X, Y) ((X) == (Y))
+
+#define INIT_MATRIX(TYPE)                              \
+  TYPE mat[NROWS][NUM_ELEMS (TYPE)];                   \
+  TYPE r[NROWS];                                       \
+  for (int i = 0; i < NROWS; i++)                      \
+    for (int j = 0; j < NUM_ELEMS (TYPE); j++)         \
+      {                                                        \
+       mat[i][j] = i + (j * 2) * (j & 1 ? 1 : -1);     \
+       asm volatile ("" ::: "memory");                 \
+      }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_plus_##TYPE (mat, r, NROWS);                 \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = 0;                           \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 += mat[i][j];                              \
+       if (!CMP_##TYPE (r[i], r2))                     \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 = mat[i][j] CMP_OP r2 ? mat[i][j] : r2;    \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)         \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 BIT_OP mat[i][j];                          \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
new file mode 100644
index 00000000000..d1b22c0d69a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-3.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0) != 0
+      || add_loop (x, 11) != 572
+      || add_loop (x, 0x100) != 22016
+      || add_loop (x, 0xfff) != 20480
+      || max_loop (x, 0) != 0
+      || max_loop (x, 11) != 132
+      || max_loop (x, 0x100) != 65280
+      || max_loop (x, 0xfff) != 65504
+      || or_loop (x, 0) != 0
+      || or_loop (x, 11) != 0xfe
+      || or_loop (x, 0x80) != 0x7ffe
+      || or_loop (x, 0xb4) != 0x7ffe
+      || or_loop (x, 0xb5) != 0xfffe
+      || eor_loop (x, 0) != 0
+      || eor_loop (x, 11) != 0xe8
+      || eor_loop (x, 0x100) != 0xcf00
+      || eor_loop (x, 0xfff) != 0xa000)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0) != 65535
+      || min_loop (x, 11) != 65403
+      || min_loop (x, 0x100) != 255
+      || min_loop (x, 0xfff) != 31
+      || and_loop (x, 0) != 0xffff
+      || and_loop (x, 11) != 0xff01
+      || and_loop (x, 0x80) != 0x8001
+      || and_loop (x, 0xb4) != 0x8001
+      || and_loop (x, 0xb5) != 1)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
new file mode 100644
index 00000000000..c17e125a763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-4.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0, 10) != 10
+      || add_loop (x, 11, 42) != 614
+      || add_loop (x, 0x100, 84) != 22100
+      || add_loop (x, 0xfff, 20) != 20500
+      || max_loop (x, 0, 10) != 10
+      || max_loop (x, 11, 131) != 132
+      || max_loop (x, 11, 133) != 133
+      || max_loop (x, 0x100, 65279) != 65280
+      || max_loop (x, 0x100, 65281) != 65281
+      || max_loop (x, 0xfff, 65503) != 65504
+      || max_loop (x, 0xfff, 65505) != 65505
+      || or_loop (x, 0, 0x71) != 0x71
+      || or_loop (x, 11, 0) != 0xfe
+      || or_loop (x, 11, 0xb3c) != 0xbfe
+      || or_loop (x, 0x80, 0) != 0x7ffe
+      || or_loop (x, 0x80, 1) != 0x7fff
+      || or_loop (x, 0xb4, 0) != 0x7ffe
+      || or_loop (x, 0xb4, 1) != 0x7fff
+      || or_loop (x, 0xb5, 0) != 0xfffe
+      || or_loop (x, 0xb5, 1) != 0xffff
+      || eor_loop (x, 0, 0x3e) != 0x3e
+      || eor_loop (x, 11, 0) != 0xe8
+      || eor_loop (x, 11, 0x1ff) != 0x117
+      || eor_loop (x, 0x100, 0) != 0xcf00
+      || eor_loop (x, 0x100, 0xeee) != 0xc1ee
+      || eor_loop (x, 0xfff, 0) != 0xa000
+      || eor_loop (x, 0xfff, 0x8888) != 0x2888)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0, 10000) != 10000
+      || min_loop (x, 11, 65404) != 65403
+      || min_loop (x, 11, 65402) != 65402
+      || min_loop (x, 0x100, 256) != 255
+      || min_loop (x, 0x100, 254) != 254
+      || min_loop (x, 0xfff, 32) != 31
+      || min_loop (x, 0xfff, 30) != 30
+      || and_loop (x, 0, 0x1234) != 0x1234
+      || and_loop (x, 11, 0xffff) != 0xff01
+      || and_loop (x, 11, 0xcdef) != 0xcd01
+      || and_loop (x, 0x80, 0xffff) != 0x8001
+      || and_loop (x, 0x80, 0xfffe) != 0x8000
+      || and_loop (x, 0xb4, 0xffff) != 0x8001
+      || and_loop (x, 0xb4, 0xfffe) != 0x8000
+      || and_loop (x, 0xb5, 0xffff) != 1
+      || and_loop (x, 0xb5, 0xfffe) != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 19589fa9638..532c17c4065 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -71,6 +71,8 @@ foreach op $AUTOVEC_TEST_OPTS {
     "" "$op"
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
     "" "$op"
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
+    "" "$op"
 }

 # widening operation only test on LMUL < 8
-- 
2.36.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-14 12:51   ` 钟居哲
@ 2023-07-16  2:18     ` Li, Pan2
  0 siblings, 0 replies; 7+ messages in thread
From: Li, Pan2 @ 2023-07-16  2:18 UTC (permalink / raw)
  To: 钟居哲, kito.cheng
  Cc: gcc-patches, kito.cheng, palmer, rdapp.gcc, Jeff Law

File a separated PATCH target GCC 13 for this bug with rvv.exp and riscv.exp test passed. Unfortunately, it is not easy to reproduce this by Intrinsic API.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624574.html

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
Sent: Friday, July 14, 2023 8:51 PM
To: kito.cheng <kito.cheng@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; palmer <palmer@rivosinc.com>; rdapp.gcc <rdapp.gcc@gmail.com>; Jeff Law <jeffreyalaw@gmail.com>
Subject: Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

So to be safe, I think it should be backport to GCC 13 even though I didn't have a intrinsic testcase to reproduce it.



juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


<juzhe.zhong@rivai.ai> 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
    res &= x[i];
  return res;
}

ASM:
and_loop:
        ble     a1,zero,.L4
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.i v1,-1
.L3:
        vsetvli a5,a1,e32,m1,tu,ma       ------------> MUST BE "TU".
        slli    a4,a5,2
        sub     a1,a1,a5
        vle32.v v2,0(a0)
        add     a0,a0,a4
        vand.vv v1,v2,v1
        bne     a1,zero,.L3
        vsetivli        zero,1,e32,m1,ta,ma
        vmv.v.i v2,-1
        vsetvli a3,zero,e32,m1,ta,ma
        vredand.vs      v1,v1,v2
        vmv.x.s a5,v1
        and     a0,a2,a5
        ret
.L4:
        mv      a0,a2
        ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

        * config/riscv/autovec.md (reduc_plus_scal_<mode>): New pattern.
        (reduc_smax_scal_<mode>): Ditto.
        (reduc_umax_scal_<mode>): Ditto.
        (reduc_smin_scal_<mode>): Ditto.
        (reduc_umin_scal_<mode>): Ditto.
        (reduc_and_scal_<mode>): Ditto.
        (reduc_ior_scal_<mode>): Ditto.
        (reduc_xor_scal_<mode>): Ditto.
        * config/riscv/riscv-protos.h (enum insn_type): New enum.
        (emit_nonvlmax_integer_move_insn): Add reduction.
        (expand_reduction): New function.
        * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
        (emit_vlmax_fp_reduction_insn): Ditto.
        (get_m1_mode): Ditto.
        (expand_cond_len_binop): Fix name.
        (expand_reduction): New function.
        * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
        (change_insn): Ditto.
        (change_vsetvl_insn): Ditto.
        (pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
        * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md                   | 138 ++++++++++++++++++
 gcc/config/riscv/riscv-protos.h               |   3 +
 gcc/config/riscv/riscv-v.cc                   |  84 ++++++++++-
 gcc/config/riscv/riscv-vsetvl.cc              |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c         | 118 +++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-2.c         | 129 ++++++++++++++++
 .../riscv/rvv/autovec/reduc/reduc-3.c         |  65 +++++++++
 .../riscv/rvv/autovec/reduc/reduc-4.c         |  59 ++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-1.c     |  56 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c     |  79 ++++++++++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c     |  49 +++++++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c     |  66 +++++++++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp    |   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (<CODE>, operands);
   DONE;
 })
+
+;; =========================================================================
+;; == Reductions
+;; =========================================================================
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx min = immed_wide_int_const (wi::min_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, min);
+  DONE;
+})
+
+(define_expand "reduc_umax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (UMAX, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, SIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_umin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  int prec = GET_MODE_PRECISION (<VEL>mode);
+  rtx max = immed_wide_int_const (wi::max_value (prec, UNSIGNED), <VEL>mode);
+  riscv_vector::expand_reduction (UMIN, operands, max);
+  DONE;
+})
+
+(define_expand "reduc_and_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (AND, operands, CONSTM1_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_ior_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (IOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_xor_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VI 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (XOR, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Tree reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfredusum.vs
+;; - vfredmax.vs
+;; - vfredmin.vs
+;; -------------------------------------------------------------------------
+
+(define_expand "reduc_plus_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::expand_reduction (PLUS, operands, CONST0_RTX (<VEL>mode));
+  DONE;
+})
+
+(define_expand "reduc_smax_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, true);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMAX, operands, f);
+  DONE;
+})
+
+(define_expand "reduc_smin_scal_<mode>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:VF 1 "register_operand")]
+  "TARGET_VECTOR"
+{
+  REAL_VALUE_TYPE rv;
+  real_inf (&rv, false);
+  rtx f = const_double_from_real_value (rv, <VEL>mode);
+  riscv_vector::expand_reduction (SMIN, operands, f);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 1a622c58f4b..f19b7fc2b8d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -197,6 +197,7 @@ enum insn_type
   RVV_COMPRESS_OP = 4,
   RVV_GATHER_M_OP = 5,
   RVV_SCATTER_M_OP = 4,
+  RVV_REDUCTION_OP = 3,
 };
 enum vlmul_type
 {
@@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
 void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
 void emit_scalar_move_insn (unsigned, rtx *);
 void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
+//void emit_vlmax_reduction_insn (unsigned, rtx *);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
@@ -280,6 +282,7 @@ bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void expand_cond_len_binop (rtx_code, rtx *);
+void expand_reduction (rtx_code, rtx *, rtx);
 #endif
 bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
                          bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 90da63889bd..ccf0f6ff852 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1137,6 +1137,43 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
   e.emit_insn ((enum insn_code) icode, ops);
 }

+/* Emit reduction instruction.  */
+static void
+emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
+/* Emit reduction instruction.  */
+static void
+emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+  insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+                                         /* HAS_DEST_P */ true,
+                                         /* FULLY_UNMASKED_P */ true,
+                                         /* USE_REAL_MERGE_P */ false,
+                                         /* HAS_AVL_P */ true,
+                                         /* VLMAX_P */ true, dest_mode,
+                                         mask_mode);
+
+  e.set_policy (TAIL_ANY);
+  e.set_rounding_mode (FRM_DYN);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+
 /* Emit merge instruction.  */

 static machine_mode
@@ -1629,6 +1666,17 @@ get_mask_mode (machine_mode mode)
   return get_vector_mode (BImode, GET_MODE_NUNITS (mode));
 }

+/* Return the appropriate M1 mode for MODE.  */
+
+static opt_machine_mode
+get_m1_mode (machine_mode mode)
+{
+  scalar_mode smode = GET_MODE_INNER (mode);
+  unsigned int bytes = GET_MODE_SIZE (smode);
+  poly_uint64 m1_nunits = exact_div (BYTES_PER_RISCV_VECTOR, bytes);
+  return get_vector_mode (smode, m1_nunits);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
    This function is not only used by builtins, but also will be used by
    auto-vectorization in the future.  */
@@ -3099,9 +3147,9 @@ expand_cond_len_binop (rtx_code code, rtx *ops)
       rtx ops[] = {dest, mask, merge, src1, src2};
       insn_code icode = code_for_pred (code, mode);
       if (needs_fp_rounding (code, mode))
-       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_fp_tu_insn (icode, RVV_BINOP_TU, ops, len);
       else
-       emit_nonvlmax_tu_insn (icode, RVV_BINOP_MU, ops, len);
+       emit_nonvlmax_tu_insn (icode, RVV_BINOP_TU, ops, len);
     }
   else
     /* FIXME: Enable this case when we support it in the middle-end.  */
@@ -3267,4 +3315,36 @@ expand_gather_scatter (rtx *ops, bool is_load)
     }
 }

+/* Expand reduction operations.  */
+void
+expand_reduction (rtx_code code, rtx *ops, rtx init)
+{
+  machine_mode vmode = GET_MODE (ops[1]);
+  machine_mode m1_mode = get_m1_mode (vmode).require ();
+  machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
+
+  rtx m1_tmp = gen_reg_rtx (m1_mode);
+  rtx m1_mask = gen_scalar_move_mask (m1_mmode);
+  rtx m1_undef = RVV_VUNDEF (m1_mode);
+  rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
+  emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
+
+  rtx m1_tmp2 = gen_reg_rtx (m1_mode);
+  rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
+
+  if (FLOAT_MODE_P (vmode) && code == PLUS)
+    {
+      insn_code icode
+       = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
+      emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+  else
+    {
+      insn_code icode = code_for_pred_reduc (code, vmode, m1_mode);
+      emit_vlmax_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+    }
+
+  emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2));
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 586dc8e5379..97a9dad8a77 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
 }

 static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
+gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
+               rtx vl = NULL_RTX)
 {
   rtx new_pat;
   vl_vtype_info new_info = info;
@@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
   if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
     {
       rtx dest = get_vl (rinsn);
-      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
+      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
     }
   else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
     new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
@@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
       print_rtl_single (dump_file, PATTERN (rinsn));
     }

-  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
+  gcc_assert (change_p);

   if (dump_file)
     {
@@ -931,7 +933,8 @@ change_insn (function_info *ssa, insn_change change, insn_info *insn,
 }

 static void
-change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
+change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info,
+                   rtx vl = NULL_RTX)
 {
   rtx_insn *rinsn;
   if (vector_config_insn_p (insn->rtl ()))
@@ -945,7 +948,7 @@ change_vsetvl_insn (const insn_info *insn, const vector_insn_info &info)
       rinsn = PREV_INSN (insn->rtl ());
       gcc_assert (vector_config_insn_p (rinsn));
     }
-  rtx new_pat = gen_vsetvl_pat (rinsn, info);
+  rtx new_pat = gen_vsetvl_pat (rinsn, info, vl);
   change_insn (rinsn, new_pat);
 }

@@ -3377,7 +3380,20 @@ pass_vsetvl::backward_demand_fusion (void)
                                       new_info))
                continue;

-             change_vsetvl_insn (new_info.get_insn (), new_info);
+             rtx vl = NULL_RTX;
+             /* Backward VLMAX VL:
+                  bb 3:
+                    vsetivli zero, 1 ... -> vsetvli t1, zero
+                    vmv.s.x
+                  bb 5:
+                    vsetvli t1, zero ... -> to be elided.
+                    vlse16.v
+
+                  We should forward "t1".  */
+             if (!block_info.reaching_out.has_avl_reg ()
+                 && vlmax_avl_p (new_info.get_avl ()))
+               vl = get_vl (prop.get_insn ()->rtl ());
+             change_vsetvl_insn (new_info.get_insn (), new_info, vl);
              if (block_info.local_dem == block_info.reaching_out)
                block_info.local_dem = new_info;
              block_info.reaching_out = new_info;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
new file mode 100644
index 00000000000..0d543af13ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
@@ -0,0 +1,118 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define DEF_REDUC_PLUS(TYPE)                   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_plus_##TYPE (TYPE *a, int n)             \
+{                                              \
+  TYPE r = 0;                                  \
+  for (int i = 0; i < n; ++i)                  \
+    r += a[i];                                 \
+  return r;                                    \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)   \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r = a[i] CMP_OP r ? a[i] : r;              \
+  return r;                                    \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)  \
+TYPE __attribute__ ((noinline, noclone))       \
+reduc_##NAME##_##TYPE (TYPE *a, int n)         \
+{                                              \
+  TYPE r = 13;                                 \
+  for (int i = 0; i < n; ++i)                  \
+    r BIT_OP a[i];                             \
+  return r;                                    \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
new file mode 100644
index 00000000000..136a8a378bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
@@ -0,0 +1,129 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define NUM_ELEMS(TYPE) (1024 / sizeof (TYPE))
+
+#define DEF_REDUC_PLUS(TYPE)                                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],       \
+                  TYPE *restrict r, int n)                     \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = 0;                                                        \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] += a[i][j];                                       \
+    }                                                          \
+}
+
+#define TEST_PLUS(T)                           \
+  T (int8_t)                                   \
+  T (int16_t)                                  \
+  T (int32_t)                                  \
+  T (int64_t)                                  \
+  T (uint8_t)                                  \
+  T (uint16_t)                                 \
+  T (uint32_t)                                 \
+  T (uint64_t)                                 \
+  T (_Float16)                                 \
+  T (float)                                    \
+  T (double)
+
+TEST_PLUS (DEF_REDUC_PLUS)
+
+#define DEF_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                   \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)],   \
+                      TYPE *restrict r, int n)                 \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS (TYPE); j++)               \
+        r[i] = a[i][j] CMP_OP r[i] ? a[i][j] : r[i];           \
+    }                                                          \
+}
+
+#define TEST_MAXMIN(T)                         \
+  T (int8_t, max, >)                           \
+  T (int16_t, max, >)                          \
+  T (int32_t, max, >)                          \
+  T (int64_t, max, >)                          \
+  T (uint8_t, max, >)                          \
+  T (uint16_t, max, >)                         \
+  T (uint32_t, max, >)                         \
+  T (uint64_t, max, >)                         \
+  T (_Float16, max, >)                         \
+  T (float, max, >)                            \
+  T (double, max, >)                           \
+                                               \
+  T (int8_t, min, <)                           \
+  T (int16_t, min, <)                          \
+  T (int32_t, min, <)                          \
+  T (int64_t, min, <)                          \
+  T (uint8_t, min, <)                          \
+  T (uint16_t, min, <)                         \
+  T (uint32_t, min, <)                         \
+  T (uint64_t, min, <)                         \
+  T (_Float16, min, <)                         \
+  T (float, min, <)                            \
+  T (double, min, <)
+
+TEST_MAXMIN (DEF_REDUC_MAXMIN)
+
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)                    \
+void __attribute__ ((noinline, noclone))                       \
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],       \
+                   TYPE *restrict r, int n)                    \
+{                                                              \
+  for (int i = 0; i < n; i++)                                  \
+    {                                                          \
+      r[i] = a[i][0];                                          \
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)                        \
+        r[i] BIT_OP a[i][j];                                   \
+    }                                                          \
+}
+
+#define TEST_BITWISE(T)                                \
+  T (int8_t, and, &=)                          \
+  T (int16_t, and, &=)                         \
+  T (int32_t, and, &=)                         \
+  T (int64_t, and, &=)                         \
+  T (uint8_t, and, &=)                         \
+  T (uint16_t, and, &=)                                \
+  T (uint32_t, and, &=)                                \
+  T (uint64_t, and, &=)                                \
+                                               \
+  T (int8_t, ior, |=)                          \
+  T (int16_t, ior, |=)                         \
+  T (int32_t, ior, |=)                         \
+  T (int64_t, ior, |=)                         \
+  T (uint8_t, ior, |=)                         \
+  T (uint16_t, ior, |=)                                \
+  T (uint32_t, ior, |=)                                \
+  T (uint64_t, ior, |=)                                \
+                                               \
+  T (int8_t, xor, ^=)                          \
+  T (int16_t, xor, ^=)                         \
+  T (int32_t, xor, ^=)                         \
+  T (int64_t, xor, ^=)                         \
+  T (uint8_t, xor, ^=)                         \
+  T (uint16_t, xor, ^=)                                \
+  T (uint32_t, xor, ^=)                                \
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 8 } } */
+/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
new file mode 100644
index 00000000000..c3638344f80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n)
+{
+  unsigned short res = ~0;
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n)
+{
+  unsigned short res = 0;
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
new file mode 100644
index 00000000000..f00a12826c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+unsigned short __attribute__((noipa))
+add_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res += x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+min_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res < x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+max_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res = res > x[i] ? res : x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+and_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res &= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+or_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res |= x[i];
+  return res;
+}
+
+unsigned short __attribute__((noipa))
+eor_loop (unsigned short *x, int n, unsigned short res)
+{
+  for (int i = 0; i < n; ++i)
+    res ^= x[i];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {vredsum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredmaxu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredminu\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredand\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vredxor\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
new file mode 100644
index 00000000000..b500f857598
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
@@ -0,0 +1,56 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-1.c"
+
+#define NUM_ELEMS(TYPE) (73 + sizeof (TYPE))
+
+#define INIT_VECTOR(TYPE)                              \
+  TYPE a[NUM_ELEMS (TYPE) + 1];                                \
+  for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)       \
+    {                                                  \
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);         \
+      asm volatile ("" ::: "memory");                  \
+    }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_VECTOR (TYPE);                                        \
+    TYPE r1 = reduc_plus_##TYPE (a, NUM_ELEMS (TYPE)); \
+    volatile TYPE r2 = 0;                              \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)         \
+      r2 += a[i];                                      \
+    if (r1 != r2)                                      \
+      __builtin_abort ();                              \
+  }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)                  \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 = a[i] CMP_OP r2 ? a[i] : r2;                         \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)                 \
+  {                                                            \
+    INIT_VECTOR (TYPE);                                                \
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));     \
+    volatile TYPE r2 = 13;                                     \
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)                 \
+      r2 BIT_OP a[i];                                          \
+    if (r1 != r2)                                              \
+      __builtin_abort ();                                      \
+  }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
new file mode 100644
index 00000000000..3c2f62557b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
@@ -0,0 +1,79 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include "reduc-2.c"
+
+#define NROWS 53
+
+/* -ffast-math fuzz for PLUS.  */
+#define CMP__Float16(X, Y) ((X) >= (Y) * 0.875 && (X) <= (Y) * 1.125)
+#define CMP_float(X, Y) ((X) == (Y))
+#define CMP_double(X, Y) ((X) == (Y))
+#define CMP_int8_t(X, Y) ((X) == (Y))
+#define CMP_int16_t(X, Y) ((X) == (Y))
+#define CMP_int32_t(X, Y) ((X) == (Y))
+#define CMP_int64_t(X, Y) ((X) == (Y))
+#define CMP_uint8_t(X, Y) ((X) == (Y))
+#define CMP_uint16_t(X, Y) ((X) == (Y))
+#define CMP_uint32_t(X, Y) ((X) == (Y))
+#define CMP_uint64_t(X, Y) ((X) == (Y))
+
+#define INIT_MATRIX(TYPE)                              \
+  TYPE mat[NROWS][NUM_ELEMS (TYPE)];                   \
+  TYPE r[NROWS];                                       \
+  for (int i = 0; i < NROWS; i++)                      \
+    for (int j = 0; j < NUM_ELEMS (TYPE); j++)         \
+      {                                                        \
+       mat[i][j] = i + (j * 2) * (j & 1 ? 1 : -1);     \
+       asm volatile ("" ::: "memory");                 \
+      }
+
+#define TEST_REDUC_PLUS(TYPE)                          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_plus_##TYPE (mat, r, NROWS);                 \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = 0;                           \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 += mat[i][j];                              \
+       if (!CMP_##TYPE (r[i], r2))                     \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_MAXMIN(TYPE, NAME, CMP_OP)          \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 = mat[i][j] CMP_OP r2 ? mat[i][j] : r2;    \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)         \
+  {                                                    \
+    INIT_MATRIX (TYPE);                                        \
+    reduc_##NAME##_##TYPE (mat, r, NROWS);             \
+    for (int i = 0; i < NROWS; i++)                    \
+      {                                                        \
+       volatile TYPE r2 = mat[i][0];                   \
+       for (int j = 0; j < NUM_ELEMS (TYPE); ++j)      \
+         r2 BIT_OP mat[i][j];                          \
+       if (r[i] != r2)                                 \
+         __builtin_abort ();                           \
+      }                                                        \
+    }
+
+int main ()
+{
+  TEST_PLUS (TEST_REDUC_PLUS)
+  TEST_MAXMIN (TEST_REDUC_MAXMIN)
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
new file mode 100644
index 00000000000..d1b22c0d69a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
@@ -0,0 +1,49 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-3.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0) != 0
+      || add_loop (x, 11) != 572
+      || add_loop (x, 0x100) != 22016
+      || add_loop (x, 0xfff) != 20480
+      || max_loop (x, 0) != 0
+      || max_loop (x, 11) != 132
+      || max_loop (x, 0x100) != 65280
+      || max_loop (x, 0xfff) != 65504
+      || or_loop (x, 0) != 0
+      || or_loop (x, 11) != 0xfe
+      || or_loop (x, 0x80) != 0x7ffe
+      || or_loop (x, 0xb4) != 0x7ffe
+      || or_loop (x, 0xb5) != 0xfffe
+      || eor_loop (x, 0) != 0
+      || eor_loop (x, 11) != 0xe8
+      || eor_loop (x, 0x100) != 0xcf00
+      || eor_loop (x, 0xfff) != 0xa000)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0) != 65535
+      || min_loop (x, 11) != 65403
+      || min_loop (x, 0x100) != 255
+      || min_loop (x, 0xfff) != 31
+      || and_loop (x, 0) != 0xffff
+      || and_loop (x, 11) != 0xff01
+      || and_loop (x, 0x80) != 0x8001
+      || and_loop (x, 0xb4) != 0x8001
+      || and_loop (x, 0xb5) != 1)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
new file mode 100644
index 00000000000..c17e125a763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
@@ -0,0 +1,66 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math -fno-vect-cost-model" } */
+
+#include "reduc-4.c"
+
+#define N 0x1100
+
+int
+main (void)
+{
+  unsigned short x[N];
+  for (int i = 0; i < N; ++i)
+    x[i] = (i + 1) * (i + 2);
+
+  if (add_loop (x, 0, 10) != 10
+      || add_loop (x, 11, 42) != 614
+      || add_loop (x, 0x100, 84) != 22100
+      || add_loop (x, 0xfff, 20) != 20500
+      || max_loop (x, 0, 10) != 10
+      || max_loop (x, 11, 131) != 132
+      || max_loop (x, 11, 133) != 133
+      || max_loop (x, 0x100, 65279) != 65280
+      || max_loop (x, 0x100, 65281) != 65281
+      || max_loop (x, 0xfff, 65503) != 65504
+      || max_loop (x, 0xfff, 65505) != 65505
+      || or_loop (x, 0, 0x71) != 0x71
+      || or_loop (x, 11, 0) != 0xfe
+      || or_loop (x, 11, 0xb3c) != 0xbfe
+      || or_loop (x, 0x80, 0) != 0x7ffe
+      || or_loop (x, 0x80, 1) != 0x7fff
+      || or_loop (x, 0xb4, 0) != 0x7ffe
+      || or_loop (x, 0xb4, 1) != 0x7fff
+      || or_loop (x, 0xb5, 0) != 0xfffe
+      || or_loop (x, 0xb5, 1) != 0xffff
+      || eor_loop (x, 0, 0x3e) != 0x3e
+      || eor_loop (x, 11, 0) != 0xe8
+      || eor_loop (x, 11, 0x1ff) != 0x117
+      || eor_loop (x, 0x100, 0) != 0xcf00
+      || eor_loop (x, 0x100, 0xeee) != 0xc1ee
+      || eor_loop (x, 0xfff, 0) != 0xa000
+      || eor_loop (x, 0xfff, 0x8888) != 0x2888)
+    __builtin_abort ();
+
+  for (int i = 0; i < N; ++i)
+    x[i] = ~x[i];
+
+  if (min_loop (x, 0, 10000) != 10000
+      || min_loop (x, 11, 65404) != 65403
+      || min_loop (x, 11, 65402) != 65402
+      || min_loop (x, 0x100, 256) != 255
+      || min_loop (x, 0x100, 254) != 254
+      || min_loop (x, 0xfff, 32) != 31
+      || min_loop (x, 0xfff, 30) != 30
+      || and_loop (x, 0, 0x1234) != 0x1234
+      || and_loop (x, 11, 0xffff) != 0xff01
+      || and_loop (x, 11, 0xcdef) != 0xcd01
+      || and_loop (x, 0x80, 0xffff) != 0x8001
+      || and_loop (x, 0x80, 0xfffe) != 0x8000
+      || and_loop (x, 0xb4, 0xffff) != 0x8001
+      || and_loop (x, 0xb4, 0xfffe) != 0x8000
+      || and_loop (x, 0xb5, 0xffff) != 1
+      || and_loop (x, 0xb5, 0xfffe) != 0)
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 19589fa9638..532c17c4065 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -71,6 +71,8 @@ foreach op $AUTOVEC_TEST_OPTS {
     "" "$op"
   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/ternop/*.\[cS\]]] \
     "" "$op"
+  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/reduc/*.\[cS\]]] \
+    "" "$op"
 }

 # widening operation only test on LMUL < 8
-- 
2.36.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-14 12:30 [PATCH] RISC-V: Support non-SLP unordered reduction juzhe.zhong
  2023-07-14 12:38 ` Kito Cheng
@ 2023-07-17  7:00 ` Kito Cheng
  2023-07-17  8:22   ` juzhe.zhong
  1 sibling, 1 reply; 7+ messages in thread
From: Kito Cheng @ 2023-07-17  7:00 UTC (permalink / raw)
  To: juzhe.zhong; +Cc: gcc-patches, kito.cheng, palmer, rdapp.gcc, jeffreyalaw

> @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
>  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
>  void emit_scalar_move_insn (unsigned, rtx *);
>  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
> +//void emit_vlmax_reduction_insn (unsigned, rtx *);

Plz drop this.


> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index 586dc8e5379..97a9dad8a77 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
>  }
>
>  static rtx
> -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
> +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
> +               rtx vl = NULL_RTX)
>  {
>    rtx new_pat;
>    vl_vtype_info new_info = info;
> @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
>    if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
>      {
>        rtx dest = get_vl (rinsn);

rtx dest = vl ? vl : get_vl (rinsn);

> -      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
> +      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);

and keep dest here.

>      }
>    else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
>      new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);

Should we handle vl is non-null case in else-if and else case?
Add `assert (vl == NULL_RTX)` if not handle.

> @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
>        print_rtl_single (dump_file, PATTERN (rinsn));
>      }
>
> -  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> +  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> +  gcc_assert (change_p);

I think we could create a wrapper for validate_change to make sure
that return true, and also use that wrapper for all other call sites?

e.g.
validate_change_or_fail?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
  2023-07-17  7:00 ` Kito Cheng
@ 2023-07-17  8:22   ` juzhe.zhong
  0 siblings, 0 replies; 7+ messages in thread
From: juzhe.zhong @ 2023-07-17  8:22 UTC (permalink / raw)
  To: kito.cheng; +Cc: gcc-patches, Kito.cheng, palmer, Robin Dapp, jeffreyalaw

[-- Attachment #1: Type: text/plain, Size: 2705 bytes --]

Address comment.

V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624638.html 

I added:

+/* Change insn and Assert the change always happens.  */
+static void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
as you suggested.

Could you take a look again?


juzhe.zhong@rivai.ai
 
From: Kito Cheng
Date: 2023-07-17 15:00
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
> @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
>  void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
>  void emit_scalar_move_insn (unsigned, rtx *);
>  void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
> +//void emit_vlmax_reduction_insn (unsigned, rtx *);
 
Plz drop this.
 
 
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
> index 586dc8e5379..97a9dad8a77 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
>  }
>
>  static rtx
> -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
> +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info,
> +               rtx vl = NULL_RTX)
>  {
>    rtx new_pat;
>    vl_vtype_info new_info = info;
> @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info)
>    if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ()))
>      {
>        rtx dest = get_vl (rinsn);
 
rtx dest = vl ? vl : get_vl (rinsn);
 
> -      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest);
> +      new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest);
 
and keep dest here.
 
>      }
>    else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
>      new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
 
Should we handle vl is non-null case in else-if and else case?
Add `assert (vl == NULL_RTX)` if not handle.
 
> @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat)
>        print_rtl_single (dump_file, PATTERN (rinsn));
>      }
>
> -  validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> +  bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false);
> +  gcc_assert (change_p);
 
I think we could create a wrapper for validate_change to make sure
that return true, and also use that wrapper for all other call sites?
 
e.g.
validate_change_or_fail?
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-17  8:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14 12:30 [PATCH] RISC-V: Support non-SLP unordered reduction juzhe.zhong
2023-07-14 12:38 ` Kito Cheng
2023-07-14 12:47   ` 钟居哲
2023-07-14 12:51   ` 钟居哲
2023-07-16  2:18     ` Li, Pan2
2023-07-17  7:00 ` Kito Cheng
2023-07-17  8:22   ` juzhe.zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).