* [PATCH V2] RISC-V: Support in-order floating-point reduction
@ 2023-07-20 8:51 Juzhe-Zhong
2023-07-20 8:59 ` Kito Cheng
0 siblings, 1 reply; 4+ messages in thread
From: Juzhe-Zhong @ 2023-07-20 8:51 UTC (permalink / raw)
To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong
This patch is depending on:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
Consider this following case:
float foo (float *__restrict a, int n)
{
float result = 1.0;
for (int i = 0; i < n; i++)
result += a[i];
return result;
}
Compile with **NO** -ffast-math:
Before this patch:
<source>:4:21: missed: couldn't vectorize loop
<source>:1:7: missed: not vectorized: relevant phi not supported: result_14 = PHI <result_11(6), 1.0e+0(5)>
After this patch:
foo:
lui a5,%hi(.LC0)
flw fa0,%lo(.LC0)(a5)
ble a1,zero,.L4
.L3:
vsetvli a5,a1,e32,m1,ta,ma
vle32.v v1,0(a0)
slli a4,a5,2
sub a1,a1,a5
vfmv.s.f v2,fa0
add a0,a0,a4
vfredosum.vs v1,v1,v2 ----------> FOLD_LEFT_PLUS
vfmv.f.s fa0,v1
bne a1,zero,.L3
ret
.L4:
ret
gcc/ChangeLog:
* config/riscv/autovec.md (fold_left_plus_<mode>): New pattern.
(mask_len_fold_left_plus_<mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(enum reduction_type): Ditto.
(expand_reduction): Add in-order reduction.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New function.
(expand_reduction): Add in-order reduction.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New test.
---
gcc/config/riscv/autovec.md | 39 ++++++++++++++
gcc/config/riscv/riscv-protos.h | 13 ++++-
gcc/config/riscv/riscv-v.cc | 53 +++++++++++++++----
.../riscv/rvv/autovec/reduc/reduc_strict-1.c | 28 ++++++++++
.../riscv/rvv/autovec/reduc/reduc_strict-2.c | 26 +++++++++
.../riscv/rvv/autovec/reduc/reduc_strict-3.c | 18 +++++++
.../riscv/rvv/autovec/reduc/reduc_strict-4.c | 24 +++++++++
.../riscv/rvv/autovec/reduc/reduc_strict-5.c | 28 ++++++++++
.../riscv/rvv/autovec/reduc/reduc_strict-6.c | 18 +++++++
.../riscv/rvv/autovec/reduc/reduc_strict-7.c | 21 ++++++++
.../rvv/autovec/reduc/reduc_strict_run-1.c | 29 ++++++++++
.../rvv/autovec/reduc/reduc_strict_run-2.c | 31 +++++++++++
12 files changed, 317 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 00947207f3f..667a877d009 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1687,3 +1687,42 @@
riscv_vector::expand_reduction (SMIN, operands, f);
DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] Left-to-right reductions
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfredosum.vs
+;; -------------------------------------------------------------------------
+
+;; Unpredicated in-order FP reductions.
+(define_expand "fold_left_plus_<mode>"
+ [(match_operand:<VEL> 0 "register_operand")
+ (match_operand:<VEL> 1 "register_operand")
+ (match_operand:VF 2 "register_operand")]
+ "TARGET_VECTOR"
+{
+ riscv_vector::expand_reduction (PLUS, operands,
+ operands[1],
+ riscv_vector::reduction_type::FOLD_LEFT);
+ DONE;
+})
+
+;; Predicated in-order FP reductions.
+(define_expand "mask_len_fold_left_plus_<mode>"
+ [(match_operand:<VEL> 0 "register_operand")
+ (match_operand:<VEL> 1 "register_operand")
+ (match_operand:VF 2 "register_operand")
+ (match_operand:<VM> 3 "vector_mask_operand")
+ (match_operand 4 "autovec_length_operand")
+ (match_operand 5 "const_0_operand")]
+ "TARGET_VECTOR"
+{
+ if (rtx_equal_p (operands[4], const0_rtx))
+ emit_move_insn (operands[0], operands[1]);
+ else
+ riscv_vector::expand_reduction (PLUS, operands,
+ operands[1],
+ riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
+ DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 16fb8dabca0..c9520f689e2 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -199,6 +199,7 @@ enum insn_type
RVV_GATHER_M_OP = 5,
RVV_SCATTER_M_OP = 4,
RVV_REDUCTION_OP = 3,
+ RVV_REDUCTION_TU_OP = RVV_REDUCTION_OP + 2,
};
enum vlmul_type
{
@@ -247,7 +248,7 @@ void emit_vlmax_merge_insn (unsigned, int, rtx *);
void emit_vlmax_cmp_insn (unsigned, rtx *);
void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
-void emit_scalar_move_insn (unsigned, rtx *);
+void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
@@ -270,6 +271,13 @@ enum mask_policy
MASK_AGNOSTIC = 1,
MASK_ANY = 2,
};
+
+enum class reduction_type
+{
+ UNORDERED,
+ FOLD_LEFT,
+ MASK_LEN_FOLD_LEFT,
+};
enum tail_policy get_prefer_tail_policy ();
enum mask_policy get_prefer_mask_policy ();
rtx get_avl_type_rtx (enum avl_type);
@@ -282,7 +290,8 @@ bool has_vi_variant_p (rtx_code, rtx);
void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
void expand_cond_len_binop (rtx_code, rtx *);
-void expand_reduction (rtx_code, rtx *, rtx);
+void expand_reduction (rtx_code, rtx *, rtx,
+ reduction_type = reduction_type::UNORDERED);
#endif
bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
bool, void (*)(rtx *, rtx));
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 53088edf909..e338be151d3 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1023,11 +1023,11 @@ emit_nonvlmax_fp_tu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
/* Emit vmv.s.x instruction. */
void
-emit_scalar_move_insn (unsigned icode, rtx *ops)
+emit_scalar_move_insn (unsigned icode, rtx *ops, rtx len)
{
machine_mode dest_mode = GET_MODE (ops[0]);
machine_mode mask_mode = get_mask_mode (dest_mode).require ();
- insn_expander<RVV_INSN_OPERANDS_MAX> e (riscv_vector::RVV_SCALAR_MOV_OP,
+ insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SCALAR_MOV_OP,
/* HAS_DEST_P */ true,
/* FULLY_UNMASKED_P */ false,
/* USE_REAL_MERGE_P */ true,
@@ -1038,7 +1038,7 @@ emit_scalar_move_insn (unsigned icode, rtx *ops)
e.set_policy (TAIL_ANY);
e.set_policy (MASK_ANY);
- e.set_vl (CONST1_RTX (Pmode));
+ e.set_vl (len ? len : CONST1_RTX (Pmode));
e.emit_insn ((enum insn_code) icode, ops);
}
@@ -1196,6 +1196,26 @@ emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* Emit reduction instruction. */
+static void
+emit_nonvlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
+ insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
+ /* HAS_DEST_P */ true,
+ /* FULLY_UNMASKED_P */ false,
+ /* USE_REAL_MERGE_P */ true,
+ /* HAS_AVL_P */ true,
+ /* VLMAX_P */ false, dest_mode,
+ mask_mode);
+
+ e.set_policy (TAIL_ANY);
+ e.set_rounding_mode (FRM_DYN);
+ e.set_vl (vl);
+ e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* Emit merge instruction. */
static machine_mode
@@ -3343,9 +3363,10 @@ expand_cond_len_ternop (unsigned icode, rtx *ops)
/* Expand reduction operations. */
void
-expand_reduction (rtx_code code, rtx *ops, rtx init)
+expand_reduction (rtx_code code, rtx *ops, rtx init, reduction_type type)
{
- machine_mode vmode = GET_MODE (ops[1]);
+ rtx vector = type == reduction_type::UNORDERED ? ops[1] : ops[2];
+ machine_mode vmode = GET_MODE (vector);
machine_mode m1_mode = get_m1_mode (vmode).require ();
machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
@@ -3353,16 +3374,30 @@ expand_reduction (rtx_code code, rtx *ops, rtx init)
rtx m1_mask = gen_scalar_move_mask (m1_mmode);
rtx m1_undef = RVV_VUNDEF (m1_mode);
rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
- emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
+ rtx len = type == reduction_type::MASK_LEN_FOLD_LEFT ? ops[4] : NULL_RTX;
+ emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops,
+ len);
rtx m1_tmp2 = gen_reg_rtx (m1_mode);
- rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
+ rtx reduc_ops[] = {m1_tmp2, vector, m1_tmp};
if (FLOAT_MODE_P (vmode) && code == PLUS)
{
insn_code icode
- = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
- emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
+ = code_for_pred_reduc_plus (type == reduction_type::UNORDERED
+ ? UNSPEC_UNORDERED
+ : UNSPEC_ORDERED,
+ vmode, m1_mode);
+ if (type == reduction_type::MASK_LEN_FOLD_LEFT)
+ {
+ rtx mask = ops[3];
+ rtx mask_len_reduc_ops[]
+ = {m1_tmp2, mask, RVV_VUNDEF (m1_mode), vector, m1_tmp};
+ emit_nonvlmax_fp_reduction_insn (icode, RVV_REDUCTION_TU_OP,
+ mask_len_reduc_ops, len);
+ }
+ else
+ emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
}
else
{
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
new file mode 100644
index 00000000000..c293e9ae746
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+
+#define NUM_ELEMS(TYPE) ((int)(5 * (256 / sizeof (TYPE)) + 3))
+
+#define DEF_REDUC_PLUS(TYPE) \
+ TYPE __attribute__ ((noinline, noclone)) \
+ reduc_plus_##TYPE (TYPE *a, TYPE *b) \
+ { \
+ TYPE r = 0, q = 3; \
+ for (int i = 0; i < NUM_ELEMS (TYPE); i++) \
+ { \
+ r += a[i]; \
+ q -= b[i]; \
+ } \
+ return r * q; \
+ }
+
+#define TEST_ALL(T) \
+ T (_Float16) \
+ T (float) \
+ T (double)
+
+TEST_ALL (DEF_REDUC_PLUS)
+
+/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
new file mode 100644
index 00000000000..2e1e7ab674d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#define NUM_ELEMS(TYPE) ((int) (5 * (256 / sizeof (TYPE)) + 3))
+
+#define DEF_REDUC_PLUS(TYPE) \
+void __attribute__ ((noinline, noclone)) \
+reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)], \
+ TYPE *restrict r, int n) \
+{ \
+ for (int i = 0; i < n; i++) \
+ { \
+ r[i] = 0; \
+ for (int j = 0; j < NUM_ELEMS (TYPE); j++) \
+ r[i] += a[i][j]; \
+ } \
+}
+
+#define TEST_ALL(T) \
+ T (_Float16) \
+ T (float) \
+ T (double)
+
+TEST_ALL (DEF_REDUC_PLUS)
+
+/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
new file mode 100644
index 00000000000..f559d40e60f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+double mat[100][2];
+
+double
+slp_reduc_plus (int n)
+{
+ double tmp = 0.0;
+ for (int i = 0; i < n; i++)
+ {
+ tmp = tmp + mat[i][0];
+ tmp = tmp + mat[i][1];
+ }
+ return tmp;
+}
+
+/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
new file mode 100644
index 00000000000..428d371d9cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+double mat[100][8];
+
+double
+slp_reduc_plus (int n)
+{
+ double tmp = 0.0;
+ for (int i = 0; i < n; i++)
+ {
+ tmp = tmp + mat[i][0];
+ tmp = tmp + mat[i][1];
+ tmp = tmp + mat[i][2];
+ tmp = tmp + mat[i][3];
+ tmp = tmp + mat[i][4];
+ tmp = tmp + mat[i][5];
+ tmp = tmp + mat[i][6];
+ tmp = tmp + mat[i][7];
+ }
+ return tmp;
+}
+
+/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
new file mode 100644
index 00000000000..24add2291f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+double mat[100][12];
+
+double
+slp_reduc_plus (int n)
+{
+ double tmp = 0.0;
+ for (int i = 0; i < n; i++)
+ {
+ tmp = tmp + mat[i][0];
+ tmp = tmp + mat[i][1];
+ tmp = tmp + mat[i][2];
+ tmp = tmp + mat[i][3];
+ tmp = tmp + mat[i][4];
+ tmp = tmp + mat[i][5];
+ tmp = tmp + mat[i][6];
+ tmp = tmp + mat[i][7];
+ tmp = tmp + mat[i][8];
+ tmp = tmp + mat[i][9];
+ tmp = tmp + mat[i][10];
+ tmp = tmp + mat[i][11];
+ }
+ return tmp;
+}
+
+/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
new file mode 100644
index 00000000000..c1567b067ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+float
+double_reduc (float (*i)[16])
+{
+ float l = 0;
+
+#pragma GCC unroll 0
+ for (int a = 0; a < 8; a++)
+ for (int b = 0; b < 100; b++)
+ l += i[b][a];
+ return l;
+}
+
+/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
+/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
+/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
new file mode 100644
index 00000000000..f742a824bb2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */
+
+float
+double_reduc (float *i, float *j)
+{
+ float k = 0, l = 0;
+
+ for (int a = 0; a < 8; a++)
+ for (int b = 0; b < 100; b++)
+ {
+ k += i[b];
+ l += j[b];
+ }
+ return l * k;
+}
+
+/* { dg-final { scan-assembler-times {vle32\.v} 2 } } */
+/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */
+/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
+/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
new file mode 100644
index 00000000000..516be97e9eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
@@ -0,0 +1,29 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "reduc_strict-1.c"
+
+#define TEST_REDUC_PLUS(TYPE) \
+ { \
+ TYPE a[NUM_ELEMS (TYPE)]; \
+ TYPE b[NUM_ELEMS (TYPE)]; \
+ TYPE r = 0, q = 3; \
+ for (int i = 0; i < NUM_ELEMS (TYPE); i++) \
+ { \
+ a[i] = (i * 0.1) * (i & 1 ? 1 : -1); \
+ b[i] = (i * 0.3) * (i & 1 ? 1 : -1); \
+ r += a[i]; \
+ q -= b[i]; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ TYPE res = reduc_plus_##TYPE (a, b); \
+ if (res != r * q) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+ TEST_ALL (TEST_REDUC_PLUS);
+ return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
new file mode 100644
index 00000000000..0a4238d96f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "reduc_strict-2.c"
+
+#define NROWS 5
+
+#define TEST_REDUC_PLUS(TYPE) \
+ { \
+ TYPE a[NROWS][NUM_ELEMS (TYPE)]; \
+ TYPE r[NROWS]; \
+ TYPE expected[NROWS] = {}; \
+ for (int i = 0; i < NROWS; ++i) \
+ for (int j = 0; j < NUM_ELEMS (TYPE); ++j) \
+ { \
+ a[i][j] = (i * 0.1 + j * 0.6) * (j & 1 ? 1 : -1); \
+ expected[i] += a[i][j]; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ reduc_plus_##TYPE (a, r, NROWS); \
+ for (int i = 0; i < NROWS; ++i) \
+ if (r[i] != expected[i]) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+ TEST_ALL (TEST_REDUC_PLUS);
+ return 0;
+}
--
2.36.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V2] RISC-V: Support in-order floating-point reduction
2023-07-20 8:51 [PATCH V2] RISC-V: Support in-order floating-point reduction Juzhe-Zhong
@ 2023-07-20 8:59 ` Kito Cheng
2023-07-20 9:00 ` Robin Dapp
0 siblings, 1 reply; 4+ messages in thread
From: Kito Cheng @ 2023-07-20 8:59 UTC (permalink / raw)
To: Juzhe-Zhong; +Cc: gcc-patches, kito.cheng, jeffreyalaw, rdapp.gcc
LGTM, but I would like make sure Robin is OK too
On Thu, Jul 20, 2023 at 4:51 PM Juzhe-Zhong <juzhe.zhong@rivai.ai> wrote:
>
> This patch is depending on:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624995.html
>
> Consider this following case:
> float foo (float *__restrict a, int n)
> {
> float result = 1.0;
> for (int i = 0; i < n; i++)
> result += a[i];
> return result;
> }
>
> Compile with **NO** -ffast-math:
>
> Before this patch:
> <source>:4:21: missed: couldn't vectorize loop
> <source>:1:7: missed: not vectorized: relevant phi not supported: result_14 = PHI <result_11(6), 1.0e+0(5)>
>
> After this patch:
> foo:
> lui a5,%hi(.LC0)
> flw fa0,%lo(.LC0)(a5)
> ble a1,zero,.L4
> .L3:
> vsetvli a5,a1,e32,m1,ta,ma
> vle32.v v1,0(a0)
> slli a4,a5,2
> sub a1,a1,a5
> vfmv.s.f v2,fa0
> add a0,a0,a4
> vfredosum.vs v1,v1,v2 ----------> FOLD_LEFT_PLUS
> vfmv.f.s fa0,v1
> bne a1,zero,.L3
> ret
> .L4:
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (fold_left_plus_<mode>): New pattern.
> (mask_len_fold_left_plus_<mode>): Ditto.
> * config/riscv/riscv-protos.h (enum insn_type): New enum.
> (enum reduction_type): Ditto.
> (expand_reduction): Add in-order reduction.
> * config/riscv/riscv-v.cc (emit_nonvlmax_fp_reduction_insn): New function.
> (expand_reduction): Add in-order reduction.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c: New test.
>
> ---
> gcc/config/riscv/autovec.md | 39 ++++++++++++++
> gcc/config/riscv/riscv-protos.h | 13 ++++-
> gcc/config/riscv/riscv-v.cc | 53 +++++++++++++++----
> .../riscv/rvv/autovec/reduc/reduc_strict-1.c | 28 ++++++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-2.c | 26 +++++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-3.c | 18 +++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-4.c | 24 +++++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-5.c | 28 ++++++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-6.c | 18 +++++++
> .../riscv/rvv/autovec/reduc/reduc_strict-7.c | 21 ++++++++
> .../rvv/autovec/reduc/reduc_strict_run-1.c | 29 ++++++++++
> .../rvv/autovec/reduc/reduc_strict_run-2.c | 31 +++++++++++
> 12 files changed, 317 insertions(+), 11 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 00947207f3f..667a877d009 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1687,3 +1687,42 @@
> riscv_vector::expand_reduction (SMIN, operands, f);
> DONE;
> })
> +
> +;; -------------------------------------------------------------------------
> +;; ---- [FP] Left-to-right reductions
> +;; -------------------------------------------------------------------------
> +;; Includes:
> +;; - vfredosum.vs
> +;; -------------------------------------------------------------------------
> +
> +;; Unpredicated in-order FP reductions.
> +(define_expand "fold_left_plus_<mode>"
> + [(match_operand:<VEL> 0 "register_operand")
> + (match_operand:<VEL> 1 "register_operand")
> + (match_operand:VF 2 "register_operand")]
> + "TARGET_VECTOR"
> +{
> + riscv_vector::expand_reduction (PLUS, operands,
> + operands[1],
> + riscv_vector::reduction_type::FOLD_LEFT);
> + DONE;
> +})
> +
> +;; Predicated in-order FP reductions.
> +(define_expand "mask_len_fold_left_plus_<mode>"
> + [(match_operand:<VEL> 0 "register_operand")
> + (match_operand:<VEL> 1 "register_operand")
> + (match_operand:VF 2 "register_operand")
> + (match_operand:<VM> 3 "vector_mask_operand")
> + (match_operand 4 "autovec_length_operand")
> + (match_operand 5 "const_0_operand")]
> + "TARGET_VECTOR"
> +{
> + if (rtx_equal_p (operands[4], const0_rtx))
> + emit_move_insn (operands[0], operands[1]);
> + else
> + riscv_vector::expand_reduction (PLUS, operands,
> + operands[1],
> + riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
> + DONE;
> +})
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 16fb8dabca0..c9520f689e2 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -199,6 +199,7 @@ enum insn_type
> RVV_GATHER_M_OP = 5,
> RVV_SCATTER_M_OP = 4,
> RVV_REDUCTION_OP = 3,
> + RVV_REDUCTION_TU_OP = RVV_REDUCTION_OP + 2,
> };
> enum vlmul_type
> {
> @@ -247,7 +248,7 @@ void emit_vlmax_merge_insn (unsigned, int, rtx *);
> void emit_vlmax_cmp_insn (unsigned, rtx *);
> void emit_vlmax_cmp_mu_insn (unsigned, rtx *);
> void emit_vlmax_masked_mu_insn (unsigned, int, rtx *);
> -void emit_scalar_move_insn (unsigned, rtx *);
> +void emit_scalar_move_insn (unsigned, rtx *, rtx = 0);
> void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx);
> enum vlmul_type get_vlmul (machine_mode);
> unsigned int get_ratio (machine_mode);
> @@ -270,6 +271,13 @@ enum mask_policy
> MASK_AGNOSTIC = 1,
> MASK_ANY = 2,
> };
> +
> +enum class reduction_type
> +{
> + UNORDERED,
> + FOLD_LEFT,
> + MASK_LEN_FOLD_LEFT,
> +};
> enum tail_policy get_prefer_tail_policy ();
> enum mask_policy get_prefer_mask_policy ();
> rtx get_avl_type_rtx (enum avl_type);
> @@ -282,7 +290,8 @@ bool has_vi_variant_p (rtx_code, rtx);
> void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
> bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
> void expand_cond_len_binop (rtx_code, rtx *);
> -void expand_reduction (rtx_code, rtx *, rtx);
> +void expand_reduction (rtx_code, rtx *, rtx,
> + reduction_type = reduction_type::UNORDERED);
> #endif
> bool sew64_scalar_helper (rtx *, rtx *, rtx, machine_mode,
> bool, void (*)(rtx *, rtx));
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 53088edf909..e338be151d3 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -1023,11 +1023,11 @@ emit_nonvlmax_fp_tu_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
> /* Emit vmv.s.x instruction. */
>
> void
> -emit_scalar_move_insn (unsigned icode, rtx *ops)
> +emit_scalar_move_insn (unsigned icode, rtx *ops, rtx len)
> {
> machine_mode dest_mode = GET_MODE (ops[0]);
> machine_mode mask_mode = get_mask_mode (dest_mode).require ();
> - insn_expander<RVV_INSN_OPERANDS_MAX> e (riscv_vector::RVV_SCALAR_MOV_OP,
> + insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SCALAR_MOV_OP,
> /* HAS_DEST_P */ true,
> /* FULLY_UNMASKED_P */ false,
> /* USE_REAL_MERGE_P */ true,
> @@ -1038,7 +1038,7 @@ emit_scalar_move_insn (unsigned icode, rtx *ops)
>
> e.set_policy (TAIL_ANY);
> e.set_policy (MASK_ANY);
> - e.set_vl (CONST1_RTX (Pmode));
> + e.set_vl (len ? len : CONST1_RTX (Pmode));
> e.emit_insn ((enum insn_code) icode, ops);
> }
>
> @@ -1196,6 +1196,26 @@ emit_vlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops)
> e.emit_insn ((enum insn_code) icode, ops);
> }
>
> +/* Emit reduction instruction. */
> +static void
> +emit_nonvlmax_fp_reduction_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
> +{
> + machine_mode dest_mode = GET_MODE (ops[0]);
> + machine_mode mask_mode = get_mask_mode (GET_MODE (ops[1])).require ();
> + insn_expander<RVV_INSN_OPERANDS_MAX> e (op_num,
> + /* HAS_DEST_P */ true,
> + /* FULLY_UNMASKED_P */ false,
> + /* USE_REAL_MERGE_P */ true,
> + /* HAS_AVL_P */ true,
> + /* VLMAX_P */ false, dest_mode,
> + mask_mode);
> +
> + e.set_policy (TAIL_ANY);
> + e.set_rounding_mode (FRM_DYN);
> + e.set_vl (vl);
> + e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
> /* Emit merge instruction. */
>
> static machine_mode
> @@ -3343,9 +3363,10 @@ expand_cond_len_ternop (unsigned icode, rtx *ops)
>
> /* Expand reduction operations. */
> void
> -expand_reduction (rtx_code code, rtx *ops, rtx init)
> +expand_reduction (rtx_code code, rtx *ops, rtx init, reduction_type type)
> {
> - machine_mode vmode = GET_MODE (ops[1]);
> + rtx vector = type == reduction_type::UNORDERED ? ops[1] : ops[2];
> + machine_mode vmode = GET_MODE (vector);
> machine_mode m1_mode = get_m1_mode (vmode).require ();
> machine_mode m1_mmode = get_mask_mode (m1_mode).require ();
>
> @@ -3353,16 +3374,30 @@ expand_reduction (rtx_code code, rtx *ops, rtx init)
> rtx m1_mask = gen_scalar_move_mask (m1_mmode);
> rtx m1_undef = RVV_VUNDEF (m1_mode);
> rtx scalar_move_ops[] = {m1_tmp, m1_mask, m1_undef, init};
> - emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops);
> + rtx len = type == reduction_type::MASK_LEN_FOLD_LEFT ? ops[4] : NULL_RTX;
> + emit_scalar_move_insn (code_for_pred_broadcast (m1_mode), scalar_move_ops,
> + len);
>
> rtx m1_tmp2 = gen_reg_rtx (m1_mode);
> - rtx reduc_ops[] = {m1_tmp2, ops[1], m1_tmp};
> + rtx reduc_ops[] = {m1_tmp2, vector, m1_tmp};
>
> if (FLOAT_MODE_P (vmode) && code == PLUS)
> {
> insn_code icode
> - = code_for_pred_reduc_plus (UNSPEC_UNORDERED, vmode, m1_mode);
> - emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
> + = code_for_pred_reduc_plus (type == reduction_type::UNORDERED
> + ? UNSPEC_UNORDERED
> + : UNSPEC_ORDERED,
> + vmode, m1_mode);
> + if (type == reduction_type::MASK_LEN_FOLD_LEFT)
> + {
> + rtx mask = ops[3];
> + rtx mask_len_reduc_ops[]
> + = {m1_tmp2, mask, RVV_VUNDEF (m1_mode), vector, m1_tmp};
> + emit_nonvlmax_fp_reduction_insn (icode, RVV_REDUCTION_TU_OP,
> + mask_len_reduc_ops, len);
> + }
> + else
> + emit_vlmax_fp_reduction_insn (icode, RVV_REDUCTION_OP, reduc_ops);
> }
> else
> {
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
> new file mode 100644
> index 00000000000..c293e9ae746
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#include <stdint-gcc.h>
> +
> +#define NUM_ELEMS(TYPE) ((int)(5 * (256 / sizeof (TYPE)) + 3))
> +
> +#define DEF_REDUC_PLUS(TYPE) \
> + TYPE __attribute__ ((noinline, noclone)) \
> + reduc_plus_##TYPE (TYPE *a, TYPE *b) \
> + { \
> + TYPE r = 0, q = 3; \
> + for (int i = 0; i < NUM_ELEMS (TYPE); i++) \
> + { \
> + r += a[i]; \
> + q -= b[i]; \
> + } \
> + return r * q; \
> + }
> +
> +#define TEST_ALL(T) \
> + T (_Float16) \
> + T (float) \
> + T (double)
> +
> +TEST_ALL (DEF_REDUC_PLUS)
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
> new file mode 100644
> index 00000000000..2e1e7ab674d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#define NUM_ELEMS(TYPE) ((int) (5 * (256 / sizeof (TYPE)) + 3))
> +
> +#define DEF_REDUC_PLUS(TYPE) \
> +void __attribute__ ((noinline, noclone)) \
> +reduc_plus_##TYPE (TYPE (*restrict a)[NUM_ELEMS (TYPE)], \
> + TYPE *restrict r, int n) \
> +{ \
> + for (int i = 0; i < n; i++) \
> + { \
> + r[i] = 0; \
> + for (int j = 0; j < NUM_ELEMS (TYPE); j++) \
> + r[i] += a[i][j]; \
> + } \
> +}
> +
> +#define TEST_ALL(T) \
> + T (_Float16) \
> + T (float) \
> + T (double)
> +
> +TEST_ALL (DEF_REDUC_PLUS)
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 3 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
> new file mode 100644
> index 00000000000..f559d40e60f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-3.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +double mat[100][2];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> + double tmp = 0.0;
> + for (int i = 0; i < n; i++)
> + {
> + tmp = tmp + mat[i][0];
> + tmp = tmp + mat[i][1];
> + }
> + return tmp;
> +}
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
> new file mode 100644
> index 00000000000..428d371d9cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-4.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +double mat[100][8];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> + double tmp = 0.0;
> + for (int i = 0; i < n; i++)
> + {
> + tmp = tmp + mat[i][0];
> + tmp = tmp + mat[i][1];
> + tmp = tmp + mat[i][2];
> + tmp = tmp + mat[i][3];
> + tmp = tmp + mat[i][4];
> + tmp = tmp + mat[i][5];
> + tmp = tmp + mat[i][6];
> + tmp = tmp + mat[i][7];
> + }
> + return tmp;
> +}
> +
> +/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
> new file mode 100644
> index 00000000000..24add2291f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-5.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +double mat[100][12];
> +
> +double
> +slp_reduc_plus (int n)
> +{
> + double tmp = 0.0;
> + for (int i = 0; i < n; i++)
> + {
> + tmp = tmp + mat[i][0];
> + tmp = tmp + mat[i][1];
> + tmp = tmp + mat[i][2];
> + tmp = tmp + mat[i][3];
> + tmp = tmp + mat[i][4];
> + tmp = tmp + mat[i][5];
> + tmp = tmp + mat[i][6];
> + tmp = tmp + mat[i][7];
> + tmp = tmp + mat[i][8];
> + tmp = tmp + mat[i][9];
> + tmp = tmp + mat[i][10];
> + tmp = tmp + mat[i][11];
> + }
> + return tmp;
> +}
> +
> +/* { dg-final { scan-assembler {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
> new file mode 100644
> index 00000000000..c1567b067ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */
> +
> +float
> +double_reduc (float (*i)[16])
> +{
> + float l = 0;
> +
> +#pragma GCC unroll 0
> + for (int a = 0; a < 8; a++)
> + for (int b = 0; b < 100; b++)
> + l += i[b][a];
> + return l;
> +}
> +
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 1 } } */
> +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
> new file mode 100644
> index 00000000000..f742a824bb2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict-7.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */
> +
> +float
> +double_reduc (float *i, float *j)
> +{
> + float k = 0, l = 0;
> +
> + for (int a = 0; a < 8; a++)
> + for (int b = 0; b < 100; b++)
> + {
> + k += i[b];
> + l += j[b];
> + }
> + return l * k;
> +}
> +
> +/* { dg-final { scan-assembler-times {vle32\.v} 2 } } */
> +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */
> +/* { dg-final { scan-tree-dump "Detected double reduction" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "OUTER LOOP VECTORIZED" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
> new file mode 100644
> index 00000000000..516be97e9eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#include "reduc_strict-1.c"
> +
> +#define TEST_REDUC_PLUS(TYPE) \
> + { \
> + TYPE a[NUM_ELEMS (TYPE)]; \
> + TYPE b[NUM_ELEMS (TYPE)]; \
> + TYPE r = 0, q = 3; \
> + for (int i = 0; i < NUM_ELEMS (TYPE); i++) \
> + { \
> + a[i] = (i * 0.1) * (i & 1 ? 1 : -1); \
> + b[i] = (i * 0.3) * (i & 1 ? 1 : -1); \
> + r += a[i]; \
> + q -= b[i]; \
> + asm volatile ("" ::: "memory"); \
> + } \
> + TYPE res = reduc_plus_##TYPE (a, b); \
> + if (res != r * q) \
> + __builtin_abort (); \
> + }
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> + TEST_ALL (TEST_REDUC_PLUS);
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
> new file mode 100644
> index 00000000000..0a4238d96f3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
> +
> +#include "reduc_strict-2.c"
> +
> +#define NROWS 5
> +
> +#define TEST_REDUC_PLUS(TYPE) \
> + { \
> + TYPE a[NROWS][NUM_ELEMS (TYPE)]; \
> + TYPE r[NROWS]; \
> + TYPE expected[NROWS] = {}; \
> + for (int i = 0; i < NROWS; ++i) \
> + for (int j = 0; j < NUM_ELEMS (TYPE); ++j) \
> + { \
> + a[i][j] = (i * 0.1 + j * 0.6) * (j & 1 ? 1 : -1); \
> + expected[i] += a[i][j]; \
> + asm volatile ("" ::: "memory"); \
> + } \
> + reduc_plus_##TYPE (a, r, NROWS); \
> + for (int i = 0; i < NROWS; ++i) \
> + if (r[i] != expected[i]) \
> + __builtin_abort (); \
> + }
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> + TEST_ALL (TEST_REDUC_PLUS);
> + return 0;
> +}
> --
> 2.36.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V2] RISC-V: Support in-order floating-point reduction
2023-07-20 8:59 ` Kito Cheng
@ 2023-07-20 9:00 ` Robin Dapp
2023-07-24 8:38 ` Lehua Ding
0 siblings, 1 reply; 4+ messages in thread
From: Robin Dapp @ 2023-07-20 9:00 UTC (permalink / raw)
To: Kito Cheng, Juzhe-Zhong; +Cc: rdapp.gcc, gcc-patches, kito.cheng, jeffreyalaw
> LGTM, but I would like make sure Robin is OK too
Yes, LGTM as well.
Regards
Robin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH V2] RISC-V: Support in-order floating-point reduction
2023-07-20 9:00 ` Robin Dapp
@ 2023-07-24 8:38 ` Lehua Ding
0 siblings, 0 replies; 4+ messages in thread
From: Lehua Ding @ 2023-07-24 8:38 UTC (permalink / raw)
To: Kito Cheng, Robin Dapp, gcc-patches; +Cc: jeffreyalaw, Juzhe-Zhong
[-- Attachment #1: Type: text/plain, Size: 62 bytes --]
Committed to the trunk, thanks Kito and Robin.
Best,
Lehua
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-07-24 8:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-20 8:51 [PATCH V2] RISC-V: Support in-order floating-point reduction Juzhe-Zhong
2023-07-20 8:59 ` Kito Cheng
2023-07-20 9:00 ` Robin Dapp
2023-07-24 8:38 ` Lehua Ding
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).