* [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
@ 2023-08-24 2:19 Juzhe-Zhong
2023-08-24 9:13 ` Robin Dapp
0 siblings, 1 reply; 4+ messages in thread
From: Juzhe-Zhong @ 2023-08-24 2:19 UTC (permalink / raw)
To: gcc-patches; +Cc: kito.cheng, kito.cheng, jeffreyalaw, rdapp.gcc, Juzhe-Zhong
Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
int last = 66; /* High start value. */
for (int i = 0; i < 4; i++)
if (a[i] < min_v)
last = i;
return last;
}
--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8
condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret
--param=riscv-autovec-preference=scalable
condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret
There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.
vsetvl issue is not included in this patch but will be fixed soon.
gcc/ChangeLog:
* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.
---
gcc/config/riscv/autovec.md | 24 ++++
gcc/config/riscv/riscv-protos.h | 2 +
gcc/config/riscv/riscv-v.cc | 115 +++++++++++++++++-
gcc/config/riscv/vector.md | 2 +-
| 20 +++
| 6 +
| 24 ++++
| 6 +
| 7 ++
| 6 +
| 6 +
| 26 ++++
| 6 +
| 8 ++
| 6 +
| 8 ++
| 6 +
| 8 ++
| 22 ++++
| 4 +
| 22 ++++
| 4 +
| 22 ++++
| 4 +
| 4 +
| 23 ++++
| 4 +
| 23 ++++
| 4 +
| 25 ++++
| 4 +
| 23 ++++
32 files changed, 472 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index e1addc07036..9760fa4dde0 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2126,6 +2126,30 @@
DONE;
})
+;; -------------------------------------------------------------------------
+;; ---- [INT,FP] Extract active element
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vcompress.vm
+;; - vcpop.m
+;; - vslidedown.vx
+;; - vmv.x.s
+;; - vfmv.f.s
+;; -------------------------------------------------------------------------
+
+(define_expand "len_fold_extract_last_<mode>"
+ [(match_operand:<VEL> 0 "register_operand")
+ (match_operand:<VEL> 1 "register_operand")
+ (match_operand:<VM> 2 "register_operand")
+ (match_operand:V 3 "register_operand")
+ (match_operand 4 "autovec_length_operand")
+ (match_operand 5 "const_0_operand")]
+ "TARGET_VECTOR"
+ {
+ riscv_vector::expand_fold_extract_last (operands);
+ DONE;
+ })
+
;; -------------------------------------------------------------------------
;; ---- [INT] Average.
;; -------------------------------------------------------------------------
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 2c4405c9860..68ec1bdf62b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -207,6 +207,7 @@ enum insn_type
RVV_SCATTER_M_OP = 4,
RVV_REDUCTION_OP = 3,
RVV_REDUCTION_TU_OP = RVV_REDUCTION_OP + 2,
+ RVV_CPOP = 2,
};
enum vlmul_type
{
@@ -329,6 +330,7 @@ void expand_gather_scatter (rtx *, bool);
void expand_cond_len_ternop (unsigned, rtx *);
void prepare_ternary_operands (rtx *, bool = false);
void expand_lanes_load_store (rtx *, bool);
+void expand_fold_extract_last (rtx *);
/* Rounding mode bitfield for fixed point VXRM. */
enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 14eda581d00..7c9b9a4f50a 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -213,7 +213,7 @@ public:
{
/* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
the vsetvli to obtain the value of vlmax. */
- poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
+ poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
len = gen_int_mode (nunits, Pmode);
m_vlmax_p = false; /* It has became NONVLMAX now. */
}
@@ -848,6 +848,28 @@ emit_nonvlmax_slide_tu_insn (unsigned icode, rtx *ops, rtx avl)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli
+ followed by a vslide insn (with real merge operand). */
+void
+emit_nonvlmax_slide_insn (unsigned icode, rtx *ops, rtx avl)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = get_mask_mode (dest_mode);
+ insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_SLIDE_OP,
+ /* HAS_DEST_P */ true,
+ /* FULLY_UNMASKED_P */ true,
+ /* USE_REAL_MERGE_P */ true,
+ /* HAS_AVL_P */ true,
+ /* VLMAX_P */ true,
+ dest_mode,
+ mask_mode);
+
+ e.set_policy (TAIL_ANY);
+ e.set_policy (MASK_ANY);
+ e.set_vl (avl);
+
+ e.emit_insn ((enum insn_code) icode, ops);
+}
/* This function emits merge instruction. */
void
@@ -1111,6 +1133,25 @@ emit_scalar_move_insn (unsigned icode, rtx *ops, rtx len)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* Emit vcpop.m instruction. */
+
+static void
+emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = GET_MODE (ops[1]);
+ insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_CPOP,
+ /* HAS_DEST_P */ true,
+ /* FULLY_UNMASKED_P */ true,
+ /* USE_REAL_MERGE_P */ true,
+ /* HAS_AVL_P */ true,
+ /* VLMAX_P */ len ? false : true,
+ dest_mode, mask_mode);
+
+ e.set_vl (len);
+ e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* Emit vmv.v.x instruction with vlmax. */
static void
@@ -1228,6 +1269,25 @@ emit_vlmax_compress_insn (unsigned icode, rtx *ops)
e.emit_insn ((enum insn_code) icode, ops);
}
+/* Emit compress instruction. */
+static void
+emit_nonvlmax_compress_insn (unsigned icode, rtx *ops, rtx avl)
+{
+ machine_mode dest_mode = GET_MODE (ops[0]);
+ machine_mode mask_mode = get_mask_mode (dest_mode);
+ insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_COMPRESS_OP,
+ /* HAS_DEST_P */ true,
+ /* FULLY_UNMASKED_P */ false,
+ /* USE_REAL_MERGE_P */ true,
+ /* HAS_AVL_P */ true,
+ /* VLMAX_P */ true, dest_mode,
+ mask_mode);
+
+ e.set_policy (TAIL_ANY);
+ e.set_vl (avl);
+ e.emit_insn ((enum insn_code) icode, ops);
+}
+
/* Emit reduction instruction. */
static void
emit_vlmax_reduction_insn (unsigned icode, int op_num, rtx *ops)
@@ -3816,4 +3876,57 @@ expand_lanes_load_store (rtx *ops, bool is_load)
}
}
+/* Expand LEN_FOLD_EXTRACT_LAST. */
+void
+expand_fold_extract_last (rtx *ops)
+{
+ rtx dst = ops[0];
+ rtx default_value = ops[1];
+ rtx mask = ops[2];
+ rtx anchor = gen_reg_rtx (Pmode);
+ rtx index = gen_reg_rtx (Pmode);
+ rtx vect = ops[3];
+ rtx else_label = gen_label_rtx ();
+ rtx end_label = gen_label_rtx ();
+ rtx len = ops[4];
+ poly_int64 value;
+ machine_mode mode = GET_MODE (vect);
+ machine_mode mask_mode = GET_MODE (mask);
+ rtx compress_vect = gen_reg_rtx (mode);
+ rtx slide_vect = gen_reg_rtx (mode);
+ insn_code icode;
+
+ if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode)))
+ len = NULL_RTX;
+
+ /* Calculate the number of 1-bit in mask. */
+ rtx cpop_ops[] = {anchor, mask};
+ emit_cpop_insn (code_for_pred_popcount (mask_mode, Pmode), cpop_ops, len);
+
+ riscv_expand_conditional_branch (else_label, EQ, anchor, const0_rtx);
+ emit_insn (gen_rtx_SET (index, gen_rtx_PLUS (Pmode, anchor, constm1_rtx)));
+ /* Compress the vector. */
+ icode = code_for_pred_compress (mode);
+ rtx compress_ops[] = {compress_vect, RVV_VUNDEF (mode), vect, mask};
+ if (len)
+ emit_nonvlmax_compress_insn (icode, compress_ops, len);
+ else
+ emit_vlmax_compress_insn (icode, compress_ops);
+ /* Emit the slide down to index 0 in a new vector. */
+ rtx slide_ops[] = {slide_vect, RVV_VUNDEF (mode), compress_vect, index};
+ icode = code_for_pred_slide (UNSPEC_VSLIDEDOWN, mode);
+ if (len)
+ emit_nonvlmax_slide_insn (icode, slide_ops, len);
+ else
+ emit_vlmax_slide_insn (icode, slide_ops);
+ /* Emit v(f)mv.[xf].s. */
+ emit_insn (gen_pred_extract_first (mode, dst, slide_vect));
+
+ emit_jump_insn (gen_jump (end_label));
+ emit_barrier ();
+ emit_label (else_label);
+ emit_move_insn (dst, default_value);
+ emit_label (end_label);
+}
+
} // namespace riscv_vector
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 6ceae25dbed..a442e0fdd3c 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -417,7 +417,7 @@
vialu,vshift,vicmp,vimul,vidiv,vsalu,\
vext,viwalu,viwmul,vicalu,vnshift,\
vimuladd,vimerge,vaalu,vsmul,vsshift,\
- vnclip,viminmax,viwmuladd,vmpop,vmffs,vmsfs,\
+ vnclip,viminmax,viwmuladd,vmffs,vmsfs,\
vmiota,vmidx,vfalu,vfmul,vfminmax,vfdiv,\
vfwalu,vfwmul,vfsqrt,vfrecp,vfsgnj,vfcmp,\
vfmerge,vfcvtitof,vfcvtftoi,vfwcvtitof,\
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c
new file mode 100644
index 00000000000..6c86f29e7d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define N 32
+
+/* Simple condition reduction. */
+
+int __attribute__ ((noinline, noclone))
+condition_reduction (int *a, int min_v)
+{
+ int last = 66; /* High start value. */
+
+ for (int i = 0; i < N; i++)
+ if (a[i] < min_v)
+ last = i;
+
+ return last;
+}
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c
new file mode 100644
index 00000000000..c5fe5204763
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-9.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c
new file mode 100644
index 00000000000..85547c8bd76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define N 32
+
+#ifndef TYPE
+#define TYPE float
+#endif
+
+/* Non-integer data types. */
+
+TYPE __attribute__ ((noinline, noclone))
+condition_reduction (TYPE *a, TYPE min_v)
+{
+ TYPE last = 0;
+
+ for (int i = 0; i < N; i++)
+ if (a[i] < min_v)
+ last = a[i];
+
+ return last;
+}
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c
new file mode 100644
index 00000000000..c165cb33ce4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-11.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c
new file mode 100644
index 00000000000..9a04af6c266
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define TYPE double
+#include "extract_last-11.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c
new file mode 100644
index 00000000000..88f8a4c056a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-13.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c
new file mode 100644
index 00000000000..b1eea0db0cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-1.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c
new file mode 100644
index 00000000000..2c94ef58a47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include <stdint-gcc.h>
+
+#if !defined(TYPE)
+#define TYPE uint32_t
+#endif
+
+#define N 254
+
+/* Non-simple condition reduction. */
+
+TYPE __attribute__ ((noinline, noclone))
+condition_reduction (TYPE *a, TYPE min_v)
+{
+ TYPE last = 65;
+
+ for (TYPE i = 0; i < N; i++)
+ if (a[i] < min_v)
+ last = a[i];
+
+ return last;
+}
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c
new file mode 100644
index 00000000000..a9ac667edd3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-3.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c
new file mode 100644
index 00000000000..dc7fa639786
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define TYPE uint8_t
+
+#include "extract_last-3.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c
new file mode 100644
index 00000000000..4e434a1813d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-5.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c
new file mode 100644
index 00000000000..e75e9b21ed3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define TYPE int16_t
+
+#include "extract_last-3.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c
new file mode 100644
index 00000000000..a37eb26f5a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#include "extract_last-7.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c
new file mode 100644
index 00000000000..c7ae0d747cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model -fdump-tree-optimized" } */
+
+#define TYPE uint64_t
+
+#include "extract_last-3.c"
+
+/* { dg-final { scan-tree-dump "\.LEN_FOLD_EXTRACT_LAST" "optimized" } } */
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c
new file mode 100644
index 00000000000..c2083455cde
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-1.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ int a[N] = {
+ 11, -12, 13, 14, 15, 16, 17, 18, 19, 20,
+ 1, 2, -3, 4, 5, 6, 7, -8, 9, 10,
+ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+ 31, 32
+ };
+
+ int ret = condition_reduction (a, 1);
+
+ if (ret != 17)
+ __builtin_abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c
new file mode 100644
index 00000000000..7ff435df4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-9.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c
new file mode 100644
index 00000000000..99af6b3287e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-11.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ float a[N] = {
+ 11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20,
+ 1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6,
+ 21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30,
+ 31.111, 32.322
+ };
+
+ float ret = condition_reduction (a, 16.7);
+
+ if (ret != (float) 10.6)
+ __builtin_abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c
new file mode 100644
index 00000000000..43d1c765bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-11.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c
new file mode 100644
index 00000000000..1e418964d68
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c
@@ -0,0 +1,22 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-13.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ double a[N] = {
+ 11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20,
+ 1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6,
+ 21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30,
+ 31.111, 32.322
+ };
+
+ double ret = condition_reduction (a, 16.7);
+
+ if (ret != 10.6)
+ __builtin_abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c
new file mode 100644
index 00000000000..535b68f0f22
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-13.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c
new file mode 100644
index 00000000000..ee53bf9ccab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-1.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c
new file mode 100644
index 00000000000..ff2b0258dc8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-3.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ TYPE a[N] = {
+ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+ 31, 32
+ };
+ __builtin_memset (a + 32, 43, (N - 32) * sizeof (TYPE));
+
+ TYPE ret = condition_reduction (a, 16);
+
+ if (ret != 10)
+ __builtin_abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c
new file mode 100644
index 00000000000..4b5b6332adf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-3.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c
new file mode 100644
index 00000000000..7b66b24261f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-5.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ TYPE a[N] = {
+ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+ 31, 32
+ };
+ __builtin_memset (a + 32, 43, (N - 32) * sizeof (TYPE));
+
+ TYPE ret = condition_reduction (a, 16);
+
+ if (ret != 10)
+ __builtin_abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c
new file mode 100644
index 00000000000..a52eac92487
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-5.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c
new file mode 100644
index 00000000000..a1ac4a5dfb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-7.c"
+
+extern void abort (void) __attribute__ ((noreturn));
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ TYPE a[N] = {
+ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+ 31, 32
+ };
+ __builtin_memset (a+32, 43, (N-32)*sizeof (TYPE));
+
+ TYPE ret = condition_reduction (a, 16);
+
+ if (ret != 10)
+ abort ();
+
+ return 0;
+}
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c
new file mode 100644
index 00000000000..56858f99921
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c
@@ -0,0 +1,4 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "extract_last_run-7.c"
--git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c
new file mode 100644
index 00000000000..67672bfb705
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=fixed-vlmax -fno-vect-cost-model" } */
+
+#include "extract_last-9.c"
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+ TYPE a[N] = {
+ 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
+ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+ 31, 32
+ };
+ __builtin_memset (a + 32, 43, (N - 32) * sizeof (TYPE));
+
+ TYPE ret = condition_reduction (a, 16);
+
+ if (ret != 10)
+ __builtin_abort ();
+
+ return 0;
+}
--
2.36.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
2023-08-24 2:19 [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization Juzhe-Zhong
@ 2023-08-24 9:13 ` Robin Dapp
2023-08-24 9:33 ` 钟居哲
2023-08-24 9:38 ` 钟居哲
0 siblings, 2 replies; 4+ messages in thread
From: Robin Dapp @ 2023-08-24 9:13 UTC (permalink / raw)
To: Juzhe-Zhong, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, jeffreyalaw
Hi Juzhe,
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
Mhm, where is this sext coming from? Thought I had this covered with
the autovec-opt pattern but apparently not. I'll take that, nothing
related to this patch.
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
> {
> /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax. */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
Why is that necessary? Just for the popcount I presume?
Can't we rather have a new case for a scalar destination? I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
>
> +/* Emit vcpop.m instruction. */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> + machine_mode dest_mode = GET_MODE (ops[0]);
> + machine_mode mask_mode = GET_MODE (ops[1]);
> + insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_CPOP,
> + /* HAS_DEST_P */ true,
> + /* FULLY_UNMASKED_P */ true,
> + /* USE_REAL_MERGE_P */ true,
> + /* HAS_AVL_P */ true,
> + /* VLMAX_P */ len ? false : true,
> + dest_mode, mask_mode);
> +
> + e.set_vl (len);
> + e.emit_insn ((enum insn_code) icode, ops);
> +}
The use_real_merge just appeared odd to me here because there is
nothing to merge. But in the end it's just to omit the vundef operand
so good for now. There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
The rest looks good to me. Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
The rest looks good to me. At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine. Might want to use a conditional select in the
future but actually not that important.
Regards
Robin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
2023-08-24 9:13 ` Robin Dapp
@ 2023-08-24 9:33 ` 钟居哲
2023-08-24 9:38 ` 钟居哲
1 sibling, 0 replies; 4+ messages in thread
From: 钟居哲 @ 2023-08-24 9:33 UTC (permalink / raw)
To: rdapp.gcc, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, Jeff Law
[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]
>> Why is that necessary? Just for the popcount I presume?
>> Can't we rather have a new case for a scalar destination? I find
>> the code a bit misleading now as we check m_dest_mode and then not
>> use it.
I am gonna fix it in V2.
>> The rest looks good to me. Note that my machine crashed when
>> compiling the extract_last-14.c because it used up all my RAM.
>> The vsetvl "refactor" phase 3 patch helped, though.
>> We'd need to have this patch depend on the other one then.
Yes. The refactor patch fixed potential bugs. I will commit that tomorrow
when kito no more comments.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
Mhm, where is this sext coming from? Thought I had this covered with
the autovec-opt pattern but apparently not. I'll take that, nothing
related to this patch.
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
> {
> /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax. */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
Why is that necessary? Just for the popcount I presume?
Can't we rather have a new case for a scalar destination? I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
>
> +/* Emit vcpop.m instruction. */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> + machine_mode dest_mode = GET_MODE (ops[0]);
> + machine_mode mask_mode = GET_MODE (ops[1]);
> + insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_CPOP,
> + /* HAS_DEST_P */ true,
> + /* FULLY_UNMASKED_P */ true,
> + /* USE_REAL_MERGE_P */ true,
> + /* HAS_AVL_P */ true,
> + /* VLMAX_P */ len ? false : true,
> + dest_mode, mask_mode);
> +
> + e.set_vl (len);
> + e.emit_insn ((enum insn_code) icode, ops);
> +}
The use_real_merge just appeared odd to me here because there is
nothing to merge. But in the end it's just to omit the vundef operand
so good for now. There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
The rest looks good to me. Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
The rest looks good to me. At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine. Might want to use a conditional select in the
future but actually not that important.
Regards
Robin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
2023-08-24 9:13 ` Robin Dapp
2023-08-24 9:33 ` 钟居哲
@ 2023-08-24 9:38 ` 钟居哲
1 sibling, 0 replies; 4+ messages in thread
From: 钟居哲 @ 2023-08-24 9:38 UTC (permalink / raw)
To: rdapp.gcc, gcc-patches; +Cc: rdapp.gcc, kito.cheng, kito.cheng, Jeff Law
[-- Attachment #1: Type: text/plain, Size: 3008 bytes --]
>> The use_real_merge just appeared odd to me here because there is
>> nothing to merge. But in the end it's just to omit the vundef operand
>> so good for now. There is an increasing number of opportunities to
>> refactor in riscv-v.cc, though ;)
I think we can change use_real_merge into use_dummy_merge?
When it's true then add undef merge :
if (!m_use_real_merge_p)
add_vundef_operand ();
change it into:
if (m_use_dummy_merge_p)
add_vundef_operand ();
Then we can avoid the confusion.
juzhe.zhong@rivai.ai
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
Mhm, where is this sext coming from? Thought I had this covered with
the autovec-opt pattern but apparently not. I'll take that, nothing
related to this patch.
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
> {
> /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax. */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
Why is that necessary? Just for the popcount I presume?
Can't we rather have a new case for a scalar destination? I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
>
> +/* Emit vcpop.m instruction. */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> + machine_mode dest_mode = GET_MODE (ops[0]);
> + machine_mode mask_mode = GET_MODE (ops[1]);
> + insn_expander<RVV_INSN_OPERANDS_MAX> e (RVV_CPOP,
> + /* HAS_DEST_P */ true,
> + /* FULLY_UNMASKED_P */ true,
> + /* USE_REAL_MERGE_P */ true,
> + /* HAS_AVL_P */ true,
> + /* VLMAX_P */ len ? false : true,
> + dest_mode, mask_mode);
> +
> + e.set_vl (len);
> + e.emit_insn ((enum insn_code) icode, ops);
> +}
The use_real_merge just appeared odd to me here because there is
nothing to merge. But in the end it's just to omit the vundef operand
so good for now. There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
The rest looks good to me. Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
The rest looks good to me. At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine. Might want to use a conditional select in the
future but actually not that important.
Regards
Robin
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-08-24 9:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 2:19 [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization Juzhe-Zhong
2023-08-24 9:13 ` Robin Dapp
2023-08-24 9:33 ` 钟居哲
2023-08-24 9:38 ` 钟居哲
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).