* [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
@ 2023-09-18 15:36 Lehua Ding
2023-09-18 21:18 ` Robin Dapp
0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-18 15:36 UTC (permalink / raw)
To: gcc-patches
Cc: juzhe.zhong, kito.cheng, rdapp.gcc, palmer, jeffreyalaw, lehua.ding
This patch support combining cond extend and reduce_sum to cond widen reduce_sum
like combine the following three insns:
(set (reg:RVVM2HI 149)
(const_vector:RVVM2HI repeat [
(const_int 0)
]))
(set (reg:RVVM2HI 138)
(if_then_else:RVVM2HI
(reg:RVVMF8BI 135)
(reg:RVVM2HI 148)
(reg:RVVM2HI 149)))
(set (reg:HI 150)
(unspec:HI [
(reg:RVVM2HI 138)
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SI 147)
(unspec:SI [
(if_then_else:RVVM2SI
(reg:RVVMF16BI 135)
(sign_extend:RVVM2SI (reg:RVVM1HI 136))
(const_vector:RVVM2SI repeat [
(const_int 0)
]))
] UNSPEC_REDUC_SUM))
Consider the following C code:
int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
int16_t sum = 0;
for (int i = 0; i < 16; i += 1)
if (pred[i])
sum += a[i];
return sum;
}
assembly before this patch:
foo:
vsetivli zero,16,e16,m2,ta,ma
li a5,0
vmv.v.i v2,0
vsetvli zero,zero,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vsetvli zero,zero,e16,m2,ta,mu
vle8.v v4,0(a0),v0.t
vmv.s.x v1,a5
vsext.vf2 v2,v4,v0.t
vredsum.vs v2,v2,v1
vmv.x.s a0,v2
slliw a0,a0,16
sraiw a0,a0,16
ret
assembly after this patch:
foo:
li a5,0
vsetivli zero,16,e16,m1,ta,ma
vmv.s.x v3,a5
vsetivli zero,16,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vle8.v v2,0(a0),v0.t
vwredsum.vs v1,v2,v3,v0.t
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a0,v1
slliw a0,a0,16
sraiw a0,a0,16
ret
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>):
New combine patterns.
* config/riscv/autovec.md (vcond_mask_<mode><vm>):
Split vcond_mask pattern into three patterns.
(vec_duplicate_const_0<mode>): Ditto.
(*vcond_mask_<mode><vm>): Ditto.
* config/riscv/predicates.md (vector_register_or_const_0_operand): New.
* config/riscv/riscv-protos.h (enum insn_type): Add REDUCE_OP_M.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test.
---
gcc/config/riscv/autovec-opt.md | 48 +++++++++++++++
gcc/config/riscv/autovec.md | 59 ++++++++++++++++++-
gcc/config/riscv/predicates.md | 5 ++
gcc/config/riscv/riscv-protos.h | 1 +
.../rvv/autovec/cond/cond_widen_reduc-1.c | 30 ++++++++++
.../rvv/autovec/cond/cond_widen_reduc_run-1.c | 28 +++++++++
6 files changed, 170 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index b47bae16193..eefa4f28a0a 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1284,6 +1284,54 @@
}
[(set_attr "type" "vector")])
+;; Combine mask extend + vredsum to mask vwredsum[u]
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+ [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+ (unspec:<V_DOUBLE_EXTEND_VEL> [
+ (if_then_else:<V_DOUBLE_EXTEND>
+ (match_operand:<VM> 1 "register_operand")
+ (any_extend:<V_DOUBLE_EXTEND>
+ (match_operand:VI_QHS_NO_M8 2 "register_operand"))
+ (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+ ] UNSPEC_REDUC_SUM))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ rtx ops[] = {operands[0], operands[2], operands[1],
+ gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+ riscv_vector::expand_reduction (<WREDUC_UNSPEC>,
+ riscv_vector::REDUCE_OP_M,
+ ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+ DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine mask extend + vfredsum to mask vfwredusum
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+ [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+ (unspec:<V_DOUBLE_EXTEND_VEL> [
+ (if_then_else:<V_DOUBLE_EXTEND>
+ (match_operand:<VM> 1 "register_operand")
+ (float_extend:<V_DOUBLE_EXTEND>
+ (match_operand:VF_HS_NO_M8 2 "register_operand"))
+ (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+ ] UNSPEC_REDUC_SUM_UNORDERED))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ rtx ops[] = {operands[0], operands[2], operands[1],
+ gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+ riscv_vector::expand_reduction (UNSPEC_WREDUC_SUM_UNORDERED,
+ riscv_vector::REDUCE_OP_M_FRM_DYN,
+ ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+ DONE;
+}
+[(set_attr "type" "vector")])
+
;; =============================================================================
;; Misc combine patterns
;; =============================================================================
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..20a71ad8ced 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -545,7 +545,64 @@
;; - vfmerge.vf
;; -------------------------------------------------------------------------
-(define_insn_and_split "vcond_mask_<mode><vm>"
+;; The purpose of splitting the original pattern into three patterns here is
+;; to combine the following three insns:
+;; (set (reg:RVVM2HI 149)
+;; (const_vector:RVVM2HI repeat [
+;; (const_int 0)
+;; ]))
+;; (set (reg:RVVM2HI 138)
+;; (if_then_else:RVVM2HI
+;; (reg:RVVMF8BI 135)
+;; (reg:RVVM2HI 148)
+;; (reg:RVVM2HI 149)))
+;; (set (reg:HI 150)
+;; (unspec:HI [
+;; (reg:RVVM2HI 138)
+;; ] UNSPEC_REDUC_SUM))
+;;
+;; into one insn:
+;;
+;; (set (reg:SI 147)
+;; (unspec:SI [
+;; (if_then_else:RVVM2SI
+;; (reg:RVVMF16BI 135)
+;; (sign_extend:RVVM2SI (reg:RVVM1HI 136))
+;; (const_vector:RVVM2SI repeat [
+;; (const_int 0)
+;; ]))
+;; ] UNSPEC_REDUC_SUM))
+
+(define_expand "vcond_mask_<mode><vm>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (if_then_else:V_VLS
+ (match_operand:<VM> 3 "register_operand")
+ (match_operand:V_VLS 1 "nonmemory_operand")
+ (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+ "TARGET_VECTOR"
+ {
+ if (satisfies_constraint_Wc0 (operands[2]))
+ {
+ rtx reg = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
+ operands[2] = reg;
+ }
+ })
+
+(define_insn_and_split "vec_duplicate_const_0<mode>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (match_operand:V_VLS 1 "vector_const_0_operand"))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+}
+ [(set_attr "type" "vector")])
+
+(define_insn_and_split "*vcond_mask_<mode><vm>"
[(set (match_operand:V_VLS 0 "register_operand")
(if_then_else:V_VLS
(match_operand:<VM> 3 "register_operand")
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 4bc7ff2c9d8..6abf9d97958 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -463,6 +463,11 @@
(ior (match_operand 0 "register_operand")
(match_code "const_vector")))
+(define_predicate "vector_register_or_const_0_operand"
+ (ior (match_operand 0 "register_operand")
+ (and (match_code "const_vector")
+ (match_test "satisfies_constraint_Wc0 (op)"))))
+
(define_predicate "vector_gs_scale_operand_16"
(and (match_code "const_int")
(match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..fd6107ccb5c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -337,6 +337,7 @@ enum insn_type : unsigned int
/* For vreduce, no mask policy operand. */
REDUCE_OP = __NORMAL_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
+ REDUCE_OP_M = __MASK_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
REDUCE_OP_FRM_DYN = REDUCE_OP | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
REDUCE_OP_M_FRM_DYN
= __MASK_OP_TA | BINARY_OP_P | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
new file mode 100644
index 00000000000..22a71048684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math" } */
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, N) \
+ __attribute__ ((noipa)) \
+ TYPE1 reduc_##TYPE1##_##TYPE2 (TYPE2 *restrict a, TYPE2 *restrict pred) \
+ { \
+ TYPE1 sum = 0; \
+ for (int i = 0; i < N; i += 1) \
+ if (pred[i]) \
+ sum += a[i]; \
+ return sum; \
+ }
+
+#define TEST_ALL(TEST) \
+ TEST (int16_t, int8_t, 16) \
+ TEST (int32_t, int16_t, 8) \
+ TEST (int64_t, int32_t, 4) \
+ TEST (uint16_t, uint8_t, 16) \
+ TEST (uint32_t, uint16_t, 8) \
+ TEST (uint64_t, uint32_t, 4) \
+ TEST (float, _Float16, 8) \
+ TEST (double, float, 4)
+
+TEST_ALL (TEST_TYPE)
+
+/* { dg-final { scan-assembler-times {\tvfwredusum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times {\tvwredsum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwredsumu\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
new file mode 100644
index 00000000000..fdb7e5249ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "cond_widen_reduc-1.c"
+
+#define RUN(TYPE1, TYPE2, N) \
+ { \
+ TYPE2 a[N]; \
+ TYPE2 pred[N]; \
+ TYPE1 r = 0; \
+ for (int i = 0; i < N; i++) \
+ { \
+ a[i] = (i * 0.1) * (i & 1 ? 1 : -1); \
+ pred[i] = i % 3; \
+ if (pred[i]) \
+ r += a[i]; \
+ asm volatile ("" ::: "memory"); \
+ } \
+ if (r != reduc_##TYPE1##_##TYPE2 (a, pred)) \
+ __builtin_abort (); \
+ }
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+ TEST_ALL (RUN)
+ return 0;
+}
--
2.36.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-18 15:36 [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum Lehua Ding
@ 2023-09-18 21:18 ` Robin Dapp
2023-09-19 2:50 ` Lehua Ding
0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-18 21:18 UTC (permalink / raw)
To: Lehua Ding, gcc-patches
Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Lehua,
> +(define_expand "vcond_mask_<mode><vm>"
> + [(set (match_operand:V_VLS 0 "register_operand")
> + (if_then_else:V_VLS
> + (match_operand:<VM> 3 "register_operand")
> + (match_operand:V_VLS 1 "nonmemory_operand")
> + (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
> + "TARGET_VECTOR"
Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?
> + {
> + if (satisfies_constraint_Wc0 (operands[2]))
> + {
> + rtx reg = gen_reg_rtx (<MODE>mode);
> + emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between? And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do? Probably initial values other than 0 don't work out of the box?
In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.
Regards
Robin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-18 21:18 ` Robin Dapp
@ 2023-09-19 2:50 ` Lehua Ding
2023-09-19 13:48 ` Robin Dapp
0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-19 2:50 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Robin,
>> +(define_expand "vcond_mask_<mode><vm>"
>> + [(set (match_operand:V_VLS 0 "register_operand")
>> + (if_then_else:V_VLS
>> + (match_operand:<VM> 3 "register_operand")
>> + (match_operand:V_VLS 1 "nonmemory_operand")
>> + (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
>> + "TARGET_VECTOR"
>
> Would it hurt to allow any nonmemory operand here and just force the
> "unsupported" constants into a register?
Are you talking about why operand 2 doesn't use nonmemory_operand
predicate? If so, I think this is because our vmerge.v[vxi]m insns only
supports that operand 1 is a scalar and operand 2 must be a vector register.
>
>> + {
>> + if (satisfies_constraint_Wc0 (operands[2]))
>> + {
>> + rtx reg = gen_reg_rtx (<MODE>mode);
>> + emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
>
> Can't we emit a move_insn directly without going through the new pattern?
> Or will that be optimized away in between?
Currently, the move_insn of moving vector const 0 to a vector register
will produce the bellow rtl from vector.md, I hope keep the simple
pattern before split1 pass:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
> And the new pattern isn't
> actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
> already do? Probably initial values other than 0 don't work out of the box?
Yes, the name vec_duplicate_const_0 is missleading, maybe change it to
mov_vec_const_0 is better.
> In any case it wouldn't hurt to describe the "design decisions" (i.e. why
> we need one thing and not another) so it's easier to follow the patterns
> in the future.
Yes, I should have described the design in more detail. Before this
patch, if I wanted to implement the current combine optimization, I
would need to create a combine pattern that combines the following three
instructions:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI (unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
(if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
(sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
(reg:RVVM2HI 149)))
(set (reg:HI 151)
(unspec:HI [
(reg:RVVM2HI 138 [ vect__ifc__38.17 ])
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SF 139 [ <retval> ])
(unspec:SF [
(if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
(float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
(if_then_else:RVVM2SF (unspec:RVVMF16BI [
(const_vector:RVVMF16BI repeat [
(const_int 1 [0x1])
])
(reg:DI 143)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2SF repeat [
(const_double:SF 0.0 [0x0.0p+0])
])
(unspec:RVVM2SF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
] UNSPEC_REDUC_SUM_UNORDERED))
But I think the combine pattern is ugly because the
move_vector_const_0_to_vector_registers has many extra operands we don't
care about. So I want to introduce a simple pattern of it. I have two
options here, one is to modify the generic mov pattern, changing
define_expand to define_insn_and_split to keep pattern simple. The
reason why I did not choose this one is that mov pattern uses a lot of
places, including reload pass. The impact is relatively big.
The other option (this patch chooses) is to split the vcond_mask pattern
for this particular case, allowing vec_const_0 to be passed into operand
2, and then generating it into a simple mov pattern. The purpose of
splitting the vcond_mask pattern into three patterns is to generate a
simple mov pattern. Simply allowing operand 2 of vcond_mask to allow vec
const 0 will invalidate other existing combine patterns because they
only allow the operand to be register. I wonder if these details explain
why?
--
Best,
Lehua
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-19 2:50 ` Lehua Ding
@ 2023-09-19 13:48 ` Robin Dapp
2023-09-19 15:58 ` Lehua Ding
0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-19 13:48 UTC (permalink / raw)
To: Lehua Ding, gcc-patches
Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Lehua,
>> Would it hurt to allow any nonmemory operand here and just force the
>> "unsupported" constants into a register?
>
> Are you talking about why operand 2 doesn't use nonmemory_operand
> predicate? If so, I think this is because our vmerge.v[vxi]m insns
> only supports that operand 1 is a scalar and operand 2 must be a
> vector register.
My question was rather:
Why doesn't something like
(define_insn_and_split "vcond_mask_<mode><vm>"
[(set (match_operand:V_VLS 0 "register_operand")
(if_then_else:V_VLS
(match_operand:<VM> 3 "register_operand")
(match_operand:V_VLS 1 "nonmemory_operand")
(match_operand:V_VLS 2 "nonmemory_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
[(const_int 0)]
{
/* The order of vcond_mask is opposite to pred_merge. */
if (REG_P (operands[2]))
operands[2] = force_reg (<MODE>mode, operands[2]);
std::swap (operands[1], operands[2]);
riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
riscv_vector::MERGE_OP, operands);
DONE;
}
[(set_attr "type" "vector")]
)
suffice? You could disallow operands[2] != 0 if needed.
Regards
Robin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-19 13:48 ` Robin Dapp
@ 2023-09-19 15:58 ` Lehua Ding
2023-09-19 22:02 ` Robin Dapp
0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-19 15:58 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Robin,
>>> Would it hurt to allow any nonmemory operand here and just force the
>>> "unsupported" constants into a register?
>>
>> Are you talking about why operand 2 doesn't use nonmemory_operand
>> predicate? If so, I think this is because our vmerge.v[vxi]m insns
>> only supports that operand 1 is a scalar and operand 2 must be a
>> vector register.
>
> My question was rather:
>
> Why doesn't something like
>
> (define_insn_and_split "vcond_mask_<mode><vm>"
> [(set (match_operand:V_VLS 0 "register_operand")
> (if_then_else:V_VLS
> (match_operand:<VM> 3 "register_operand")
> (match_operand:V_VLS 1 "nonmemory_operand")
> (match_operand:V_VLS 2 "nonmemory_operand")))]
> "TARGET_VECTOR && can_create_pseudo_p ()"
> "#"
> "&& 1"
> [(const_int 0)]
> {
> /* The order of vcond_mask is opposite to pred_merge. */
> if (REG_P (operands[2]))
> operands[2] = force_reg (<MODE>mode, operands[2]);
> std::swap (operands[1], operands[2]);
> riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
> riscv_vector::MERGE_OP, operands);
> DONE;
> }
> [(set_attr "type" "vector")]
> )
>
> suffice? You could disallow operands[2] != 0 if needed.
I think I understand what you're saying. The reason for not doing this
is because simply allowing the operand 2 of vcond_mask to be vec_const_0
would cause the other combine patterns to fail because they require the
operand 2 of vcond_mask to be a register. Like the following existing
combine patterns (operand 2 as the instruction merge operand, is not
allowed to be non-register):
(define_insn_and_split "*cond_<optab><mode>"
[(set (match_operand:VF 0 "register_operand")
(if_then_else:VF
(match_operand:<VM> 1 "register_operand")
(any_float_unop:VF
(match_operand:VF 2 "register_operand"))
(match_operand:VF 3 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
[(const_int 0)]
{
insn_code icode = code_for_pred (<CODE>, <MODE>mode);
rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
riscv_vector::expand_cond_len_unop (icode, ops);
DONE;
}
[(set_attr "type" "vector")])
My current method is still to keep the operand 2 of vcond_mask as a
register, but the pattern of mov_vec_const_0 is simplified, so that the
corresponding combine pattern can be more simple. That's the only reason
I split the vcond_mask into three patterns.
--
Best,
Lehua
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-19 15:58 ` Lehua Ding
@ 2023-09-19 22:02 ` Robin Dapp
2023-09-20 4:02 ` Lehua Ding
0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-19 22:02 UTC (permalink / raw)
To: Lehua Ding, gcc-patches
Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Lehua,
thanks for the explanation.
> My current method is still to keep the operand 2 of vcond_mask as a
> register, but the pattern of mov_vec_const_0 is simplified, so that
> the corresponding combine pattern can be more simple. That's the only
> reason I split the vcond_mask into three patterns.
My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it. It's not that
bad of course but it appears as if we messed up somewhere else.
I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a register)
and a force_reg in abs:VF.
Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner? I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)
Regards
Robin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-19 22:02 ` Robin Dapp
@ 2023-09-20 4:02 ` Lehua Ding
2023-09-20 8:03 ` Lehua Ding
0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-20 4:02 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw
On 2023/9/20 6:02, Robin Dapp wrote:
> Hi Lehua,
>
> thanks for the explanation.
>
>> My current method is still to keep the operand 2 of vcond_mask as a
>> register, but the pattern of mov_vec_const_0 is simplified, so that
>> the corresponding combine pattern can be more simple. That's the only
>> reason I split the vcond_mask into three patterns.
>
> My "problem" with the separate split it that it really sticks out
> and everybody seeing it would wonder why we need it. It's not that
> bad of course but it appears as if we messed up somewhere else.
Can not agree more.
> I checked and I don't see additional FAILs with the vmask pattern
> that additionally allows a const0 operand (that is forced into a register)
> and a force_reg in abs:VF.
>
> Would you mind re-checking if we can avoid the extra
> "vec_duplicate_const_0" by changing the other affected patterns
> in a similar manner? I really didn't verify in-depth so if we needed
> to add a force_reg to every pattern we might need to reconsider.
> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
> additional force_regs ;)
I think that's a little weird, too. I prefer to add a single
define_insn_and_split move_const_0 pattern like following diff, How do
you feel about that?
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f4dab9fceb8..c0ab96ae8ab 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx
(mode);
/* Element in range -16 ~ 15 integer or 0.0 floating-point,
we use vmv.v.i instruction. */
- if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+ /* For const int or float 0, we keep the simple pattern before split1
+ pass. */
+ if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
+ emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
+ else if (satisfies_constraint_vi (src))
{
rtx ops[] = {tmp, src};
emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f66ffebba24..b4973125d04 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1640,6 +1640,19 @@
"TARGET_VECTOR"
{})
+(define_insn_and_split "@mov_vec_const_0<mode>"
+ [(set (match_operand:V_VLS 0 "register_operand")
+ (match_operand:V_VLS 1 "vector_const_0_operand"))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+ }
+ [(set_attr "type" "vector")])
+
;; vle.v/vse.v,vmv.v.v
(define_insn_and_split "*pred_mov<mode>"
[(set (match_operand:V_VLS 0 "nonimmediate_operand" "=vr,
vr, vd, m, vr, vr")
--
Best,
Lehua
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
2023-09-20 4:02 ` Lehua Ding
@ 2023-09-20 8:03 ` Lehua Ding
0 siblings, 0 replies; 8+ messages in thread
From: Lehua Ding @ 2023-09-20 8:03 UTC (permalink / raw)
To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw
Hi Robin,
I have posted a V2 patch to implement the method I mentioned. I wonder
if that makes it a little easier to understand?
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630985.html
On 2023/9/20 12:02, Lehua Ding wrote:
>
>
> On 2023/9/20 6:02, Robin Dapp wrote:
>> Hi Lehua,
>>
>> thanks for the explanation.
>>
>>> My current method is still to keep the operand 2 of vcond_mask as a
>>> register, but the pattern of mov_vec_const_0 is simplified, so that
>>> the corresponding combine pattern can be more simple. That's the only
>>> reason I split the vcond_mask into three patterns.
>>
>> My "problem" with the separate split it that it really sticks out
>> and everybody seeing it would wonder why we need it. It's not that
>> bad of course but it appears as if we messed up somewhere else.
>
> Can not agree more.
>
>> I checked and I don't see additional FAILs with the vmask pattern
>> that additionally allows a const0 operand (that is forced into a
>> register)
>> and a force_reg in abs:VF.
>>
>> Would you mind re-checking if we can avoid the extra
>> "vec_duplicate_const_0" by changing the other affected patterns
>> in a similar manner? I really didn't verify in-depth so if we needed
>> to add a force_reg to every pattern we might need to reconsider.
>> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
>> additional force_regs ;)
>
> I think that's a little weird, too. I prefer to add a single
> define_insn_and_split move_const_0 pattern like following diff, How do
> you feel about that?
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index f4dab9fceb8..c0ab96ae8ab 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
> rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx
> (mode);
> /* Element in range -16 ~ 15 integer or 0.0 floating-point,
> we use vmv.v.i instruction. */
> - if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
> + /* For const int or float 0, we keep the simple pattern before
> split1
> + pass. */
> + if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
> + emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
> + else if (satisfies_constraint_vi (src))
> {
> rtx ops[] = {tmp, src};
> emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index f66ffebba24..b4973125d04 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1640,6 +1640,19 @@
> "TARGET_VECTOR"
> {})
>
> +(define_insn_and_split "@mov_vec_const_0<mode>"
> + [(set (match_operand:V_VLS 0 "register_operand")
> + (match_operand:V_VLS 1 "vector_const_0_operand"))]
> + "TARGET_VECTOR && can_create_pseudo_p ()"
> + "#"
> + "&& 1"
> + [(const_int 0)]
> + {
> + emit_move_insn (operands[0], operands[1]);
> + DONE;
> + }
> + [(set_attr "type" "vector")])
> +
> ;; vle.v/vse.v,vmv.v.v
> (define_insn_and_split "*pred_mov<mode>"
> [(set (match_operand:V_VLS 0 "nonimmediate_operand" "=vr,
> vr, vd, m, vr, vr")
>
--
Best,
Lehua
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-20 8:03 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 15:36 [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum Lehua Ding
2023-09-18 21:18 ` Robin Dapp
2023-09-19 2:50 ` Lehua Ding
2023-09-19 13:48 ` Robin Dapp
2023-09-19 15:58 ` Lehua Ding
2023-09-19 22:02 ` Robin Dapp
2023-09-20 4:02 ` Lehua Ding
2023-09-20 8:03 ` Lehua Ding
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).