public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
@ 2023-09-18 15:36 Lehua Ding
  2023-09-18 21:18 ` Robin Dapp
  0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-18 15:36 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, rdapp.gcc, palmer, jeffreyalaw, lehua.ding

This patch support combining cond extend and reduce_sum to cond widen reduce_sum
like combine the following three insns:
  (set (reg:RVVM2HI 149)
       (const_vector:RVVM2HI repeat [
          (const_int 0)
       ]))
  (set (reg:RVVM2HI 138)
    (if_then_else:RVVM2HI
      (reg:RVVMF8BI 135)
      (reg:RVVM2HI 148)
      (reg:RVVM2HI 149)))
  (set (reg:HI 150)
    (unspec:HI [
      (reg:RVVM2HI 138)
    ] UNSPEC_REDUC_SUM))
into one insn:
  (set (reg:SI 147)
    (unspec:SI [
      (if_then_else:RVVM2SI
        (reg:RVVMF16BI 135)
        (sign_extend:RVVM2SI (reg:RVVM1HI 136))
        (const_vector:RVVM2SI repeat [
          (const_int 0)
        ]))
    ] UNSPEC_REDUC_SUM))

Consider the following C code:

int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
  int16_t sum = 0;
  for (int i = 0; i < 16; i += 1)
    if (pred[i])
      sum += a[i];
  return sum;
}

assembly before this patch:

foo:
        vsetivli        zero,16,e16,m2,ta,ma
        li      a5,0
        vmv.v.i v2,0
        vsetvli zero,zero,e8,m1,ta,ma
        vl1re8.v        v0,0(a1)
        vmsne.vi        v0,v0,0
        vsetvli zero,zero,e16,m2,ta,mu
        vle8.v  v4,0(a0),v0.t
        vmv.s.x v1,a5
        vsext.vf2       v2,v4,v0.t
        vredsum.vs      v2,v2,v1
        vmv.x.s a0,v2
        slliw   a0,a0,16
        sraiw   a0,a0,16
        ret

assembly after this patch:

foo:
	li	a5,0
	vsetivli	zero,16,e16,m1,ta,ma
	vmv.s.x	v3,a5
	vsetivli	zero,16,e8,m1,ta,ma
	vl1re8.v	v0,0(a1)
	vmsne.vi	v0,v0,0
	vle8.v	v2,0(a0),v0.t
	vwredsum.vs	v1,v2,v3,v0.t
	vsetivli	zero,0,e16,m1,ta,ma
	vmv.x.s	a0,v1
	slliw	a0,a0,16
	sraiw	a0,a0,16
	ret

gcc/ChangeLog:

	* config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>):
	New combine patterns.
	* config/riscv/autovec.md (vcond_mask_<mode><vm>):
	Split vcond_mask pattern into three patterns.
	(vec_duplicate_const_0<mode>): Ditto.
	(*vcond_mask_<mode><vm>): Ditto.
	* config/riscv/predicates.md (vector_register_or_const_0_operand): New.
	* config/riscv/riscv-protos.h (enum insn_type): Add REDUCE_OP_M.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test.
	* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test.

---
 gcc/config/riscv/autovec-opt.md               | 48 +++++++++++++++
 gcc/config/riscv/autovec.md                   | 59 ++++++++++++++++++-
 gcc/config/riscv/predicates.md                |  5 ++
 gcc/config/riscv/riscv-protos.h               |  1 +
 .../rvv/autovec/cond/cond_widen_reduc-1.c     | 30 ++++++++++
 .../rvv/autovec/cond/cond_widen_reduc_run-1.c | 28 +++++++++
 6 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index b47bae16193..eefa4f28a0a 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1284,6 +1284,54 @@
 }
 [(set_attr "type" "vector")])

+;; Combine mask extend + vredsum to mask vwredsum[u]
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+  [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+        (unspec:<V_DOUBLE_EXTEND_VEL> [
+          (if_then_else:<V_DOUBLE_EXTEND>
+            (match_operand:<VM> 1 "register_operand")
+            (any_extend:<V_DOUBLE_EXTEND>
+              (match_operand:VI_QHS_NO_M8 2 "register_operand"))
+            (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+        ] UNSPEC_REDUC_SUM))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx ops[] = {operands[0], operands[2], operands[1],
+               gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+  riscv_vector::expand_reduction (<WREDUC_UNSPEC>,
+                                  riscv_vector::REDUCE_OP_M,
+                                  ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
+;; Combine mask extend + vfredsum to mask vfwredusum
+(define_insn_and_split "*cond_widen_reduc_plus_scal_<mode>"
+  [(set (match_operand:<V_DOUBLE_EXTEND_VEL> 0 "register_operand")
+        (unspec:<V_DOUBLE_EXTEND_VEL> [
+          (if_then_else:<V_DOUBLE_EXTEND>
+            (match_operand:<VM> 1 "register_operand")
+            (float_extend:<V_DOUBLE_EXTEND>
+              (match_operand:VF_HS_NO_M8 2 "register_operand"))
+            (match_operand:<V_DOUBLE_EXTEND> 3 "vector_const_0_operand"))
+        ] UNSPEC_REDUC_SUM_UNORDERED))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx ops[] = {operands[0], operands[2], operands[1],
+               gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+  riscv_vector::expand_reduction (UNSPEC_WREDUC_SUM_UNORDERED,
+                                  riscv_vector::REDUCE_OP_M_FRM_DYN,
+                                  ops, CONST0_RTX (<V_DOUBLE_EXTEND_VEL>mode));
+  DONE;
+}
+[(set_attr "type" "vector")])
+
 ;; =============================================================================
 ;; Misc combine patterns
 ;; =============================================================================
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 493d5745485..20a71ad8ced 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -545,7 +545,64 @@
 ;; - vfmerge.vf
 ;; -------------------------------------------------------------------------

-(define_insn_and_split "vcond_mask_<mode><vm>"
+;; The purpose of splitting the original pattern into three patterns here is
+;; to combine the following three insns:
+;;   (set (reg:RVVM2HI 149)
+;;        (const_vector:RVVM2HI repeat [
+;;           (const_int 0)
+;;        ]))
+;;   (set (reg:RVVM2HI 138)
+;;     (if_then_else:RVVM2HI
+;;       (reg:RVVMF8BI 135)
+;;       (reg:RVVM2HI 148)
+;;       (reg:RVVM2HI 149)))
+;;   (set (reg:HI 150)
+;;     (unspec:HI [
+;;       (reg:RVVM2HI 138)
+;;     ] UNSPEC_REDUC_SUM))
+;;
+;; into one insn:
+;;
+;;   (set (reg:SI 147)
+;;     (unspec:SI [
+;;       (if_then_else:RVVM2SI
+;;         (reg:RVVMF16BI 135)
+;;         (sign_extend:RVVM2SI (reg:RVVM1HI 136))
+;;         (const_vector:RVVM2SI repeat [
+;;           (const_int 0)
+;;         ]))
+;;     ] UNSPEC_REDUC_SUM))
+
+(define_expand "vcond_mask_<mode><vm>"
+  [(set (match_operand:V_VLS 0 "register_operand")
+        (if_then_else:V_VLS
+          (match_operand:<VM> 3 "register_operand")
+          (match_operand:V_VLS 1 "nonmemory_operand")
+          (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
+  "TARGET_VECTOR"
+  {
+    if (satisfies_constraint_Wc0 (operands[2]))
+      {
+        rtx reg = gen_reg_rtx (<MODE>mode);
+        emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
+        operands[2] = reg;
+      }
+  })
+
+(define_insn_and_split "vec_duplicate_const_0<mode>"
+  [(set (match_operand:V_VLS 0 "register_operand")
+        (match_operand:V_VLS 1 "vector_const_0_operand"))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  emit_move_insn (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vector")])
+
+(define_insn_and_split "*vcond_mask_<mode><vm>"
   [(set (match_operand:V_VLS 0 "register_operand")
         (if_then_else:V_VLS
           (match_operand:<VM> 3 "register_operand")
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 4bc7ff2c9d8..6abf9d97958 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -463,6 +463,11 @@
   (ior (match_operand 0 "register_operand")
        (match_code "const_vector")))

+(define_predicate "vector_register_or_const_0_operand"
+  (ior (match_operand 0 "register_operand")
+       (and (match_code "const_vector")
+            (match_test "satisfies_constraint_Wc0 (op)"))))
+
 (define_predicate "vector_gs_scale_operand_16"
   (and (match_code "const_int")
        (match_test "INTVAL (op) == 1 || INTVAL (op) == 2")))
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5a2d218d67b..fd6107ccb5c 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -337,6 +337,7 @@ enum insn_type : unsigned int

   /* For vreduce, no mask policy operand. */
   REDUCE_OP = __NORMAL_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
+  REDUCE_OP_M = __MASK_OP_TA | BINARY_OP_P | VTYPE_MODE_FROM_OP1_P,
   REDUCE_OP_FRM_DYN = REDUCE_OP | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
   REDUCE_OP_M_FRM_DYN
   = __MASK_OP_TA | BINARY_OP_P | FRM_DYN_P | VTYPE_MODE_FROM_OP1_P,
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
new file mode 100644
index 00000000000..22a71048684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh_zvl128b -mabi=lp64d --param riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math" } */
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, N)                                             \
+  __attribute__ ((noipa))                                                      \
+  TYPE1 reduc_##TYPE1##_##TYPE2 (TYPE2 *restrict a, TYPE2 *restrict pred)      \
+  {                                                                            \
+    TYPE1 sum = 0;                                                             \
+    for (int i = 0; i < N; i += 1)                                             \
+      if (pred[i])                                                             \
+	sum += a[i];                                                           \
+    return sum;                                                                \
+  }
+
+#define TEST_ALL(TEST)                                                         \
+  TEST (int16_t, int8_t, 16)                                                   \
+  TEST (int32_t, int16_t, 8)                                                   \
+  TEST (int64_t, int32_t, 4)                                                   \
+  TEST (uint16_t, uint8_t, 16)                                                 \
+  TEST (uint32_t, uint16_t, 8)                                                 \
+  TEST (uint64_t, uint32_t, 4)                                                 \
+  TEST (float, _Float16, 8)                                                    \
+  TEST (double, float, 4)
+
+TEST_ALL (TEST_TYPE)
+
+/* { dg-final { scan-assembler-times {\tvfwredusum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 2 } } */
+/* { dg-final { scan-assembler-times {\tvwredsum\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwredsumu\.vs\tv[0-9]+,v[0-9]+,v[0-9]+,v0\.t} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
new file mode 100644
index 00000000000..fdb7e5249ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "cond_widen_reduc-1.c"
+
+#define RUN(TYPE1, TYPE2, N)                                                   \
+  {                                                                            \
+    TYPE2 a[N];                                                                \
+    TYPE2 pred[N];                                                             \
+    TYPE1 r = 0;                                                               \
+    for (int i = 0; i < N; i++)                                                \
+      {                                                                        \
+	a[i] = (i * 0.1) * (i & 1 ? 1 : -1);                                   \
+	pred[i] = i % 3;                                                       \
+	if (pred[i])                                                           \
+	  r += a[i];                                                           \
+	asm volatile ("" ::: "memory");                                        \
+      }                                                                        \
+    if (r != reduc_##TYPE1##_##TYPE2 (a, pred))                                \
+      __builtin_abort ();                                                      \
+  }
+
+int __attribute__ ((optimize (1)))
+main ()
+{
+  TEST_ALL (RUN)
+  return 0;
+}
--
2.36.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-18 15:36 [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum Lehua Ding
@ 2023-09-18 21:18 ` Robin Dapp
  2023-09-19  2:50   ` Lehua Ding
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-18 21:18 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches
  Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Lehua,

> +(define_expand "vcond_mask_<mode><vm>"
> +  [(set (match_operand:V_VLS 0 "register_operand")
> +        (if_then_else:V_VLS
> +          (match_operand:<VM> 3 "register_operand")
> +          (match_operand:V_VLS 1 "nonmemory_operand")
> +          (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
> +  "TARGET_VECTOR"

Would it hurt to allow any nonmemory operand here and just force the
"unsupported" constants into a register?

> +  {
> +    if (satisfies_constraint_Wc0 (operands[2]))
> +      {
> +        rtx reg = gen_reg_rtx (<MODE>mode);
> +        emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));

Can't we emit a move_insn directly without going through the new pattern?
Or will that be optimized away in between?  And the new pattern isn't
actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
already do?  Probably initial values other than 0 don't work out of the box?

In any case it wouldn't hurt to describe the "design decisions" (i.e. why
we need one thing and not another) so it's easier to follow the patterns
in the future.

Regards
 Robin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-18 21:18 ` Robin Dapp
@ 2023-09-19  2:50   ` Lehua Ding
  2023-09-19 13:48     ` Robin Dapp
  0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-19  2:50 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Robin,

>> +(define_expand "vcond_mask_<mode><vm>"
>> +  [(set (match_operand:V_VLS 0 "register_operand")
>> +        (if_then_else:V_VLS
>> +          (match_operand:<VM> 3 "register_operand")
>> +          (match_operand:V_VLS 1 "nonmemory_operand")
>> +          (match_operand:V_VLS 2 "vector_register_or_const_0_operand")))]
>> +  "TARGET_VECTOR"
> 
> Would it hurt to allow any nonmemory operand here and just force the
> "unsupported" constants into a register?

Are you talking about why operand 2 doesn't use nonmemory_operand 
predicate? If so, I think this is because our vmerge.v[vxi]m insns only 
supports that operand 1 is a scalar and operand 2 must be a vector register.

> 
>> +  {
>> +    if (satisfies_constraint_Wc0 (operands[2]))
>> +      {
>> +        rtx reg = gen_reg_rtx (<MODE>mode);
>> +        emit_insn (gen_vec_duplicate_const_0<mode> (reg, operands[2]));
> 
> Can't we emit a move_insn directly without going through the new pattern?
> Or will that be optimized away in between?  

Currently, the move_insn of moving vector const 0 to a vector register 
will produce the bellow rtl from vector.md, I hope keep the simple 
pattern before split1 pass:

  (set (reg:RVVM2HI 149)
          (if_then_else:RVVM2HI (unspec:RVVMF8BI [
                      (const_vector:RVVMF8BI repeat [
                              (const_int 1 [0x1])
                          ])
                      (reg:DI 146)
                      (const_int 2 [0x2]) repeated x2
                      (const_int 1 [0x1])
                      (reg:SI 66 vl)
                      (reg:SI 67 vtype)
                  ] UNSPEC_VPREDICATE)
              (const_vector:RVVM2HI repeat [
                      (const_int 0 [0])
                  ])
              (unspec:RVVM2HI [
                      (reg:SI 0 zero)
                  ] UNSPEC_VUNDEF)))

> And the new pattern isn't
> actually a duplicate but a move anyway so maybe a force_reg (operands[2]) would
> already do?  Probably initial values other than 0 don't work out of the box?

Yes, the name vec_duplicate_const_0 is missleading, maybe change it to 
mov_vec_const_0 is better.

> In any case it wouldn't hurt to describe the "design decisions" (i.e. why
> we need one thing and not another) so it's easier to follow the patterns
> in the future.
Yes, I should have described the design in more detail. Before this 
patch, if I wanted to implement the current combine optimization, I 
would need to create a combine pattern that combines the following three 
instructions:

        (set (reg:RVVM2HI 149)
          (if_then_else:RVVM2HI (unspec:RVVMF8BI [
                      (const_vector:RVVMF8BI repeat [
                              (const_int 1 [0x1])
                          ])
                      (reg:DI 146)
                      (const_int 2 [0x2]) repeated x2
                      (const_int 1 [0x1])
                      (reg:SI 66 vl)
                      (reg:SI 67 vtype)
                  ] UNSPEC_VPREDICATE)
              (const_vector:RVVM2HI repeat [
                      (const_int 0 [0])
                  ])
              (unspec:RVVM2HI [
                      (reg:SI 0 zero)
                  ] UNSPEC_VUNDEF)))

        (set (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
          (if_then_else:RVVM2HI (reg:RVVMF8BI 135 [ mask__18.11 ])
              (sign_extend:RVVM2HI (reg:RVVM1QI 136 [ vect__6.14 ]))
              (reg:RVVM2HI 149)))

        (set (reg:HI 151)
          (unspec:HI [
                  (reg:RVVM2HI 138 [ vect__ifc__38.17 ])
              ] UNSPEC_REDUC_SUM))

into one insn:
    (set (reg:SF 139 [ <retval> ])
         (unspec:SF [
              (if_then_else:RVVM2SF (reg:RVVMF16BI 135 [ mask__11.32 ])
                  (float_extend:RVVM2SF (reg:RVVM1HF 136 [ vect__7.35 ]))
                  (if_then_else:RVVM2SF (unspec:RVVMF16BI [
                              (const_vector:RVVMF16BI repeat [
                                      (const_int 1 [0x1])
                                  ])
                              (reg:DI 143)
                              (const_int 2 [0x2]) repeated x2
                              (const_int 1 [0x1])
                              (reg:SI 66 vl)
                              (reg:SI 67 vtype)
                          ] UNSPEC_VPREDICATE)
                      (const_vector:RVVM2SF repeat [
                              (const_double:SF 0.0 [0x0.0p+0])
                          ])
                      (unspec:RVVM2SF [
                              (reg:SI 0 zero)
                          ] UNSPEC_VUNDEF)))
          ] UNSPEC_REDUC_SUM_UNORDERED))

But I think the combine pattern is ugly because the 
move_vector_const_0_to_vector_registers has many extra operands we don't 
care about. So I want to introduce a simple pattern of it. I have two 
options here, one is to modify the generic mov pattern, changing 
define_expand to define_insn_and_split to keep pattern simple. The 
reason why I did not choose this one is that mov pattern uses a lot of 
places, including reload pass. The impact is relatively big.

The other option (this patch chooses) is to split the vcond_mask pattern 
for this particular case, allowing vec_const_0 to be passed into operand 
2, and then generating it into a simple mov pattern. The purpose of 
splitting the vcond_mask pattern into three patterns is to generate a 
simple mov pattern. Simply allowing operand 2 of vcond_mask to allow vec 
const 0 will invalidate other existing combine patterns because they 
only allow the operand to be register. I wonder if these details explain 
why?

-- 
Best,
Lehua

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-19  2:50   ` Lehua Ding
@ 2023-09-19 13:48     ` Robin Dapp
  2023-09-19 15:58       ` Lehua Ding
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-19 13:48 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches
  Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Lehua,

>> Would it hurt to allow any nonmemory operand here and just force the
>> "unsupported" constants into a register?
> 
> Are you talking about why operand 2 doesn't use nonmemory_operand
> predicate? If so, I think this is because our vmerge.v[vxi]m insns
> only supports that operand 1 is a scalar and operand 2 must be a
> vector register.

My question was rather:

Why doesn't something like

(define_insn_and_split "vcond_mask_<mode><vm>"
  [(set (match_operand:V_VLS 0 "register_operand")
        (if_then_else:V_VLS
          (match_operand:<VM> 3 "register_operand")
          (match_operand:V_VLS 1 "nonmemory_operand")
          (match_operand:V_VLS 2 "nonmemory_operand")))]
  "TARGET_VECTOR && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(const_int 0)]
  {
    /* The order of vcond_mask is opposite to pred_merge.  */
    if (REG_P (operands[2]))
      operands[2] = force_reg (<MODE>mode, operands[2]);
    std::swap (operands[1], operands[2]);
    riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
                                   riscv_vector::MERGE_OP, operands);
    DONE;
  }
  [(set_attr "type" "vector")]
)

suffice?  You could disallow operands[2] != 0 if needed.

Regards
 Robin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-19 13:48     ` Robin Dapp
@ 2023-09-19 15:58       ` Lehua Ding
  2023-09-19 22:02         ` Robin Dapp
  0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-19 15:58 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Robin,

>>> Would it hurt to allow any nonmemory operand here and just force the
>>> "unsupported" constants into a register?
>>
>> Are you talking about why operand 2 doesn't use nonmemory_operand
>> predicate? If so, I think this is because our vmerge.v[vxi]m insns
>> only supports that operand 1 is a scalar and operand 2 must be a
>> vector register.
> 
> My question was rather:
> 
> Why doesn't something like
> 
> (define_insn_and_split "vcond_mask_<mode><vm>"
>    [(set (match_operand:V_VLS 0 "register_operand")
>          (if_then_else:V_VLS
>            (match_operand:<VM> 3 "register_operand")
>            (match_operand:V_VLS 1 "nonmemory_operand")
>            (match_operand:V_VLS 2 "nonmemory_operand")))]
>    "TARGET_VECTOR && can_create_pseudo_p ()"
>    "#"
>    "&& 1"
>    [(const_int 0)]
>    {
>      /* The order of vcond_mask is opposite to pred_merge.  */
>      if (REG_P (operands[2]))
>        operands[2] = force_reg (<MODE>mode, operands[2]);
>      std::swap (operands[1], operands[2]);
>      riscv_vector::emit_vlmax_insn (code_for_pred_merge (<MODE>mode),
>                                     riscv_vector::MERGE_OP, operands);
>      DONE;
>    }
>    [(set_attr "type" "vector")]
> )
> 
> suffice?  You could disallow operands[2] != 0 if needed.

I think I understand what you're saying. The reason for not doing this 
is because simply allowing the operand 2 of vcond_mask to be vec_const_0 
would cause the other combine patterns to fail because they require the 
operand 2 of vcond_mask to be a register. Like the following existing 
combine patterns (operand 2 as the instruction merge operand, is not 
allowed to be non-register):

(define_insn_and_split "*cond_<optab><mode>"
   [(set (match_operand:VF 0 "register_operand")
      (if_then_else:VF
        (match_operand:<VM> 1 "register_operand")
        (any_float_unop:VF
          (match_operand:VF 2 "register_operand"))
        (match_operand:VF 3 "register_operand")))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
{
   insn_code icode = code_for_pred (<CODE>, <MODE>mode);
   rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
                gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
   riscv_vector::expand_cond_len_unop (icode, ops);
   DONE;
}
[(set_attr "type" "vector")])

My current method is still to keep the operand 2 of vcond_mask as a 
register, but the pattern of mov_vec_const_0 is simplified, so that the 
corresponding combine pattern can be more simple. That's the only reason 
I split the vcond_mask into three patterns.

-- 
Best,
Lehua

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-19 15:58       ` Lehua Ding
@ 2023-09-19 22:02         ` Robin Dapp
  2023-09-20  4:02           ` Lehua Ding
  0 siblings, 1 reply; 8+ messages in thread
From: Robin Dapp @ 2023-09-19 22:02 UTC (permalink / raw)
  To: Lehua Ding, gcc-patches
  Cc: rdapp.gcc, juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Lehua,

thanks for the explanation.

> My current method is still to keep the operand 2 of vcond_mask as a
> register, but the pattern of mov_vec_const_0 is simplified, so that
> the corresponding combine pattern can be more simple. That's the only
> reason I split the vcond_mask into three patterns.

My "problem" with the separate split it that it really sticks out
and everybody seeing it would wonder why we need it.  It's not that
bad of course but it appears as if we messed up somewhere else. 

I checked and I don't see additional FAILs with the vmask pattern
that additionally allows a const0 operand (that is forced into a register)
and a force_reg in abs:VF.

Would you mind re-checking if we can avoid the extra
"vec_duplicate_const_0" by changing the other affected patterns
in a similar manner?  I really didn't verify in-depth so if we needed
to add a force_reg to every pattern we might need to reconsider.
Still, I'd be unsure if I preferred the "vec_dup_const_0" over
additional force_regs ;)

Regards
 Robin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-19 22:02         ` Robin Dapp
@ 2023-09-20  4:02           ` Lehua Ding
  2023-09-20  8:03             ` Lehua Ding
  0 siblings, 1 reply; 8+ messages in thread
From: Lehua Ding @ 2023-09-20  4:02 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw



On 2023/9/20 6:02, Robin Dapp wrote:
> Hi Lehua,
> 
> thanks for the explanation.
> 
>> My current method is still to keep the operand 2 of vcond_mask as a
>> register, but the pattern of mov_vec_const_0 is simplified, so that
>> the corresponding combine pattern can be more simple. That's the only
>> reason I split the vcond_mask into three patterns.
> 
> My "problem" with the separate split it that it really sticks out
> and everybody seeing it would wonder why we need it.  It's not that
> bad of course but it appears as if we messed up somewhere else.

Can not agree more.

> I checked and I don't see additional FAILs with the vmask pattern
> that additionally allows a const0 operand (that is forced into a register)
> and a force_reg in abs:VF.
> 
> Would you mind re-checking if we can avoid the extra
> "vec_duplicate_const_0" by changing the other affected patterns
> in a similar manner?  I really didn't verify in-depth so if we needed
> to add a force_reg to every pattern we might need to reconsider.
> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
> additional force_regs ;)

I think that's a little weird, too. I prefer to add a single 
define_insn_and_split move_const_0 pattern like following diff, How do 
you feel about that?

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index f4dab9fceb8..c0ab96ae8ab 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
        rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx 
(mode);
        /* Element in range -16 ~ 15 integer or 0.0 floating-point,
  	 we use vmv.v.i instruction.  */
-      if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
+      /* For const int or float 0, we keep the simple pattern before split1
+	 pass. */
+      if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
+	emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
+      else if (satisfies_constraint_vi (src))
  	{
  	  rtx ops[] = {tmp, src};
  	  emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f66ffebba24..b4973125d04 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1640,6 +1640,19 @@
    "TARGET_VECTOR"
    {})

+(define_insn_and_split "@mov_vec_const_0<mode>"
+  [(set (match_operand:V_VLS 0 "register_operand")
+        (match_operand:V_VLS 1 "vector_const_0_operand"))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+    emit_move_insn (operands[0], operands[1]);
+    DONE;
+  }
+  [(set_attr "type" "vector")])
+
  ;; vle.v/vse.v,vmv.v.v
  (define_insn_and_split "*pred_mov<mode>"
    [(set (match_operand:V_VLS 0 "nonimmediate_operand"            "=vr, 
    vr,    vd,     m,    vr,    vr")

-- 
Best,
Lehua


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum
  2023-09-20  4:02           ` Lehua Ding
@ 2023-09-20  8:03             ` Lehua Ding
  0 siblings, 0 replies; 8+ messages in thread
From: Lehua Ding @ 2023-09-20  8:03 UTC (permalink / raw)
  To: Robin Dapp, gcc-patches; +Cc: juzhe.zhong, kito.cheng, palmer, jeffreyalaw

Hi Robin,

I have posted a V2 patch to implement the method I mentioned. I wonder 
if that makes it a little easier to understand?

V2 patch: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630985.html

On 2023/9/20 12:02, Lehua Ding wrote:
> 
> 
> On 2023/9/20 6:02, Robin Dapp wrote:
>> Hi Lehua,
>>
>> thanks for the explanation.
>>
>>> My current method is still to keep the operand 2 of vcond_mask as a
>>> register, but the pattern of mov_vec_const_0 is simplified, so that
>>> the corresponding combine pattern can be more simple. That's the only
>>> reason I split the vcond_mask into three patterns.
>>
>> My "problem" with the separate split it that it really sticks out
>> and everybody seeing it would wonder why we need it.  It's not that
>> bad of course but it appears as if we messed up somewhere else.
> 
> Can not agree more.
> 
>> I checked and I don't see additional FAILs with the vmask pattern
>> that additionally allows a const0 operand (that is forced into a 
>> register)
>> and a force_reg in abs:VF.
>>
>> Would you mind re-checking if we can avoid the extra
>> "vec_duplicate_const_0" by changing the other affected patterns
>> in a similar manner?  I really didn't verify in-depth so if we needed
>> to add a force_reg to every pattern we might need to reconsider.
>> Still, I'd be unsure if I preferred the "vec_dup_const_0" over
>> additional force_regs ;)
> 
> I think that's a little weird, too. I prefer to add a single 
> define_insn_and_split move_const_0 pattern like following diff, How do 
> you feel about that?
> 
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index f4dab9fceb8..c0ab96ae8ab 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -973,7 +973,11 @@ expand_const_vector (rtx target, rtx src)
>         rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx 
> (mode);
>         /* Element in range -16 ~ 15 integer or 0.0 floating-point,
>        we use vmv.v.i instruction.  */
> -      if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
> +      /* For const int or float 0, we keep the simple pattern before 
> split1
> +     pass. */
> +      if (can_create_pseudo_p () && satisfies_constraint_Wc0 (src))
> +    emit_insn (gen_mov_vec_const_0 (mode, tmp, src));
> +      else if (satisfies_constraint_vi (src))
>       {
>         rtx ops[] = {tmp, src};
>         emit_vlmax_insn (code_for_pred_mov (mode), UNARY_OP, ops);
> 
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index f66ffebba24..b4973125d04 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1640,6 +1640,19 @@
>     "TARGET_VECTOR"
>     {})
> 
> +(define_insn_and_split "@mov_vec_const_0<mode>"
> +  [(set (match_operand:V_VLS 0 "register_operand")
> +        (match_operand:V_VLS 1 "vector_const_0_operand"))]
> +  "TARGET_VECTOR && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +    emit_move_insn (operands[0], operands[1]);
> +    DONE;
> +  }
> +  [(set_attr "type" "vector")])
> +
>   ;; vle.v/vse.v,vmv.v.v
>   (define_insn_and_split "*pred_mov<mode>"
>     [(set (match_operand:V_VLS 0 "nonimmediate_operand"            "=vr, 
>     vr,    vd,     m,    vr,    vr")
> 

-- 
Best,
Lehua

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-09-20  8:03 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-18 15:36 [PATCH] RISC-V: Support combine cond extend and reduce sum to cond widen reduce sum Lehua Ding
2023-09-18 21:18 ` Robin Dapp
2023-09-19  2:50   ` Lehua Ding
2023-09-19 13:48     ` Robin Dapp
2023-09-19 15:58       ` Lehua Ding
2023-09-19 22:02         ` Robin Dapp
2023-09-20  4:02           ` Lehua Ding
2023-09-20  8:03             ` Lehua Ding

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).