public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
@ 2023-04-27 14:30 pan2.li
  2023-04-27 14:57 ` Kito Cheng
  2023-04-28  2:46 ` [PATCH v2] " pan2.li
  0 siblings, 2 replies; 11+ messages in thread
From: pan2.li @ 2023-04-27 14:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: juzhe.zhong, kito.cheng, pan2.li, yanzhang.wang

From: Pan Li <pan2.li@intel.com>

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24                    <- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

	* config/riscv/predicates.md (comparison_simplify_to_clear_operator):
	  Add new predicate of the simplification operators.
	* config/riscv/vector.md: Add new define split to perform
	  the simplification.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>
---
 gcc/config/riscv/predicates.md                |   6 +
 gcc/config/riscv/vector.md                    |  34 ++
 .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++++++++++++++++++
 3 files changed, 331 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..1626665825b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -328,6 +328,12 @@ (define_predicate "ltge_operator"
 (define_predicate "comparison_except_ltge_operator"
   (match_code "eq,ne,le,leu,gt,gtu"))
 
+;; Some comparison operator with same operands can be simpiled to clear.
+;; For example, op[0] = ne (op[1], op[1]) => op[0] = clr (op[0]).  We sort
+;; similar comparison operators here.
+(define_predicate "comparison_simplify_to_clear_operator"
+  (match_code "ne,lt,ltu,gt,gtu"))
+
 (define_predicate "comparison_except_eqge_operator"
   (match_code "le,leu,gt,gtu,lt,ltu"))
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b3d23441679..47b97dfe69d 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7689,3 +7689,37 @@ (define_insn "@pred_fault_load<mode>"
   "vle<sew>ff.v\t%0,%3%p1"
   [(set_attr "type" "vldff")
    (set_attr "mode" "<MODE>")])
+
+;; -----------------------------------------------------------------------------
+;; ---- Integer Compare Instructions Simplification
+;; -----------------------------------------------------------------------------
+;; Simplify to VMCLR.m Includes:
+;; - 1. VMSNE
+;; - 2. VMSLT
+;; - 3. VMSLTU
+;; - 4. VMSGT
+;; - 5. VMSGTU
+;; -----------------------------------------------------------------------------
+(define_split
+  [(set (match_operand:<VM> 0 "register_operand")
+	(if_then_else:<VM>
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_all_trues_mask_operand")
+	     (match_operand      6 "vector_length_operand")
+	     (match_operand      7 "const_int_operand")
+	     (match_operand      8 "const_int_operand")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (match_operator:<VM>   3 "comparison_simplify_to_clear_operator"
+	    [(match_operand:VI   4 "register_operand")
+	     (match_operand:VI   5 "vector_arith_operand")])
+	  (match_operand:<VM>    2 "vector_merge_operand")))]
+  "TARGET_VECTOR && reload_completed && operands[4] == operands[5]"
+  [(const_int 0)]
+  {
+    emit_insn (gen_pred_mov (<VM>mode, operands[0], CONST1_RTX (<VM>mode),
+			     RVV_VUNDEF (<VM>mode), CONST0_RTX (<VM>mode),
+			     operands[6], operands[8]));
+    DONE;
+  }
+)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
new file mode 100644
index 00000000000..8954adad09d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -0,0 +1,291 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-27 14:30 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR pan2.li
@ 2023-04-27 14:57 ` Kito Cheng
  2023-04-27 15:00   ` Kito Cheng
  2023-04-28  2:46 ` [PATCH v2] " pan2.li
  1 sibling, 1 reply; 11+ messages in thread
From: Kito Cheng @ 2023-04-27 14:57 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang

> +(define_split
> +  [(set (match_operand:<VM> 0 "register_operand")
> +       (if_then_else:<VM>
> +         (unspec:<VM>
> +           [(match_operand:<VM> 1 "vector_all_trues_mask_operand")
> +            (match_operand      6 "vector_length_operand")
> +            (match_operand      7 "const_int_operand")
> +            (match_operand      8 "const_int_operand")
> +            (reg:SI VL_REGNUM)
> +            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +         (match_operator:<VM>   3 "comparison_simplify_to_clear_operator"
> +           [(match_operand:VI   4 "register_operand")
> +            (match_operand:VI   5 "vector_arith_operand")])
> +         (match_operand:<VM>    2 "vector_merge_operand")))]
> +  "TARGET_VECTOR && reload_completed && operands[4] == operands[5]"

Could you try something like this? that should be more generic:

(define_split
 [(set (match_operand:VB 0 "register_operand")
       (if_then_else:VB
         (unspec:VB
           [(match_operand:VB 1 "vector_all_trues_mask_operand")
            (match_operand 4 "vector_length_operand")
            (match_operand 5 "const_int_operand")
            (match_operand 6 "const_int_operand")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
         (match_operand:VB 3 "vector_move_operand")
         (match_operand:VB 2 "vector_undef_operand")))]
 "TARGET_VECTOR && reload_completed"
 [(const_int 0)]
 {
   emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
                            RVV_VUNDEF (<MODE>mode), CONST0_RTX (<MODE>mode),
                            operands[4], operands[5]));
   DONE;
 }
)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-27 14:57 ` Kito Cheng
@ 2023-04-27 15:00   ` Kito Cheng
  2023-04-28  2:06     ` Li, Pan2
  0 siblings, 1 reply; 11+ messages in thread
From: Kito Cheng @ 2023-04-27 15:00 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang

> Could you try something like this? that should be more generic:
>
> (define_split
>  [(set (match_operand:VB 0 "register_operand")
>        (if_then_else:VB
>          (unspec:VB
>            [(match_operand:VB 1 "vector_all_trues_mask_operand")
>             (match_operand 4 "vector_length_operand")
>             (match_operand 5 "const_int_operand")
>             (match_operand 6 "const_int_operand")
>             (reg:SI VL_REGNUM)
>             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>          (match_operand:VB 3 "vector_move_operand")
>          (match_operand:VB 2 "vector_undef_operand")))]
>  "TARGET_VECTOR && reload_completed"

Remove the reload_completed should work well, but you might need more
test, I didn't run full test on this change :P

>  [(const_int 0)]
>  {
>    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
>                             RVV_VUNDEF (<MODE>mode), CONST0_RTX (<MODE>mode),
>                             operands[4], operands[5]));
>    DONE;
>  }
> )

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-27 15:00   ` Kito Cheng
@ 2023-04-28  2:06     ` Li, Pan2
  2023-04-28  6:35       ` Kito Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Li, Pan2 @ 2023-04-28  2:06 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Thanks Kito for the better approach. It works well with the prepared test cases but I may have one question about the semantics of the vector_move_operand.

The defined predicate of vector_move_operand composes of (non-imm || (const vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
I may not quit understand why we group them together and named as vector_move.

Another difference is that it will act on combine pass which is more generic than the PATCH v1 (which acts on split2 pass).

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Thursday, April 27, 2023 11:00 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

> Could you try something like this? that should be more generic:
>
> (define_split
>  [(set (match_operand:VB 0 "register_operand")
>        (if_then_else:VB
>          (unspec:VB
>            [(match_operand:VB 1 "vector_all_trues_mask_operand")
>             (match_operand 4 "vector_length_operand")
>             (match_operand 5 "const_int_operand")
>             (match_operand 6 "const_int_operand")
>             (reg:SI VL_REGNUM)
>             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>          (match_operand:VB 3 "vector_move_operand")
>          (match_operand:VB 2 "vector_undef_operand")))]  
> "TARGET_VECTOR && reload_completed"

Remove the reload_completed should work well, but you might need more test, I didn't run full test on this change :P

>  [(const_int 0)]
>  {
>    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
>                             RVV_VUNDEF (<MODE>mode), CONST0_RTX (<MODE>mode),
>                             operands[4], operands[5]));
>    DONE;
>  }
> )

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-27 14:30 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR pan2.li
  2023-04-27 14:57 ` Kito Cheng
@ 2023-04-28  2:46 ` pan2.li
  2023-04-28  6:40   ` Kito Cheng
  1 sibling, 1 reply; 11+ messages in thread
From: pan2.li @ 2023-04-28  2:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: juzhe.zhong, kito.cheng, pan2.li, yanzhang.wang

From: Pan Li <pan2.li@intel.com>

When some RVV integer compare operators act on the same vector
registers without mask. They can be simplified to VMCLR.

This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
of the simplification by adding one new define_split.

Given we have:
vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v24,0(a1)
vmslt.vv v8,v24,v24
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf8,ta,ma
vmclr.m v24                    <- optimized to vmclr.m
vsetvli zero,a5,e8,mf8,ta,ma
vsm.v   v24,0(a0)
ret

As above, we may have one instruction eliminated and require less
vector registers.

gcc/ChangeLog:

	* config/riscv/vector.md: Add new define split to perform
	  the simplification.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: kito-cheng <kito.cheng@sifive.com>
---
 gcc/config/riscv/vector.md                    |  32 ++
 .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++++++++++++++++++
 2 files changed, 323 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b3d23441679..1642822d098 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
   "vle<sew>ff.v\t%0,%3%p1"
   [(set_attr "type" "vldff")
    (set_attr "mode" "<MODE>")])
+
+;; -----------------------------------------------------------------------------
+;; ---- Integer Compare Instructions Simplification
+;; -----------------------------------------------------------------------------
+;; Simplify to VMCLR.m Includes:
+;; - 1.  VMSNE
+;; - 2.  VMSLT
+;; - 3.  VMSLTU
+;; - 4.  VMSGT
+;; - 5.  VMSGTU
+;; -----------------------------------------------------------------------------
+(define_split
+  [(set (match_operand:VB      0 "register_operand")
+	(if_then_else:VB
+	  (unspec:VB
+	    [(match_operand:VB 1 "vector_all_trues_mask_operand")
+	     (match_operand    4 "vector_length_operand")
+	     (match_operand    5 "const_int_operand")
+	     (match_operand    6 "const_int_operand")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (match_operand:VB    3 "vector_move_operand")
+	  (match_operand:VB    2 "vector_undef_operand")))]
+  "TARGET_VECTOR"
+  [(const_int 0)]
+  {
+    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
+			     RVV_VUNDEF (<MODE>mode), operands[3],
+			     operands[4], operands[5]));
+    DONE;
+  }
+)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
new file mode 100644
index 00000000000..8954adad09d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -0,0 +1,291 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t vl) {
+  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
+  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
+}
+
+/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
+/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28  2:06     ` Li, Pan2
@ 2023-04-28  6:35       ` Kito Cheng
  0 siblings, 0 replies; 11+ messages in thread
From: Kito Cheng @ 2023-04-28  6:35 UTC (permalink / raw)
  To: Li, Pan2; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

> The defined predicate of vector_move_operand composes of (non-imm || (const vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
I may not quit understand why we group them together and named as vector_move.

I forgot the detail reason about that, but vaguely remember that is
for optimization, maybe need Ju-Zhe back and tell us the reason :P

On Fri, Apr 28, 2023 at 10:06 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Thanks Kito for the better approach. It works well with the prepared test cases but I may have one question about the semantics of the vector_move_operand.
>
> The defined predicate of vector_move_operand composes of (non-imm || (const vector && (reload_completed ? constraint_vi (op) : constraint_wc0(op))).
> I may not quit understand why we group them together and named as vector_move.
>
> Another difference is that it will act on combine pass which is more generic than the PATCH v1 (which acts on split2 pass).
>
> Pan
>
> -----Original Message-----
> From: Kito Cheng <kito.cheng@sifive.com>
> Sent: Thursday, April 27, 2023 11:00 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
>
> > Could you try something like this? that should be more generic:
> >
> > (define_split
> >  [(set (match_operand:VB 0 "register_operand")
> >        (if_then_else:VB
> >          (unspec:VB
> >            [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >             (match_operand 4 "vector_length_operand")
> >             (match_operand 5 "const_int_operand")
> >             (match_operand 6 "const_int_operand")
> >             (reg:SI VL_REGNUM)
> >             (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >          (match_operand:VB 3 "vector_move_operand")
> >          (match_operand:VB 2 "vector_undef_operand")))]
> > "TARGET_VECTOR && reload_completed"
>
> Remove the reload_completed should work well, but you might need more test, I didn't run full test on this change :P
>
> >  [(const_int 0)]
> >  {
> >    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
> >                             RVV_VUNDEF (<MODE>mode), CONST0_RTX (<MODE>mode),
> >                             operands[4], operands[5]));
> >    DONE;
> >  }
> > )

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28  2:46 ` [PATCH v2] " pan2.li
@ 2023-04-28  6:40   ` Kito Cheng
  2023-04-28  6:45     ` Li, Pan2
  0 siblings, 1 reply; 11+ messages in thread
From: Kito Cheng @ 2023-04-28  6:40 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl)
too, but don't know why it's not evaluated

        (eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
           (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> When some RVV integer compare operators act on the same vector
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind
> of the simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
> }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24                    <- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less
> vector registers.
>
> gcc/ChangeLog:
>
>         * config/riscv/vector.md: Add new define split to perform
>           the simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
> ---
>  gcc/config/riscv/vector.md                    |  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 ++++++++++++++++++
>  2 files changed, 323 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
>    "vle<sew>ff.v\t%0,%3%p1"
>    [(set_attr "type" "vldff")
>     (set_attr "mode" "<MODE>")])
> +
> +;; -----------------------------------------------------------------------------
> +;; ---- Integer Compare Instructions Simplification
> +;; -----------------------------------------------------------------------------
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; -----------------------------------------------------------------------------
> +(define_split
> +  [(set (match_operand:VB      0 "register_operand")
> +       (if_then_else:VB
> +         (unspec:VB
> +           [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +            (match_operand    4 "vector_length_operand")
> +            (match_operand    5 "const_int_operand")
> +            (match_operand    6 "const_int_operand")
> +            (reg:SI VL_REGNUM)
> +            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +         (match_operand:VB    3 "vector_move_operand")
> +         (match_operand:VB    2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
> +                            RVV_VUNDEF (<MODE>mode), operands[3],
> +                            operands[4], operands[5]));
> +    DONE;
> +  }
> +)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> new file mode 100644
> index 00000000000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl);
> +}
> +
> +vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl);
> +}
> +
> +vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl);
> +}
> +
> +vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl);
> +}
> +
> +vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl);
> +}
> +
> +vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl);
> +}
> +
> +vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl);
> +}
> +
> +vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
> +}
> +
> +/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28  6:40   ` Kito Cheng
@ 2023-04-28  6:45     ` Li, Pan2
  2023-04-28  9:14       ` Li, Pan2
  0 siblings, 1 reply; 11+ messages in thread
From: Li, Pan2 @ 2023-04-28  6:45 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given we have one similar case VMORN in previous.

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but don't know why it's not evaluated

        (eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
           (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24                    <- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
>         * config/riscv/vector.md: Add new define split to perform
>           the simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
> ---
>  gcc/config/riscv/vector.md                    |  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291 
> ++++++++++++++++++
>  2 files changed, 323 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
>    "vle<sew>ff.v\t%0,%3%p1"
>    [(set_attr "type" "vldff")
>     (set_attr "mode" "<MODE>")])
> +
> +;; 
> +---------------------------------------------------------------------
> +-------- ;; ---- Integer Compare Instructions Simplification ;; 
> +---------------------------------------------------------------------
> +--------
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;; 
> +---------------------------------------------------------------------
> +--------
> +(define_split
> +  [(set (match_operand:VB      0 "register_operand")
> +       (if_then_else:VB
> +         (unspec:VB
> +           [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +            (match_operand    4 "vector_length_operand")
> +            (match_operand    5 "const_int_operand")
> +            (match_operand    6 "const_int_operand")
> +            (reg:SI VL_REGNUM)
> +            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +         (match_operand:VB    3 "vector_move_operand")
> +         (match_operand:VB    2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
> +                            RVV_VUNDEF (<MODE>mode), operands[3],
> +                            operands[4], operands[5]));
> +    DONE;
> +  }
> +)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 00000000000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t 
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +/* { dg-final { scan-assembler-times 
> +{vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28  6:45     ` Li, Pan2
@ 2023-04-28  9:14       ` Li, Pan2
  2023-04-28 12:36         ` Kito Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Li, Pan2 @ 2023-04-28  9:14 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Passed both the X86 bootstrap and regression test.

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Friday, April 28, 2023 2:45 PM
To: Kito Cheng <kito.cheng@sifive.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

Thanks, kito.

Yes, you are right. I am investigating this right now from simplify rtl. Given we have one similar case VMORN in previous.

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com>
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

LGTM

I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but don't know why it's not evaluated

        (eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
           (reg/v:VNx128QI 137 [ v1 ]))

to true, anyway, I guess it should be your next step to investigate :)

On Fri, Apr 28, 2023 at 10:46 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the 
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
>   return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24                    <- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less 
> vector registers.
>
> gcc/ChangeLog:
>
>         * config/riscv/vector.md: Add new define split to perform
>           the simplification.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
> ---
>  gcc/config/riscv/vector.md                    |  32 ++
>  .../rvv/base/integer_compare_insn_shortcut.c  | 291
> ++++++++++++++++++
>  2 files changed, 323 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md 
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
>    "vle<sew>ff.v\t%0,%3%p1"
>    [(set_attr "type" "vldff")
>     (set_attr "mode" "<MODE>")])
> +
> +;;
> +---------------------------------------------------------------------
> +-------- ;; ---- Integer Compare Instructions Simplification ;;
> +---------------------------------------------------------------------
> +--------
> +;; Simplify to VMCLR.m Includes:
> +;; - 1.  VMSNE
> +;; - 2.  VMSLT
> +;; - 3.  VMSLTU
> +;; - 4.  VMSGT
> +;; - 5.  VMSGTU
> +;;
> +---------------------------------------------------------------------
> +--------
> +(define_split
> +  [(set (match_operand:VB      0 "register_operand")
> +       (if_then_else:VB
> +         (unspec:VB
> +           [(match_operand:VB 1 "vector_all_trues_mask_operand")
> +            (match_operand    4 "vector_length_operand")
> +            (match_operand    5 "const_int_operand")
> +            (match_operand    6 "const_int_operand")
> +            (reg:SI VL_REGNUM)
> +            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> +         (match_operand:VB    3 "vector_move_operand")
> +         (match_operand:VB    2 "vector_undef_operand")))]
> +  "TARGET_VECTOR"
> +  [(const_int 0)]
> +  {
> +    emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
> +                            RVV_VUNDEF (<MODE>mode), operands[3],
> +                            operands[4], operands[5]));
> +    DONE;
> +  }
> +)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 00000000000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t
> +vl) {
> +  return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t
> +vl) {
> +  return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, 
> +size_t vl) {
> +  return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +/* { dg-final { scan-assembler-times 
> +{vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times 
> +{vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28  9:14       ` Li, Pan2
@ 2023-04-28 12:36         ` Kito Cheng
  2023-04-28 13:04           ` Li, Pan2
  0 siblings, 1 reply; 11+ messages in thread
From: Kito Cheng @ 2023-04-28 12:36 UTC (permalink / raw)
  To: Li, Pan2; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

pushed, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
  2023-04-28 12:36         ` Kito Cheng
@ 2023-04-28 13:04           ` Li, Pan2
  0 siblings, 0 replies; 11+ messages in thread
From: Li, Pan2 @ 2023-04-28 13:04 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Cool, Thank you!

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Friday, April 28, 2023 8:37 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR

pushed, thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-28 13:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-27 14:30 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR pan2.li
2023-04-27 14:57 ` Kito Cheng
2023-04-27 15:00   ` Kito Cheng
2023-04-28  2:06     ` Li, Pan2
2023-04-28  6:35       ` Kito Cheng
2023-04-28  2:46 ` [PATCH v2] " pan2.li
2023-04-28  6:40   ` Kito Cheng
2023-04-28  6:45     ` Li, Pan2
2023-04-28  9:14       ` Li, Pan2
2023-04-28 12:36         ` Kito Cheng
2023-04-28 13:04           ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).