From: "Li, Pan2" <pan2.li@intel.com>
To: Kito Cheng <kito.cheng@sifive.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
"juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>,
"Wang, Yanzhang" <yanzhang.wang@intel.com>
Subject: RE: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
Date: Fri, 28 Apr 2023 06:45:18 +0000 [thread overview]
Message-ID: <MW5PR11MB59080B4B19C4E3433D4A72F8A96B9@MW5PR11MB5908.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CALLt3Th6V_Szp5rUsrKS1vKtO1x2HRwvj7kRVGJeCpYM_bhxXQ@mail.gmail.com>
Thanks, kito.
Yes, you are right. I am investigating this right now from simplify rtl. Given we have one similar case VMORN in previous.
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com>
Sent: Friday, April 28, 2023 2:41 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMCLR
LGTM
I thought it can optimization __riscv_vmseq_vv_i8m8_b1(v1, v1, vl) too, but don't know why it's not evaluated
(eq:VNx128BI (reg/v:VNx128QI 137 [ v1 ])
(reg/v:VNx128QI 137 [ v1 ]))
to true, anyway, I guess it should be your next step to investigate :)
On Fri, Apr 28, 2023 at 10:46 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> When some RVV integer compare operators act on the same vector
> registers without mask. They can be simplified to VMCLR.
>
> This PATCH allow the ne, lt, ltu, gt, gtu to perform such kind of the
> simplification by adding one new define_split.
>
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t vl) {
> return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
>
> Before this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vl8re8.v v24,0(a1)
> vmslt.vv v8,v24,v24
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v v8,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf8,ta,ma
> vmclr.m v24 <- optimized to vmclr.m
> vsetvli zero,a5,e8,mf8,ta,ma
> vsm.v v24,0(a0)
> ret
>
> As above, we may have one instruction eliminated and require less
> vector registers.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md: Add new define split to perform
> the simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
> ---
> gcc/config/riscv/vector.md | 32 ++
> .../rvv/base/integer_compare_insn_shortcut.c | 291
> ++++++++++++++++++
> 2 files changed, 323 insertions(+)
> create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.
> c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index b3d23441679..1642822d098 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -7689,3 +7689,35 @@ (define_insn "@pred_fault_load<mode>"
> "vle<sew>ff.v\t%0,%3%p1"
> [(set_attr "type" "vldff")
> (set_attr "mode" "<MODE>")])
> +
> +;;
> +---------------------------------------------------------------------
> +-------- ;; ---- Integer Compare Instructions Simplification ;;
> +---------------------------------------------------------------------
> +--------
> +;; Simplify to VMCLR.m Includes:
> +;; - 1. VMSNE
> +;; - 2. VMSLT
> +;; - 3. VMSLTU
> +;; - 4. VMSGT
> +;; - 5. VMSGTU
> +;;
> +---------------------------------------------------------------------
> +--------
> +(define_split
> + [(set (match_operand:VB 0 "register_operand")
> + (if_then_else:VB
> + (unspec:VB
> + [(match_operand:VB 1 "vector_all_trues_mask_operand")
> + (match_operand 4 "vector_length_operand")
> + (match_operand 5 "const_int_operand")
> + (match_operand 6 "const_int_operand")
> + (reg:SI VL_REGNUM)
> + (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> + (match_operand:VB 3 "vector_move_operand")
> + (match_operand:VB 2 "vector_undef_operand")))]
> + "TARGET_VECTOR"
> + [(const_int 0)]
> + {
> + emit_insn (gen_pred_mov (<MODE>mode, operands[0], CONST1_RTX (<MODE>mode),
> + RVV_VUNDEF (<MODE>mode), operands[3],
> + operands[4], operands[5]));
> + DONE;
> + }
> +)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcu
> t.c
> new file mode 100644
> index 00000000000..8954adad09d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_sho
> +++ rtcut.c
> @@ -0,0 +1,291 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
> +
> +#include "riscv_vector.h"
> +
> +vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmseq_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmseq_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmseq_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmseq_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmseq_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmseq_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmseq_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsne_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsne_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsne_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsne_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsne_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsne_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsne_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmsne_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmslt_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmslt_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmslt_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmslt_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmslt_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmslt_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmslt_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmslt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsltu_case_0(vuint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsltu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsltu_case_1(vuint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsltu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsltu_case_2(vuint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsltu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsltu_case_3(vuint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsltu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsltu_case_4(vuint8mf2_t v1,
> +size_t vl) {
> + return __riscv_vmsltu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsltu_case_5(vuint8mf4_t v1,
> +size_t vl) {
> + return __riscv_vmsltu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsltu_case_6(vuint8mf8_t v1,
> +size_t vl) {
> + return __riscv_vmsltu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsle_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsle_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsle_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsle_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsle_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsle_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsle_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmsle_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsleu_case_0(vuint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsleu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsleu_case_1(vuint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsleu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsleu_case_2(vuint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsleu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsleu_case_3(vuint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsleu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsleu_case_4(vuint8mf2_t v1,
> +size_t vl) {
> + return __riscv_vmsleu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsleu_case_5(vuint8mf4_t v1,
> +size_t vl) {
> + return __riscv_vmsleu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsleu_case_6(vuint8mf8_t v1,
> +size_t vl) {
> + return __riscv_vmsleu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgt_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgt_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgt_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgt_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgt_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgt_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgt_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmsgt_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgtu_case_0(vuint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsgtu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgtu_case_1(vuint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsgtu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgtu_case_2(vuint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsgtu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgtu_case_3(vuint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsgtu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgtu_case_4(vuint8mf2_t v1,
> +size_t vl) {
> + return __riscv_vmsgtu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgtu_case_5(vuint8mf4_t v1,
> +size_t vl) {
> + return __riscv_vmsgtu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgtu_case_6(vuint8mf8_t v1,
> +size_t vl) {
> + return __riscv_vmsgtu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsge_case_0(vint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsge_case_1(vint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsge_case_2(vint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsge_case_3(vint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsge_case_4(vint8mf2_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsge_case_5(vint8mf4_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsge_case_6(vint8mf8_t v1, size_t
> +vl) {
> + return __riscv_vmsge_vv_i8mf8_b64(v1, v1, vl); }
> +
> +vbool1_t test_shortcut_for_riscv_vmsgeu_case_0(vuint8m8_t v1, size_t
> +vl) {
> + return __riscv_vmsgeu_vv_u8m8_b1(v1, v1, vl); }
> +
> +vbool2_t test_shortcut_for_riscv_vmsgeu_case_1(vuint8m4_t v1, size_t
> +vl) {
> + return __riscv_vmsgeu_vv_u8m4_b2(v1, v1, vl); }
> +
> +vbool4_t test_shortcut_for_riscv_vmsgeu_case_2(vuint8m2_t v1, size_t
> +vl) {
> + return __riscv_vmsgeu_vv_u8m2_b4(v1, v1, vl); }
> +
> +vbool8_t test_shortcut_for_riscv_vmsgeu_case_3(vuint8m1_t v1, size_t
> +vl) {
> + return __riscv_vmsgeu_vv_u8m1_b8(v1, v1, vl); }
> +
> +vbool16_t test_shortcut_for_riscv_vmsgeu_case_4(vuint8mf2_t v1,
> +size_t vl) {
> + return __riscv_vmsgeu_vv_u8mf2_b16(v1, v1, vl); }
> +
> +vbool32_t test_shortcut_for_riscv_vmsgeu_case_5(vuint8mf4_t v1,
> +size_t vl) {
> + return __riscv_vmsgeu_vv_u8mf4_b32(v1, v1, vl); }
> +
> +vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1,
> +size_t vl) {
> + return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl); }
> +
> +/* { dg-final { scan-assembler-times
> +{vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times
> +{vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
> +/* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-04-28 6:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-27 14:30 [PATCH] " pan2.li
2023-04-27 14:57 ` Kito Cheng
2023-04-27 15:00 ` Kito Cheng
2023-04-28 2:06 ` Li, Pan2
2023-04-28 6:35 ` Kito Cheng
2023-04-28 2:46 ` [PATCH v2] " pan2.li
2023-04-28 6:40 ` Kito Cheng
2023-04-28 6:45 ` Li, Pan2 [this message]
2023-04-28 9:14 ` Li, Pan2
2023-04-28 12:36 ` Kito Cheng
2023-04-28 13:04 ` Li, Pan2
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW5PR11MB59080B4B19C4E3433D4A72F8A96B9@MW5PR11MB5908.namprd11.prod.outlook.com \
--to=pan2.li@intel.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=juzhe.zhong@rivai.ai \
--cc=kito.cheng@sifive.com \
--cc=yanzhang.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).