public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
@ 2023-04-28 15:21 pan2.li
  2023-04-28 21:47 ` Jeff Law
  2023-04-29 13:32 ` [PATCH v2] " pan2.li
  0 siblings, 2 replies; 21+ messages in thread
From: pan2.li @ 2023-04-28 15:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: juzhe.zhong, kito.cheng, pan2.li, yanzhang.wang

From: Pan Li <pan2.li@intel.com>

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
simplification by adding one macro in riscv for simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1                  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
	  consumed by simplify_rtx.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
	  Adjust test check condition.
---
 gcc/config/riscv/riscv.h                                    | 5 +++++
 .../riscv/rvv/base/integer_compare_insn_shortcut.c          | 6 +-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 13038a39e5c..4473115d3a9 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1096,4 +1096,9 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
 
+/* Like s390, riscv also defined this macro for the vector comparision.  Then
+   the simplify-rtx relational_result will canonicalize the result to the
+   CONST1_RTX for the simplification.  */
+#define VECTOR_STORE_FLAG_VALUE(MODE) CONSTM1_RTX (GET_MODE_INNER (MODE))
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@ vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-28 15:21 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET pan2.li
@ 2023-04-28 21:47 ` Jeff Law
  2023-04-29  2:55   ` Li, Pan2
  2023-04-29 13:32 ` [PATCH v2] " pan2.li
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff Law @ 2023-04-28 21:47 UTC (permalink / raw)
  To: pan2.li, gcc-patches; +Cc: juzhe.zhong, kito.cheng, yanzhang.wang



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector registers
> without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
> }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less vector
> registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My 
understanding is we have one output bit per input element in the 
comparison.  So unless the number of elements matches the bit width of 
the mask register, this isn't going to work.

Am I missing something?

Jeff



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-28 21:47 ` Jeff Law
@ 2023-04-29  2:55   ` Li, Pan2
  2023-04-29 13:35     ` Li, Pan2
  2023-04-29 15:05     ` Jeff Law
  0 siblings, 2 replies; 21+ messages in thread
From: Li, Pan2 @ 2023-04-29  2:55 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: juzhe.zhong, kito.cheng, Wang, Yanzhang

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this mean s390 parts has similar issue here? Then for instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding is we have one output bit per input element in the comparison.  So unless the number of elements matches the bit width of the mask register, this isn't going to work.

Am I missing something?

Jeff



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-28 15:21 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET pan2.li
  2023-04-28 21:47 ` Jeff Law
@ 2023-04-29 13:32 ` pan2.li
  1 sibling, 0 replies; 21+ messages in thread
From: pan2.li @ 2023-04-29 13:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: juzhe.zhong, kito.cheng, pan2.li, yanzhang.wang, jeffreyalaw

From: Pan Li <pan2.li@intel.com>

When some RVV integer compare operators act on the same vector registers
without mask. They can be simplified to VMSET.

This PATCH allow the eq, le, leu, ge, geu to perform such kind of the
simplification by adding vector bool support in relational_result of
the simplify rtx.

Given we have:
vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl)
{
  return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl);
}

Before this patch:
vsetvli  zero,a2,e8,m8,ta,ma
vl8re8.v v8,0(a1)
vmseq.vv v8,v8,v8
vsetvli  a5,zero,e8,m8,ta,ma
vsm.v    v8,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,m8,ta,ma
vmset.m v1                  <- optimized to vmset.m
vsetvli a5,zero,e8,m8,ta,ma
vsm.v   v1,0(a0)
ret

As above, we may have one instruction eliminated and require less vector
registers.

gcc/ChangeLog:

	* machmode.h (VECTOR_BOOL_MODE_P): Add new predication macro.
	* simplify-rtx.cc (relational_result): Add vector bool support.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
	  Adjust test check condition.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/machmode.h                                              | 4 ++++
 gcc/simplify-rtx.cc                                         | 4 ++++
 .../riscv/rvv/base/integer_compare_insn_shortcut.c          | 6 +-----
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..5fbece0042f 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)			\
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)
+
 /* Nonzero if MODE is a scalar integral mode.  */
 #define SCALAR_INT_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_INT		\
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d4aeebc7a5f..12aba4c4b05 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2535,6 +2535,10 @@ relational_result (machine_mode mode, machine_mode cmp_mode, rtx res)
     {
       if (res == const0_rtx)
 	return CONST0_RTX (mode);
+
+      if (VECTOR_BOOL_MODE_P (mode) && res == const1_rtx)
+	return CONSTM1_RTX (mode);
+
 #ifdef VECTOR_STORE_FLAG_VALUE
       rtx val = VECTOR_STORE_FLAG_VALUE (mode);
       if (val == NULL_RTX)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
index 8954adad09d..1bca8467a16 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c
@@ -283,9 +283,5 @@ vbool64_t test_shortcut_for_riscv_vmsgeu_case_6(vuint8mf8_t v1, size_t vl) {
   return __riscv_vmsgeu_vv_u8mf8_b64(v1, v1, vl);
 }
 
-/* { dg-final { scan-assembler-times {vmseq\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsle\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsleu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsge\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
-/* { dg-final { scan-assembler-times {vmsgeu\.vv\sv[0-9],\s*v[0-9],\s*v[0-9]} 7 } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\sv[0-9]} 35 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\sv[0-9]} 35 } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29  2:55   ` Li, Pan2
@ 2023-04-29 13:35     ` Li, Pan2
  2023-04-29 15:05     ` Jeff Law
  1 sibling, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2023-04-29 13:35 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: juzhe.zhong, kito.cheng, Wang, Yanzhang

Hi Jeff

Just have a try in simplify_rtx for this optimization in PATCH v2. Could you please help to share any idea about this when you free? Thank you!

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/617117.html

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Saturday, April 29, 2023 10:55 AM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks Jeff for comments.

It makes sense to me. For the EQ operator we should have CONSTM1. Does this mean s390 parts has similar issue here? Then for instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.

Please help to correct me if any mistake. Thank you again.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Saturday, April 29, 2023 5:48 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/28/23 09:21, Pan Li via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> When some RVV integer compare operators act on the same vector 
> registers without mask. They can be simplified to VMSET.
> 
> This PATCH allows the eq, le, leu, ge, geu to perform such kind of the 
> simplification by adding one macro in riscv for simplify rtx.
> 
> Given we have:
> vbool1_t test_shortcut_for_riscv_vmseq_case_0(vint8m8_t v1, size_t vl) 
> {
>    return __riscv_vmseq_vv_i8m8_b1(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  zero,a2,e8,m8,ta,ma
> vl8re8.v v8,0(a1)
> vmseq.vv v8,v8,v8
> vsetvli  a5,zero,e8,m8,ta,ma
> vsm.v    v8,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,m8,ta,ma
> vmset.m v1                  <- optimized to vmset.m
> vsetvli a5,zero,e8,m8,ta,ma
> vsm.v   v1,0(a0)
> ret
> 
> As above, we may have one instruction eliminated and require less 
> vector registers.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.h (VECTOR_STORE_FLAG_VALUE): Add new macro
> 	  consumed by simplify_rtx.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/integer_compare_insn_shortcut.c:
> 	  Adjust test check condition.
I'm not sure this is 100% correct.

What happens to the high bits in the resultant mask register?  My understanding is we have one output bit per input element in the comparison.  So unless the number of elements matches the bit width of the mask register, this isn't going to work.

Am I missing something?

Jeff



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29  2:55   ` Li, Pan2
  2023-04-29 13:35     ` Li, Pan2
@ 2023-04-29 15:05     ` Jeff Law
  2023-04-29 17:21       ` Andrew Waterman
  2023-04-30  1:40       ` Kito Cheng
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff Law @ 2023-04-29 15:05 UTC (permalink / raw)
  To: Li, Pan2, gcc-patches; +Cc: juzhe.zhong, kito.cheng, Wang, Yanzhang



On 4/28/23 20:55, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
> It makes sense to me. For the EQ operator we should have CONSTM1. 
That's not the way I interpret the RVV documentation.  Of course it's 
not terribly clear.    I guess one could do some experiments with qemu 
or try to dig into the sail code and figure out the intent from those.



Does this mean s390 parts has similar issue here? Then for instructions 
like VMSEQ, we need to adjust the simplify_rtx up to a point.
You'd have to refer to the s390 instruction set reference to understand 
precisely how the vector compares work.

But as it stands this really isn't a simplify-rtx question, but a 
question of the semantics of risc-v.   What happens with the high bits 
in the destination mask register is critical -- and if risc-v doesn't 
set them to all ones in this case, then that would mean that defining 
that macro is simply wrong for risc-v.

jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 15:05     ` Jeff Law
@ 2023-04-29 17:21       ` Andrew Waterman
  2023-04-29 17:28         ` Palmer Dabbelt
  2023-04-29 17:49         ` Jeff Law
  2023-04-30  1:40       ` Kito Cheng
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Waterman @ 2023-04-29 17:21 UTC (permalink / raw)
  To: Jeff Law; +Cc: Li, Pan2, Wang, Yanzhang, gcc-patches, juzhe.zhong, kito.cheng

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

>
> jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:21       ` Andrew Waterman
@ 2023-04-29 17:28         ` Palmer Dabbelt
  2023-04-29 17:46           ` Jeff Law
  2023-04-29 17:49         ` Jeff Law
  1 sibling, 1 reply; 21+ messages in thread
From: Palmer Dabbelt @ 2023-04-29 17:28 UTC (permalink / raw)
  To: gcc-patches
  Cc: jeffreyalaw, pan2.li, yanzhang.wang, gcc-patches, juzhe.zhong,
	kito.cheng

On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 4/28/23 20:55, Li, Pan2 wrote:
>> > Thanks Jeff for comments.
>> >
>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>> That's not the way I interpret the RVV documentation.  Of course it's
>> not terribly clear.    I guess one could do some experiments with qemu
>> or try to dig into the sail code and figure out the intent from those.

QEMU specifically takes advantage of the behavior Andrew is pointing out 
it the spec, and will soon do so more aggressively (assuming the patches 
Daniel just sent out get merged).

>> Does this mean s390 parts has similar issue here? Then for instructions
>> like VMSEQ, we need to adjust the simplify_rtx up to a point.
>> You'd have to refer to the s390 instruction set reference to understand
>> precisely how the vector compares work.
>>
>> But as it stands this really isn't a simplify-rtx question, but a
>> question of the semantics of risc-v.   What happens with the high bits
>> in the destination mask register is critical -- and if risc-v doesn't
>> set them to all ones in this case, then that would mean that defining
>> that macro is simply wrong for risc-v.
>
> The relevant statement in the spec is that "the tail elements are always
> updated with a tail-agnostic policy".  The vmset.m instruction will cause
> mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
> either be undisturbed or set to 1, i.e., effectively unspecified.
>
>>
>> jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:28         ` Palmer Dabbelt
@ 2023-04-29 17:46           ` Jeff Law
  2023-04-29 17:48             ` Palmer Dabbelt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Law @ 2023-04-29 17:46 UTC (permalink / raw)
  To: Palmer Dabbelt, gcc-patches
  Cc: pan2.li, yanzhang.wang, juzhe.zhong, kito.cheng



On 4/29/23 11:28, Palmer Dabbelt wrote:
> On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
>> gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>>
>>> On 4/28/23 20:55, Li, Pan2 wrote:
>>> > Thanks Jeff for comments.
>>> >
>>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>>> That's not the way I interpret the RVV documentation.  Of course it's
>>> not terribly clear.    I guess one could do some experiments with qemu
>>> or try to dig into the sail code and figure out the intent from those.
> 
> QEMU specifically takes advantage of the behavior Andrew is pointing out 
> it the spec, and will soon do so more aggressively (assuming the patches 
> Daniel just sent out get merged).
Yea.  And taking advantage of that behavior is definitely a performance 
issue for QEMU.  There's still work to do though.  QEMU on vector code 
is running crazy slow.

jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:46           ` Jeff Law
@ 2023-04-29 17:48             ` Palmer Dabbelt
  2023-04-29 17:52               ` Jeff Law
  0 siblings, 1 reply; 21+ messages in thread
From: Palmer Dabbelt @ 2023-04-29 17:48 UTC (permalink / raw)
  To: jeffreyalaw; +Cc: gcc-patches, pan2.li, yanzhang.wang, juzhe.zhong, kito.cheng

On Sat, 29 Apr 2023 10:46:37 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/29/23 11:28, Palmer Dabbelt wrote:
>> On Sat, 29 Apr 2023 10:21:53 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>>> On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
>>> gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>>
>>>>
>>>> On 4/28/23 20:55, Li, Pan2 wrote:
>>>> > Thanks Jeff for comments.
>>>> >
>>>> > It makes sense to me. For the EQ operator we should have CONSTM1.
>>>> That's not the way I interpret the RVV documentation.  Of course it's
>>>> not terribly clear.    I guess one could do some experiments with qemu
>>>> or try to dig into the sail code and figure out the intent from those.
>>
>> QEMU specifically takes advantage of the behavior Andrew is pointing out
>> it the spec, and will soon do so more aggressively (assuming the patches
>> Daniel just sent out get merged).
> Yea.  And taking advantage of that behavior is definitely a performance
> issue for QEMU.  There's still work to do though.  QEMU on vector code
> is running crazy slow.

I guess we're kind of off the rails for a GCC patch, but that's 
definately true.  Across the board RVV is going to just need a lot of 
work, it's very different than SVE or AVX.

Unfortunately QEMU performance isn't really a priority on our end, but 
it's great to see folks digging into it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:21       ` Andrew Waterman
  2023-04-29 17:28         ` Palmer Dabbelt
@ 2023-04-29 17:49         ` Jeff Law
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff Law @ 2023-04-29 17:49 UTC (permalink / raw)
  To: Andrew Waterman
  Cc: Li, Pan2, Wang, Yanzhang, gcc-patches, juzhe.zhong, kito.cheng



On 4/29/23 11:21, Andrew Waterman wrote:

> 
> The relevant statement in the spec is that "the tail elements are always 
> updated with a tail-agnostic policy".  The vmset.m instruction will 
> cause mask register bits [0, vl-1] to be set to 1; elements [vl, 
> VLMAX-1] will either be undisturbed or set to 1, i.e., effectively 
> unspecified.
Makes sense.  Just have to stitch together bits from different locations 
in the manual.

The net being that I can't think we can define that macro for RISC-V in 
the way that Pan wants, the semantics just don't line up correctly.

jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:48             ` Palmer Dabbelt
@ 2023-04-29 17:52               ` Jeff Law
  2023-04-29 18:15                 ` Palmer Dabbelt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff Law @ 2023-04-29 17:52 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: gcc-patches, pan2.li, yanzhang.wang, juzhe.zhong, kito.cheng



On 4/29/23 11:48, Palmer Dabbelt wrote:

>> Yea.  And taking advantage of that behavior is definitely a performance
>> issue for QEMU.  There's still work to do though.  QEMU on vector code
>> is running crazy slow.
> 
> I guess we're kind of off the rails for a GCC patch, but that's 
> definately true.  Across the board RVV is going to just need a lot of 
> work, it's very different than SVE or AVX.
> 
> Unfortunately QEMU performance isn't really a priority on our end, but 
> it's great to see folks digging into it.
Well, when a user mode SPEC run goes from ~15 minutes to multiple hours 
for a single input workload within specint it becomes a development 
problem.  Daniel is loosely affiliated with my group in Ventana, so I 
can bug him with this kind of stuff.

jeff



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 17:52               ` Jeff Law
@ 2023-04-29 18:15                 ` Palmer Dabbelt
  0 siblings, 0 replies; 21+ messages in thread
From: Palmer Dabbelt @ 2023-04-29 18:15 UTC (permalink / raw)
  To: jeffreyalaw; +Cc: gcc-patches, pan2.li, yanzhang.wang, juzhe.zhong, kito.cheng

On Sat, 29 Apr 2023 10:52:50 PDT (-0700), jeffreyalaw@gmail.com wrote:
>
>
> On 4/29/23 11:48, Palmer Dabbelt wrote:
>
>>> Yea.  And taking advantage of that behavior is definitely a performance
>>> issue for QEMU.  There's still work to do though.  QEMU on vector code
>>> is running crazy slow.
>>
>> I guess we're kind of off the rails for a GCC patch, but that's
>> definately true.  Across the board RVV is going to just need a lot of
>> work, it's very different than SVE or AVX.
>>
>> Unfortunately QEMU performance isn't really a priority on our end, but
>> it's great to see folks digging into it.
> Well, when a user mode SPEC run goes from ~15 minutes to multiple hours
> for a single input workload within specint it becomes a development
> problem.  Daniel is loosely affiliated with my group in Ventana, so I
> can bug him with this kind of stuff.

We've got another team actually doing the mechanics of the SPEC runs, we 
just do the compiler.  So while I guess it is a problem, it's not my 
problem ;)

Maybe not the best way to go about things, but there's only so much that 
can be done...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-29 15:05     ` Jeff Law
  2023-04-29 17:21       ` Andrew Waterman
@ 2023-04-30  1:40       ` Kito Cheng
  2023-04-30 14:21         ` Li, Pan2
  2023-05-02 16:28         ` Jeff Law
  1 sibling, 2 replies; 21+ messages in thread
From: Kito Cheng @ 2023-04-30  1:40 UTC (permalink / raw)
  To: Jeff Law
  Cc: Li, Pan2, gcc-patches, juzhe.zhong, Wang, Yanzhang, Andrew Waterman

Hi Jeff:

The RTL pattern already models tail element and vector length well,
so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#        (if_then_else:VNx2BI (unspec:VNx2BI [
#                    (const_vector:VNx2BI repeat [
#                            (const_int 1 [0x1])
#                        ])  # all-1 mask
#                    (reg:DI 143)  # AVL reg, or vector length
#                    (const_int 2 [0x2]) # mask policy
#                    (const_int 0 [0])   # avl type
#                    (reg:SI 66 vl)
#                    (reg:SI 67 vtype)
#                ] UNSPEC_VPREDICATE)
#            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#                (reg/v:VNx2QI 137 [ v1 ]))
#            (unspec:VNx2BI [
#                    (reg:SI 0 zero)
#                ] UNSPEC_VUNDEF))) # maskoff and tail operand
#     (expr_list:REG_DEAD (reg:DI 143)
#        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#            (nil))))

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB      0 "register_operand")
       (if_then_else:VB
         (unspec:VB
           [(match_operand:VB 1 "vector_all_trues_mask_operand")
            (match_operand    4 "vector_length_operand")
            (match_operand    5 "const_int_operand")
            (match_operand    6 "const_int_operand")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
         (match_operand:VB    3 "vector_move_operand")
         (match_operand:VB    2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since
maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
       (if_then_else:VNx2BI (unspec:VNx2BI [
                   (const_vector:VNx2BI repeat [
                           (const_int 1 [0x1])
                       ])  # all-1 mask
                   (reg:DI 143) # AVL reg, or vector length
                   (const_int 2 [0x2]) # mask policy
                   (reg:SI 66 vl)
                   (reg:SI 67 vtype)
               ] UNSPEC_VPREDICATE)
           (const_vector:VNx2BI repeat [
                   (const_int 1 [0x1])
               ])    # all-1
           (unspec:VNx2BI [
                   (reg:SI 0 zero)
               ] UNSPEC_VUNDEF))) # still vundef
    (expr_list:REG_DEAD (reg:DI 143)
       (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.
>
> jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-30  1:40       ` Kito Cheng
@ 2023-04-30 14:21         ` Li, Pan2
  2023-05-02 16:28         ` Jeff Law
  1 sibling, 0 replies; 21+ messages in thread
From: Li, Pan2 @ 2023-04-30 14:21 UTC (permalink / raw)
  To: Kito Cheng, Jeff Law
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, Andrew Waterman

Thanks all for comments. Summary what I have learned from the mail thread as below. Please feel free to correct me if any mistake.

1. The RVV VMSET has tail policy and the high bits of target register can be overridden to 1 or retain the value they held according to the ISA.
2. The semantics of tail policy is different with s390 according the macro comment " /* The truth element value for vector comparisons.  Our instructions always generate -1 in that case.  */ ".
3. We still have a lot of work to do for the RISC-V besides compiler.
4. The RTL pattern of PATCH v1 models tail policy and vector length as well.

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Sunday, April 30, 2023 9:40 AM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Hi Jeff:

The RTL pattern already models tail element and vector length well, so I don't feel the first version of Pan's patch has any problem?

Input RTL pattern:

#(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
#        (if_then_else:VNx2BI (unspec:VNx2BI [
#                    (const_vector:VNx2BI repeat [
#                            (const_int 1 [0x1])
#                        ])  # all-1 mask
#                    (reg:DI 143)  # AVL reg, or vector length
#                    (const_int 2 [0x2]) # mask policy
#                    (const_int 0 [0])   # avl type
#                    (reg:SI 66 vl)
#                    (reg:SI 67 vtype)
#                ] UNSPEC_VPREDICATE)
#            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
#                (reg/v:VNx2QI 137 [ v1 ]))
#            (unspec:VNx2BI [
#                    (reg:SI 0 zero)
#                ] UNSPEC_VUNDEF))) # maskoff and tail operand
#     (expr_list:REG_DEAD (reg:DI 143)
#        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
#            (nil))))

And the split pattern, only did on tail/maskoff element with undefined value:

(define_split
 [(set (match_operand:VB      0 "register_operand")
       (if_then_else:VB
         (unspec:VB
           [(match_operand:VB 1 "vector_all_trues_mask_operand")
            (match_operand    4 "vector_length_operand")
            (match_operand    5 "const_int_operand")
            (match_operand    6 "const_int_operand")
            (reg:SI VL_REGNUM)
            (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
         (match_operand:VB    3 "vector_move_operand")
         (match_operand:VB    2 "vector_undef_operand")))] # maskoff
and tail operand, only match undef value

Then it turns into vmset, and also discard mask policy operand (since maskoff is undef means don't care IMO):

(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
       (if_then_else:VNx2BI (unspec:VNx2BI [
                   (const_vector:VNx2BI repeat [
                           (const_int 1 [0x1])
                       ])  # all-1 mask
                   (reg:DI 143) # AVL reg, or vector length
                   (const_int 2 [0x2]) # mask policy
                   (reg:SI 66 vl)
                   (reg:SI 67 vtype)
               ] UNSPEC_VPREDICATE)
           (const_vector:VNx2BI repeat [
                   (const_int 1 [0x1])
               ])    # all-1
           (unspec:VNx2BI [
                   (reg:SI 0 zero)
               ] UNSPEC_VUNDEF))) # still vundef
    (expr_list:REG_DEAD (reg:DI 143)
       (nil)))



On Sat, Apr 29, 2023 at 11:05 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.    I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for 
> instructions like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to 
> understand precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't 
> set them to all ones in this case, then that would mean that defining 
> that macro is simply wrong for risc-v.
>
> jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-04-30  1:40       ` Kito Cheng
  2023-04-30 14:21         ` Li, Pan2
@ 2023-05-02 16:28         ` Jeff Law
  2023-05-03 11:17           ` Li, Pan2
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff Law @ 2023-05-02 16:28 UTC (permalink / raw)
  To: Kito Cheng
  Cc: Li, Pan2, gcc-patches, juzhe.zhong, Wang, Yanzhang, Andrew Waterman



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well,
> so I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to 
return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the 
potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a 
vector register to -1, it could just copy from the destination of the 
vmset to the new target.  But if the vmset didn't set all the bits to 1, 
then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-05-02 16:28         ` Jeff Law
@ 2023-05-03 11:17           ` Li, Pan2
  2023-05-05 12:30             ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2023-05-03 11:17 UTC (permalink / raw)
  To: Jeff Law, Kito Cheng
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, Andrew Waterman

Thanks all for comments, will work with kito to make it happen.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng <kito.cheng@sifive.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-05-03 11:17           ` Li, Pan2
@ 2023-05-05 12:30             ` Li, Pan2
  2023-05-05 12:37               ` Kito Cheng
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2023-05-05 12:30 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Hi kito,

Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.

Pan


-----Original Message-----
From: Li, Pan2 
Sent: Wednesday, May 3, 2023 7:18 PM
To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@sifive.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

Thanks all for comments, will work with kito to make it happen.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, May 3, 2023 12:28 AM
To: Kito Cheng <kito.cheng@sifive.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET



On 4/29/23 19:40, Kito Cheng wrote:
> Hi Jeff:
> 
> The RTL pattern already models tail element and vector length well, so 
> I don't feel the first version of Pan's patch has any problem?
> 
> Input RTL pattern:
> 
> #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> #        (if_then_else:VNx2BI (unspec:VNx2BI [
> #                    (const_vector:VNx2BI repeat [
> #                            (const_int 1 [0x1])
> #                        ])  # all-1 mask
> #                    (reg:DI 143)  # AVL reg, or vector length
> #                    (const_int 2 [0x2]) # mask policy
> #                    (const_int 0 [0])   # avl type
> #                    (reg:SI 66 vl)
> #                    (reg:SI 67 vtype)
> #                ] UNSPEC_VPREDICATE)
> #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> #                (reg/v:VNx2QI 137 [ v1 ]))
> #            (unspec:VNx2BI [
> #                    (reg:SI 0 zero)
> #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> #     (expr_list:REG_DEAD (reg:DI 143)
> #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> #            (nil))))
> 
> And the split pattern, only did on tail/maskoff element with undefined value:
> 
> (define_split
>   [(set (match_operand:VB      0 "register_operand")
>         (if_then_else:VB
>           (unspec:VB
>             [(match_operand:VB 1 "vector_all_trues_mask_operand")
>              (match_operand    4 "vector_length_operand")
>              (match_operand    5 "const_int_operand")
>              (match_operand    6 "const_int_operand")
>              (reg:SI VL_REGNUM)
>              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
>           (match_operand:VB    3 "vector_move_operand")
>           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> and tail operand, only match undef value
> 
> Then it turns into vmset, and also discard mask policy operand (since 
> maskoff is undef means don't care IMO):
> 
> (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
>         (if_then_else:VNx2BI (unspec:VNx2BI [
>                     (const_vector:VNx2BI repeat [
>                             (const_int 1 [0x1])
>                         ])  # all-1 mask
>                     (reg:DI 143) # AVL reg, or vector length
>                     (const_int 2 [0x2]) # mask policy
>                     (reg:SI 66 vl)
>                     (reg:SI 67 vtype)
>                 ] UNSPEC_VPREDICATE)
>             (const_vector:VNx2BI repeat [
>                     (const_int 1 [0x1])
>                 ])    # all-1
>             (unspec:VNx2BI [
>                     (reg:SI 0 zero)
>                 ] UNSPEC_VUNDEF))) # still vundef
>      (expr_list:REG_DEAD (reg:DI 143)
>         (nil)))
Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call 
chain.   If that doesn't match the actual register state after the 
instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.

For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.

With all the UNSPECs in place, this may not be a problem in practice. 
Unsure.  I'm willing to defer to you on this Kito.

Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-05-05 12:30             ` Li, Pan2
@ 2023-05-05 12:37               ` Kito Cheng
  2023-05-05 12:45                 ` Li, Pan2
  0 siblings, 1 reply; 21+ messages in thread
From: Kito Cheng @ 2023-05-05 12:37 UTC (permalink / raw)
  To: Li, Pan2; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@sifive.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng <kito.cheng@sifive.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, so
> > I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > #                    (const_vector:VNx2BI repeat [
> > #                            (const_int 1 [0x1])
> > #                        ])  # all-1 mask
> > #                    (reg:DI 143)  # AVL reg, or vector length
> > #                    (const_int 2 [0x2]) # mask policy
> > #                    (const_int 0 [0])   # avl type
> > #                    (reg:SI 66 vl)
> > #                    (reg:SI 67 vtype)
> > #                ] UNSPEC_VPREDICATE)
> > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #                (reg/v:VNx2QI 137 [ v1 ]))
> > #            (unspec:VNx2BI [
> > #                    (reg:SI 0 zero)
> > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > #     (expr_list:REG_DEAD (reg:DI 143)
> > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #            (nil))))
> >
> > And the split pattern, only did on tail/maskoff element with undefined value:
> >
> > (define_split
> >   [(set (match_operand:VB      0 "register_operand")
> >         (if_then_else:VB
> >           (unspec:VB
> >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >              (match_operand    4 "vector_length_operand")
> >              (match_operand    5 "const_int_operand")
> >              (match_operand    6 "const_int_operand")
> >              (reg:SI VL_REGNUM)
> >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >           (match_operand:VB    3 "vector_move_operand")
> >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand (since
> > maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> >         (if_then_else:VNx2BI (unspec:VNx2BI [
> >                     (const_vector:VNx2BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])  # all-1 mask
> >                     (reg:DI 143) # AVL reg, or vector length
> >                     (const_int 2 [0x2]) # mask policy
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (const_vector:VNx2BI repeat [
> >                     (const_int 1 [0x1])
> >                 ])    # all-1
> >             (unspec:VNx2BI [
> >                     (reg:SI 0 zero)
> >                 ] UNSPEC_VUNDEF))) # still vundef
> >      (expr_list:REG_DEAD (reg:DI 143)
> >         (nil)))
> Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-05-05 12:37               ` Kito Cheng
@ 2023-05-05 12:45                 ` Li, Pan2
  2023-05-05 14:51                   ` Kito Cheng
  0 siblings, 1 reply; 21+ messages in thread
From: Li, Pan2 @ 2023-05-05 12:45 UTC (permalink / raw)
  To: Kito Cheng; +Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang

Ok, sounds good. Thank you!

Pan

-----Original Message-----
From: Kito Cheng <kito.cheng@sifive.com> 
Sent: Friday, May 5, 2023 8:37 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

I will take V1 and commit to trunk after my local test is done :)

On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Hi kito,
>
> Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, May 3, 2023 7:18 PM
> To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng 
> <kito.cheng@sifive.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang 
> <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
> Thanks all for comments, will work with kito to make it happen.
>
> Pan
>
> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Wednesday, May 3, 2023 12:28 AM
> To: Kito Cheng <kito.cheng@sifive.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; 
> juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew 
> Waterman <andrew@sifive.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify 
> to VMSET
>
>
>
> On 4/29/23 19:40, Kito Cheng wrote:
> > Hi Jeff:
> >
> > The RTL pattern already models tail element and vector length well, 
> > so I don't feel the first version of Pan's patch has any problem?
> >
> > Input RTL pattern:
> >
> > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > #                    (const_vector:VNx2BI repeat [
> > #                            (const_int 1 [0x1])
> > #                        ])  # all-1 mask
> > #                    (reg:DI 143)  # AVL reg, or vector length
> > #                    (const_int 2 [0x2]) # mask policy
> > #                    (const_int 0 [0])   # avl type
> > #                    (reg:SI 66 vl)
> > #                    (reg:SI 67 vtype)
> > #                ] UNSPEC_VPREDICATE)
> > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > #                (reg/v:VNx2QI 137 [ v1 ]))
> > #            (unspec:VNx2BI [
> > #                    (reg:SI 0 zero)
> > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > #     (expr_list:REG_DEAD (reg:DI 143)
> > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > #            (nil))))
> >
> > And the split pattern, only did on tail/maskoff element with undefined value:
> >
> > (define_split
> >   [(set (match_operand:VB      0 "register_operand")
> >         (if_then_else:VB
> >           (unspec:VB
> >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> >              (match_operand    4 "vector_length_operand")
> >              (match_operand    5 "const_int_operand")
> >              (match_operand    6 "const_int_operand")
> >              (reg:SI VL_REGNUM)
> >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> >           (match_operand:VB    3 "vector_move_operand")
> >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > and tail operand, only match undef value
> >
> > Then it turns into vmset, and also discard mask policy operand 
> > (since maskoff is undef means don't care IMO):
> >
> > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> >         (if_then_else:VNx2BI (unspec:VNx2BI [
> >                     (const_vector:VNx2BI repeat [
> >                             (const_int 1 [0x1])
> >                         ])  # all-1 mask
> >                     (reg:DI 143) # AVL reg, or vector length
> >                     (const_int 2 [0x2]) # mask policy
> >                     (reg:SI 66 vl)
> >                     (reg:SI 67 vtype)
> >                 ] UNSPEC_VPREDICATE)
> >             (const_vector:VNx2BI repeat [
> >                     (const_int 1 [0x1])
> >                 ])    # all-1
> >             (unspec:VNx2BI [
> >                     (reg:SI 0 zero)
> >                 ] UNSPEC_VUNDEF))) # still vundef
> >      (expr_list:REG_DEAD (reg:DI 143)
> >         (nil)))
> Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> chain.   If that doesn't match the actual register state after the
> instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
>
> For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
>
> With all the UNSPECs in place, this may not be a problem in practice.
> Unsure.  I'm willing to defer to you on this Kito.
>
> Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
  2023-05-05 12:45                 ` Li, Pan2
@ 2023-05-05 14:51                   ` Kito Cheng
  0 siblings, 0 replies; 21+ messages in thread
From: Kito Cheng @ 2023-05-05 14:51 UTC (permalink / raw)
  To: Li, Pan2; +Cc: Kito Cheng, gcc-patches, juzhe.zhong, Wang, Yanzhang

pushed v1 to trunk

On Fri, May 5, 2023 at 8:46 PM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Ok, sounds good. Thank you!
>
> Pan
>
> -----Original Message-----
> From: Kito Cheng <kito.cheng@sifive.com>
> Sent: Friday, May 5, 2023 8:37 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET
>
> I will take V1 and commit to trunk after my local test is done :)
>
> On Fri, May 5, 2023 at 8:30 PM Li, Pan2 <pan2.li@intel.com> wrote:
> >
> > Hi kito,
> >
> > Could you please help to share any suggestion about the PATCH? Comparing the V1 and V2.
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Wednesday, May 3, 2023 7:18 PM
> > To: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng
> > <kito.cheng@sifive.com>
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang
> > <yanzhang.wang@intel.com>; Andrew Waterman <andrew@sifive.com>
> > Subject: RE: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> > Thanks all for comments, will work with kito to make it happen.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Jeff Law <jeffreyalaw@gmail.com>
> > Sent: Wednesday, May 3, 2023 12:28 AM
> > To: Kito Cheng <kito.cheng@sifive.com>
> > Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org;
> > juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; Andrew
> > Waterman <andrew@sifive.com>
> > Subject: Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify
> > to VMSET
> >
> >
> >
> > On 4/29/23 19:40, Kito Cheng wrote:
> > > Hi Jeff:
> > >
> > > The RTL pattern already models tail element and vector length well,
> > > so I don't feel the first version of Pan's patch has any problem?
> > >
> > > Input RTL pattern:
> > >
> > > #(insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > > #        (if_then_else:VNx2BI (unspec:VNx2BI [
> > > #                    (const_vector:VNx2BI repeat [
> > > #                            (const_int 1 [0x1])
> > > #                        ])  # all-1 mask
> > > #                    (reg:DI 143)  # AVL reg, or vector length
> > > #                    (const_int 2 [0x2]) # mask policy
> > > #                    (const_int 0 [0])   # avl type
> > > #                    (reg:SI 66 vl)
> > > #                    (reg:SI 67 vtype)
> > > #                ] UNSPEC_VPREDICATE)
> > > #            (geu:VNx2BI (reg/v:VNx2QI 137 [ v1 ])
> > > #                (reg/v:VNx2QI 137 [ v1 ]))
> > > #            (unspec:VNx2BI [
> > > #                    (reg:SI 0 zero)
> > > #                ] UNSPEC_VUNDEF))) # maskoff and tail operand
> > > #     (expr_list:REG_DEAD (reg:DI 143)
> > > #        (expr_list:REG_DEAD (reg/v:VNx2QI 137 [ v1 ])
> > > #            (nil))))
> > >
> > > And the split pattern, only did on tail/maskoff element with undefined value:
> > >
> > > (define_split
> > >   [(set (match_operand:VB      0 "register_operand")
> > >         (if_then_else:VB
> > >           (unspec:VB
> > >             [(match_operand:VB 1 "vector_all_trues_mask_operand")
> > >              (match_operand    4 "vector_length_operand")
> > >              (match_operand    5 "const_int_operand")
> > >              (match_operand    6 "const_int_operand")
> > >              (reg:SI VL_REGNUM)
> > >              (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> > >           (match_operand:VB    3 "vector_move_operand")
> > >           (match_operand:VB    2 "vector_undef_operand")))] # maskoff
> > > and tail operand, only match undef value
> > >
> > > Then it turns into vmset, and also discard mask policy operand
> > > (since maskoff is undef means don't care IMO):
> > >
> > > (insn 10 7 12 2 (set (reg:VNx2BI 134 [ _1 ])
> > >         (if_then_else:VNx2BI (unspec:VNx2BI [
> > >                     (const_vector:VNx2BI repeat [
> > >                             (const_int 1 [0x1])
> > >                         ])  # all-1 mask
> > >                     (reg:DI 143) # AVL reg, or vector length
> > >                     (const_int 2 [0x2]) # mask policy
> > >                     (reg:SI 66 vl)
> > >                     (reg:SI 67 vtype)
> > >                 ] UNSPEC_VPREDICATE)
> > >             (const_vector:VNx2BI repeat [
> > >                     (const_int 1 [0x1])
> > >                 ])    # all-1
> > >             (unspec:VNx2BI [
> > >                     (reg:SI 0 zero)
> > >                 ] UNSPEC_VUNDEF))) # still vundef
> > >      (expr_list:REG_DEAD (reg:DI 143)
> > >         (nil)))
> > Right.  My concern is that when we call relational_result it's going to return -1 (as a vector of bools) which bubbles up through the call
> > chain.   If that doesn't match the actual register state after the
> > instruction (irrespective of the tail policy), then we have the potential to generate incorrect code.
> >
> > For example, if there's a subsequent instruction that tried to set a vector register to -1, it could just copy from the destination of the vmset to the new target.  But if the vmset didn't set all the bits to 1, then the code is wrong.
> >
> > With all the UNSPECs in place, this may not be a problem in practice.
> > Unsure.  I'm willing to defer to you on this Kito.
> >
> > Jeff

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-05-05 14:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-28 15:21 [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET pan2.li
2023-04-28 21:47 ` Jeff Law
2023-04-29  2:55   ` Li, Pan2
2023-04-29 13:35     ` Li, Pan2
2023-04-29 15:05     ` Jeff Law
2023-04-29 17:21       ` Andrew Waterman
2023-04-29 17:28         ` Palmer Dabbelt
2023-04-29 17:46           ` Jeff Law
2023-04-29 17:48             ` Palmer Dabbelt
2023-04-29 17:52               ` Jeff Law
2023-04-29 18:15                 ` Palmer Dabbelt
2023-04-29 17:49         ` Jeff Law
2023-04-30  1:40       ` Kito Cheng
2023-04-30 14:21         ` Li, Pan2
2023-05-02 16:28         ` Jeff Law
2023-05-03 11:17           ` Li, Pan2
2023-05-05 12:30             ` Li, Pan2
2023-05-05 12:37               ` Kito Cheng
2023-05-05 12:45                 ` Li, Pan2
2023-05-05 14:51                   ` Kito Cheng
2023-04-29 13:32 ` [PATCH v2] " pan2.li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).