[PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
@ 2023-04-17 14:50 pan2.li
  2023-04-18  1:30 ` Li, Pan2
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: pan2.li @ 2023-04-17 14:50 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, rguenther, pan2.li, yanzhang.wang,
	richard.sandiford

From: Pan Li <pan2.li@intel.com>

This patch add the optimization for the vector IOR(V1, NOT V1). Assume
we have below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
{
  return __riscv_vmorn_mm_b32(v1, v1, vl);
}

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should
be no difference execpt the operator when compare the VMORN and VMANDN
for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1)
simplification besides the existing SCALAR_INT mode.

gcc/ChangeLog:

	* machmode.h (VECTOR_BOOL_MODE_P):
	* simplify-rtx.cc (valid_mode_for_ior_simplification_p):
	(simplify_context::simplify_binary_operation_1):

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
	* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/machmode.h                                |  4 ++
 gcc/simplify-rtx.cc                           | 10 +++-
 .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
 .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
 4 files changed, 63 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c

diff --git a/gcc/machmode.h b/gcc/machmode.h
index f1865c1ef42..771bae89cb7 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)			\
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)		\
+
 /* Nonzero if MODE is a scalar integral mode.  */
 #define SCALAR_INT_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_INT		\
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index ee75079917f..eff27b835bf 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
   return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);
 }
 
+static bool
+valid_mode_for_ior_simplification_p (machine_mode mode)
+{
+  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode);
+}
+
 /* Test whether expression, X, is an immediate constant that represents
    the most significant bit of machine mode MODE.  */
 
@@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
 	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
 	  && ! side_effects_p (op0)
-	  && SCALAR_INT_MODE_P (mode))
-	return constm1_rtx;
+	  && valid_mode_for_ior_simplification_p (mode))
+	return CONST1_RTX (mode);
 
       /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
       if (CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
index 83cc4a1b5a5..57d0241675a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
 /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
-/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
 /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
-/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
new file mode 100644
index 00000000000..ec3bd0baf03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
+
+#include <stdint.h>
+
+uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
+{
+  return a | ~a;
+}
+
+uint16_t test_simplify_ior_scalar_case_1 (uint16_t a)
+{
+  return a | ~a;
+}
+
+uint32_t test_simplify_ior_scalar_case_2 (uint32_t a)
+{
+  return a | ~a;
+}
+
+uint64_t test_simplify_ior_scalar_case_3 (uint64_t a)
+{
+  return a | ~a;
+}
+
+int8_t test_simplify_ior_scalar_case_4 (int8_t a)
+{
+  return a | ~a;
+}
+
+int16_t test_simplify_ior_scalar_case_5 (int16_t a)
+{
+  return a | ~a;
+}
+
+int32_t test_simplify_ior_scalar_case_6 (int32_t a)
+{
+  return a | ~a;
+}
+
+int64_t test_simplify_ior_scalar_case_7 (int64_t a)
+{
+  return a | ~a;
+}
+
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
+/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
+/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-17 14:50 [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion pan2.li
@ 2023-04-18  1:30 ` Li, Pan2
  2023-04-18  7:59   ` Richard Biener
  2023-04-18  9:08 ` [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization pan2.li
  2023-04-19  9:18 ` [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND pan2.li
  2 siblings, 1 reply; 16+ messages in thread
From: Li, Pan2 @ 2023-04-18  1:30 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang, richard.sandiford

Passed the X86 bootstrap and regression tests.

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Monday, April 17, 2023 10:50 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

From: Pan Li <pan2.li@intel.com>

This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
  return __riscv_vmorn_mm_b32(v1, v1, vl); }

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.

gcc/ChangeLog:

	* machmode.h (VECTOR_BOOL_MODE_P):
	* simplify-rtx.cc (valid_mode_for_ior_simplification_p):
	(simplify_context::simplify_binary_operation_1):

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
	* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/machmode.h                                |  4 ++
 gcc/simplify-rtx.cc                           | 10 +++-
 .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
 .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
 4 files changed, 63 insertions(+), 4 deletions(-)  create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c

diff --git a/gcc/machmode.h b/gcc/machmode.h index f1865c1ef42..771bae89cb7 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)			\
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)		\
+
 /* Nonzero if MODE is a scalar integral mode.  */
 #define SCALAR_INT_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_INT		\
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index ee75079917f..eff27b835bf 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
   return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);  }
 
+static bool
+valid_mode_for_ior_simplification_p (machine_mode mode) {
+  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
+
 /* Test whether expression, X, is an immediate constant that represents
    the most significant bit of machine mode MODE.  */
 
@@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
 	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
 	  && ! side_effects_p (op0)
-	  && SCALAR_INT_MODE_P (mode))
-	return constm1_rtx;
+	  && valid_mode_for_ior_simplification_p (mode))
+	return CONST1_RTX (mode);
 
       /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
       if (CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
index 83cc4a1b5a5..57d0241675a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
 /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
-/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
 /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
-/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
new file mode 100644
index 00000000000..ec3bd0baf03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
+
+#include <stdint.h>
+
+uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
+  return a | ~a;
+}
+
+uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
+  return a | ~a;
+}
+
+uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
+  return a | ~a;
+}
+
+uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
+  return a | ~a;
+}
+
+int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
+  return a | ~a;
+}
+
+int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
+  return a | ~a;
+}
+
+int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
+  return a | ~a;
+}
+
+int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
+  return a | ~a;
+}
+
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
+/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
+/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
--
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  1:30 ` Li, Pan2
@ 2023-04-18  7:59   ` Richard Biener
  2023-04-18  8:00     ` Richard Biener
  2023-04-18  8:08     ` Li, Pan2
  0 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2023-04-18  7:59 UTC (permalink / raw)
  To: Li, Pan2
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Passed the X86 bootstrap and regression tests.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, April 17, 2023 10:50 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
>
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
>   return __riscv_vmorn_mm_b32(v1, v1, vl); }
>
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
>
> The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
>
> gcc/ChangeLog:
>
>         * machmode.h (VECTOR_BOOL_MODE_P):
>         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
>         (simplify_context::simplify_binary_operation_1):
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
>         * gcc.target/riscv/simplify_ior_optimization.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/machmode.h                                |  4 ++
>  gcc/simplify-rtx.cc                           | 10 +++-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  4 files changed, 63 insertions(+), 4 deletions(-)  create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
>
> diff --git a/gcc/machmode.h b/gcc/machmode.h index f1865c1ef42..771bae89cb7 100644
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
> @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
>     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
>     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
>
> +/* Nonzero if MODE is a vector bool mode.  */
> +#define VECTOR_BOOL_MODE_P(MODE)                       \
> +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> +
>  /* Nonzero if MODE is a scalar integral mode.  */
>  #define SCALAR_INT_MODE_P(MODE)                        \
>    (GET_MODE_CLASS (MODE) == MODE_INT           \
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index ee75079917f..eff27b835bf 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
>    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);  }
>
> +static bool
> +valid_mode_for_ior_simplification_p (machine_mode mode) {
> +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> +
>  /* Test whether expression, X, is an immediate constant that represents
>     the most significant bit of machine mode MODE.  */
>
> @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
> -         && SCALAR_INT_MODE_P (mode))
> -       return constm1_rtx;
> +         && valid_mode_for_ior_simplification_p (mode))

for simple predicates like this please do not split them out, it makes
understanding the code more difficult.

> +       return CONST1_RTX (mode);

shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for VECTOR_BOOL
and not also for VECTOR_INT?  You're citing AND and that does

      /* A & (~A) -> 0 */
      if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
           || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
          && ! side_effects_p (op0)
          && GET_MODE_CLASS (mode) != MODE_CC)
        return CONST0_RTX (mode);

so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?

Richard.

>
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> -/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  7:59   ` Richard Biener
@ 2023-04-18  8:00     ` Richard Biener
  2023-04-18  8:20       ` Li, Pan2
  2023-04-18  8:08     ` Li, Pan2
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-04-18  8:00 UTC (permalink / raw)
  To: Li, Pan2
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

On Tue, Apr 18, 2023 at 9:59 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Passed the X86 bootstrap and regression tests.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, April 17, 2023 10:50 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
> >
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }

Btw, this also shows you might want to consider inlining the intrinsics in
target specific folding or implement them with generic vector operations
in the headers.

> >
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> >
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> >
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> >
> > The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> >
> > gcc/ChangeLog:
> >
> >         * machmode.h (VECTOR_BOOL_MODE_P):
> >         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
> >         (simplify_context::simplify_binary_operation_1):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> >         * gcc.target/riscv/simplify_ior_optimization.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/machmode.h                                |  4 ++
> >  gcc/simplify-rtx.cc                           | 10 +++-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  4 files changed, 63 insertions(+), 4 deletions(-)  create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> >
> > diff --git a/gcc/machmode.h b/gcc/machmode.h index f1865c1ef42..771bae89cb7 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
> >
> > +/* Nonzero if MODE is a vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE)                       \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> > +
> >  /* Nonzero if MODE is a scalar integral mode.  */
> >  #define SCALAR_INT_MODE_P(MODE)                        \
> >    (GET_MODE_CLASS (MODE) == MODE_INT           \
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index ee75079917f..eff27b835bf 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
> >    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);  }
> >
> > +static bool
> > +valid_mode_for_ior_simplification_p (machine_mode mode) {
> > +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> > +
> >  /* Test whether expression, X, is an immediate constant that represents
> >     the most significant bit of machine mode MODE.  */
> >
> > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >           && ! side_effects_p (op0)
> > -         && SCALAR_INT_MODE_P (mode))
> > -       return constm1_rtx;
> > +         && valid_mode_for_ior_simplification_p (mode))
>
> for simple predicates like this please do not split them out, it makes
> understanding the code more difficult.
>
> > +       return CONST1_RTX (mode);
>
> shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for VECTOR_BOOL
> and not also for VECTOR_INT?  You're citing AND and that does
>
>       /* A & (~A) -> 0 */
>       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
>           && GET_MODE_CLASS (mode) != MODE_CC)
>         return CONST0_RTX (mode);
>
> so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?
>
> Richard.
>
> >
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> > -/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  7:59   ` Richard Biener
  2023-04-18  8:00     ` Richard Biener
@ 2023-04-18  8:08     ` Li, Pan2
  1 sibling, 0 replies; 16+ messages in thread
From: Li, Pan2 @ 2023-04-18  8:08 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

Thanks Richard for comments, CIL and will have a try for the suggestions.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, April 18, 2023 4:00 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Passed the X86 bootstrap and regression tests.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, April 17, 2023 10:50 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; 
> Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang 
> <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
>
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
>   return __riscv_vmorn_mm_b32(v1, v1, vl); }
>
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ 
> v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
>
> The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
>
> gcc/ChangeLog:
>
>         * machmode.h (VECTOR_BOOL_MODE_P):
>         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
>         (simplify_context::simplify_binary_operation_1):
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
>         * gcc.target/riscv/simplify_ior_optimization.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/machmode.h                                |  4 ++
>  gcc/simplify-rtx.cc                           | 10 +++-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  4 files changed, 63 insertions(+), 4 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
>
> diff --git a/gcc/machmode.h b/gcc/machmode.h index 
> f1865c1ef42..771bae89cb7 100644
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
> @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
>     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
>     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
>
> +/* Nonzero if MODE is a vector bool mode.  */
> +#define VECTOR_BOOL_MODE_P(MODE)                       \
> +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> +
>  /* Nonzero if MODE is a scalar integral mode.  */
>  #define SCALAR_INT_MODE_P(MODE)                        \
>    (GET_MODE_CLASS (MODE) == MODE_INT           \
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> ee75079917f..eff27b835bf 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
>    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);  }
>
> +static bool
> +valid_mode_for_ior_simplification_p (machine_mode mode) {
> +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> +
>  /* Test whether expression, X, is an immediate constant that represents
>     the most significant bit of machine mode MODE.  */
>
> @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
> -         && SCALAR_INT_MODE_P (mode))
> -       return constm1_rtx;
> +         && valid_mode_for_ior_simplification_p (mode))

for simple predicates like this please do not split them out, it makes understanding the code more difficult.
[pan]: Sure, will update this part.

> +       return CONST1_RTX (mode);

shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for VECTOR_BOOL and not also for VECTOR_INT?  You're citing AND and that does
[pan]: will have a try for CONSTM1_RTX. I am not very sure there is some ad-hoc reason when compare to AND, thus only add the VECTOR_BOOL covered by test, will have update to similar way as AND.

      /* A & (~A) -> 0 */
      if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
           || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
          && ! side_effects_p (op0)
          && GET_MODE_CLASS (mode) != MODE_CC)
        return CONST0_RTX (mode);

so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?

Richard.

>
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t 
> test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
> -/* { dg-final { scan-assembler-times 
> {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 
> 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 
> 14 } } */ diff --git 
> a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c 
> b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  8:00     ` Richard Biener
@ 2023-04-18  8:20       ` Li, Pan2
  2023-04-18  9:11         ` Li, Pan2
  0 siblings, 1 reply; 16+ messages in thread
From: Li, Pan2 @ 2023-04-18  8:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

I look into the IOR simplification code for this optimization. Mostly I try to implement them with generic vector operations.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, April 18, 2023 4:01 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

On Tue, Apr 18, 2023 at 9:59 AM Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Passed the X86 bootstrap and regression tests.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, April 17, 2023 10:50 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; 
> > Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang 
> > <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
> >
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }

Btw, this also shows you might want to consider inlining the intrinsics in target specific folding or implement them with generic vector operations in the headers.

> >
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> >
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> >
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 
> > [ v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> >
> > The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> >
> > gcc/ChangeLog:
> >
> >         * machmode.h (VECTOR_BOOL_MODE_P):
> >         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
> >         (simplify_context::simplify_binary_operation_1):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> >         * gcc.target/riscv/simplify_ior_optimization.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/machmode.h                                |  4 ++
> >  gcc/simplify-rtx.cc                           | 10 +++-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  4 files changed, 63 insertions(+), 4 deletions(-)  create mode 
> > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> >
> > diff --git a/gcc/machmode.h b/gcc/machmode.h index 
> > f1865c1ef42..771bae89cb7 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
> >
> > +/* Nonzero if MODE is a vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE)                       \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> > +
> >  /* Nonzero if MODE is a scalar integral mode.  */
> >  #define SCALAR_INT_MODE_P(MODE)                        \
> >    (GET_MODE_CLASS (MODE) == MODE_INT           \
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..eff27b835bf 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
> >    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode);  
> > }
> >
> > +static bool
> > +valid_mode_for_ior_simplification_p (machine_mode mode) {
> > +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> > +
> >  /* Test whether expression, X, is an immediate constant that represents
> >     the most significant bit of machine mode MODE.  */
> >
> > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >           && ! side_effects_p (op0)
> > -         && SCALAR_INT_MODE_P (mode))
> > -       return constm1_rtx;
> > +         && valid_mode_for_ior_simplification_p (mode))
>
> for simple predicates like this please do not split them out, it makes 
> understanding the code more difficult.
>
> > +       return CONST1_RTX (mode);
>
> shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for 
> VECTOR_BOOL and not also for VECTOR_INT?  You're citing AND and that 
> does
>
>       /* A & (~A) -> 0 */
>       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
>           && GET_MODE_CLASS (mode) != MODE_CC)
>         return CONST0_RTX (mode);
>
> so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?
>
> Richard.
>
> >
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git 
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t 
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not 
> > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 
> > 14 } } */
> >  /* { dg-final { scan-assembler-times 
> > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git 
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c 
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } 
> > +*/
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-17 14:50 [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion pan2.li
  2023-04-18  1:30 ` Li, Pan2
@ 2023-04-18  9:08 ` pan2.li
  2023-04-19  6:40   ` Richard Biener
  2023-04-19  9:18 ` [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND pan2.li
  2 siblings, 1 reply; 16+ messages in thread
From: pan2.li @ 2023-04-18  9:08 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, rguenther, pan2.li, richard.sandiford,
	yanzhang.wang

From: Pan Li <pan2.li@intel.com>

This patch add the optimization for the vector IOR(V1, NOT V1). Assume
we have below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
{
  return __riscv_vmorn_mm_b32(v1, v1, vl);
}

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should
be no difference execpt the operator when compare the VMORN and VMANDN
for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1)
simplification besides the existing SCALAR_INT mode.

gcc/ChangeLog:

	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
	* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/simplify-rtx.cc                           |  4 +-
 .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
 .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
 3 files changed, 53 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index ee75079917f..3bc9b2f55ea 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
 	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
 	  && ! side_effects_p (op0)
-	  && SCALAR_INT_MODE_P (mode))
-	return constm1_rtx;
+	  && GET_MODE_CLASS (mode) != MODE_CC)
+	return CONSTM1_RTX (mode);
 
       /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
       if (CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
index 83cc4a1b5a5..57d0241675a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
 /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
-/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
 /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
-/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
new file mode 100644
index 00000000000..ec3bd0baf03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
+
+#include <stdint.h>
+
+uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
+{
+  return a | ~a;
+}
+
+uint16_t test_simplify_ior_scalar_case_1 (uint16_t a)
+{
+  return a | ~a;
+}
+
+uint32_t test_simplify_ior_scalar_case_2 (uint32_t a)
+{
+  return a | ~a;
+}
+
+uint64_t test_simplify_ior_scalar_case_3 (uint64_t a)
+{
+  return a | ~a;
+}
+
+int8_t test_simplify_ior_scalar_case_4 (int8_t a)
+{
+  return a | ~a;
+}
+
+int16_t test_simplify_ior_scalar_case_5 (int16_t a)
+{
+  return a | ~a;
+}
+
+int32_t test_simplify_ior_scalar_case_6 (int32_t a)
+{
+  return a | ~a;
+}
+
+int64_t test_simplify_ior_scalar_case_7 (int64_t a)
+{
+  return a | ~a;
+}
+
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
+/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
+/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  8:20       ` Li, Pan2
@ 2023-04-18  9:11         ` Li, Pan2
  2023-04-19  6:03           ` Li, Pan2
  0 siblings, 1 reply; 16+ messages in thread
From: Li, Pan2 @ 2023-04-18  9:11 UTC (permalink / raw)
  To: Li, Pan2, Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

Update the PATCH v2 here, https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615937.html.

Running the boostrap/regression test, and keep you posted.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Tuesday, April 18, 2023 4:20 PM
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

I look into the IOR simplification code for this optimization. Mostly I try to implement them with generic vector operations.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, April 18, 2023 4:01 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

On Tue, Apr 18, 2023 at 9:59 AM Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Passed the X86 bootstrap and regression tests.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, April 17, 2023 10:50 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; 
> > Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang 
> > <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
> >
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }

Btw, this also shows you might want to consider inlining the intrinsics in target specific folding or implement them with generic vector operations in the headers.

> >
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> >
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> >
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 
> > [ v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> >
> > The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> >
> > gcc/ChangeLog:
> >
> >         * machmode.h (VECTOR_BOOL_MODE_P):
> >         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
> >         (simplify_context::simplify_binary_operation_1):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> >         * gcc.target/riscv/simplify_ior_optimization.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/machmode.h                                |  4 ++
> >  gcc/simplify-rtx.cc                           | 10 +++-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  4 files changed, 63 insertions(+), 4 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> >
> > diff --git a/gcc/machmode.h b/gcc/machmode.h index
> > f1865c1ef42..771bae89cb7 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
> >
> > +/* Nonzero if MODE is a vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE)                       \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> > +
> >  /* Nonzero if MODE is a scalar integral mode.  */
> >  #define SCALAR_INT_MODE_P(MODE)                        \
> >    (GET_MODE_CLASS (MODE) == MODE_INT           \
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..eff27b835bf 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
> >    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode); 
> > }
> >
> > +static bool
> > +valid_mode_for_ior_simplification_p (machine_mode mode) {
> > +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> > +
> >  /* Test whether expression, X, is an immediate constant that represents
> >     the most significant bit of machine mode MODE.  */
> >
> > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >           && ! side_effects_p (op0)
> > -         && SCALAR_INT_MODE_P (mode))
> > -       return constm1_rtx;
> > +         && valid_mode_for_ior_simplification_p (mode))
>
> for simple predicates like this please do not split them out, it makes 
> understanding the code more difficult.
>
> > +       return CONST1_RTX (mode);
>
> shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for 
> VECTOR_BOOL and not also for VECTOR_INT?  You're citing AND and that 
> does
>
>       /* A & (~A) -> 0 */
>       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
>           && GET_MODE_CLASS (mode) != MODE_CC)
>         return CONST0_RTX (mode);
>
> so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?
>
> Richard.
>
> >
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not 
> > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */
> >  /* { dg-final { scan-assembler-times 
> > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git 
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } 
> > +*/
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
  2023-04-18  9:11         ` Li, Pan2
@ 2023-04-19  6:03           ` Li, Pan2
  0 siblings, 0 replies; 16+ messages in thread
From: Li, Pan2 @ 2023-04-19  6:03 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther, Wang, Yanzhang,
	richard.sandiford

Passed the X86 bootstrap and regression tests.

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Tuesday, April 18, 2023 5:12 PM
To: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

Update the PATCH v2 here, https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615937.html.

Running the boostrap/regression test, and keep you posted.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Tuesday, April 18, 2023 4:20 PM
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

I look into the IOR simplification code for this optimization. Mostly I try to implement them with generic vector operations.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com>
Sent: Tuesday, April 18, 2023 4:01 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; Wang, Yanzhang <yanzhang.wang@intel.com>; richard.sandiford@arm.com
Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

On Tue, Apr 18, 2023 at 9:59 AM Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Passed the X86 bootstrap and regression tests.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2 <pan2.li@intel.com>
> > Sent: Monday, April 17, 2023 10:50 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; rguenther@suse.de; 
> > Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang 
> > <yanzhang.wang@intel.com>; richard.sandiford@arm.com
> > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we have below sample code.
> >
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }

Btw, this also shows you might want to consider inlining the intrinsics in target specific folding or implement them with generic vector operations in the headers.

> >
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> >
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> >
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 
> > [ v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> >
> > The similar optimization like VMANDN has enabled already. There should be no difference execpt the operator when compare the VMORN and VMANDN for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> >
> > gcc/ChangeLog:
> >
> >         * machmode.h (VECTOR_BOOL_MODE_P):
> >         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
> >         (simplify_context::simplify_binary_operation_1):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> >         * gcc.target/riscv/simplify_ior_optimization.c: New test.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/machmode.h                                |  4 ++
> >  gcc/simplify-rtx.cc                           | 10 +++-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  4 files changed, 63 insertions(+), 4 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> >
> > diff --git a/gcc/machmode.h b/gcc/machmode.h index
> > f1865c1ef42..771bae89cb7 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
> >
> > +/* Nonzero if MODE is a vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE)                       \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> > +
> >  /* Nonzero if MODE is a scalar integral mode.  */
> >  #define SCALAR_INT_MODE_P(MODE)                        \
> >    (GET_MODE_CLASS (MODE) == MODE_INT           \
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..eff27b835bf 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
> >    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode); 
> > }
> >
> > +static bool
> > +valid_mode_for_ior_simplification_p (machine_mode mode) {
> > +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> > +
> >  /* Test whether expression, X, is an immediate constant that represents
> >     the most significant bit of machine mode MODE.  */
> >
> > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >           && ! side_effects_p (op0)
> > -         && SCALAR_INT_MODE_P (mode))
> > -       return constm1_rtx;
> > +         && valid_mode_for_ior_simplification_p (mode))
>
> for simple predicates like this please do not split them out, it makes 
> understanding the code more difficult.
>
> > +       return CONST1_RTX (mode);
>
> shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for 
> VECTOR_BOOL and not also for VECTOR_INT?  You're citing AND and that 
> does
>
>       /* A & (~A) -> 0 */
>       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
>           && GET_MODE_CLASS (mode) != MODE_CC)
>         return CONST0_RTX (mode);
>
> so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?
>
> Richard.
>
> >
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not 
> > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */
> >  /* { dg-final { scan-assembler-times 
> > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git 
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } 
> > +*/
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-18  9:08 ` [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization pan2.li
@ 2023-04-19  6:40   ` Richard Biener
  2023-04-19  6:46     ` Li, Pan2
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-04-19  6:40 UTC (permalink / raw)
  To: pan2.li
  Cc: gcc-patches, juzhe.zhong, kito.cheng, richard.sandiford, yanzhang.wang

On Tue, 18 Apr 2023, pan2.li@intel.com wrote:

> From: Pan Li <pan2.li@intel.com>
> 
> This patch add the optimization for the vector IOR(V1, NOT V1). Assume
> we have below sample code.
> 
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
> {
>   return __riscv_vmorn_mm_b32(v1, v1, vl);
> }
> 
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
> 
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> 
> The similar optimization like VMANDN has enabled already. There should
> be no difference execpt the operator when compare the VMORN and VMANDN
> for such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1)
> simplification besides the existing SCALAR_INT mode.
> 
> gcc/ChangeLog:
> 
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

This needs some text

> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:

Likewise.

OK with that fixed.

> 	* gcc.target/riscv/simplify_ior_optimization.c: New test.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/simplify-rtx.cc                           |  4 +-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  3 files changed, 53 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> 
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index ee75079917f..3bc9b2f55ea 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>  	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>  	  && ! side_effects_p (op0)
> -	  && SCALAR_INT_MODE_P (mode))
> -	return constm1_rtx;
> +	  && GET_MODE_CLASS (mode) != MODE_CC)
> +	return CONSTM1_RTX (mode);
>  
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> -/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a)
> +{
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-19  6:40   ` Richard Biener
@ 2023-04-19  6:46     ` Li, Pan2
  2023-04-19  8:47       ` Li, Pan2
  0 siblings, 1 reply; 16+ messages in thread
From: Li, Pan2 @ 2023-04-19  6:46 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, richard.sandiford, Wang, Yanzhang

Oh, I see. The message need to be re-generated. Thank you for pointing out, will update ASPA.

Pan

-----Original Message-----
From: Richard Biener <rguenther@suse.de> 
Sent: Wednesday, April 19, 2023 2:40 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization

On Tue, 18 Apr 2023, pan2.li@intel.com wrote:

> From: Pan Li <pan2.li@intel.com>
> 
> This patch add the optimization for the vector IOR(V1, NOT V1). Assume 
> we have below sample code.
> 
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t 
> vl) {
>   return __riscv_vmorn_mm_b32(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
> 
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ 
> v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> 
> The similar optimization like VMANDN has enabled already. There should 
> be no difference execpt the operator when compare the VMORN and VMANDN 
> for such kind of optimization. The patch allows the VECTOR_BOOL 
> IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> 
> gcc/ChangeLog:
> 
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

This needs some text

> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:

Likewise.

OK with that fixed.

> 	* gcc.target/riscv/simplify_ior_optimization.c: New test.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/simplify-rtx.cc                           |  4 +-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  3 files changed, 53 insertions(+), 4 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> 
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> ee75079917f..3bc9b2f55ea 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>  	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>  	  && ! side_effects_p (op0)
> -	  && SCALAR_INT_MODE_P (mode))
> -	return constm1_rtx;
> +	  && GET_MODE_CLASS (mode) != MODE_CC)
> +	return CONSTM1_RTX (mode);
>  
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t 
> test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
> -/* { dg-final { scan-assembler-times 
> {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 
> 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 
> 14 } } */ diff --git 
> a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c 
> b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> 

--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-19  6:46     ` Li, Pan2
@ 2023-04-19  8:47       ` Li, Pan2
  2023-04-19  8:51         ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Li, Pan2 @ 2023-04-19  8:47 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, richard.sandiford, Wang, Yanzhang

Hi Richard,

Do you have any idea about this? I leverage git gcc-commit-mklog, it will generate something as below. It looks no text after colon. I am not sure if I need to add something by myself.

gcc/ChangeLog:

........* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):     <=== no text here.

gcc/testsuite/ChangeLog:

........* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:                                <=== no text here.
........* gcc.target/riscv/simplify_ior_optimization.c: New test.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# Your branch is up to date with 'origin/master'.
#
# Changes to be committed:
#.......modified:   gcc/simplify-rtx.cc
#.......modified:   gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
#.......new file:   gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c

Pan

-----Original Message-----
From: Li, Pan2 
Sent: Wednesday, April 19, 2023 2:47 PM
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization

Oh, I see. The message need to be re-generated. Thank you for pointing out, will update ASPA.

Pan

-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Wednesday, April 19, 2023 2:40 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: Re: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization

On Tue, 18 Apr 2023, pan2.li@intel.com wrote:

> From: Pan Li <pan2.li@intel.com>
> 
> This patch add the optimization for the vector IOR(V1, NOT V1). Assume 
> we have below sample code.
> 
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t
> vl) {
>   return __riscv_vmorn_mm_b32(v1, v1, vl); }
> 
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
> 
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
> 
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [
> v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> 
> The similar optimization like VMANDN has enabled already. There should 
> be no difference execpt the operator when compare the VMORN and VMANDN 
> for such kind of optimization. The patch allows the VECTOR_BOOL 
> IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> 
> gcc/ChangeLog:
> 
> 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):

This needs some text

> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:

Likewise.

OK with that fixed.

> 	* gcc.target/riscv/simplify_ior_optimization.c: New test.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/simplify-rtx.cc                           |  4 +-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  3 files changed, 53 insertions(+), 4 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> 
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> ee75079917f..3bc9b2f55ea 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>  	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>  	  && ! side_effects_p (op0)
> -	  && SCALAR_INT_MODE_P (mode))
> -	return constm1_rtx;
> +	  && GET_MODE_CLASS (mode) != MODE_CC)
> +	return CONSTM1_RTX (mode);
>  
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t
> test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> } */
> -/* { dg-final { scan-assembler-times 
> {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+}
> 14 } } */ diff --git
> a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> 

--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-19  8:47       ` Li, Pan2
@ 2023-04-19  8:51         ` Richard Biener
  2023-04-19  9:20           ` Li, Pan2
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-04-19  8:51 UTC (permalink / raw)
  To: Li, Pan2
  Cc: gcc-patches, juzhe.zhong, kito.cheng, richard.sandiford, Wang, Yanzhang

On Wed, 19 Apr 2023, Li, Pan2 wrote:

> Hi Richard,
> 
> Do you have any idea about this? I leverage git gcc-commit-mklog, it 
> will generate something as below. It looks no text after colon. I am not 
> sure if I need to add something by myself.

Well, you need to add a description of your change!

> gcc/ChangeLog:
> 
> ........* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):     <=== no text here.
> 
> gcc/testsuite/ChangeLog:
> 
> ........* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:                                <=== no text here.
> ........* gcc.target/riscv/simplify_ior_optimization.c: New test.
> 
> # Please enter the commit message for your changes. Lines starting
> # with '#' will be ignored, and an empty message aborts the commit.
> #
> # On branch master
> # Your branch is up to date with 'origin/master'.
> #
> # Changes to be committed:
> #.......modified:   gcc/simplify-rtx.cc
> #.......modified:   gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> #.......new file:   gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> 
> Pan
> 
> -----Original Message-----
> From: Li, Pan2 
> Sent: Wednesday, April 19, 2023 2:47 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
> 
> Oh, I see. The message need to be re-generated. Thank you for pointing out, will update ASPA.
> 
> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, April 19, 2023 2:40 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: Re: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
> 
> On Tue, 18 Apr 2023, pan2.li@intel.com wrote:
> 
> > From: Pan Li <pan2.li@intel.com>
> > 
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume 
> > we have below sample code.
> > 
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t
> > vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }
> > 
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> > 
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> > 
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [
> > v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> > 
> > The similar optimization like VMANDN has enabled already. There should 
> > be no difference execpt the operator when compare the VMORN and VMANDN 
> > for such kind of optimization. The patch allows the VECTOR_BOOL 
> > IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> > 
> > gcc/ChangeLog:
> > 
> > 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
> 
> This needs some text
> 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> 
> Likewise.
> 
> OK with that fixed.
> 
> > 	* gcc.target/riscv/simplify_ior_optimization.c: New test.
> > 
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/simplify-rtx.cc                           |  4 +-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  3 files changed, 53 insertions(+), 4 deletions(-)  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > 
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..3bc9b2f55ea 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >  	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >  	  && ! side_effects_p (op0)
> > -	  && SCALAR_INT_MODE_P (mode))
> > -	return constm1_rtx;
> > +	  && GET_MODE_CLASS (mode) != MODE_CC)
> > +	return CONSTM1_RTX (mode);
> >  
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> > } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> > } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } 
> > } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */
> >  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */ diff --git
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > 
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND.
  2023-04-17 14:50 [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion pan2.li
  2023-04-18  1:30 ` Li, Pan2
  2023-04-18  9:08 ` [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization pan2.li
@ 2023-04-19  9:18 ` pan2.li
  2023-04-19 15:19   ` Kito Cheng
  2 siblings, 1 reply; 16+ messages in thread
From: pan2.li @ 2023-04-19  9:18 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, kito.cheng, rguenther, pan2.li, richard.sandiford,
	yanzhang.wang

From: Pan Li <pan2.li@intel.com>

This patch aligned the MODE_CLASS condition of the IOR to the AND. Then
more MODE_CLASS besides SCALAR_INT can able to perform the optimization
A | (~A) -> -1 similar to AND operator. For example as below sample code.

vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
{
  return __riscv_vmorn_mm_b32(v1, v1, vl);
}

Before this patch:
vsetvli  a5,zero,e8,mf4,ta,ma
vlm.v    v24,0(a1)
vsetvli  zero,a2,e8,mf4,ta,ma
vmorn.mm v24,v24,v24
vsetvli  a5,zero,e8,mf4,ta,ma
vsm.v    v24,0(a0)
ret

After this patch:
vsetvli zero,a2,e8,mf4,ta,ma
vmset.m v24
vsetvli a5,zero,e8,mf4,ta,ma
vsm.v   v24,0(a0)
ret

Or in RTL's perspective,
from:
(ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
to:
(const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])

The similar optimization like VMANDN has enabled already. There should
be no difference execpt the operator when compare the VMORN and VMANDN
for such kind of optimization. The patch aligns the IOR MODE_CLASS condition
of the simplification to the AND operator.

gcc/ChangeLog:

	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
	  Align IOR (A | (~A) -> -1) optimization MODE_CLASS condition to AND.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Update check
	  condition.
	* gcc.target/riscv/simplify_ior_optimization.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/simplify-rtx.cc                           |  4 +-
 .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
 .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
 3 files changed, 53 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index c57ff3320ee..d4aeebc7a5f 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -3370,8 +3370,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
 	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
 	  && ! side_effects_p (op0)
-	  && SCALAR_INT_MODE_P (mode))
-	return constm1_rtx;
+	  && GET_MODE_CLASS (mode) != MODE_CC)
+	return CONSTM1_RTX (mode);
 
       /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
       if (CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
index 83cc4a1b5a5..57d0241675a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
 /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
-/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
 /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
 /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
-/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
 /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
new file mode 100644
index 00000000000..ec3bd0baf03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
+
+#include <stdint.h>
+
+uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
+{
+  return a | ~a;
+}
+
+uint16_t test_simplify_ior_scalar_case_1 (uint16_t a)
+{
+  return a | ~a;
+}
+
+uint32_t test_simplify_ior_scalar_case_2 (uint32_t a)
+{
+  return a | ~a;
+}
+
+uint64_t test_simplify_ior_scalar_case_3 (uint64_t a)
+{
+  return a | ~a;
+}
+
+int8_t test_simplify_ior_scalar_case_4 (int8_t a)
+{
+  return a | ~a;
+}
+
+int16_t test_simplify_ior_scalar_case_5 (int16_t a)
+{
+  return a | ~a;
+}
+
+int32_t test_simplify_ior_scalar_case_6 (int32_t a)
+{
+  return a | ~a;
+}
+
+int64_t test_simplify_ior_scalar_case_7 (int64_t a)
+{
+  return a | ~a;
+}
+
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
+/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
+/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
+/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization
  2023-04-19  8:51         ` Richard Biener
@ 2023-04-19  9:20           ` Li, Pan2
  0 siblings, 0 replies; 16+ messages in thread
From: Li, Pan2 @ 2023-04-19  9:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, kito.cheng, richard.sandiford, Wang, Yanzhang

Thank you for information. Updated the v3 version as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616154.html

Pan

-----Original Message-----
From: Richard Biener <rguenther@suse.de> 
Sent: Wednesday, April 19, 2023 4:52 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization

On Wed, 19 Apr 2023, Li, Pan2 wrote:

> Hi Richard,
> 
> Do you have any idea about this? I leverage git gcc-commit-mklog, it 
> will generate something as below. It looks no text after colon. I am 
> not sure if I need to add something by myself.

Well, you need to add a description of your change!

> gcc/ChangeLog:
> 
> ........* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):     <=== no text here.
> 
> gcc/testsuite/ChangeLog:
> 
> ........* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:                                <=== no text here.
> ........* gcc.target/riscv/simplify_ior_optimization.c: New test.
> 
> # Please enter the commit message for your changes. Lines starting # 
> with '#' will be ignored, and an empty message aborts the commit.
> #
> # On branch master
> # Your branch is up to date with 'origin/master'.
> #
> # Changes to be committed:
> #.......modified:   gcc/simplify-rtx.cc
> #.......modified:   gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> #.......new file:   gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> 
> Pan
> 
> -----Original Message-----
> From: Li, Pan2
> Sent: Wednesday, April 19, 2023 2:47 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; 
> kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang 
> <yanzhang.wang@intel.com>
> Subject: RE: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) 
> optimization
> 
> Oh, I see. The message need to be re-generated. Thank you for pointing out, will update ASPA.
> 
> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, April 19, 2023 2:40 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; 
> kito.cheng@sifive.com; richard.sandiford@arm.com; Wang, Yanzhang 
> <yanzhang.wang@intel.com>
> Subject: Re: [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) 
> optimization
> 
> On Tue, 18 Apr 2023, pan2.li@intel.com wrote:
> 
> > From: Pan Li <pan2.li@intel.com>
> > 
> > This patch add the optimization for the vector IOR(V1, NOT V1). 
> > Assume we have below sample code.
> > 
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t
> > vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }
> > 
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> > 
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> > 
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 
> > [
> > v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> > 
> > The similar optimization like VMANDN has enabled already. There 
> > should be no difference execpt the operator when compare the VMORN 
> > and VMANDN for such kind of optimization. The patch allows the 
> > VECTOR_BOOL IOR(V1, NOT V1) simplification besides the existing SCALAR_INT mode.
> > 
> > gcc/ChangeLog:
> > 
> > 	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
> 
> This needs some text
> 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> 
> Likewise.
> 
> OK with that fixed.
> 
> > 	* gcc.target/riscv/simplify_ior_optimization.c: New test.
> > 
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/simplify-rtx.cc                           |  4 +-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  3 files changed, 53 insertions(+), 4 deletions(-)  create mode 
> > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > 
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..3bc9b2f55ea 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -3332,8 +3332,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >  	   || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >  	  && ! side_effects_p (op0)
> > -	  && SCALAR_INT_MODE_P (mode))
> > -	return constm1_rtx;
> > +	  && GET_MODE_CLASS (mode) != MODE_CC)
> > +	return CONSTM1_RTX (mode);
> >  
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
> >        if (CONST_INT_P (op1)
> > diff --git
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not 
> > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */
> >  /* { dg-final { scan-assembler-times 
> > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */ diff --git
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } 
> > +*/
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > 
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, 
> Boudien Moerman; HRB 36809 (AG Nuernberg)
> 

--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND.
  2023-04-19  9:18 ` [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND pan2.li
@ 2023-04-19 15:19   ` Kito Cheng
  0 siblings, 0 replies; 16+ messages in thread
From: Kito Cheng @ 2023-04-19 15:19 UTC (permalink / raw)
  To: pan2.li
  Cc: gcc-patches, juzhe.zhong, kito.cheng, rguenther,
	richard.sandiford, yanzhang.wang

Committed, thanks :)

On Wed, Apr 19, 2023 at 5:19 PM Pan Li via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> This patch aligned the MODE_CLASS condition of the IOR to the AND. Then
> more MODE_CLASS besides SCALAR_INT can able to perform the optimization
> A | (~A) -> -1 similar to AND operator. For example as below sample code.
>
> vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl)
> {
>   return __riscv_vmorn_mm_b32(v1, v1, vl);
> }
>
> Before this patch:
> vsetvli  a5,zero,e8,mf4,ta,ma
> vlm.v    v24,0(a1)
> vsetvli  zero,a2,e8,mf4,ta,ma
> vmorn.mm v24,v24,v24
> vsetvli  a5,zero,e8,mf4,ta,ma
> vsm.v    v24,0(a0)
> ret
>
> After this patch:
> vsetvli zero,a2,e8,mf4,ta,ma
> vmset.m v24
> vsetvli a5,zero,e8,mf4,ta,ma
> vsm.v   v24,0(a0)
> ret
>
> Or in RTL's perspective,
> from:
> (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 [ v1 ])))
> to:
> (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
>
> The similar optimization like VMANDN has enabled already. There should
> be no difference execpt the operator when compare the VMORN and VMANDN
> for such kind of optimization. The patch aligns the IOR MODE_CLASS condition
> of the simplification to the AND operator.
>
> gcc/ChangeLog:
>
>         * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
>           Align IOR (A | (~A) -> -1) optimization MODE_CLASS condition to AND.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c: Update check
>           condition.
>         * gcc.target/riscv/simplify_ior_optimization.c: New test.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/simplify-rtx.cc                           |  4 +-
>  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
>  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
>  3 files changed, 53 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index c57ff3320ee..d4aeebc7a5f 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3370,8 +3370,8 @@ simplify_context::simplify_binary_operation_1 (rtx_code code,
>        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
> -         && SCALAR_INT_MODE_P (mode))
> -       return constm1_rtx;
> +         && GET_MODE_CLASS (mode) != MODE_CC)
> +       return CONSTM1_RTX (mode);
>
>        /* (ior A C) is C if all bits of A that might be nonzero are on in C.  */
>        if (CONST_INT_P (op1)
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> index 83cc4a1b5a5..57d0241675a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> @@ -233,9 +233,8 @@ vbool64_t test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
>  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> -/* { dg-final { scan-assembler-times {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
>  /* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
>  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
>  /* { dg-final { scan-assembler-times {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> new file mode 100644
> index 00000000000..ec3bd0baf03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> @@ -0,0 +1,50 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> +
> +#include <stdint.h>
> +
> +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a)
> +{
> +  return a | ~a;
> +}
> +
> +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int8_t test_simplify_ior_scalar_case_4 (int8_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int16_t test_simplify_ior_scalar_case_5 (int16_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int32_t test_simplify_ior_scalar_case_6 (int32_t a)
> +{
> +  return a | ~a;
> +}
> +
> +int64_t test_simplify_ior_scalar_case_7 (int64_t a)
> +{
> +  return a | ~a;
> +}
> +
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } */
> +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-04-19 15:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-17 14:50 [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion pan2.li
2023-04-18  1:30 ` Li, Pan2
2023-04-18  7:59   ` Richard Biener
2023-04-18  8:00     ` Richard Biener
2023-04-18  8:20       ` Li, Pan2
2023-04-18  9:11         ` Li, Pan2
2023-04-19  6:03           ` Li, Pan2
2023-04-18  8:08     ` Li, Pan2
2023-04-18  9:08 ` [PATCH v2] RISC-V: Allow Vector IOR(V1, NOT V1) optimization pan2.li
2023-04-19  6:40   ` Richard Biener
2023-04-19  6:46     ` Li, Pan2
2023-04-19  8:47       ` Li, Pan2
2023-04-19  8:51         ` Richard Biener
2023-04-19  9:20           ` Li, Pan2
2023-04-19  9:18 ` [PATCH v3] RISC-V: Align IOR optimization MODE_CLASS condition to AND pan2.li
2023-04-19 15:19   ` Kito Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).