[PATCH 1/7] arm: Auto-vectorization for MVE: vand

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/7] arm: Auto-vectorization for MVE: vand
@ 2020-11-25 13:54 Christophe Lyon
  2020-11-25 13:54 ` [PATCH 2/7] arm: Auto-vectorization for MVE: vorr Christophe Lyon
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE vandq instructions for auto-vectorization.  MVE
vandq insns in mve.md are modified to use 'and' instead of unspec
expression to support and<mode>3.  The and<mode>3 expander is added to
vec-common.md

2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
	(VANQ): Remove.
	* config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
	instruction using expression and.
	(mve_vandq_s<mode>): New expander.
	* config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
	* config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
	* config/arm/vec-common.md (and<mode>3): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vand.c: New test.
---
 gcc/config/arm/iterators.md                  |  4 +---
 gcc/config/arm/mve.md                        | 20 ++++++++++++----
 gcc/config/arm/neon.md                       |  2 +-
 gcc/config/arm/unspecs.md                    |  2 --
 gcc/config/arm/vec-common.md                 | 15 ++++++++++++
 gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
 6 files changed, 66 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 592af35..72039e4 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
 		       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
 		       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
-		       (VADDVQ_P_S "s")	(VADDVQ_P_U "u") (VANDQ_S "s")
-		       (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
+		       (VADDVQ_P_S "s")	(VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
 		       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
 		       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
 		       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
@@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
 (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
 (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
 (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
-(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
 (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
 (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
 (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ecbaaa9..975eb7d 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
 ;;
 ;; [vandq_u, vandq_s])
 ;;
-(define_insn "mve_vandq_<supf><mode>"
+;; signed and unsigned versions are the same: define the unsigned
+;; insn, and use an expander for the signed one as we still reference
+;; both names from arm_mve.h.
+(define_insn "mve_vandq_u<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VANDQ))
+	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vand %q0, %q1, %q2"
+  "vand\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
+(define_expand "mve_vandq_s<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
+		       (match_operand:MVE_2 2 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
 
 ;;
 ;; [vbicq_s, vbicq_u])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2d76769..dc4707d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -712,7 +712,7 @@ (define_insn "ior<mode>3"
 ;; corresponds to the canonical form the middle-end expects to use for
 ;; immediate bitwise-ANDs.
 
-(define_insn "and<mode>3"
+(define_insn "and<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
 	(and:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
 		 (match_operand:VDQ 2 "neon_inv_logic_op2" "w,DL")))]
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index a3844e9..e8bf68e 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -601,7 +601,6 @@ (define_c_enum "unspec" [
   VADDQ_N_S
   VADDVAQ_S
   VADDVQ_P_S
-  VANDQ_S
   VBICQ_S
   VBRSRQ_N_S
   VCADDQ_ROT270_S
@@ -648,7 +647,6 @@ (define_c_enum "unspec" [
   VADDQ_N_U
   VADDVAQ_U
   VADDVQ_P_U
-  VANDQ_U
   VBICQ_U
   VBRSRQ_N_U
   VCADDQ_ROT270_U
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 250e503..3dd694c 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -172,3 +172,18 @@ (define_expand "vec_set<mode>"
 					       GEN_INT (elem), operands[0]));
   DONE;
 })
+
+(define_expand "and<mode>3"
+  [(set (match_operand:VNIM1 0 "s_register_operand" "")
+	(and:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
+		   (match_operand:VNIM1 2 "neon_inv_logic_op2" "")))]
+  "TARGET_NEON
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
+)
+
+(define_expand "and<mode>3"
+  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
+	(and:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
+		      (match_operand:VNINOTM1 2 "neon_inv_logic_op2" "")))]
+  "TARGET_NEON"
+)
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vand.c b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
new file mode 100644
index 0000000..2e30cd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
@@ -0,0 +1,34 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP b[i];						\
+    }									\
+}
+
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, &, vand)
+FUNC(u, uint, 32, 2, &, vand)
+FUNC(s, int, 16, 4, &, vand)
+FUNC(u, uint, 16, 4, &, vand)
+FUNC(s, int, 8, 8, &, vand)
+FUNC(u, uint, 8, 8, &, vand)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, &, vand)
+FUNC(u, uint, 32, 4, &, vand)
+FUNC(s, int, 16, 8, &, vand)
+FUNC(u, uint, 16, 8, &, vand)
+FUNC(s, int, 8, 16, &, vand)
+FUNC(u, uint, 8, 16, &, vand)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* { dg-final { scan-assembler-times {vand\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/7] arm: Auto-vectorization for MVE: vorr
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
@ 2020-11-25 13:54 ` Christophe Lyon
  2020-11-25 13:54 ` [PATCH 3/7] arm: Auto-vectorization for MVE: veor Christophe Lyon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE vorrq instructions for auto-vectorization.  MVE
vorrq insns in mve.md are modified to use ior instead of unspec
expression to support ior<mode>3.  The ior<mode>3 expander is added to
vec-common.md

2020-11-13  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/iterators.md (supf): Remove VORRQ_S and VORRQ_U.
	(VORRQ): Remove.
	* config/arm/mve.md (mve_vorrq_s<mode>): New entry for vorr
	instruction using expression ior.
	(mve_vorrq_u<mode>): New expander.
	* config/arm/neon.md (ior<mode>3): Renamed into ior<mode>3_neon.
	* config/arm/unspecs.md (VORRQ_S, VORRQ_U): Remove.
	* config/arm/vec-common.md (ior<mode>3): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vorr.c: Add vorr tests.
---
 gcc/config/arm/iterators.md                  |  5 ++--
 gcc/config/arm/mve.md                        | 17 ++++++++++----
 gcc/config/arm/neon.md                       |  2 +-
 gcc/config/arm/unspecs.md                    |  2 --
 gcc/config/arm/vec-common.md                 | 15 ++++++++++++
 gcc/testsuite/gcc.target/arm/simd/mve-vorr.c | 34 ++++++++++++++++++++++++++++
 6 files changed, 64 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vorr.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 72039e4..5fcb7af 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1247,8 +1247,8 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VMULLBQ_INT_S "s") (VMULLBQ_INT_U "u") (VQADDQ_S "s")
 		       (VMULLTQ_INT_S "s") (VMULLTQ_INT_U "u") (VQADDQ_U "u")
 		       (VMULQ_N_S "s") (VMULQ_N_U "u") (VMULQ_S "s")
-		       (VMULQ_U "u") (VORNQ_S "s") (VORNQ_U "u") (VORRQ_S "s")
-		       (VORRQ_U "u") (VQADDQ_N_S "s") (VQADDQ_N_U "u")
+		       (VMULQ_U "u") (VORNQ_S "s") (VORNQ_U "u")
+		       (VQADDQ_N_S "s") (VQADDQ_N_U "u")
 		       (VQRSHLQ_N_S "s") (VQRSHLQ_N_U "u") (VQRSHLQ_S "s")
 		       (VQRSHLQ_U "u") (VQSHLQ_N_S "s")	(VQSHLQ_N_U "u")
 		       (VQSHLQ_R_S "s") (VQSHLQ_R_U "u") (VQSHLQ_S "s")
@@ -1523,7 +1523,6 @@ (define_int_iterator VMULLTQ_INT [VMULLTQ_INT_U VMULLTQ_INT_S])
 (define_int_iterator VMULQ [VMULQ_U VMULQ_S])
 (define_int_iterator VMULQ_N [VMULQ_N_U VMULQ_N_S])
 (define_int_iterator VORNQ [VORNQ_U VORNQ_S])
-(define_int_iterator VORRQ [VORRQ_S VORRQ_U])
 (define_int_iterator VQADDQ [VQADDQ_U VQADDQ_S])
 (define_int_iterator VQADDQ_N [VQADDQ_N_S VQADDQ_N_U])
 (define_int_iterator VQRSHLQ [VQRSHLQ_S VQRSHLQ_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 975eb7d..0f04044 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1610,17 +1610,24 @@ (define_insn "mve_vornq_<supf><mode>"
 ;;
 ;; [vorrq_s, vorrq_u])
 ;;
-(define_insn "mve_vorrq_<supf><mode>"
+(define_insn "mve_vorrq_s<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VORRQ))
+	(ior:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vorr %q0, %q1, %q2"
+  "vorr\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
+(define_expand "mve_vorrq_u<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+	(ior:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
+		       (match_operand:MVE_2 2 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
 
 ;;
 ;; [vqaddq_n_s, vqaddq_n_u])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index dc4707d..669c34d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -690,7 +690,7 @@ (define_insn "neon_vcvt<NEON_VCVT:nvrint_variant><su_optab><VCVTF:mode><v_cmp_re
    (set_attr "predicable" "no")]
 )
 
-(define_insn "ior<mode>3"
+(define_insn "ior<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
 	(ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
 		 (match_operand:VDQ 2 "neon_logic_op2" "w,Dl")))]
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index e8bf68e..f111ad8 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -624,7 +624,6 @@ (define_c_enum "unspec" [
   VMULQ_S
   VMULQ_N_S
   VORNQ_S
-  VORRQ_S
   VQADDQ_S
   VQADDQ_N_S
   VQRSHLQ_S
@@ -670,7 +669,6 @@ (define_c_enum "unspec" [
   VMULQ_U
   VMULQ_N_U
   VORNQ_U
-  VORRQ_U
   VQADDQ_U
   VQADDQ_N_U
   VQRSHLQ_U
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 3dd694c..413fb07 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -187,3 +187,18 @@ (define_expand "and<mode>3"
 		      (match_operand:VNINOTM1 2 "neon_inv_logic_op2" "")))]
   "TARGET_NEON"
 )
+
+(define_expand "ior<mode>3"
+  [(set (match_operand:VNIM1 0 "s_register_operand" "")
+	(ior:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
+		   (match_operand:VNIM1 2 "neon_logic_op2" "")))]
+  "TARGET_NEON
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
+)
+
+(define_expand "ior<mode>3"
+  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
+	(ior:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
+		      (match_operand:VNINOTM1 2 "neon_logic_op2" "")))]
+  "TARGET_NEON"
+)
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vorr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vorr.c
new file mode 100644
index 0000000..c7b59bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vorr.c
@@ -0,0 +1,34 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP b[i];						\
+    }									\
+}
+
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, |, vorr)
+FUNC(u, uint, 32, 2, |, vorr)
+FUNC(s, int, 16, 4, |, vorr)
+FUNC(u, uint, 16, 4, |, vorr)
+FUNC(s, int, 8, 8, |, vorr)
+FUNC(u, uint, 8, 8, |, vorr)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, |, vorr)
+FUNC(u, uint, 32, 4, |, vorr)
+FUNC(s, int, 16, 8, |, vorr)
+FUNC(u, uint, 16, 8, |, vorr)
+FUNC(s, int, 8, 16, |, vorr)
+FUNC(u, uint, 8, 16, |, vorr)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* { dg-final { scan-assembler-times {vorr\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/7] arm: Auto-vectorization for MVE: veor
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
  2020-11-25 13:54 ` [PATCH 2/7] arm: Auto-vectorization for MVE: vorr Christophe Lyon
@ 2020-11-25 13:54 ` Christophe Lyon
  2020-11-26 10:46   ` Andre Vieira (lists)
  2020-11-25 13:54 ` [PATCH 4/7] arm: Auto-vectorization for MVE: vshl Christophe Lyon
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE veorq instructions for auto-vectorization.  MVE
veorq insns in mve.md are modified to use xor instead of unspec
expression to support xor<mode>3.  The xor<mode>3 expander is added to
vec-common.md

2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/iterators.md (supf): Remove VEORQ_S and VEORQ_U.
	(VEORQ): Remove.
	* config/arm/mve.md (mve_veorq_u<mode>): New entry for veor
	instruction using expression xor.
	(mve_veorq_s<mode>): New expander.
	* config/arm/neon.md (xor<mode>3): Renamed into xor<mode>3_neon.
	* config/arm/unspscs.md (VEORQ_S, VEORQ_U): Remove.
	* config/arm/vec-common.md (xor<mode>3): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-veor.c: Add tests for veor.
---
 gcc/config/arm/iterators.md                  |  3 +--
 gcc/config/arm/mve.md                        | 17 ++++++++++----
 gcc/config/arm/neon.md                       |  2 +-
 gcc/config/arm/unspecs.md                    |  2 --
 gcc/config/arm/vec-common.md                 | 15 ++++++++++++
 gcc/testsuite/gcc.target/arm/simd/mve-veor.c | 34 ++++++++++++++++++++++++++++
 6 files changed, 63 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-veor.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 5fcb7af..0195275 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1237,7 +1237,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
 		       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
 		       (VCMPEQQ_N_S "s") (VCMPEQQ_N_U "u") (VCMPNEQ_N_S "s")
-		       (VCMPNEQ_N_U "u") (VEORQ_S "s") (VEORQ_U "u")
+		       (VCMPNEQ_N_U "u")
 		       (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s")
 		       (VHADDQ_U "u") (VHSUBQ_N_S "s")	(VHSUBQ_N_U "u")
 		       (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u")
@@ -1507,7 +1507,6 @@ (define_int_iterator VCADDQ_ROT90 [VCADDQ_ROT90_U VCADDQ_ROT90_S])
 (define_int_iterator VCMPEQQ [VCMPEQQ_U VCMPEQQ_S])
 (define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S VCMPEQQ_N_U])
 (define_int_iterator VCMPNEQ_N [VCMPNEQ_N_U VCMPNEQ_N_S])
-(define_int_iterator VEORQ [VEORQ_U VEORQ_S])
 (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U])
 (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S])
 (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 0f04044..a5f5d75 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1204,17 +1204,24 @@ (define_insn "mve_vcmpneq_n_<supf><mode>"
 ;;
 ;; [veorq_u, veorq_s])
 ;;
-(define_insn "mve_veorq_<supf><mode>"
+(define_insn "mve_veorq_u<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")]
-	 VEORQ))
+	(xor:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		       (match_operand:MVE_2 2 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "veor %q0, %q1, %q2"
+  "veor\t%q0, %q1, %q2"
   [(set_attr "type" "mve_move")
 ])
+(define_expand "mve_veorq_s<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+	(xor:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
+		       (match_operand:MVE_2 2 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
 
 ;;
 ;; [vhaddq_n_u, vhaddq_n_s])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 669c34d..e1263b0 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -747,7 +747,7 @@ (define_insn "bic<mode>3_neon"
   [(set_attr "type" "neon_logic<q>")]
 )
 
-(define_insn "xor<mode>3"
+(define_insn "xor<mode>3_neon"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
 	(xor:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
 		 (match_operand:VDQ 2 "s_register_operand" "w")))]
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index f111ad8..78313ea 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -608,7 +608,6 @@ (define_c_enum "unspec" [
   VCMPEQQ_S
   VCMPEQQ_N_S
   VCMPNEQ_N_S
-  VEORQ_S
   VHADDQ_S
   VHADDQ_N_S
   VHSUBQ_S
@@ -653,7 +652,6 @@ (define_c_enum "unspec" [
   VCMPEQQ_U
   VCMPEQQ_N_U
   VCMPNEQ_N_U
-  VEORQ_U
   VHADDQ_U
   VHADDQ_N_U
   VHSUBQ_U
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 413fb07..687134a 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -202,3 +202,18 @@ (define_expand "ior<mode>3"
 		      (match_operand:VNINOTM1 2 "neon_logic_op2" "")))]
   "TARGET_NEON"
 )
+
+(define_expand "xor<mode>3"
+  [(set (match_operand:VNIM1 0 "s_register_operand" "")
+	(xor:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
+		   (match_operand:VNIM1 2 "s_register_operand" "")))]
+  "TARGET_NEON
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
+)
+
+(define_expand "xor<mode>3"
+  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
+	(xor:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
+		      (match_operand:VNINOTM1 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+)
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-veor.c b/gcc/testsuite/gcc.target/arm/simd/mve-veor.c
new file mode 100644
index 0000000..5c534cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-veor.c
@@ -0,0 +1,34 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP b[i];						\
+    }									\
+}
+
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, ^, veor)
+FUNC(u, uint, 32, 2, ^, veor)
+FUNC(s, int, 16, 4, ^, veor)
+FUNC(u, uint, 16, 4, ^, veor)
+FUNC(s, int, 8, 8, ^, veor)
+FUNC(u, uint, 8, 8, ^, veor)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, ^, veor)
+FUNC(u, uint, 32, 4, ^, veor)
+FUNC(s, int, 16, 8, ^, veor)
+FUNC(u, uint, 16, 8, ^, veor)
+FUNC(s, int, 8, 16, ^, veor)
+FUNC(u, uint, 8, 16, ^, veor)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* { dg-final { scan-assembler-times {veor\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 4/7] arm: Auto-vectorization for MVE: vshl
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
  2020-11-25 13:54 ` [PATCH 2/7] arm: Auto-vectorization for MVE: vorr Christophe Lyon
  2020-11-25 13:54 ` [PATCH 3/7] arm: Auto-vectorization for MVE: veor Christophe Lyon
@ 2020-11-25 13:54 ` Christophe Lyon
  2020-11-25 13:54 ` [PATCH 5/7] arm: Auto-vectorization for MVE: vshr Christophe Lyon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE vshlq instructions for auto-vectorization.  A
new MVE pattern is introduced that takes a vector of constants as
second operand, all constants being equal.

The existing mve_vshlq_n_<supf><mode> is kept, as it takes a single
immediate as second operand, and is used by arm_mve.h.

The vashl<mode>3 expander is added to vec-common.md.

2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/mve.md (mve_vshlq_n_<mode>_imm): New entry.
	* config/arm/neon.md (vashl<mode>3): Rename into vashl<mode>3_neon.
	* config/arm/vec-common.md (vasl<mode>3): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vshl.c: Add tests for vshl.
---
 gcc/config/arm/mve.md                        | 19 +++++++++++++++
 gcc/config/arm/neon.md                       |  2 +-
 gcc/config/arm/vec-common.md                 |  7 ++++++
 gcc/testsuite/gcc.target/arm/simd/mve-vshl.c | 35 ++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vshl.c

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a5f5d75..ce82258 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1924,6 +1924,7 @@ (define_insn "mve_vrshrq_n_<supf><mode>"
 ;;
 ;; [vshlq_n_u, vshlq_n_s])
 ;;
+;; Version that takes an immediate as operand 2.
 (define_insn "mve_vshlq_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
@@ -1936,6 +1937,24 @@ (define_insn "mve_vshlq_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
+;; Version with a vector of immediates as operand 2.
+;; We only emit signed ('s') versions, since it makes no difference.
+(define_insn "mve_vshlq_n_<mode>_imm"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(ashift:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+		      (match_operand:MVE_2 2 "imm_for_neon_lshift_operand" "i")))
+  ]
+  "TARGET_HAVE_MVE"
+{
+  return neon_output_shift_immediate ("vshl", 's', &operands[2],
+				     <MODE>mode,
+				     VALID_NEON_QREG_MODE (<MODE>mode),
+				     true);
+}
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vshlq_r_s, vshlq_r_u])
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index e1263b0..cb7646e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -870,7 +870,7 @@ (define_insn "*smax<mode>3_neon"
 ; generic vectorizer code.  It ends up creating a V2DI constructor with
 ; SImode elements.
 
-(define_insn "vashl<mode>3"
+(define_insn "vashl<mode>3_neon"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w,w")
 	(ashift:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w,w")
 		      (match_operand:VDQIW 2 "imm_lshift_or_reg_neon" "w,Dm")))]
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 687134a..4d04b0f 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -217,3 +217,10 @@ (define_expand "xor<mode>3"
 		      (match_operand:VNINOTM1 2 "s_register_operand" "")))]
   "TARGET_NEON"
 )
+
+(define_expand "vashl<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand" "")
+	(ashift:VDQIW (match_operand:VDQIW 1 "s_register_operand" "")
+		      (match_operand:VDQIW 2 "imm_lshift_or_reg_neon" "")))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+)
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshl.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshl.c
new file mode 100644
index 0000000..4ccc9a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshl.c
@@ -0,0 +1,35 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP 5;						\
+    }									\
+}
+
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, <<, vshl)
+FUNC(u, uint, 32, 2, <<, vshl)
+FUNC(s, int, 16, 4, <<, vshl)
+FUNC(u, uint, 16, 4, <<, vshl)
+FUNC(s, int, 8, 8, <<, vshl)
+FUNC(u, uint, 8, 8, <<, vshl)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, <<, vshl)
+FUNC(u, uint, 32, 4, <<, vshl)
+FUNC(s, int, 16, 8, <<, vshl)
+FUNC(u, uint, 16, 8, <<, vshl)
+FUNC(s, int, 8, 16, <<, vshl)
+FUNC(u, uint, 8, 16, <<, vshl)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* We only emit vshl.s, which is equivalent to vshl.u anyway.  */
+/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 6 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 5/7] arm: Auto-vectorization for MVE: vshr
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
                   ` (2 preceding siblings ...)
  2020-11-25 13:54 ` [PATCH 4/7] arm: Auto-vectorization for MVE: vshl Christophe Lyon
@ 2020-11-25 13:54 ` Christophe Lyon
  2020-11-25 13:54 ` [PATCH 6/7] arm: Auto-vectorization for MVE: vmvn Christophe Lyon
  2020-11-25 17:17 ` [PATCH 1/7] arm: Auto-vectorization for MVE: vand Andre Simoes Dias Vieira
  5 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE vshr instructions for auto-vectorization.  New
MVE patterns are introduced that take a vector of constants as second
operand, all constants being equal.

The existing mve_vshrq_n_<supf><mode> is kept, as it takes a single
immediate as second operand, and is used by arm_mve.h.

The vashr<mode>3 expander is moved fron neon.md to vec-common.md,
updated to rely on the normal expansion scheme to generate shifts by
immediate. This is needed because MVE has only a subset of the mode
available for Neon, and this would otherwise try to access
gen_mve_vshrq_n_s<mode>3_imm with unsupported <mode> values.

2020-11-20  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/mve.md (mve_vshrq_n_s<mode>_imm): New entry.
	(mve_vshrq_n_u<mode>_imm): Likewise.
	* config/arm/neon.md (vashr<mode>3, vlshr<mode>3): Move to ...
	* config/arm/vec-common.md: ... here.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vshr.c: Add tests for vshl.
---
 gcc/config/arm/mve.md                        | 34 +++++++++++++++++++++++
 gcc/config/arm/neon.md                       | 34 -----------------------
 gcc/config/arm/vec-common.md                 | 40 +++++++++++++++++++++++++++-
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c | 35 ++++++++++++++++++++++++
 4 files changed, 108 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index ce82258..55b2991 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -759,6 +759,7 @@ (define_insn "mve_vcreateq_<supf><mode>"
 ;;
 ;; [vshrq_n_s, vshrq_n_u])
 ;;
+;; Version that takes an immediate as operand 2.
 (define_insn "mve_vshrq_n_<supf><mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
@@ -771,6 +772,39 @@ (define_insn "mve_vshrq_n_<supf><mode>"
   [(set_attr "type" "mve_move")
 ])
 
+;; Versions that take constant vectors as operand 2 (with all elements
+;; equal).
+(define_insn "mve_vshrq_n_s<mode>_imm"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(ashiftrt:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+			(match_operand:MVE_2 2 "imm_for_neon_rshift_operand" "i")))
+  ]
+  "TARGET_HAVE_MVE"
+  {
+    return neon_output_shift_immediate ("vshr", 's', &operands[2],
+					<MODE>mode,
+					VALID_NEON_QREG_MODE (<MODE>mode),
+					true);
+  }
+  [(set_attr "type" "mve_move")
+])
+(define_insn "mve_vshrq_n_u<mode>_imm"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
+	(lshiftrt:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
+			(match_operand:MVE_2 2 "imm_for_neon_rshift_operand" "i")))
+  ]
+  "TARGET_HAVE_MVE"
+  {
+    return neon_output_shift_immediate ("vshr", 'u', &operands[2],
+					<MODE>mode,
+					VALID_NEON_QREG_MODE (<MODE>mode),
+					true);
+  }
+  [(set_attr "type" "mve_move")
+])
+
 ;;
 ;; [vcvtq_n_from_f_s, vcvtq_n_from_f_u])
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index cb7646e..5090673 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -943,40 +943,6 @@ (define_insn "ashl<mode>3_unsigned"
   [(set_attr "type" "neon_shift_reg<q>")]
 )
 
-(define_expand "vashr<mode>3"
-  [(set (match_operand:VDQIW 0 "s_register_operand")
-	(ashiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand")
-			(match_operand:VDQIW 2 "imm_rshift_or_reg_neon")))]
-  "TARGET_NEON"
-{
-  if (s_register_operand (operands[2], <MODE>mode))
-    {
-      rtx neg = gen_reg_rtx (<MODE>mode);
-      emit_insn (gen_neg<mode>2 (neg, operands[2]));
-      emit_insn (gen_ashl<mode>3_signed (operands[0], operands[1], neg));
-    }
-  else
-    emit_insn (gen_vashr<mode>3_imm (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
-(define_expand "vlshr<mode>3"
-  [(set (match_operand:VDQIW 0 "s_register_operand")
-	(lshiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand")
-			(match_operand:VDQIW 2 "imm_rshift_or_reg_neon")))]
-  "TARGET_NEON"
-{
-  if (s_register_operand (operands[2], <MODE>mode))
-    {
-      rtx neg = gen_reg_rtx (<MODE>mode);
-      emit_insn (gen_neg<mode>2 (neg, operands[2]));
-      emit_insn (gen_ashl<mode>3_unsigned (operands[0], operands[1], neg));
-    }
-  else
-    emit_insn (gen_vlshr<mode>3_imm (operands[0], operands[1], operands[2]));
-  DONE;
-})
-
 ;; 64-bit shifts
 
 ;; This pattern loads a 32-bit shift count into a 64-bit NEON register,
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 4d04b0f..915dcb0 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -223,4 +223,42 @@ (define_expand "vashl<mode>3"
 	(ashift:VDQIW (match_operand:VDQIW 1 "s_register_operand" "")
 		      (match_operand:VDQIW 2 "imm_lshift_or_reg_neon" "")))]
   "TARGET_NEON || TARGET_HAVE_MVE"
-)
\ No newline at end of file
+)
+
+;; When operand 2 is an immediate, use the normal expansion to match
+;; gen_vashr<mode>3_imm for Neon and gen_mve_vshrq_n_s<mode>_imm for
+;; MVE.
+(define_expand "vashr<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand")
+	(ashiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand")
+			(match_operand:VDQIW 2 "imm_rshift_or_reg_neon")))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  if (TARGET_NEON
+      && s_register_operand (operands[2], <MODE>mode))
+    {
+      rtx neg = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_neg<mode>2 (neg, operands[2]));
+      emit_insn (gen_ashl<mode>3_signed (operands[0], operands[1], neg));
+      DONE;
+    }
+})
+
+;; When operand 2 is an immediate, use the normal expansion to match
+;; gen_vashr<mode>3_imm for Neon and gen_mve_vshrq_n_u<mode>_imm for
+;; MVE.
+(define_expand "vlshr<mode>3"
+  [(set (match_operand:VDQIW 0 "s_register_operand")
+	(lshiftrt:VDQIW (match_operand:VDQIW 1 "s_register_operand")
+			(match_operand:VDQIW 2 "imm_rshift_or_reg_neon")))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+{
+  if (TARGET_NEON
+      && s_register_operand (operands[2], <MODE>mode))
+    {
+      rtx neg = gen_reg_rtx (<MODE>mode);
+      emit_insn (gen_neg<mode>2 (neg, operands[2]));
+      emit_insn (gen_ashl<mode>3_unsigned (operands[0], operands[1], neg));
+      DONE;
+    }
+})
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
new file mode 100644
index 0000000..1b53b6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -0,0 +1,35 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP 5;						\
+    }									\
+}
+
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, >>, vshr)
+FUNC(u, uint, 32, 2, >>, vshr)
+FUNC(s, int, 16, 4, >>, vshr)
+FUNC(u, uint, 16, 4, >>, vshr)
+FUNC(s, int, 8, 8, >>, vshr)
+FUNC(u, uint, 8, 8, >>, vshr)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, >>, vshr)
+FUNC(u, uint, 32, 4, >>, vshr)
+FUNC(s, int, 16, 8, >>, vshr)
+FUNC(u, uint, 16, 8, >>, vshr)
+FUNC(s, int, 8, 16, >>, vshr)
+FUNC(u, uint, 8, 16, >>, vshr)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* { dg-final { scan-assembler-times {vshr.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
+/* { dg-final { scan-assembler-times {vshr.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 6/7] arm: Auto-vectorization for MVE: vmvn
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
                   ` (3 preceding siblings ...)
  2020-11-25 13:54 ` [PATCH 5/7] arm: Auto-vectorization for MVE: vshr Christophe Lyon
@ 2020-11-25 13:54 ` Christophe Lyon
  2020-11-25 17:17 ` [PATCH 1/7] arm: Auto-vectorization for MVE: vand Andre Simoes Dias Vieira
  5 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-25 13:54 UTC (permalink / raw)
  To: gcc-patches

This patch enables MVE vmvnq instructions for auto-vectorization.  MVE
vmvnq insns in mve.md are modified to use 'not' instead of unspec
expression to support one_cmpl<mode>2.  The one_cmpl<mode>2 expander
is added to vec-common.md.

2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>

	gcc/
	* config/arm/iterators.md (VDQNOTM5): New mode iterator.
	(supf): Remove VMVNQ_S and VMVNQ_U.
	(VMVNQ): Remove.
	* config/arm/mve.md (mve_vmvnq_u<mode>): New entry for vmvn
	instruction using expression not.
	(mve_vmvnq_s<mode>): New expander.
	* config/arm/neon.md (one_cmpl<mode>2): Renamed into
	one_cmpl<mode>2_insn.
	* config/arm/unspscs.md (VMVNQ_S, VMVNQ_U): Remove.
	* config/arm/vec-common.md (one_cmpl<mode>2): New expander.

	gcc/testsuite/
	* gcc.target/arm/simd/mve-vmvn.c: Add tests for vmvn.
---
 gcc/config/arm/iterators.md                  |  6 ++++--
 gcc/config/arm/mve.md                        | 14 +++++++++----
 gcc/config/arm/neon.md                       |  4 ++--
 gcc/config/arm/unspecs.md                    |  2 --
 gcc/config/arm/vec-common.md                 | 12 +++++++++++
 gcc/testsuite/gcc.target/arm/simd/mve-vmvn.c | 31 ++++++++++++++++++++++++++++
 6 files changed, 59 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vmvn.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 0195275..75af7aa 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -72,6 +72,9 @@ (define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
 ;; Integer and float modes supported by Neon and IWMMXT but not MVE.
 (define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
 
+;; Integer modes supported by Neon but not part of MVE_5
+(define_mode_iterator VDQNOTM5 [V8QI V16QI V4HI V2SI V2DI])
+
 ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
 (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
@@ -1216,7 +1219,7 @@ (define_int_attr mmla_sfx [(UNSPEC_MATMUL_S "s8") (UNSPEC_MATMUL_U "u8")
 (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
 		       (VREV16Q_U "u") (VMVNQ_N_S "s") (VMVNQ_N_U "u")
 		       (VCVTAQ_U "u") (VCVTAQ_S "s") (VREV64Q_S "s")
-		       (VREV64Q_U "u") (VMVNQ_S "s") (VMVNQ_U "u")
+		       (VREV64Q_U "u")
 		       (VDUPQ_N_U "u") (VDUPQ_N_S"s") (VADDVQ_S "s")
 		       (VADDVQ_U "u") (VADDVQ_S "s") (VADDVQ_U "u")
 		       (VMOVLTQ_U "u") (VMOVLTQ_S "s") (VMOVLBQ_S "s")
@@ -1476,7 +1479,6 @@ (define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
 (define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 (define_int_iterator VREV16Q [VREV16Q_U VREV16Q_S])
 (define_int_iterator VCVTAQ [VCVTAQ_U VCVTAQ_S])
-(define_int_iterator VMVNQ [VMVNQ_U VMVNQ_S])
 (define_int_iterator VDUPQ_N [VDUPQ_N_U VDUPQ_N_S])
 (define_int_iterator VCLZQ [VCLZQ_U VCLZQ_S])
 (define_int_iterator VADDVQ [VADDVQ_U VADDVQ_S])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 55b2991..1d18de5 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -433,16 +433,22 @@ (define_insn "mve_vnegq_s<mode>"
 ;;
 ;; [vmvnq_u, vmvnq_s])
 ;;
-(define_insn "mve_vmvnq_<supf><mode>"
+(define_insn "mve_vmvnq_u<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
-	 VMVNQ))
+	(not:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
-  "vmvn %q0, %q1"
+  "vmvn\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
+(define_expand "mve_vmvnq_s<mode>"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+	(not:MVE_2 (match_operand:MVE_2 1 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
 
 ;;
 ;; [vdupq_n_u, vdupq_n_s])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 5090673..ae70fa8 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -756,7 +756,7 @@ (define_insn "xor<mode>3_neon"
   [(set_attr "type" "neon_logic<q>")]
 )
 
-(define_insn "one_cmpl<mode>2"
+(define_insn "one_cmpl<mode>2_insn"
   [(set (match_operand:VDQ 0 "s_register_operand" "=w")
         (not:VDQ (match_operand:VDQ 1 "s_register_operand" "w")))]
   "TARGET_NEON"
@@ -3206,7 +3206,7 @@ (define_expand "neon_vmvn<mode>"
    (match_operand:VDQIW 1 "s_register_operand")]
   "TARGET_NEON"
 {
-  emit_insn (gen_one_cmpl<mode>2 (operands[0], operands[1]));
+  emit_insn (gen_one_cmpl<mode>2_insn (operands[0], operands[1]));
   DONE;
 })
 
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 78313ea..a2866c1 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -550,8 +550,6 @@ (define_c_enum "unspec" [
   VREV64Q_U
   VQABSQ_S
   VNEGQ_S
-  VMVNQ_S
-  VMVNQ_U
   VDUPQ_N_U
   VDUPQ_N_S
   VCLZQ_U
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 915dcb0..5c6e847 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -262,3 +262,15 @@ (define_expand "vlshr<mode>3"
       DONE;
     }
 })
+
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:MVE_5 0 "s_register_operand")
+	(not:MVE_5 (match_operand:MVE_5 1 "s_register_operand")))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
+)
+
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:VDQNOTM5 0 "s_register_operand")
+	(not:VDQNOTM5 (match_operand:VDQNOTM5 1 "s_register_operand")))]
+  "TARGET_NEON"
+)
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vmvn.c b/gcc/testsuite/gcc.target/arm/simd/mve-vmvn.c
new file mode 100644
index 0000000..612c179
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vmvn.c
@@ -0,0 +1,31 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include <stdint.h>
+
+#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = OP a[i];						\
+    }									\
+}
+
+/* vmnvq supports only 16-bit and 32-bit elements.  */
+/* 64-bit vectors.  */
+FUNC(s, int, 32, 2, ~, vmvn)
+FUNC(u, uint, 32, 2, ~, vmvn)
+FUNC(s, int, 16, 4, ~, vmvn)
+FUNC(u, uint, 16, 4, ~, vmvn)
+
+/* 128-bit vectors.  */
+FUNC(s, int, 32, 4, ~, vmvn)
+FUNC(u, uint, 32, 4, ~, vmvn)
+FUNC(s, int, 16, 8, ~, vmvn)
+FUNC(u, uint, 16, 8, ~, vmvn)
+
+/* MVE has only 128-bit vectors, so we can vectorize only half of the
+   functions above.  */
+/* { dg-final { scan-assembler-times {vmvn\tq[0-9]+, q[0-9]+} 4 } } */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand
  2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
                   ` (4 preceding siblings ...)
  2020-11-25 13:54 ` [PATCH 6/7] arm: Auto-vectorization for MVE: vmvn Christophe Lyon
@ 2020-11-25 17:17 ` Andre Simoes Dias Vieira
  2020-11-26 15:31   ` Christophe Lyon
  5 siblings, 1 reply; 12+ messages in thread
From: Andre Simoes Dias Vieira @ 2020-11-25 17:17 UTC (permalink / raw)
  To: gcc-patches, Christophe Lyon

Hi Christophe,

Thanks for these! See some inline comments.

On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> This patch enables MVE vandq instructions for auto-vectorization.  MVE
> vandq insns in mve.md are modified to use 'and' instead of unspec
> expression to support and<mode>3.  The and<mode>3 expander is added to
> vec-common.md
>
> 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
>
> 	gcc/
> 	* gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
> 	(VANQ): Remove.
> 	* config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
> 	instruction using expression and.
> 	(mve_vandq_s<mode>): New expander.
> 	* config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
> 	* config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
> 	* config/arm/vec-common.md (and<mode>3): New expander.
>
> 	gcc/testsuite/
> 	* gcc.target/arm/simd/mve-vand.c: New test.
> ---
>   gcc/config/arm/iterators.md                  |  4 +---
>   gcc/config/arm/mve.md                        | 20 ++++++++++++----
>   gcc/config/arm/neon.md                       |  2 +-
>   gcc/config/arm/unspecs.md                    |  2 --
>   gcc/config/arm/vec-common.md                 | 15 ++++++++++++
>   gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
>   6 files changed, 66 insertions(+), 11 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
>
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 592af35..72039e4 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>   		       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
>   		       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
>   		       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
> -		       (VADDVQ_P_S "s")	(VADDVQ_P_U "u") (VANDQ_S "s")
> -		       (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
> +		       (VADDVQ_P_S "s")	(VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
>   		       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
>   		       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
>   		       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
>   (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
>   (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
>   (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
>   (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
>   (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
>   (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index ecbaaa9..975eb7d 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
>   ;;
>   ;; [vandq_u, vandq_s])
>   ;;
> -(define_insn "mve_vandq_<supf><mode>"
> +;; signed and unsigned versions are the same: define the unsigned
> +;; insn, and use an expander for the signed one as we still reference
> +;; both names from arm_mve.h.
> +(define_insn "mve_vandq_u<mode>"
>     [
>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VANDQ))
> +	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> +		       (match_operand:MVE_2 2 "s_register_operand" "w")))
The predicate on the second operand is more restrictive than the one in 
expand 'neon_inv_logic_op2'. This should still work with immediates, or 
well I checked for integers, it generates a loop as such:

         vldrw.32        q3, [r0]
         vldr.64 d4, .L8
         vldr.64 d5, .L8+8
         vand    q3, q3, q2
         vstrw.32        q3, [r2]

MVE does support vand's with immediates, just like NEON, I suspect you 
could just copy the way Neon handles these, possibly worth the little 
extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
If you don't it might still be worth expanding the test to make sure 
other immediates-types combinations don't trigger an ICE?

I'm not sure I understand why it loads it in two 64-bit chunks and not 
do a single load or not just do something like a vmov or vbic immediate. 
Anyhow that's a worry for another day I guess..
>     ]
>     "TARGET_HAVE_MVE"
> -  "vand %q0, %q1, %q2"
> +  "vand\t%q0, %q1, %q2"
>     [(set_attr "type" "mve_move")
>   ])
> +(define_expand "mve_vandq_s<mode>"
> +  [
> +   (set (match_operand:MVE_2 0 "s_register_operand")
> +	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
> +		       (match_operand:MVE_2 2 "s_register_operand")))
> +  ]
> +  "TARGET_HAVE_MVE"
> +)
>   
>   ;;
>   ;; [vbicq_s, vbicq_u])
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 2d76769..dc4707d 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -712,7 +712,7 @@ (define_insn "ior<mode>3"
>   ;; corresponds to the canonical form the middle-end expects to use for
>   ;; immediate bitwise-ANDs.
>   
> -(define_insn "and<mode>3"
> +(define_insn "and<mode>3_neon"
>     [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
>   	(and:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
>   		 (match_operand:VDQ 2 "neon_inv_logic_op2" "w,DL")))]
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index a3844e9..e8bf68e 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -601,7 +601,6 @@ (define_c_enum "unspec" [
>     VADDQ_N_S
>     VADDVAQ_S
>     VADDVQ_P_S
> -  VANDQ_S
>     VBICQ_S
>     VBRSRQ_N_S
>     VCADDQ_ROT270_S
> @@ -648,7 +647,6 @@ (define_c_enum "unspec" [
>     VADDQ_N_U
>     VADDVAQ_U
>     VADDVQ_P_U
> -  VANDQ_U
>     VBICQ_U
>     VBRSRQ_N_U
>     VCADDQ_ROT270_U
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 250e503..3dd694c 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -172,3 +172,18 @@ (define_expand "vec_set<mode>"
>   					       GEN_INT (elem), operands[0]));
>     DONE;
>   })
> +
> +(define_expand "and<mode>3"
> +  [(set (match_operand:VNIM1 0 "s_register_operand" "")
> +	(and:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
> +		   (match_operand:VNIM1 2 "neon_inv_logic_op2" "")))]
> +  "TARGET_NEON
> +   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
> +)
> +
> +(define_expand "and<mode>3"
> +  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
> +	(and:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
> +		      (match_operand:VNINOTM1 2 "neon_inv_logic_op2" "")))]
> +  "TARGET_NEON"
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vand.c b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> new file mode 100644
> index 0000000..2e30cd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> @@ -0,0 +1,34 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <stdint.h>
> +
> +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> +    int i;								\
> +    for (i=0; i<NB; i++) {						\
> +      dest[i] = a[i] OP b[i];						\
> +    }									\
> +}
> +
> +/* 64-bit vectors.  */
> +FUNC(s, int, 32, 2, &, vand)
> +FUNC(u, uint, 32, 2, &, vand)
> +FUNC(s, int, 16, 4, &, vand)
> +FUNC(u, uint, 16, 4, &, vand)
> +FUNC(s, int, 8, 8, &, vand)
> +FUNC(u, uint, 8, 8, &, vand)
> +
> +/* 128-bit vectors.  */
> +FUNC(s, int, 32, 4, &, vand)
> +FUNC(u, uint, 32, 4, &, vand)
> +FUNC(s, int, 16, 8, &, vand)
> +FUNC(u, uint, 16, 8, &, vand)
> +FUNC(s, int, 8, 16, &, vand)
> +FUNC(u, uint, 8, 16, &, vand)
> +
> +/* MVE has only 128-bit vectors, so we can vectorize only half of the
> +   functions above.  */
> +/* { dg-final { scan-assembler-times {vand\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/7] arm: Auto-vectorization for MVE: veor
  2020-11-25 13:54 ` [PATCH 3/7] arm: Auto-vectorization for MVE: veor Christophe Lyon
@ 2020-11-26 10:46   ` Andre Vieira (lists)
  0 siblings, 0 replies; 12+ messages in thread
From: Andre Vieira (lists) @ 2020-11-26 10:46 UTC (permalink / raw)
  To: gcc-patches, Christophe Lyon

LGTM,  but please wait for maintainer review.

On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> This patch enables MVE veorq instructions for auto-vectorization.  MVE
> veorq insns in mve.md are modified to use xor instead of unspec
> expression to support xor<mode>3.  The xor<mode>3 expander is added to
> vec-common.md
>
> 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
>
> 	gcc/
> 	* config/arm/iterators.md (supf): Remove VEORQ_S and VEORQ_U.
> 	(VEORQ): Remove.
> 	* config/arm/mve.md (mve_veorq_u<mode>): New entry for veor
> 	instruction using expression xor.
> 	(mve_veorq_s<mode>): New expander.
> 	* config/arm/neon.md (xor<mode>3): Renamed into xor<mode>3_neon.
> 	* config/arm/unspscs.md (VEORQ_S, VEORQ_U): Remove.
> 	* config/arm/vec-common.md (xor<mode>3): New expander.
>
> 	gcc/testsuite/
> 	* gcc.target/arm/simd/mve-veor.c: Add tests for veor.
> ---
>   gcc/config/arm/iterators.md                  |  3 +--
>   gcc/config/arm/mve.md                        | 17 ++++++++++----
>   gcc/config/arm/neon.md                       |  2 +-
>   gcc/config/arm/unspecs.md                    |  2 --
>   gcc/config/arm/vec-common.md                 | 15 ++++++++++++
>   gcc/testsuite/gcc.target/arm/simd/mve-veor.c | 34 ++++++++++++++++++++++++++++
>   6 files changed, 63 insertions(+), 10 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-veor.c
>
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 5fcb7af..0195275 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -1237,7 +1237,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>   		       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
>   		       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
>   		       (VCMPEQQ_N_S "s") (VCMPEQQ_N_U "u") (VCMPNEQ_N_S "s")
> -		       (VCMPNEQ_N_U "u") (VEORQ_S "s") (VEORQ_U "u")
> +		       (VCMPNEQ_N_U "u")
>   		       (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s")
>   		       (VHADDQ_U "u") (VHSUBQ_N_S "s")	(VHSUBQ_N_U "u")
>   		       (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u")
> @@ -1507,7 +1507,6 @@ (define_int_iterator VCADDQ_ROT90 [VCADDQ_ROT90_U VCADDQ_ROT90_S])
>   (define_int_iterator VCMPEQQ [VCMPEQQ_U VCMPEQQ_S])
>   (define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S VCMPEQQ_N_U])
>   (define_int_iterator VCMPNEQ_N [VCMPNEQ_N_U VCMPNEQ_N_S])
> -(define_int_iterator VEORQ [VEORQ_U VEORQ_S])
>   (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U])
>   (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S])
>   (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U])
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 0f04044..a5f5d75 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1204,17 +1204,24 @@ (define_insn "mve_vcmpneq_n_<supf><mode>"
>   ;;
>   ;; [veorq_u, veorq_s])
>   ;;
> -(define_insn "mve_veorq_<supf><mode>"
> +(define_insn "mve_veorq_u<mode>"
>     [
>      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> -	(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> -		       (match_operand:MVE_2 2 "s_register_operand" "w")]
> -	 VEORQ))
> +	(xor:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> +		       (match_operand:MVE_2 2 "s_register_operand" "w")))
>     ]
>     "TARGET_HAVE_MVE"
> -  "veor %q0, %q1, %q2"
> +  "veor\t%q0, %q1, %q2"
>     [(set_attr "type" "mve_move")
>   ])
> +(define_expand "mve_veorq_s<mode>"
> +  [
> +   (set (match_operand:MVE_2 0 "s_register_operand")
> +	(xor:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
> +		       (match_operand:MVE_2 2 "s_register_operand")))
> +  ]
> +  "TARGET_HAVE_MVE"
> +)
>   
>   ;;
>   ;; [vhaddq_n_u, vhaddq_n_s])
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 669c34d..e1263b0 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -747,7 +747,7 @@ (define_insn "bic<mode>3_neon"
>     [(set_attr "type" "neon_logic<q>")]
>   )
>   
> -(define_insn "xor<mode>3"
> +(define_insn "xor<mode>3_neon"
>     [(set (match_operand:VDQ 0 "s_register_operand" "=w")
>   	(xor:VDQ (match_operand:VDQ 1 "s_register_operand" "w")
>   		 (match_operand:VDQ 2 "s_register_operand" "w")))]
> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> index f111ad8..78313ea 100644
> --- a/gcc/config/arm/unspecs.md
> +++ b/gcc/config/arm/unspecs.md
> @@ -608,7 +608,6 @@ (define_c_enum "unspec" [
>     VCMPEQQ_S
>     VCMPEQQ_N_S
>     VCMPNEQ_N_S
> -  VEORQ_S
>     VHADDQ_S
>     VHADDQ_N_S
>     VHSUBQ_S
> @@ -653,7 +652,6 @@ (define_c_enum "unspec" [
>     VCMPEQQ_U
>     VCMPEQQ_N_U
>     VCMPNEQ_N_U
> -  VEORQ_U
>     VHADDQ_U
>     VHADDQ_N_U
>     VHSUBQ_U
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 413fb07..687134a 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -202,3 +202,18 @@ (define_expand "ior<mode>3"
>   		      (match_operand:VNINOTM1 2 "neon_logic_op2" "")))]
>     "TARGET_NEON"
>   )
> +
> +(define_expand "xor<mode>3"
> +  [(set (match_operand:VNIM1 0 "s_register_operand" "")
> +	(xor:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
> +		   (match_operand:VNIM1 2 "s_register_operand" "")))]
> +  "TARGET_NEON
> +   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
> +)
> +
> +(define_expand "xor<mode>3"
> +  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
> +	(xor:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
> +		      (match_operand:VNINOTM1 2 "s_register_operand" "")))]
> +  "TARGET_NEON"
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-veor.c b/gcc/testsuite/gcc.target/arm/simd/mve-veor.c
> new file mode 100644
> index 0000000..5c534cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-veor.c
> @@ -0,0 +1,34 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include <stdint.h>
> +
> +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)				\
> +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> +    int i;								\
> +    for (i=0; i<NB; i++) {						\
> +      dest[i] = a[i] OP b[i];						\
> +    }									\
> +}
> +
> +/* 64-bit vectors.  */
> +FUNC(s, int, 32, 2, ^, veor)
> +FUNC(u, uint, 32, 2, ^, veor)
> +FUNC(s, int, 16, 4, ^, veor)
> +FUNC(u, uint, 16, 4, ^, veor)
> +FUNC(s, int, 8, 8, ^, veor)
> +FUNC(u, uint, 8, 8, ^, veor)
> +
> +/* 128-bit vectors.  */
> +FUNC(s, int, 32, 4, ^, veor)
> +FUNC(u, uint, 32, 4, ^, veor)
> +FUNC(s, int, 16, 8, ^, veor)
> +FUNC(u, uint, 16, 8, ^, veor)
> +FUNC(s, int, 8, 16, ^, veor)
> +FUNC(u, uint, 8, 16, ^, veor)
> +
> +/* MVE has only 128-bit vectors, so we can vectorize only half of the
> +   functions above.  */
> +/* { dg-final { scan-assembler-times {veor\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand
  2020-11-25 17:17 ` [PATCH 1/7] arm: Auto-vectorization for MVE: vand Andre Simoes Dias Vieira
@ 2020-11-26 15:31   ` Christophe Lyon
  2020-11-27 14:13     ` Andre Vieira (lists)
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2020-11-26 15:31 UTC (permalink / raw)
  To: Andre Simoes Dias Vieira; +Cc: gcc Patches

[-- Attachment #1: Type: text/plain, Size: 9667 bytes --]

Hi Andre,

Thanks for the quick feedback.

On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi Christophe,
>
> Thanks for these! See some inline comments.
>
> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> > This patch enables MVE vandq instructions for auto-vectorization.  MVE
> > vandq insns in mve.md are modified to use 'and' instead of unspec
> > expression to support and<mode>3.  The and<mode>3 expander is added to
> > vec-common.md
> >
> > 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
> >
> >       gcc/
> >       * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
> >       (VANQ): Remove.
> >       * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
> >       instruction using expression and.
> >       (mve_vandq_s<mode>): New expander.
> >       * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
> >       * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
> >       * config/arm/vec-common.md (and<mode>3): New expander.
> >
> >       gcc/testsuite/
> >       * gcc.target/arm/simd/mve-vand.c: New test.
> > ---
> >   gcc/config/arm/iterators.md                  |  4 +---
> >   gcc/config/arm/mve.md                        | 20 ++++++++++++----
> >   gcc/config/arm/neon.md                       |  2 +-
> >   gcc/config/arm/unspecs.md                    |  2 --
> >   gcc/config/arm/vec-common.md                 | 15 ++++++++++++
> >   gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
> >   6 files changed, 66 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 592af35..72039e4 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> >                      (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
> >                      (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
> >                      (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
> > -                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
> > -                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
> > +                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
> >                      (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
> >                      (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
> >                      (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
> > @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
> >   (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
> >   (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
> >   (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
> > -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
> >   (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
> >   (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
> >   (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > index ecbaaa9..975eb7d 100644
> > --- a/gcc/config/arm/mve.md
> > +++ b/gcc/config/arm/mve.md
> > @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
> >   ;;
> >   ;; [vandq_u, vandq_s])
> >   ;;
> > -(define_insn "mve_vandq_<supf><mode>"
> > +;; signed and unsigned versions are the same: define the unsigned
> > +;; insn, and use an expander for the signed one as we still reference
> > +;; both names from arm_mve.h.
> > +(define_insn "mve_vandq_u<mode>"
> >     [
> >      (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> > -     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> > -                    (match_operand:MVE_2 2 "s_register_operand" "w")]
> > -      VANDQ))
> > +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> > +                    (match_operand:MVE_2 2 "s_register_operand" "w")))
> The predicate on the second operand is more restrictive than the one in
> expand 'neon_inv_logic_op2'. This should still work with immediates, or
> well I checked for integers, it generates a loop as such:
>
Right, thanks for catching it.

>          vldrw.32        q3, [r0]
>          vldr.64 d4, .L8
>          vldr.64 d5, .L8+8
>          vand    q3, q3, q2
>          vstrw.32        q3, [r2]
>
> MVE does support vand's with immediates, just like NEON, I suspect you
> could just copy the way Neon handles these, possibly worth the little
> extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
> If you don't it might still be worth expanding the test to make sure
> other immediates-types combinations don't trigger an ICE?
>
> I'm not sure I understand why it loads it in two 64-bit chunks and not
> do a single load or not just do something like a vmov or vbic immediate.
> Anyhow that's a worry for another day I guess..

Do you mean something like the attached (on top of this patch)?
I dislike the code duplication in mve_vandq_u<mode> which would
become a copy of and<mode>3_neon.

The other concern is that it's not exercised by testcase: as you noted
the compiler uses a pair of loads to prepare the second operand.

But indeed that's probably a separate problem.

I guess your comments apply to patch 2 (vorr)?

Thanks,

Christophe


> >     ]
> >     "TARGET_HAVE_MVE"
> > -  "vand %q0, %q1, %q2"
> > +  "vand\t%q0, %q1, %q2"
> >     [(set_attr "type" "mve_move")
> >   ])
> > +(define_expand "mve_vandq_s<mode>"
> > +  [
> > +   (set (match_operand:MVE_2 0 "s_register_operand")
> > +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
> > +                    (match_operand:MVE_2 2 "s_register_operand")))
> > +  ]
> > +  "TARGET_HAVE_MVE"
> > +)
> >
> >   ;;
> >   ;; [vbicq_s, vbicq_u])
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> > index 2d76769..dc4707d 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -712,7 +712,7 @@ (define_insn "ior<mode>3"
> >   ;; corresponds to the canonical form the middle-end expects to use for
> >   ;; immediate bitwise-ANDs.
> >
> > -(define_insn "and<mode>3"
> > +(define_insn "and<mode>3_neon"
> >     [(set (match_operand:VDQ 0 "s_register_operand" "=w,w")
> >       (and:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0")
> >                (match_operand:VDQ 2 "neon_inv_logic_op2" "w,DL")))]
> > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
> > index a3844e9..e8bf68e 100644
> > --- a/gcc/config/arm/unspecs.md
> > +++ b/gcc/config/arm/unspecs.md
> > @@ -601,7 +601,6 @@ (define_c_enum "unspec" [
> >     VADDQ_N_S
> >     VADDVAQ_S
> >     VADDVQ_P_S
> > -  VANDQ_S
> >     VBICQ_S
> >     VBRSRQ_N_S
> >     VCADDQ_ROT270_S
> > @@ -648,7 +647,6 @@ (define_c_enum "unspec" [
> >     VADDQ_N_U
> >     VADDVAQ_U
> >     VADDVQ_P_U
> > -  VANDQ_U
> >     VBICQ_U
> >     VBRSRQ_N_U
> >     VCADDQ_ROT270_U
> > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> > index 250e503..3dd694c 100644
> > --- a/gcc/config/arm/vec-common.md
> > +++ b/gcc/config/arm/vec-common.md
> > @@ -172,3 +172,18 @@ (define_expand "vec_set<mode>"
> >                                              GEN_INT (elem), operands[0]));
> >     DONE;
> >   })
> > +
> > +(define_expand "and<mode>3"
> > +  [(set (match_operand:VNIM1 0 "s_register_operand" "")
> > +     (and:VNIM1 (match_operand:VNIM1 1 "s_register_operand" "")
> > +                (match_operand:VNIM1 2 "neon_inv_logic_op2" "")))]
> > +  "TARGET_NEON
> > +   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))"
> > +)
> > +
> > +(define_expand "and<mode>3"
> > +  [(set (match_operand:VNINOTM1 0 "s_register_operand" "")
> > +     (and:VNINOTM1 (match_operand:VNINOTM1 1 "s_register_operand" "")
> > +                   (match_operand:VNINOTM1 2 "neon_inv_logic_op2" "")))]
> > +  "TARGET_NEON"
> > +)
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vand.c b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> > new file mode 100644
> > index 0000000..2e30cd0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> > @@ -0,0 +1,34 @@
> > +/* { dg-do assemble } */
> > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> > +/* { dg-add-options arm_v8_1m_mve } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +#include <stdint.h>
> > +
> > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME)                         \
> > +  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> > +    int i;                                                           \
> > +    for (i=0; i<NB; i++) {                                           \
> > +      dest[i] = a[i] OP b[i];                                                \
> > +    }                                                                        \
> > +}
> > +
> > +/* 64-bit vectors.  */
> > +FUNC(s, int, 32, 2, &, vand)
> > +FUNC(u, uint, 32, 2, &, vand)
> > +FUNC(s, int, 16, 4, &, vand)
> > +FUNC(u, uint, 16, 4, &, vand)
> > +FUNC(s, int, 8, 8, &, vand)
> > +FUNC(u, uint, 8, 8, &, vand)
> > +
> > +/* 128-bit vectors.  */
> > +FUNC(s, int, 32, 4, &, vand)
> > +FUNC(u, uint, 32, 4, &, vand)
> > +FUNC(s, int, 16, 8, &, vand)
> > +FUNC(u, uint, 16, 8, &, vand)
> > +FUNC(s, int, 8, 16, &, vand)
> > +FUNC(u, uint, 8, 16, &, vand)
> > +
> > +/* MVE has only 128-bit vectors, so we can vectorize only half of the
> > +   functions above.  */
> > +/* { dg-final { scan-assembler-times {vand\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */

[-- Attachment #2: mve-vand-imm.patch.txt --]
[-- Type: text/plain, Size: 3286 bytes --]

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 15b37f0..d6fe3de 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -939,19 +939,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
 ;; both names from arm_mve.h.
 (define_insn "mve_vandq_u<mode>"
   [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
-		       (match_operand:MVE_2 2 "s_register_operand" "w")))
+   (set (match_operand:MVE_2 0 "s_register_operand" "=w,w")
+	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w,0")
+		   (match_operand:MVE_2 2 "neon_inv_logic_op2" "w,DL")))
   ]
   "TARGET_HAVE_MVE"
-  "vand\t%q0, %q1, %q2"
+  {
+  switch (which_alternative)
+    {
+    case 0: return "vand\t%q0, %q1, %q2";
+    case 1: return neon_output_logic_immediate ("vand", &operands[2],
+		     <MODE>mode, 1, VALID_NEON_QREG_MODE (<MODE>mode));
+    default: gcc_unreachable ();
+  }
+  }
   [(set_attr "type" "mve_move")
 ])
 (define_expand "mve_vandq_s<mode>"
   [
    (set (match_operand:MVE_2 0 "s_register_operand")
 	(and:MVE_2 (match_operand:MVE_2 1 "s_register_operand")
-		       (match_operand:MVE_2 2 "s_register_operand")))
+		   (match_operand:MVE_2 2 "neon_inv_logic_op2")))
   ]
   "TARGET_HAVE_MVE"
 )
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 2144520..5f58f7c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -107,7 +107,7 @@ (define_predicate "vpr_register_operand"
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
-  return (TARGET_NEON
+  return ((TARGET_NEON || TARGET_HAVE_MVE)
           && neon_immediate_valid_for_logic (op, mode, 1, NULL, NULL));
 })
 
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vand.c b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
index 2e30cd0..341e9aa0 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vand.c
@@ -13,6 +13,14 @@
     }									\
 }
 
+#define FUNC_IMM(SIGN, TYPE, BITS, NB, OP, NAME)				\
+  void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
+    int i;								\
+    for (i=0; i<NB; i++) {						\
+      dest[i] = a[i] OP 1;						\
+    }									\
+}
+
 /* 64-bit vectors.  */
 FUNC(s, int, 32, 2, &, vand)
 FUNC(u, uint, 32, 2, &, vand)
@@ -29,6 +37,22 @@ FUNC(u, uint, 16, 8, &, vand)
 FUNC(s, int, 8, 16, &, vand)
 FUNC(u, uint, 8, 16, &, vand)
 
+/* 64-bit vectors.  */
+FUNC_IMM(s, int, 32, 2, &, vandimm)
+FUNC_IMM(u, uint, 32, 2, &, vandimm)
+FUNC_IMM(s, int, 16, 4, &, vandimm)
+FUNC_IMM(u, uint, 16, 4, &, vandimm)
+FUNC_IMM(s, int, 8, 8, &, vandimm)
+FUNC_IMM(u, uint, 8, 8, &, vandimm)
+
+/* 128-bit vectors.  */
+FUNC_IMM(s, int, 32, 4, &, vandimm)
+FUNC_IMM(u, uint, 32, 4, &, vandimm)
+FUNC_IMM(s, int, 16, 8, &, vandimm)
+FUNC_IMM(u, uint, 16, 8, &, vandimm)
+FUNC_IMM(s, int, 8, 16, &, vandimm)
+FUNC_IMM(u, uint, 8, 16, &, vandimm)
+
 /* MVE has only 128-bit vectors, so we can vectorize only half of the
    functions above.  */
-/* { dg-final { scan-assembler-times {vand\tq[0-9]+, q[0-9]+, q[0-9]+} 6 } } */
+/* { dg-final { scan-assembler-times {vand\tq[0-9]+, q[0-9]+, q[0-9]+} 12 } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand
  2020-11-26 15:31   ` Christophe Lyon
@ 2020-11-27 14:13     ` Andre Vieira (lists)
  2020-11-27 15:29       ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Andre Vieira (lists) @ 2020-11-27 14:13 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc Patches

Hi Christophe,

On 26/11/2020 15:31, Christophe Lyon wrote:
> Hi Andre,
>
> Thanks for the quick feedback.
>
> On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
> <andre.simoesdiasvieira@arm.com> wrote:
>> Hi Christophe,
>>
>> Thanks for these! See some inline comments.
>>
>> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
>>> This patch enables MVE vandq instructions for auto-vectorization.  MVE
>>> vandq insns in mve.md are modified to use 'and' instead of unspec
>>> expression to support and<mode>3.  The and<mode>3 expander is added to
>>> vec-common.md
>>>
>>> 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
>>>
>>>        gcc/
>>>        * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
>>>        (VANQ): Remove.
>>>        * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
>>>        instruction using expression and.
>>>        (mve_vandq_s<mode>): New expander.
>>>        * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
>>>        * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
>>>        * config/arm/vec-common.md (and<mode>3): New expander.
>>>
>>>        gcc/testsuite/
>>>        * gcc.target/arm/simd/mve-vand.c: New test.
>>> ---
>>>    gcc/config/arm/iterators.md                  |  4 +---
>>>    gcc/config/arm/mve.md                        | 20 ++++++++++++----
>>>    gcc/config/arm/neon.md                       |  2 +-
>>>    gcc/config/arm/unspecs.md                    |  2 --
>>>    gcc/config/arm/vec-common.md                 | 15 ++++++++++++
>>>    gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
>>>    6 files changed, 66 insertions(+), 11 deletions(-)
>>>    create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
>>>
>>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>>> index 592af35..72039e4 100644
>>> --- a/gcc/config/arm/iterators.md
>>> +++ b/gcc/config/arm/iterators.md
>>> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
>>>                       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
>>>                       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
>>>                       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
>>> -                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
>>> -                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
>>> +                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
>>>                       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
>>>                       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
>>>                       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
>>> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
>>>    (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
>>>    (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
>>>    (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
>>> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
>>>    (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
>>>    (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
>>>    (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
>>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>>> index ecbaaa9..975eb7d 100644
>>> --- a/gcc/config/arm/mve.md
>>> +++ b/gcc/config/arm/mve.md
>>> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
>>>    ;;
>>>    ;; [vandq_u, vandq_s])
>>>    ;;
>>> -(define_insn "mve_vandq_<supf><mode>"
>>> +;; signed and unsigned versions are the same: define the unsigned
>>> +;; insn, and use an expander for the signed one as we still reference
>>> +;; both names from arm_mve.h.
>>> +(define_insn "mve_vandq_u<mode>"
>>>      [
>>>       (set (match_operand:MVE_2 0 "s_register_operand" "=w")
>>> -     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
>>> -                    (match_operand:MVE_2 2 "s_register_operand" "w")]
>>> -      VANDQ))
>>> +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
>>> +                    (match_operand:MVE_2 2 "s_register_operand" "w")))
>> The predicate on the second operand is more restrictive than the one in
>> expand 'neon_inv_logic_op2'. This should still work with immediates, or
>> well I checked for integers, it generates a loop as such:
>>
> Right, thanks for catching it.
>
>>           vldrw.32        q3, [r0]
>>           vldr.64 d4, .L8
>>           vldr.64 d5, .L8+8
>>           vand    q3, q3, q2
>>           vstrw.32        q3, [r2]
>>
>> MVE does support vand's with immediates, just like NEON, I suspect you
>> could just copy the way Neon handles these, possibly worth the little
>> extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
>> If you don't it might still be worth expanding the test to make sure
>> other immediates-types combinations don't trigger an ICE?
>>
>> I'm not sure I understand why it loads it in two 64-bit chunks and not
>> do a single load or not just do something like a vmov or vbic immediate.
>> Anyhow that's a worry for another day I guess..
> Do you mean something like the attached (on top of this patch)?
> I dislike the code duplication in mve_vandq_u<mode> which would
> become a copy of and<mode>3_neon.
Hi Christophe,

Yeah that's what I meant. I agree with the code duplication. The reason 
we still use separate ones is because of the difference in supported 
modes. Maybe the right way around that would be to redefine VDQ as:

(define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI
                                                      (V4HI 
"TARGET_HAVE_NEON") V8HI
                                                      (V2SI 
"TARGET_HAVE_NEON") V4SI
                                                      (V4HF 
"TARGET_HAVE_NEON") V8HF
                                                      (V2SF 
"TARGET_HAVE_NEON") V4SF V2DI])

And have a single define_expand and insn for both vector extensions. 
Though we would also need to do something about the type attribut in 
case we want to have different scheduling types for both. Though right 
now we don't do any scheduling for MVE, I don't know whether these can 
be conditionalized on target features though, something to look at.

>
> The other concern is that it's not exercised by testcase: as you noted
> the compiler uses a pair of loads to prepare the second operand.
>
> But indeed that's probably a separate problem.
>
> I guess your comments apply to patch 2 (vorr)?

Yeah, forgot to reply to that one, but yes! I still need to have a look 
at 4-7.

Kind regards,
Andre


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand
  2020-11-27 14:13     ` Andre Vieira (lists)
@ 2020-11-27 15:29       ` Christophe Lyon
  2020-11-30 10:39         ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2020-11-27 15:29 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc Patches

On Fri, 27 Nov 2020 at 15:13, Andre Vieira (lists)
<andre.simoesdiasvieira@arm.com> wrote:
>
> Hi Christophe,
>
> On 26/11/2020 15:31, Christophe Lyon wrote:
> > Hi Andre,
> >
> > Thanks for the quick feedback.
> >
> > On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
> > <andre.simoesdiasvieira@arm.com> wrote:
> >> Hi Christophe,
> >>
> >> Thanks for these! See some inline comments.
> >>
> >> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> >>> This patch enables MVE vandq instructions for auto-vectorization.  MVE
> >>> vandq insns in mve.md are modified to use 'and' instead of unspec
> >>> expression to support and<mode>3.  The and<mode>3 expander is added to
> >>> vec-common.md
> >>>
> >>> 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
> >>>
> >>>        gcc/
> >>>        * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
> >>>        (VANQ): Remove.
> >>>        * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
> >>>        instruction using expression and.
> >>>        (mve_vandq_s<mode>): New expander.
> >>>        * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
> >>>        * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
> >>>        * config/arm/vec-common.md (and<mode>3): New expander.
> >>>
> >>>        gcc/testsuite/
> >>>        * gcc.target/arm/simd/mve-vand.c: New test.
> >>> ---
> >>>    gcc/config/arm/iterators.md                  |  4 +---
> >>>    gcc/config/arm/mve.md                        | 20 ++++++++++++----
> >>>    gcc/config/arm/neon.md                       |  2 +-
> >>>    gcc/config/arm/unspecs.md                    |  2 --
> >>>    gcc/config/arm/vec-common.md                 | 15 ++++++++++++
> >>>    gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
> >>>    6 files changed, 66 insertions(+), 11 deletions(-)
> >>>    create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> >>>
> >>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> >>> index 592af35..72039e4 100644
> >>> --- a/gcc/config/arm/iterators.md
> >>> +++ b/gcc/config/arm/iterators.md
> >>> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> >>>                       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
> >>>                       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
> >>>                       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
> >>> -                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
> >>> -                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
> >>> +                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
> >>>                       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
> >>>                       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
> >>>                       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
> >>> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
> >>>    (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
> >>>    (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
> >>>    (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
> >>> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
> >>>    (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
> >>>    (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
> >>>    (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
> >>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> >>> index ecbaaa9..975eb7d 100644
> >>> --- a/gcc/config/arm/mve.md
> >>> +++ b/gcc/config/arm/mve.md
> >>> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
> >>>    ;;
> >>>    ;; [vandq_u, vandq_s])
> >>>    ;;
> >>> -(define_insn "mve_vandq_<supf><mode>"
> >>> +;; signed and unsigned versions are the same: define the unsigned
> >>> +;; insn, and use an expander for the signed one as we still reference
> >>> +;; both names from arm_mve.h.
> >>> +(define_insn "mve_vandq_u<mode>"
> >>>      [
> >>>       (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> >>> -     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> >>> -                    (match_operand:MVE_2 2 "s_register_operand" "w")]
> >>> -      VANDQ))
> >>> +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> >>> +                    (match_operand:MVE_2 2 "s_register_operand" "w")))
> >> The predicate on the second operand is more restrictive than the one in
> >> expand 'neon_inv_logic_op2'. This should still work with immediates, or
> >> well I checked for integers, it generates a loop as such:
> >>
> > Right, thanks for catching it.
> >
> >>           vldrw.32        q3, [r0]
> >>           vldr.64 d4, .L8
> >>           vldr.64 d5, .L8+8
> >>           vand    q3, q3, q2
> >>           vstrw.32        q3, [r2]
> >>
> >> MVE does support vand's with immediates, just like NEON, I suspect you
> >> could just copy the way Neon handles these, possibly worth the little
> >> extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
> >> If you don't it might still be worth expanding the test to make sure
> >> other immediates-types combinations don't trigger an ICE?
> >>
> >> I'm not sure I understand why it loads it in two 64-bit chunks and not
> >> do a single load or not just do something like a vmov or vbic immediate.
> >> Anyhow that's a worry for another day I guess..
> > Do you mean something like the attached (on top of this patch)?
> > I dislike the code duplication in mve_vandq_u<mode> which would
> > become a copy of and<mode>3_neon.
> Hi Christophe,
>
> Yeah that's what I meant. I agree with the code duplication. The reason
> we still use separate ones is because of the difference in supported
> modes. Maybe the right way around that would be to redefine VDQ as:
>
> (define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI
>                                                       (V4HI
> "TARGET_HAVE_NEON") V8HI
>                                                       (V2SI
> "TARGET_HAVE_NEON") V4SI
>                                                       (V4HF
> "TARGET_HAVE_NEON") V8HF
>                                                       (V2SF
> "TARGET_HAVE_NEON") V4SF V2DI])
>
> And have a single define_expand and insn for both vector extensions.

Indeed, I can try that.
I have also noticed the VNIM1 / VNINOTM1 pair.

> Though we would also need to do something about the type attribut in
> case we want to have different scheduling types for both. Though right
> now we don't do any scheduling for MVE, I don't know whether these can
> be conditionalized on target features though, something to look at.
>
> >
> > The other concern is that it's not exercised by testcase: as you noted
> > the compiler uses a pair of loads to prepare the second operand.
> >
> > But indeed that's probably a separate problem.
> >
> > I guess your comments apply to patch 2 (vorr)?
>
> Yeah, forgot to reply to that one, but yes! I still need to have a look
> at 4-7.

Ok thanks, I have a WIP fix for #7 (vmvn) anyway.

> Kind regards,
> Andre
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/7] arm: Auto-vectorization for MVE: vand
  2020-11-27 15:29       ` Christophe Lyon
@ 2020-11-30 10:39         ` Christophe Lyon
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2020-11-30 10:39 UTC (permalink / raw)
  To: Andre Vieira (lists); +Cc: gcc Patches

On Fri, 27 Nov 2020 at 16:29, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> On Fri, 27 Nov 2020 at 15:13, Andre Vieira (lists)
> <andre.simoesdiasvieira@arm.com> wrote:
> >
> > Hi Christophe,
> >
> > On 26/11/2020 15:31, Christophe Lyon wrote:
> > > Hi Andre,
> > >
> > > Thanks for the quick feedback.
> > >
> > > On Wed, 25 Nov 2020 at 18:17, Andre Simoes Dias Vieira
> > > <andre.simoesdiasvieira@arm.com> wrote:
> > >> Hi Christophe,
> > >>
> > >> Thanks for these! See some inline comments.
> > >>
> > >> On 25/11/2020 13:54, Christophe Lyon via Gcc-patches wrote:
> > >>> This patch enables MVE vandq instructions for auto-vectorization.  MVE
> > >>> vandq insns in mve.md are modified to use 'and' instead of unspec
> > >>> expression to support and<mode>3.  The and<mode>3 expander is added to
> > >>> vec-common.md
> > >>>
> > >>> 2020-11-12  Christophe Lyon  <christophe.lyon@linaro.org>
> > >>>
> > >>>        gcc/
> > >>>        * gcc/config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
> > >>>        (VANQ): Remove.
> > >>>        * config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
> > >>>        instruction using expression and.
> > >>>        (mve_vandq_s<mode>): New expander.
> > >>>        * config/arm/neon.md (and<mode>3): Renamed into and<mode>3_neon.
> > >>>        * config/arm/unspecs.md (VANDQ_S, VANDQ_U): Remove.
> > >>>        * config/arm/vec-common.md (and<mode>3): New expander.
> > >>>
> > >>>        gcc/testsuite/
> > >>>        * gcc.target/arm/simd/mve-vand.c: New test.
> > >>> ---
> > >>>    gcc/config/arm/iterators.md                  |  4 +---
> > >>>    gcc/config/arm/mve.md                        | 20 ++++++++++++----
> > >>>    gcc/config/arm/neon.md                       |  2 +-
> > >>>    gcc/config/arm/unspecs.md                    |  2 --
> > >>>    gcc/config/arm/vec-common.md                 | 15 ++++++++++++
> > >>>    gcc/testsuite/gcc.target/arm/simd/mve-vand.c | 34 ++++++++++++++++++++++++++++
> > >>>    6 files changed, 66 insertions(+), 11 deletions(-)
> > >>>    create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vand.c
> > >>>
> > >>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > >>> index 592af35..72039e4 100644
> > >>> --- a/gcc/config/arm/iterators.md
> > >>> +++ b/gcc/config/arm/iterators.md
> > >>> @@ -1232,8 +1232,7 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s")
> > >>>                       (VADDLVQ_P_U "u") (VCMPNEQ_U "u") (VCMPNEQ_S "s")
> > >>>                       (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s")
> > >>>                       (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u")
> > >>> -                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VANDQ_S "s")
> > >>> -                    (VANDQ_U "u") (VBICQ_S "s") (VBICQ_U "u")
> > >>> +                    (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBICQ_S "s") (VBICQ_U "u")
> > >>>                       (VBRSRQ_N_S "s") (VBRSRQ_N_U "u") (VCADDQ_ROT270_S "s")
> > >>>                       (VCADDQ_ROT270_U "u") (VCADDQ_ROT90_S "s")
> > >>>                       (VCMPEQQ_S "s") (VCMPEQQ_U "u") (VCADDQ_ROT90_U "u")
> > >>> @@ -1501,7 +1500,6 @@ (define_int_iterator VABDQ [VABDQ_S VABDQ_U])
> > >>>    (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U])
> > >>>    (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U])
> > >>>    (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S])
> > >>> -(define_int_iterator VANDQ [VANDQ_U VANDQ_S])
> > >>>    (define_int_iterator VBICQ [VBICQ_S VBICQ_U])
> > >>>    (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S])
> > >>>    (define_int_iterator VCADDQ_ROT270 [VCADDQ_ROT270_S VCADDQ_ROT270_U])
> > >>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> > >>> index ecbaaa9..975eb7d 100644
> > >>> --- a/gcc/config/arm/mve.md
> > >>> +++ b/gcc/config/arm/mve.md
> > >>> @@ -894,17 +894,27 @@ (define_insn "mve_vaddvq_p_<supf><mode>"
> > >>>    ;;
> > >>>    ;; [vandq_u, vandq_s])
> > >>>    ;;
> > >>> -(define_insn "mve_vandq_<supf><mode>"
> > >>> +;; signed and unsigned versions are the same: define the unsigned
> > >>> +;; insn, and use an expander for the signed one as we still reference
> > >>> +;; both names from arm_mve.h.
> > >>> +(define_insn "mve_vandq_u<mode>"
> > >>>      [
> > >>>       (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> > >>> -     (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")
> > >>> -                    (match_operand:MVE_2 2 "s_register_operand" "w")]
> > >>> -      VANDQ))
> > >>> +     (and:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")
> > >>> +                    (match_operand:MVE_2 2 "s_register_operand" "w")))
> > >> The predicate on the second operand is more restrictive than the one in
> > >> expand 'neon_inv_logic_op2'. This should still work with immediates, or
> > >> well I checked for integers, it generates a loop as such:
> > >>
> > > Right, thanks for catching it.
> > >
> > >>           vldrw.32        q3, [r0]
> > >>           vldr.64 d4, .L8
> > >>           vldr.64 d5, .L8+8
> > >>           vand    q3, q3, q2
> > >>           vstrw.32        q3, [r2]
> > >>
> > >> MVE does support vand's with immediates, just like NEON, I suspect you
> > >> could just copy the way Neon handles these, possibly worth the little
> > >> extra effort. You can use dest[i] = a[i] & ~1 as a testcase.
> > >> If you don't it might still be worth expanding the test to make sure
> > >> other immediates-types combinations don't trigger an ICE?
> > >>
> > >> I'm not sure I understand why it loads it in two 64-bit chunks and not
> > >> do a single load or not just do something like a vmov or vbic immediate.
> > >> Anyhow that's a worry for another day I guess..
> > > Do you mean something like the attached (on top of this patch)?
> > > I dislike the code duplication in mve_vandq_u<mode> which would
> > > become a copy of and<mode>3_neon.
> > Hi Christophe,
> >
> > Yeah that's what I meant. I agree with the code duplication. The reason
> > we still use separate ones is because of the difference in supported
> > modes. Maybe the right way around that would be to redefine VDQ as:
> >
> > (define_mode_iterator VDQ [(V8QI "TARGET_HAVE_NEON") V16QI
> >                                                       (V4HI
> > "TARGET_HAVE_NEON") V8HI
> >                                                       (V2SI
> > "TARGET_HAVE_NEON") V4SI
> >                                                       (V4HF
> > "TARGET_HAVE_NEON") V8HF
> >                                                       (V2SF
> > "TARGET_HAVE_NEON") V4SF V2DI])
> >
> > And have a single define_expand and insn for both vector extensions.
>
> Indeed, I can try that.
> I have also noticed the VNIM1 / VNINOTM1 pair.
>
> > Though we would also need to do something about the type attribut in
> > case we want to have different scheduling types for both. Though right
> > now we don't do any scheduling for MVE, I don't know whether these can
> > be conditionalized on target features though, something to look at.
> >
> > >
> > > The other concern is that it's not exercised by testcase: as you noted
> > > the compiler uses a pair of loads to prepare the second operand.
> > >
> > > But indeed that's probably a separate problem.
> > >
> > > I guess your comments apply to patch 2 (vorr)?
> >
> > Yeah, forgot to reply to that one, but yes! I still need to have a look
> > at 4-7.
>
> Ok thanks, I have a WIP fix for #7 (vmvn) anyway.

And I never sent #7 because I knew it wasn't ready yet :-)

>
> > Kind regards,
> > Andre
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-30 10:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 13:54 [PATCH 1/7] arm: Auto-vectorization for MVE: vand Christophe Lyon
2020-11-25 13:54 ` [PATCH 2/7] arm: Auto-vectorization for MVE: vorr Christophe Lyon
2020-11-25 13:54 ` [PATCH 3/7] arm: Auto-vectorization for MVE: veor Christophe Lyon
2020-11-26 10:46   ` Andre Vieira (lists)
2020-11-25 13:54 ` [PATCH 4/7] arm: Auto-vectorization for MVE: vshl Christophe Lyon
2020-11-25 13:54 ` [PATCH 5/7] arm: Auto-vectorization for MVE: vshr Christophe Lyon
2020-11-25 13:54 ` [PATCH 6/7] arm: Auto-vectorization for MVE: vmvn Christophe Lyon
2020-11-25 17:17 ` [PATCH 1/7] arm: Auto-vectorization for MVE: vand Andre Simoes Dias Vieira
2020-11-26 15:31   ` Christophe Lyon
2020-11-27 14:13     ` Andre Vieira (lists)
2020-11-27 15:29       ` Christophe Lyon
2020-11-30 10:39         ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).