[PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457
@ 2021-07-15 16:39 Tamar Christina
  2021-07-15 16:39 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Tamar Christina
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Tamar Christina @ 2021-07-15 16:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]

Hi All,

These testcases accidentally contain the wrong signs for the expected values
for the scalar code.  The vector code however is correct.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Committed as a trivial fix.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

	PR middle-end/101457
	* gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
	* gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
	* gcc.dg/vect/vect-reduc-dot-9.c: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;


-- 

[-- Attachment #2: rb14658.patch --]
[-- Type: text/x-diff, Size: 2265 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
index aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
index 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
index febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
@@ -37,7 +37,7 @@ main (void)
 
   SIGNEDNESS_3 char a[N];
   SIGNEDNESS_4 short b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_1 long expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
index cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
@@ -35,8 +35,9 @@ main (void)
 {
   check_vect ();
 
-  SIGNEDNESS_3 char a[N], b[N];
-  int expected = 0x12345;
+  SIGNEDNESS_3 char a[N];
+  SIGNEDNESS_4 char b[N];
+  SIGNEDNESS_1 int expected = 0x12345;
   for (int i = 0; i < N; ++i)
     {
       a[i] = BASE + i * 5;


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
@ 2021-07-15 16:39 ` Tamar Christina
  2021-07-15 19:34   ` Richard Sandiford
  2021-07-15 16:40 ` [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64 Tamar Christina
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-15 16:39 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]

Hi All,

There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

This means we need different patterns here.  This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.

There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Rename to...
	(aarch64_usdot<vsi2qi>): ..This
	(usdot_prod<vsi2qi>): New.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use
	aarch64_usdot<vsi2qi>.
	* config/aarch64/aarch64-simd-builtins.def: Likewise.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
+  /* Implemented by aarch64_<sur><dotprod>{_lane}{q}<dot_mode>.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
 
 ;; These instructions map to the __builtins for the armv8.6a I8MM usdot
 ;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
+(define_insn "aarch64_usdot<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
 	(plus:VS
 	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand")
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+			    (match_operand:<VSI2QI> 2 "register_operand")]
+		 UNSPEC_USDOT)
+		 (match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x2_t


-- 

[-- Attachment #2: rb14659.patch --]
[-- Type: text/x-diff, Size: 3125 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
-  /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
+  /* Implemented by aarch64_<sur><dotprod>{_lane}{q}<dot_mode>.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
-  /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
+  BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
   BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
 
 ;; These instructions map to the __builtins for the armv8.6a I8MM usdot
 ;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
+(define_insn "aarch64_usdot<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
 	(plus:VS
 	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand")
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+			    (match_operand:<VSI2QI> 2 "register_operand")]
+		 UNSPEC_USDOT)
+		 (match_operand:VS 3 "register_operand")))]
+  "TARGET_I8MM"
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
 }
 
 __extension__ extern __inline int32x2_t


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
  2021-07-15 16:39 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Tamar Christina
@ 2021-07-15 16:40 ` Tamar Christina
  2021-07-15 19:44   ` Richard Sandiford
  2021-07-15 16:40 ` [PATCH 4/4][AArch32]: correct dot-product RTL patterns Tamar Christina
  2021-07-16  2:20 ` [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 H.J. Lu
  3 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-15 16:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov,
	richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Correct
	RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+	(plus:VS (match_operand:VS 3 "register_operand")
+		 (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
 			    (match_operand:<VSI2QI> 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				    operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.


-- 

[-- Attachment #2: rb14660.patch --]
[-- Type: text/x-diff, Size: 1149 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+	(plus:VS (match_operand:VS 3 "register_operand")
+		 (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
 			    (match_operand:<VSI2QI> 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+		 DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				    operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
 ;; RMW parameters that when it comes to the vectorizer.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/4][AArch32]: correct dot-product RTL patterns.
  2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
  2021-07-15 16:39 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Tamar Christina
  2021-07-15 16:40 ` [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64 Tamar Christina
@ 2021-07-15 16:40 ` Tamar Christina
  2021-07-16  2:20 ` [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 H.J. Lu
  3 siblings, 0 replies; 16+ messages in thread
From: Tamar Christina @ 2021-07-15 16:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: nd, Ramana.Radhakrishnan, Richard.Earnshaw, nickc, Kyrylo.Tkachov

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

Hi All,

The previous fix for this problem was wrong due to a subtle difference between
where NEON expects the RMW values and where intrinsics expects them.

The insn pattern is modeled after the intrinsics and so needs an expand for
the vectorizer optab to switch the RTL.

However operand[3] is not expected to be written to so the current pattern is
bogus.

Instead we use the expand to shuffle around the RTL.

The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.

arm-none-linux-gnueabihf build is currently broken, the best I could do is
verify on arm-none-eabi but the tests are all marked UNSUPPORTED, but the ICE
is gone for the backend test.

Ok for master? and active branches after some stew?

Thanks,
Tamar

gcc/ChangeLog:

	* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Correct RTL.

--- inline copy of patch -- 
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_<sup>dot_lane<vsi2qi>"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "<sup>dot_prod<vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand")
-	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+	(plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+		    (unspec:VCVTI [(match_operand:<VSI2QI> 1
 							"register_operand")
 				   (match_operand:<VSI2QI> 2
 							"register_operand")]
-		     DOTPROD)
-		    (match_operand:VCVTI 3 "register_operand")))]
+		     DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_neon_<sup>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				 operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod<vsi2qi>"


-- 

[-- Attachment #2: rb14661.patch --]
[-- Type: text/x-diff, Size: 1074 bytes --]

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 8b0a396947cc8e7345f178b926128d7224fb218a..876577fc20daee30ecdf03942c0d81c15bf8fe9a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2954,20 +2954,14 @@ (define_insn "neon_<sup>dot_lane<vsi2qi>"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_expand "<sup>dot_prod<vsi2qi>"
   [(set (match_operand:VCVTI 0 "register_operand")
-	(plus:VCVTI (unspec:VCVTI [(match_operand:<VSI2QI> 1
+	(plus:VCVTI (match_operand:VCVTI 3 "register_operand")
+		    (unspec:VCVTI [(match_operand:<VSI2QI> 1
 							"register_operand")
 				   (match_operand:<VSI2QI> 2
 							"register_operand")]
-		     DOTPROD)
-		    (match_operand:VCVTI 3 "register_operand")))]
+		     DOTPROD)))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_neon_<sup>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				 operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+)
 
 ;; Auto-vectorizer pattern for usdot
 (define_expand "usdot_prod<vsi2qi>"


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-15 16:39 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Tamar Christina
@ 2021-07-15 19:34   ` Richard Sandiford
  2021-07-20 12:34     ` Tamar Christina
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2021-07-15 19:34 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> There's a slight mismatch between the vectorizer optabs and the intrinsics
> patterns for NEON.  The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> This means we need different patterns here.  This adds a separate usdot
> vectorizer pattern which just shuffles around the RTL params.
>
> There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
> patterns which is not corrected here.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

Couldn't we just change:

> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
>  {
> -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);

…this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
I think that's an OK thing to do when the function is named after
an optab rather than an arm_neon.h intrinsic.

Thanks,
Richard

>  }
>  
>  __extension__ extern __inline int32x4_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
>  {
> -  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
> +  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
>  }
>  
>  __extension__ extern __inline int32x2_t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-15 16:40 ` [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64 Tamar Christina
@ 2021-07-15 19:44   ` Richard Sandiford
  2021-07-22 11:51     ` Tamar Christina
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2021-07-15 19:44 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard.Earnshaw, Marcus.Shawcroft, Kyrylo.Tkachov

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> The previous fix for this problem was wrong due to a subtle difference between
> where NEON expects the RMW values and where intrinsics expects them.
>
> The insn pattern is modeled after the intrinsics and so needs an expand for
> the vectorizer optab to switch the RTL.
>
> However operand[3] is not expected to be written to so the current pattern is
> bogus.
>
> Instead we use the expand to shuffle around the RTL.
>
> The vectorizer expects operands[3] and operands[0] to be
> the same but the aarch64 intrinsics expanders expect operands[0] and
> operands[1] to be the same.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and active branches after some stew?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Correct
> 	RTL.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3dacf64ccb5bde 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>"
>  ;; and so the vectorizer provides r, in which the result has to be accumulated.
>  (define_expand "<sur>dot_prod<vsi2qi>"
>    [(set (match_operand:VS 0 "register_operand")
> -	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
> +	(plus:VS (match_operand:VS 3 "register_operand")
> +		 (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
>  			    (match_operand:<VSI2QI> 2 "register_operand")]
> -		 DOTPROD)
> -		(match_operand:VS 3 "register_operand")))]
> +		 DOTPROD)))]
>    "TARGET_DOTPROD"

The canonical plus: operand order was the original one, so I think
it would be better to keep this rtl as-is and instead change
aarch64_<sur>dot<vsi2qi> to:

	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
			     (match_operand:<VSI2QI> 3 "register_operand" "w")]
			    DOTPROD)
		 (match_operand:VS 1 "register_operand" "0"))

Same idea for aarch64_<sur>dot_lane<vsi2qi> and
aarch64_<sur>dot_laneq<vsi2qi>.

Sorry to be awkward…

Thanks,
Richard

> -{
> -  emit_insn (
> -    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
> -				    operands[2]));
> -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> -  DONE;
> -})
> +)
>  
>  ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0] are the
>  ;; RMW parameters that when it comes to the vectorizer.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457
  2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
                   ` (2 preceding siblings ...)
  2021-07-15 16:40 ` [PATCH 4/4][AArch32]: correct dot-product RTL patterns Tamar Christina
@ 2021-07-16  2:20 ` H.J. Lu
  2021-07-16  8:42   ` Tamar Christina
  3 siblings, 1 reply; 16+ messages in thread
From: H.J. Lu @ 2021-07-16  2:20 UTC (permalink / raw)
  To: Tamar Christina; +Cc: GCC Patches, Richard Sandiford, nd

On Thu, Jul 15, 2021 at 9:40 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> These testcases accidentally contain the wrong signs for the expected values
> for the scalar code.  The vector code however is correct.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Committed as a trivial fix.
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
>         PR middle-end/101457
>         * gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
>         * gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
>         * gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
>         * gcc.dg/vect/vect-reduc-dot-9.c: Likewise.
>
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> index aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc1e720928f2d48 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>    check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>    for (int i = 0; i < N; ++i)
>      {
>        a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> index 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac242868592f30028ed2155 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>    check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>    for (int i = 0; i < N; ++i)
>      {
>        a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> index febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015ebf34ca226353d77e 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> @@ -37,7 +37,7 @@ main (void)
>
>    SIGNEDNESS_3 char a[N];
>    SIGNEDNESS_4 short b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_1 long expected = 0x12345;

Does it work with long == int? I still got

FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects
scan-tree-dump-not vect "vect_recog_dot_prod_pattern: detected"
FAIL: gcc.dg/vect/vect-reduc-dot-22.c scan-tree-dump-not vect
"vect_recog_dot_prod_pattern: detected"

with -m32 on Linux/x86-64.

>    for (int i = 0; i < N; ++i)
>      {
>        a[i] = BASE + i * 5;
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> index cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e292b44cc8dd129cc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> @@ -35,8 +35,9 @@ main (void)
>  {
>    check_vect ();
>
> -  SIGNEDNESS_3 char a[N], b[N];
> -  int expected = 0x12345;
> +  SIGNEDNESS_3 char a[N];
> +  SIGNEDNESS_4 char b[N];
> +  SIGNEDNESS_1 int expected = 0x12345;
>    for (int i = 0; i < N; ++i)
>      {
>        a[i] = BASE + i * 5;
>
>
> --


-- 
H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457
  2021-07-16  2:20 ` [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 H.J. Lu
@ 2021-07-16  8:42   ` Tamar Christina
  0 siblings, 0 replies; 16+ messages in thread
From: Tamar Christina @ 2021-07-16  8:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches, Richard Sandiford, nd

> -----Original Message-----
> From: H.J. Lu <hjl.tools@gmail.com>
> Sent: Friday, July 16, 2021 3:21 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Richard Sandiford
> <Richard.Sandiford@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests
> PR101457
> 
> On Thu, Jul 15, 2021 at 9:40 AM Tamar Christina via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > These testcases accidentally contain the wrong signs for the expected
> > values for the scalar code.  The vector code however is correct.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Committed as a trivial fix.
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR middle-end/101457
> >         * gcc.dg/vect/vect-reduc-dot-17.c: Fix signs of scalar code.
> >         * gcc.dg/vect/vect-reduc-dot-18.c: Likewise.
> >         * gcc.dg/vect/vect-reduc-dot-22.c: Likewise.
> >         * gcc.dg/vect/vect-reduc-dot-9.c: Likewise.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > index
> >
> aa269c4d657f65e07e36df7f3fd0098cf3aaf4d0..38f86fe458adcc7ebbbae22f5cc
> 1
> > e720928f2d48 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-17.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >    check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >    for (int i = 0; i < N; ++i)
> >      {
> >        a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > index
> >
> 2b1cc0411c3256ccd876d8b4da18ce4881dc0af9..2e86ebe3c6c6a0da9ac2428685
> 92
> > f30028ed2155 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-18.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >    check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >    for (int i = 0; i < N; ++i)
> >      {
> >        a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > index
> >
> febeb19784c6aaca72dc0871af0d32cc91fa6ea2..0bde43a6cb855ce5edd9015eb
> f34
> > ca226353d77e 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-22.c
> > @@ -37,7 +37,7 @@ main (void)
> >
> >    SIGNEDNESS_3 char a[N];
> >    SIGNEDNESS_4 short b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_1 long expected = 0x12345;
> 
> Does it work with long == int? I still got

Ah no, It requires double widening.  I'll replace it with a long long.

Thanks,
Tamar
> 
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c -flto -ffat-lto-objects scan-tree-dump-
> not vect "vect_recog_dot_prod_pattern: detected"
> FAIL: gcc.dg/vect/vect-reduc-dot-22.c scan-tree-dump-not vect
> "vect_recog_dot_prod_pattern: detected"
> 
> with -m32 on Linux/x86-64.
> 
> >    for (int i = 0; i < N; ++i)
> >      {
> >        a[i] = BASE + i * 5;
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > index
> >
> cbbeedec3bfd0810a8ce8036e6670585d9334924..d1049c96bf1febfc8933622e2
> 92b
> > 44cc8dd129cc 100644
> > --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-9.c
> > @@ -35,8 +35,9 @@ main (void)
> >  {
> >    check_vect ();
> >
> > -  SIGNEDNESS_3 char a[N], b[N];
> > -  int expected = 0x12345;
> > +  SIGNEDNESS_3 char a[N];
> > +  SIGNEDNESS_4 char b[N];
> > +  SIGNEDNESS_1 int expected = 0x12345;
> >    for (int i = 0; i < N; ++i)
> >      {
> >        a[i] = BASE + i * 5;
> >
> >
> > --
> 
> 
> --
> H.J.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-15 19:34   ` Richard Sandiford
@ 2021-07-20 12:34     ` Tamar Christina
  2021-07-20 16:15       ` Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-20 12:34 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov



> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, July 15, 2021 8:35 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
> optabs
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > There's a slight mismatch between the vectorizer optabs and the
> > intrinsics patterns for NEON.  The vectorizer expects operands[3] and
> > operands[0] to be the same but the aarch64 intrinsics expanders expect
> > operands[0] and operands[1] to be the same.
> >
> > This means we need different patterns here.  This adds a separate
> > usdot vectorizer pattern which just shuffles around the RTL params.
> >
> > There's also an inconsistency between the usdot and (u|s)dot
> > intrinsics RTL patterns which is not corrected here.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Couldn't we just change:
> 
> > diff --git a/gcc/config/aarch64/arm_neon.h
> > b/gcc/config/aarch64/arm_neon.h index
> >
> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
> e4f
> > c7f43e2040a8 100644
> > --- a/gcc/config/aarch64/arm_neon.h
> > +++ b/gcc/config/aarch64/arm_neon.h
> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
> 
> …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?

Not easily, as I was mentioning before, Neon intrinsics have the assumption that
operands[0] and operands[1] are the same. And this goes much further than just
the header call.

The actual type is determined by the optabs and the C stubs that are generated.

aarch64_init_simd_builtins which creates the C function stubs starts processing
arguments from the end and on non-void functions assumes that the value at
operands[0] be the return type. So simply moving __r will get it to think that
the result type should be uint8x8_t.

I can bypass this but then have to write a custom expander in expand code to
handle this, but at point, is it really worth it..

Tamar

> I think that's an OK thing to do when the function is named after
> an optab rather than an arm_neon.h intrinsic.
> 
> Thanks,
> Richard
> 
> >  }
> >
> >  __extension__ extern __inline int32x4_t
> >  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >  vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
> >  {
> > -  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
> > +  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
> >  }
> >
> >  __extension__ extern __inline int32x2_t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-20 12:34     ` Tamar Christina
@ 2021-07-20 16:15       ` Richard Sandiford
  2021-07-22 11:50         ` Tamar Christina
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2021-07-20 16:15 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Thursday, July 15, 2021 8:35 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
>> optabs
>> 
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > There's a slight mismatch between the vectorizer optabs and the
>> > intrinsics patterns for NEON.  The vectorizer expects operands[3] and
>> > operands[0] to be the same but the aarch64 intrinsics expanders expect
>> > operands[0] and operands[1] to be the same.
>> >
>> > This means we need different patterns here.  This adds a separate
>> > usdot vectorizer pattern which just shuffles around the RTL params.
>> >
>> > There's also an inconsistency between the usdot and (u|s)dot
>> > intrinsics RTL patterns which is not corrected here.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> 
>> Couldn't we just change:
>> 
>> > diff --git a/gcc/config/aarch64/arm_neon.h
>> > b/gcc/config/aarch64/arm_neon.h index
>> >
>> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
>> e4f
>> > c7f43e2040a8 100644
>> > --- a/gcc/config/aarch64/arm_neon.h
>> > +++ b/gcc/config/aarch64/arm_neon.h
>> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
>> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
>> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
>> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
>> 
>> …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
>
> Not easily, as I was mentioning before, Neon intrinsics have the assumption that
> operands[0] and operands[1] are the same. And this goes much further than just
> the header call.
>
> The actual type is determined by the optabs and the C stubs that are generated.
>
> aarch64_init_simd_builtins which creates the C function stubs starts processing
> arguments from the end and on non-void functions assumes that the value at
> operands[0] be the return type. So simply moving __r will get it to think that
> the result type should be uint8x8_t.

Yeah, the mode of operand 0 (i.e. the output) determines the return type.
But that mode isn't changing, so the return type will be correct for both
input operand orders.  It works for me locally with:

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 88fa5ba5a44..5987d9af7c6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -610,12 +610,12 @@ (define_expand "cmul<conj_op><mode>3"
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
 (define_insn "<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:<VSI2QI> 3 "register_operand" "w")]
-		DOTPROD)))]
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+			     (match_operand:<VSI2QI> 2 "register_operand" "w")]
+			    DOTPROD)
+		 (match_operand:VS 3 "register_operand" "0")))]
   "TARGET_DOTPROD"
-  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
+  "<sur>dot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
   [(set_attr "type" "neon_dot<q>")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 597f44ce106..64b6d43a1a0 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -31767,28 +31767,28 @@ __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
 {
-  return __builtin_aarch64_udot_prodv8qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
 {
-  return __builtin_aarch64_udot_prodv16qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_sdot_prodv8qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_sdot_prodv16qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x2_t

Thanks,
Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-20 16:15       ` Richard Sandiford
@ 2021-07-22 11:50         ` Tamar Christina
  2021-07-22 18:09           ` Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-22 11:50 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 10858 bytes --]

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
	aarch64_types_ternop_suss_qualifiers): New.
	* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
	* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.

--- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 9ed4b72d005799b8984a858f96d4763e7fa5aa39..f6b41d9c200d6300dee65ba60ae94488231a8a38 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -209,6 +209,10 @@ static enum aarch64_type_qualifiers
 aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
 #define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_suss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_TERNOP_SUSS (aarch64_types_ternop_suss_qualifiers)
 
 
 static enum aarch64_type_qualifiers
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b7f1237b1ffd0d4ca283c853be1cc94b9fc35260..3bb45a82945b143497035ec30d35543b2dad55a3 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -377,7 +377,7 @@
   /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
+  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
   /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7332a735d35846e0d9375ad2686ed7ecdb09cd29..bf667b99944e3fcce618a21c77bd5b804b3a0b5d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -599,20 +599,6 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
   [(set_attr "type" "neon_dot<q>")]
 )
 
-;; These instructions map to the __builtins for the armv8.6a I8MM usdot
-;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS
-	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-		      (match_operand:<VSI2QI> 3 "register_operand" "w")]
-	  UNSPEC_USDOT)
-	  (match_operand:VS 1 "register_operand" "0")))]
-  "TARGET_I8MM"
-  "usdot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
-  [(set_attr "type" "neon_dot<q>")]
-)
-
 ;; These expands map to the Dot Product optab the vectorizer checks for.
 ;; The auto-vectorizer expects a dot product builtin that also does an
 ;; accumulation into the provided register.
@@ -648,6 +634,20 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot
+;; (vector) Dot Product operation and the vectorized optab.
+(define_insn "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand" "=w")
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+		      (match_operand:<VSI2QI> 2 "register_operand" "w")]
+	  UNSPEC_USDOT)
+	  (match_operand:VS 3 "register_operand" "0")))]
+  "TARGET_I8MM"
+  "usdot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
+  [(set_attr "type" "neon_dot<q>")]
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 1048d7c7eaac14554142eaa7544159a50929b7f1..8396e872580bc9fb32b872f3915485b02ec2b334 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34021,14 +34021,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv8qi_suss (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv16qi_suss (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Tuesday, July 20, 2021 5:16 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
> optabs
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Thursday, July 15, 2021 8:35 PM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> >> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov
> <Kyrylo.Tkachov@arm.com>
> >> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and
> >> intrinsics optabs
> >>
> >> Tamar Christina <tamar.christina@arm.com> writes:
> >> > Hi All,
> >> >
> >> > There's a slight mismatch between the vectorizer optabs and the
> >> > intrinsics patterns for NEON.  The vectorizer expects operands[3]
> >> > and operands[0] to be the same but the aarch64 intrinsics expanders
> >> > expect operands[0] and operands[1] to be the same.
> >> >
> >> > This means we need different patterns here.  This adds a separate
> >> > usdot vectorizer pattern which just shuffles around the RTL params.
> >> >
> >> > There's also an inconsistency between the usdot and (u|s)dot
> >> > intrinsics RTL patterns which is not corrected here.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >>
> >> Couldn't we just change:
> >>
> >> > diff --git a/gcc/config/aarch64/arm_neon.h
> >> > b/gcc/config/aarch64/arm_neon.h index
> >> >
> >>
> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
> >> e4f
> >> > c7f43e2040a8 100644
> >> > --- a/gcc/config/aarch64/arm_neon.h
> >> > +++ b/gcc/config/aarch64/arm_neon.h
> >> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
> >> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> >> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
> >> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
> >> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
> >>
> >> …this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) etc.?
> >
> > Not easily, as I was mentioning before, Neon intrinsics have the
> > assumption that operands[0] and operands[1] are the same. And this
> > goes much further than just the header call.
> >
> > The actual type is determined by the optabs and the C stubs that are
> generated.
> >
> > aarch64_init_simd_builtins which creates the C function stubs starts
> > processing arguments from the end and on non-void functions assumes
> > that the value at operands[0] be the return type. So simply moving __r
> > will get it to think that the result type should be uint8x8_t.
> 
> Yeah, the mode of operand 0 (i.e. the output) determines the return type.
> But that mode isn't changing, so the return type will be correct for both input
> operand orders.  It works for me locally with:
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 88fa5ba5a44..5987d9af7c6 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -610,12 +610,12 @@ (define_expand "cmul<conj_op><mode>3"
>  ;; and so the vectorizer provides r, in which the result has to be accumulated.
>  (define_insn "<sur>dot_prod<vsi2qi>"
>    [(set (match_operand:VS 0 "register_operand" "=w")
> -	(plus:VS (match_operand:VS 1 "register_operand" "0")
> -		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand"
> "w")
> -			    (match_operand:<VSI2QI> 3 "register_operand"
> "w")]
> -		DOTPROD)))]
> +	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1
> "register_operand" "w")
> +			     (match_operand:<VSI2QI> 2 "register_operand"
> "w")]
> +			    DOTPROD)
> +		 (match_operand:VS 3 "register_operand" "0")))]
>    "TARGET_DOTPROD"
> -  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
> +  "<sur>dot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
>    [(set_attr "type" "neon_dot<q>")]
>  )
> 
> diff --git a/gcc/config/aarch64/arm_neon.h
> b/gcc/config/aarch64/arm_neon.h index 597f44ce106..64b6d43a1a0 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -31767,28 +31767,28 @@ __extension__ extern __inline uint32x2_t
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)  {
> -  return __builtin_aarch64_udot_prodv8qi_uuuu (__r, __a, __b);
> +  return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r);
>  }
> 
>  __extension__ extern __inline uint32x4_t  __attribute__
> ((__always_inline__, __gnu_inline__, __artificial__))
>  vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)  {
> -  return __builtin_aarch64_udot_prodv16qi_uuuu (__r, __a, __b);
> +  return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
>  }
> 
>  __extension__ extern __inline int32x2_t  __attribute__
> ((__always_inline__, __gnu_inline__, __artificial__))
>  vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)  {
> -  return __builtin_aarch64_sdot_prodv8qi (__r, __a, __b);
> +  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
>  }
> 
>  __extension__ extern __inline int32x4_t  __attribute__
> ((__always_inline__, __gnu_inline__, __artificial__))
>  vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)  {
> -  return __builtin_aarch64_sdot_prodv16qi (__r, __a, __b);
> +  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
>  }
> 
>  __extension__ extern __inline uint32x2_t
> 
> Thanks,
> Richard

[-- Attachment #2: rb14659.patch --]
[-- Type: application/octet-stream, Size: 4289 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 9ed4b72d005799b8984a858f96d4763e7fa5aa39..f6b41d9c200d6300dee65ba60ae94488231a8a38 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -209,6 +209,10 @@ static enum aarch64_type_qualifiers
 aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
 #define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_suss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_TERNOP_SUSS (aarch64_types_ternop_suss_qualifiers)
 
 
 static enum aarch64_type_qualifiers
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b7f1237b1ffd0d4ca283c853be1cc94b9fc35260..3bb45a82945b143497035ec30d35543b2dad55a3 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -377,7 +377,7 @@
   /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
   BUILTIN_VB (TERNOP, sdot, 0, NONE)
   BUILTIN_VB (TERNOPU, udot, 0, NONE)
-  BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
+  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
   /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
   BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7332a735d35846e0d9375ad2686ed7ecdb09cd29..bf667b99944e3fcce618a21c77bd5b804b3a0b5d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -599,20 +599,6 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
   [(set_attr "type" "neon_dot<q>")]
 )
 
-;; These instructions map to the __builtins for the armv8.6a I8MM usdot
-;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS
-	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-		      (match_operand:<VSI2QI> 3 "register_operand" "w")]
-	  UNSPEC_USDOT)
-	  (match_operand:VS 1 "register_operand" "0")))]
-  "TARGET_I8MM"
-  "usdot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
-  [(set_attr "type" "neon_dot<q>")]
-)
-
 ;; These expands map to the Dot Product optab the vectorizer checks for.
 ;; The auto-vectorizer expects a dot product builtin that also does an
 ;; accumulation into the provided register.
@@ -648,6 +634,20 @@ (define_expand "<sur>dot_prod<vsi2qi>"
   DONE;
 })
 
+;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot
+;; (vector) Dot Product operation and the vectorized optab.
+(define_insn "usdot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand" "=w")
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+		      (match_operand:<VSI2QI> 2 "register_operand" "w")]
+	  UNSPEC_USDOT)
+	  (match_operand:VS 3 "register_operand" "0")))]
+  "TARGET_I8MM"
+  "usdot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
+  [(set_attr "type" "neon_dot<q>")]
+)
+
 ;; These instructions map to the __builtins for the Dot Product
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 1048d7c7eaac14554142eaa7544159a50929b7f1..8396e872580bc9fb32b872f3915485b02ec2b334 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34021,14 +34021,14 @@ __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv8qi_suss (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+  return __builtin_aarch64_usdot_prodv16qi_suss (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-15 19:44   ` Richard Sandiford
@ 2021-07-22 11:51     ` Tamar Christina
  2021-07-22 18:11       ` Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-22 11:51 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

[-- Attachment #1: Type: text/plain, Size: 10660 bytes --]

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
	(sdot_prod, udot_prod): ... This.
	* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): Merged
	into...
	(<sur>dot_prod<vsi2qi>): ... this.
	(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
	Change operands order.
	(<sur>sadv16qi): Use new operands order.
	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
	vdotq_s32): Use new RTL ordering.

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3bb45a82945b143497035ec30d35543b2dad55a3..402453aa9bba5949da43c984c4603196b1efd092 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -375,8 +375,8 @@
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
   /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
-  BUILTIN_VB (TERNOP, sdot, 0, NONE)
-  BUILTIN_VB (TERNOPU, udot, 0, NONE)
+  BUILTIN_VB (TERNOP, sdot_prod, 10, NONE)
+  BUILTIN_VB (TERNOPU, udot_prod, 10, NONE)
   BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
   /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bf667b99944e3fcce618a21c77bd5b804b3a0b5d..13c86984df147f2033b81a2a5278252f5ac52779 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -587,19 +587,8 @@ (define_expand "cmul<conj_op><mode>3"
   DONE;
 })
 
-;; These instructions map to the __builtins for the Dot Product operations.
-(define_insn "aarch64_<sur>dot<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:<VSI2QI> 3 "register_operand" "w")]
-		DOTPROD)))]
-  "TARGET_DOTPROD"
-  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
-  [(set_attr "type" "neon_dot<q>")]
-)
-
-;; These expands map to the Dot Product optab the vectorizer checks for.
+;; These expands map to the Dot Product optab the vectorizer checks for
+;; and to the intrinsics patttern.
 ;; The auto-vectorizer expects a dot product builtin that also does an
 ;; accumulation into the provided register.
 ;; Given the following pattern
@@ -619,20 +608,17 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
 ;; ...
 ;;
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
-(define_expand "<sur>dot_prod<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
-			    (match_operand:<VSI2QI> 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+(define_insn "<sur>dot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand" "=w")
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+		      (match_operand:<VSI2QI> 2 "register_operand" "w")]
+		      DOTPROD)
+	  (match_operand:VS 3 "register_operand" "0")))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				    operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+  "<sur>dot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
+  [(set_attr "type" "neon_dot<q>")]
+)
 
 ;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot
 ;; (vector) Dot Product operation and the vectorized optab.
@@ -652,11 +638,12 @@ (define_insn "usdot_prod<vsi2qi>"
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:V8QI 3 "register_operand" "<h_con>")
-			    (match_operand:SI 4 "immediate_operand" "i")]
-		DOTPROD)))]
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
+		      (match_operand:V8QI 3 "register_operand" "<h_con>")
+		      (match_operand:SI 4 "immediate_operand" "i")]
+		      DOTPROD)
+	  (match_operand:VS 1 "register_operand" "0")))]
   "TARGET_DOTPROD"
   {
     operands[4] = aarch64_endian_lane_rtx (V8QImode, INTVAL (operands[4]));
@@ -667,11 +654,12 @@ (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
 
 (define_insn "aarch64_<sur>dot_laneq<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:V16QI 3 "register_operand" "<h_con>")
-			    (match_operand:SI 4 "immediate_operand" "i")]
-		DOTPROD)))]
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
+		      (match_operand:V16QI 3 "register_operand" "<h_con>")
+		      (match_operand:SI 4 "immediate_operand" "i")]
+		      DOTPROD)
+	  (match_operand:VS 1 "register_operand" "0")))]
   "TARGET_DOTPROD"
   {
     operands[4] = aarch64_endian_lane_rtx (V16QImode, INTVAL (operands[4]));
@@ -944,8 +932,7 @@ (define_expand "<sur>sadv16qi"
 	rtx ones = force_reg (V16QImode, CONST1_RTX (V16QImode));
 	rtx abd = gen_reg_rtx (V16QImode);
 	emit_insn (gen_aarch64_<sur>abdv16qi (abd, operands[1], operands[2]));
-	emit_insn (gen_aarch64_udotv16qi (operands[0], operands[3],
-					  abd, ones));
+	emit_insn (gen_udot_prodv16qi (operands[0], abd, ones, operands[3]));
 	DONE;
       }
     rtx reduc = gen_reg_rtx (V8HImode);
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8396e872580bc9fb32b872f3915485b02ec2b334..08bede79ad252b3728fdb278036a4de73696a5db 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -31749,28 +31749,28 @@ __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
 {
-  return __builtin_aarch64_udotv8qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
 {
-  return __builtin_aarch64_udotv16qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_sdotv8qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_sdotv16qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x2_t

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, July 15, 2021 8:45 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for
> aarch64.
> 
> Tamar Christina <tamar.christina@arm.com> writes:
> > Hi All,
> >
> > The previous fix for this problem was wrong due to a subtle difference
> > between where NEON expects the RMW values and where intrinsics
> expects them.
> >
> > The insn pattern is modeled after the intrinsics and so needs an
> > expand for the vectorizer optab to switch the RTL.
> >
> > However operand[3] is not expected to be written to so the current
> > pattern is bogus.
> >
> > Instead we use the expand to shuffle around the RTL.
> >
> > The vectorizer expects operands[3] and operands[0] to be the same but
> > the aarch64 intrinsics expanders expect operands[0] and operands[1] to
> > be the same.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master? and active branches after some stew?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Correct
> > 	RTL.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> 7397f1ec5ca0cb9e3cdd5c46772f604e640666e4..51789f954affd9fa88e2bc1bcc3
> d
> > acf64ccb5bde 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -635,18 +635,12 @@ (define_insn "aarch64_usdot<vsi2qi>"
> >  ;; and so the vectorizer provides r, in which the result has to be
> accumulated.
> >  (define_expand "<sur>dot_prod<vsi2qi>"
> >    [(set (match_operand:VS 0 "register_operand")
> > -	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1
> "register_operand")
> > +	(plus:VS (match_operand:VS 3 "register_operand")
> > +		 (unspec:VS [(match_operand:<VSI2QI> 1
> "register_operand")
> >  			    (match_operand:<VSI2QI> 2 "register_operand")]
> > -		 DOTPROD)
> > -		(match_operand:VS 3 "register_operand")))]
> > +		 DOTPROD)))]
> >    "TARGET_DOTPROD"
> 
> The canonical plus: operand order was the original one, so I think it would be
> better to keep this rtl as-is and instead change aarch64_<sur>dot<vsi2qi> to:
> 
> 	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 2
> "register_operand" "w")
> 			     (match_operand:<VSI2QI> 3 "register_operand"
> "w")]
> 			    DOTPROD)
> 		 (match_operand:VS 1 "register_operand" "0"))
> 
> Same idea for aarch64_<sur>dot_lane<vsi2qi> and
> aarch64_<sur>dot_laneq<vsi2qi>.
> 
> Sorry to be awkward…
> 
> Thanks,
> Richard
> 
> > -{
> > -  emit_insn (
> > -    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
> > -				    operands[2]));
> > -  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> > -  DONE;
> > -})
> > +)
> >
> >  ;; Auto-vectorizer pattern for usdot.  The operand[3] and operand[0]
> > are the  ;; RMW parameters that when it comes to the vectorizer.

[-- Attachment #2: rb14660.patch --]
[-- Type: application/octet-stream, Size: 6579 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3bb45a82945b143497035ec30d35543b2dad55a3..402453aa9bba5949da43c984c4603196b1efd092 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -375,8 +375,8 @@
   BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
 
   /* Implemented by <sur><dotprod>_prod<dot_mode>.  */
-  BUILTIN_VB (TERNOP, sdot, 0, NONE)
-  BUILTIN_VB (TERNOPU, udot, 0, NONE)
+  BUILTIN_VB (TERNOP, sdot_prod, 10, NONE)
+  BUILTIN_VB (TERNOPU, udot_prod, 10, NONE)
   BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)
   /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>.  */
   BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bf667b99944e3fcce618a21c77bd5b804b3a0b5d..13c86984df147f2033b81a2a5278252f5ac52779 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -587,19 +587,8 @@ (define_expand "cmul<conj_op><mode>3"
   DONE;
 })
 
-;; These instructions map to the __builtins for the Dot Product operations.
-(define_insn "aarch64_<sur>dot<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:<VSI2QI> 3 "register_operand" "w")]
-		DOTPROD)))]
-  "TARGET_DOTPROD"
-  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
-  [(set_attr "type" "neon_dot<q>")]
-)
-
-;; These expands map to the Dot Product optab the vectorizer checks for.
+;; These expands map to the Dot Product optab the vectorizer checks for
+;; and to the intrinsics patttern.
 ;; The auto-vectorizer expects a dot product builtin that also does an
 ;; accumulation into the provided register.
 ;; Given the following pattern
@@ -619,20 +608,17 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
 ;; ...
 ;;
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
-(define_expand "<sur>dot_prod<vsi2qi>"
-  [(set (match_operand:VS 0 "register_operand")
-	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
-			    (match_operand:<VSI2QI> 2 "register_operand")]
-		 DOTPROD)
-		(match_operand:VS 3 "register_operand")))]
+(define_insn "<sur>dot_prod<vsi2qi>"
+  [(set (match_operand:VS 0 "register_operand" "=w")
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+		      (match_operand:<VSI2QI> 2 "register_operand" "w")]
+		      DOTPROD)
+	  (match_operand:VS 3 "register_operand" "0")))]
   "TARGET_DOTPROD"
-{
-  emit_insn (
-    gen_aarch64_<sur>dot<vsi2qi> (operands[3], operands[3], operands[1],
-				    operands[2]));
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  DONE;
-})
+  "<sur>dot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
+  [(set_attr "type" "neon_dot<q>")]
+)
 
 ;; These instructions map to the __builtins for the Armv8.6-a I8MM usdot
 ;; (vector) Dot Product operation and the vectorized optab.
@@ -652,11 +638,12 @@ (define_insn "usdot_prod<vsi2qi>"
 ;; indexed operations.
 (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:V8QI 3 "register_operand" "<h_con>")
-			    (match_operand:SI 4 "immediate_operand" "i")]
-		DOTPROD)))]
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
+		      (match_operand:V8QI 3 "register_operand" "<h_con>")
+		      (match_operand:SI 4 "immediate_operand" "i")]
+		      DOTPROD)
+	  (match_operand:VS 1 "register_operand" "0")))]
   "TARGET_DOTPROD"
   {
     operands[4] = aarch64_endian_lane_rtx (V8QImode, INTVAL (operands[4]));
@@ -667,11 +654,12 @@ (define_insn "aarch64_<sur>dot_lane<vsi2qi>"
 
 (define_insn "aarch64_<sur>dot_laneq<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=w")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:V16QI 3 "register_operand" "<h_con>")
-			    (match_operand:SI 4 "immediate_operand" "i")]
-		DOTPROD)))]
+	(plus:VS
+	  (unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
+		      (match_operand:V16QI 3 "register_operand" "<h_con>")
+		      (match_operand:SI 4 "immediate_operand" "i")]
+		      DOTPROD)
+	  (match_operand:VS 1 "register_operand" "0")))]
   "TARGET_DOTPROD"
   {
     operands[4] = aarch64_endian_lane_rtx (V16QImode, INTVAL (operands[4]));
@@ -944,8 +932,7 @@ (define_expand "<sur>sadv16qi"
 	rtx ones = force_reg (V16QImode, CONST1_RTX (V16QImode));
 	rtx abd = gen_reg_rtx (V16QImode);
 	emit_insn (gen_aarch64_<sur>abdv16qi (abd, operands[1], operands[2]));
-	emit_insn (gen_aarch64_udotv16qi (operands[0], operands[3],
-					  abd, ones));
+	emit_insn (gen_udot_prodv16qi (operands[0], abd, ones, operands[3]));
 	DONE;
       }
     rtx reduc = gen_reg_rtx (V8HImode);
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 8396e872580bc9fb32b872f3915485b02ec2b334..08bede79ad252b3728fdb278036a4de73696a5db 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -31749,28 +31749,28 @@ __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
 {
-  return __builtin_aarch64_udotv8qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
 {
-  return __builtin_aarch64_udotv16qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_sdotv8qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_sdotv16qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
 }
 
 __extension__ extern __inline uint32x2_t

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
  2021-07-22 11:50         ` Tamar Christina
@ 2021-07-22 18:09           ` Richard Sandiford
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2021-07-22 18:09 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-builtins.c (TYPES_TERNOP_SUSS,
> 	aarch64_types_ternop_suss_qualifiers): New.
> 	* config/aarch64/aarch64-simd-builtins.def (usdot_prod): Use it.
> 	* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Re-organize RTL.
> 	* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use it.

OK, thanks.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-22 11:51     ` Tamar Christina
@ 2021-07-22 18:11       ` Richard Sandiford
  2021-07-23  8:14         ` Tamar Christina
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Sandiford @ 2021-07-22 18:11 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
> 	(sdot_prod, udot_prod): ... This.
> 	* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>): Merged
> 	into...
> 	(<sur>dot_prod<vsi2qi>): ... this.
> 	(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
> 	Change operands order.
> 	(<sur>sadv16qi): Use new operands order.
> 	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
> 	vdotq_s32): Use new RTL ordering.

OK, thanks.

Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-22 18:11       ` Richard Sandiford
@ 2021-07-23  8:14         ` Tamar Christina
  2021-07-26 13:56           ` Richard Sandiford
  0 siblings, 1 reply; 16+ messages in thread
From: Tamar Christina @ 2021-07-23  8:14 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Hi,

Sorry It looks like I forgot to ask if OK for backport to GCC 9, 10, 11 after some stew.

Thanks,
Tamar

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Thursday, July 22, 2021 7:11 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for
> aarch64.
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
> > 	(sdot_prod, udot_prod): ... This.
> > 	* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>):
> Merged
> > 	into...
> > 	(<sur>dot_prod<vsi2qi>): ... this.
> > 	(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
> > 	Change operands order.
> > 	(<sur>sadv16qi): Use new operands order.
> > 	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
> > 	vdotq_s32): Use new RTL ordering.
> 
> OK, thanks.
> 
> Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64.
  2021-07-23  8:14         ` Tamar Christina
@ 2021-07-26 13:56           ` Richard Sandiford
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2021-07-26 13:56 UTC (permalink / raw)
  To: Tamar Christina
  Cc: gcc-patches, nd, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi,
>
> Sorry It looks like I forgot to ask if OK for backport to GCC 9, 10, 11 after some stew.

Yeah, OK for backports too.

Thanks,
Richard

>
> Thanks,
> Tamar
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Thursday, July 22, 2021 7:11 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 3/4]AArch64: correct dot-product RTL patterns for
>> aarch64.
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> > 	* config/aarch64/aarch64-simd-builtins.def (sdot, udot): Rename to..
>> > 	(sdot_prod, udot_prod): ... This.
>> > 	* config/aarch64/aarch64-simd.md (aarch64_<sur>dot<vsi2qi>):
>> Merged
>> > 	into...
>> > 	(<sur>dot_prod<vsi2qi>): ... this.
>> > 	(aarch64_<sur>dot_lane<vsi2qi>, aarch64_<sur>dot_laneq<vsi2qi>):
>> > 	Change operands order.
>> > 	(<sur>sadv16qi): Use new operands order.
>> > 	* config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32,
>> > 	vdotq_s32): Use new RTL ordering.
>> 
>> OK, thanks.
>> 
>> Richard

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-26 13:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
2021-07-15 16:39 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Tamar Christina
2021-07-15 19:34   ` Richard Sandiford
2021-07-20 12:34     ` Tamar Christina
2021-07-20 16:15       ` Richard Sandiford
2021-07-22 11:50         ` Tamar Christina
2021-07-22 18:09           ` Richard Sandiford
2021-07-15 16:40 ` [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64 Tamar Christina
2021-07-15 19:44   ` Richard Sandiford
2021-07-22 11:51     ` Tamar Christina
2021-07-22 18:11       ` Richard Sandiford
2021-07-23  8:14         ` Tamar Christina
2021-07-26 13:56           ` Richard Sandiford
2021-07-15 16:40 ` [PATCH 4/4][AArch32]: correct dot-product RTL patterns Tamar Christina
2021-07-16  2:20 ` [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 H.J. Lu
2021-07-16  8:42   ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).