From: Tamar Christina <tamar.christina@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com,
Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com
Subject: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
Date: Thu, 15 Jul 2021 17:39:58 +0100 [thread overview]
Message-ID: <20210715163953.GA2861@arm.com> (raw)
In-Reply-To: <patch-14658-tamar@arm.com>
[-- Attachment #1: Type: text/plain, Size: 4050 bytes --]
Hi All,
There's a slight mismatch between the vectorizer optabs and the intrinsics
patterns for NEON. The vectorizer expects operands[3] and operands[0] to be
the same but the aarch64 intrinsics expanders expect operands[0] and
operands[1] to be the same.
This means we need different patterns here. This adds a separate usdot
vectorizer pattern which just shuffles around the RTL params.
There's also an inconsistency between the usdot and (u|s)dot intrinsics RTL
patterns which is not corrected here.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (usdot_prod<vsi2qi>): Rename to...
(aarch64_usdot<vsi2qi>): ..This
(usdot_prod<vsi2qi>): New.
* config/aarch64/arm_neon.h (vusdot_s32, vusdotq_s32): Use
aarch64_usdot<vsi2qi>.
* config/aarch64/aarch64-simd-builtins.def: Likewise.
--- inline copy of patch --
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
- /* Implemented by <sur><dotprod>_prod<dot_mode>. */
+ /* Implemented by aarch64_<sur><dotprod>{_lane}{q}<dot_mode>. */
BUILTIN_VB (TERNOP, sdot, 0, NONE)
BUILTIN_VB (TERNOPU, udot, 0, NONE)
- BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
- /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>. */
+ BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
;; These instructions map to the __builtins for the armv8.6a I8MM usdot
;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
+(define_insn "aarch64_usdot<vsi2qi>"
[(set (match_operand:VS 0 "register_operand" "=w")
(plus:VS
(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "<sur>dot_prod<vsi2qi>"
DONE;
})
+;; Auto-vectorizer pattern for usdot. The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod<vsi2qi>"
+ [(set (match_operand:VS 0 "register_operand")
+ (plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+ (match_operand:<VSI2QI> 2 "register_operand")]
+ UNSPEC_USDOT)
+ (match_operand:VS 3 "register_operand")))]
+ "TARGET_I8MM"
+)
+
;; These instructions map to the __builtins for the Dot Product
;; indexed operations.
(define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
{
- return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+ return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
}
__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
{
- return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+ return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
}
__extension__ extern __inline int32x2_t
--
[-- Attachment #2: rb14659.patch --]
[-- Type: text/x-diff, Size: 3125 bytes --]
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 063f503ebd96657f017dfaa067cb231991376bda..ac5d4fc7ff1e61d404e66193b629986382ee4ffd 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -374,11 +374,10 @@
BUILTIN_VSDQ_I_DI (BINOP, srshl, 0, NONE)
BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0, NONE)
- /* Implemented by <sur><dotprod>_prod<dot_mode>. */
+ /* Implemented by aarch64_<sur><dotprod>{_lane}{q}<dot_mode>. */
BUILTIN_VB (TERNOP, sdot, 0, NONE)
BUILTIN_VB (TERNOPU, udot, 0, NONE)
- BUILTIN_VB (TERNOP_SSUS, usdot_prod, 10, NONE)
- /* Implemented by aarch64_<sur><dotprod>_lane{q}<dot_mode>. */
+ BUILTIN_VB (TERNOP_SSUS, usdot, 0, NONE)
BUILTIN_VB (QUADOP_LANE, sdot_lane, 0, NONE)
BUILTIN_VB (QUADOPU_LANE, udot_lane, 0, NONE)
BUILTIN_VB (QUADOP_LANE, sdot_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 74890989cb3045798bf8d0241467eaaf72238297..7397f1ec5ca0cb9e3cdd5c46772f604e640666e4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -601,7 +601,7 @@ (define_insn "aarch64_<sur>dot<vsi2qi>"
;; These instructions map to the __builtins for the armv8.6a I8MM usdot
;; (vector) Dot Product operation.
-(define_insn "usdot_prod<vsi2qi>"
+(define_insn "aarch64_usdot<vsi2qi>"
[(set (match_operand:VS 0 "register_operand" "=w")
(plus:VS
(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
@@ -648,6 +648,17 @@ (define_expand "<sur>dot_prod<vsi2qi>"
DONE;
})
+;; Auto-vectorizer pattern for usdot. The operand[3] and operand[0] are the
+;; RMW parameters that when it comes to the vectorizer.
+(define_expand "usdot_prod<vsi2qi>"
+ [(set (match_operand:VS 0 "register_operand")
+ (plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand")
+ (match_operand:<VSI2QI> 2 "register_operand")]
+ UNSPEC_USDOT)
+ (match_operand:VS 3 "register_operand")))]
+ "TARGET_I8MM"
+)
+
;; These instructions map to the __builtins for the Dot Product
;; indexed operations.
(define_insn "aarch64_<sur>dot_lane<vsi2qi>"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ace4fc7f43e2040a8 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
{
- return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
+ return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
}
__extension__ extern __inline int32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
{
- return __builtin_aarch64_usdot_prodv16qi_ssus (__r, __a, __b);
+ return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
}
__extension__ extern __inline int32x2_t
next prev parent reply other threads:[~2021-07-15 16:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-15 16:39 [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 Tamar Christina
2021-07-15 16:39 ` Tamar Christina [this message]
2021-07-15 19:34 ` [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs Richard Sandiford
2021-07-20 12:34 ` Tamar Christina
2021-07-20 16:15 ` Richard Sandiford
2021-07-22 11:50 ` Tamar Christina
2021-07-22 18:09 ` Richard Sandiford
2021-07-15 16:40 ` [PATCH 3/4]AArch64: correct dot-product RTL patterns for aarch64 Tamar Christina
2021-07-15 19:44 ` Richard Sandiford
2021-07-22 11:51 ` Tamar Christina
2021-07-22 18:11 ` Richard Sandiford
2021-07-23 8:14 ` Tamar Christina
2021-07-26 13:56 ` Richard Sandiford
2021-07-15 16:40 ` [PATCH 4/4][AArch32]: correct dot-product RTL patterns Tamar Christina
2021-07-16 2:20 ` [PATCH 1/4][committed] testsuite: Fix testisms in scalar tests PR101457 H.J. Lu
2021-07-16 8:42 ` Tamar Christina
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210715163953.GA2861@arm.com \
--to=tamar.christina@arm.com \
--cc=Kyrylo.Tkachov@arm.com \
--cc=Marcus.Shawcroft@arm.com \
--cc=Richard.Earnshaw@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).