From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 90C6C384B110
 for <gcc-patches@gcc.gnu.org>; Tue, 20 Jul 2021 16:15:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 90C6C384B110
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4497131B;
 Tue, 20 Jul 2021 09:15:40 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 544093F694;
 Tue, 20 Jul 2021 09:15:39 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs
References: <20210715163953.GA2861@arm.com> <mptczrj9xrn.fsf@arm.com>
 <VI1PR08MB5325F04470C2A82250FABB7AFFE29@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Tue, 20 Jul 2021 17:15:38 +0100
In-Reply-To: <VI1PR08MB5325F04470C2A82250FABB7AFFE29@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina's message of "Tue, 20 Jul 2021 12:34:52 +0000")
Message-ID: <mptim15vu51.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jul 2021 16:15:42 -0000

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Thursday, July 15, 2021 8:35 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw
>> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
>> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics
>> optabs
>>=20
>> Tamar Christina <tamar.christina@arm.com> writes:
>> > Hi All,
>> >
>> > There's a slight mismatch between the vectorizer optabs and the
>> > intrinsics patterns for NEON.  The vectorizer expects operands[3] and
>> > operands[0] to be the same but the aarch64 intrinsics expanders expect
>> > operands[0] and operands[1] to be the same.
>> >
>> > This means we need different patterns here.  This adds a separate
>> > usdot vectorizer pattern which just shuffles around the RTL params.
>> >
>> > There's also an inconsistency between the usdot and (u|s)dot
>> > intrinsics RTL patterns which is not corrected here.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>>=20
>> Couldn't we just change:
>>=20
>> > diff --git a/gcc/config/aarch64/arm_neon.h
>> > b/gcc/config/aarch64/arm_neon.h index
>> >
>> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac
>> e4f
>> > c7f43e2040a8 100644
>> > --- a/gcc/config/aarch64/arm_neon.h
>> > +++ b/gcc/config/aarch64/arm_neon.h
>> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t
>> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> >  vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)  {
>> > -  return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b);
>> > +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
>>=20
>> =E2=80=A6this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) e=
tc.?
>
> Not easily, as I was mentioning before, Neon intrinsics have the assumpti=
on that
> operands[0] and operands[1] are the same. And this goes much further than=
 just
> the header call.
>
> The actual type is determined by the optabs and the C stubs that are gene=
rated.
>
> aarch64_init_simd_builtins which creates the C function stubs starts proc=
essing
> arguments from the end and on non-void functions assumes that the value at
> operands[0] be the return type. So simply moving __r will get it to think=
 that
> the result type should be uint8x8_t.

Yeah, the mode of operand 0 (i.e. the output) determines the return type.
But that mode isn't changing, so the return type will be correct for both
input operand orders.  It works for me locally with:

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6=
4-simd.md
index 88fa5ba5a44..5987d9af7c6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -610,12 +610,12 @@ (define_expand "cmul<conj_op><mode>3"
 ;; and so the vectorizer provides r, in which the result has to be accumul=
ated.
 (define_insn "<sur>dot_prod<vsi2qi>"
   [(set (match_operand:VS 0 "register_operand" "=3Dw")
-	(plus:VS (match_operand:VS 1 "register_operand" "0")
-		(unspec:VS [(match_operand:<VSI2QI> 2 "register_operand" "w")
-			    (match_operand:<VSI2QI> 3 "register_operand" "w")]
-		DOTPROD)))]
+	(plus:VS (unspec:VS [(match_operand:<VSI2QI> 1 "register_operand" "w")
+			     (match_operand:<VSI2QI> 2 "register_operand" "w")]
+			    DOTPROD)
+		 (match_operand:VS 3 "register_operand" "0")))]
   "TARGET_DOTPROD"
-  "<sur>dot\\t%0.<Vtype>, %2.<Vdottype>, %3.<Vdottype>"
+  "<sur>dot\\t%0.<Vtype>, %1.<Vdottype>, %2.<Vdottype>"
   [(set_attr "type" "neon_dot<q>")]
 )
=20
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 597f44ce106..64b6d43a1a0 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -31767,28 +31767,28 @@ __extension__ extern __inline uint32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b)
 {
-  return __builtin_aarch64_udot_prodv8qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r);
 }
=20
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b)
 {
-  return __builtin_aarch64_udot_prodv16qi_uuuu (__r, __a, __b);
+  return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r);
 }
=20
 __extension__ extern __inline int32x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b)
 {
-  return __builtin_aarch64_sdot_prodv8qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r);
 }
=20
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b)
 {
-  return __builtin_aarch64_sdot_prodv16qi (__r, __a, __b);
+  return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r);
 }
=20
 __extension__ extern __inline uint32x2_t

Thanks,
Richard