From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 90C6C384B110 for ; Tue, 20 Jul 2021 16:15:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 90C6C384B110 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4497131B; Tue, 20 Jul 2021 09:15:40 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 544093F694; Tue, 20 Jul 2021 09:15:39 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , Richard Earnshaw , Marcus Shawcroft , Kyrylo Tkachov Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics optabs References: <20210715163953.GA2861@arm.com> Date: Tue, 20 Jul 2021 17:15:38 +0100 In-Reply-To: (Tamar Christina's message of "Tue, 20 Jul 2021 12:34:52 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2021 16:15:42 -0000 Tamar Christina writes: >> -----Original Message----- >> From: Richard Sandiford >> Sent: Thursday, July 15, 2021 8:35 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; Kyrylo Tkachov >> Subject: Re: [PATCH 2/4]AArch64: correct usdot vectorizer and intrinsics >> optabs >>=20 >> Tamar Christina writes: >> > Hi All, >> > >> > There's a slight mismatch between the vectorizer optabs and the >> > intrinsics patterns for NEON. The vectorizer expects operands[3] and >> > operands[0] to be the same but the aarch64 intrinsics expanders expect >> > operands[0] and operands[1] to be the same. >> > >> > This means we need different patterns here. This adds a separate >> > usdot vectorizer pattern which just shuffles around the RTL params. >> > >> > There's also an inconsistency between the usdot and (u|s)dot >> > intrinsics RTL patterns which is not corrected here. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >>=20 >> Couldn't we just change: >>=20 >> > diff --git a/gcc/config/aarch64/arm_neon.h >> > b/gcc/config/aarch64/arm_neon.h index >> > >> 00d76ea937ace5763746478cbdfadf6479e0b15a..17e059efb80fa86a8a32127ac >> e4f >> > c7f43e2040a8 100644 >> > --- a/gcc/config/aarch64/arm_neon.h >> > +++ b/gcc/config/aarch64/arm_neon.h >> > @@ -34039,14 +34039,14 @@ __extension__ extern __inline int32x2_t >> > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) >> > vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b) { >> > - return __builtin_aarch64_usdot_prodv8qi_ssus (__r, __a, __b); >> > + return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b); >>=20 >> =E2=80=A6this to __builtin_aarch64_usdot_prodv8qi_ssus (__a, __b, __r) e= tc.? > > Not easily, as I was mentioning before, Neon intrinsics have the assumpti= on that > operands[0] and operands[1] are the same. And this goes much further than= just > the header call. > > The actual type is determined by the optabs and the C stubs that are gene= rated. > > aarch64_init_simd_builtins which creates the C function stubs starts proc= essing > arguments from the end and on non-void functions assumes that the value at > operands[0] be the return type. So simply moving __r will get it to think= that > the result type should be uint8x8_t. Yeah, the mode of operand 0 (i.e. the output) determines the return type. But that mode isn't changing, so the return type will be correct for both input operand orders. It works for me locally with: diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch6= 4-simd.md index 88fa5ba5a44..5987d9af7c6 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -610,12 +610,12 @@ (define_expand "cmul3" ;; and so the vectorizer provides r, in which the result has to be accumul= ated. (define_insn "dot_prod" [(set (match_operand:VS 0 "register_operand" "=3Dw") - (plus:VS (match_operand:VS 1 "register_operand" "0") - (unspec:VS [(match_operand: 2 "register_operand" "w") - (match_operand: 3 "register_operand" "w")] - DOTPROD)))] + (plus:VS (unspec:VS [(match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w")] + DOTPROD) + (match_operand:VS 3 "register_operand" "0")))] "TARGET_DOTPROD" - "dot\\t%0., %2., %3." + "dot\\t%0., %1., %2." [(set_attr "type" "neon_dot")] ) =20 diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 597f44ce106..64b6d43a1a0 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -31767,28 +31767,28 @@ __extension__ extern __inline uint32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdot_u32 (uint32x2_t __r, uint8x8_t __a, uint8x8_t __b) { - return __builtin_aarch64_udot_prodv8qi_uuuu (__r, __a, __b); + return __builtin_aarch64_udot_prodv8qi_uuuu (__a, __b, __r); } =20 __extension__ extern __inline uint32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdotq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b) { - return __builtin_aarch64_udot_prodv16qi_uuuu (__r, __a, __b); + return __builtin_aarch64_udot_prodv16qi_uuuu (__a, __b, __r); } =20 __extension__ extern __inline int32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdot_s32 (int32x2_t __r, int8x8_t __a, int8x8_t __b) { - return __builtin_aarch64_sdot_prodv8qi (__r, __a, __b); + return __builtin_aarch64_sdot_prodv8qi (__a, __b, __r); } =20 __extension__ extern __inline int32x4_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vdotq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b) { - return __builtin_aarch64_sdot_prodv16qi (__r, __a, __b); + return __builtin_aarch64_sdot_prodv16qi (__a, __b, __r); } =20 __extension__ extern __inline uint32x2_t Thanks, Richard