From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 67BD43854803 for ; Thu, 22 Jul 2021 17:16:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67BD43854803 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 04B8E106F; Thu, 22 Jul 2021 10:16:40 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 687A03F694; Thu, 22 Jul 2021 10:16:39 -0700 (PDT) From: Richard Sandiford To: Jonathan Wright Mail-Followup-To: Jonathan Wright , "gcc-patches\@gcc.gnu.org" , Kyrylo Tkachov , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , Kyrylo Tkachov Subject: Re: [PATCH] aarch64: Don't include vec_select in SIMD multiply cost References: Date: Thu, 22 Jul 2021 18:16:38 +0100 In-Reply-To: (Jonathan Wright's message of "Tue, 20 Jul 2021 11:46:39 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2021 17:16:42 -0000 Jonathan Wright writes: > Hi, > > The Neon multiply/multiply-accumulate/multiply-subtract instructions > can take various forms - multiplying full vector registers of values > or multiplying one vector by a single element of another. Regardless > of the form used, these instructions have the same cost, and this > should be reflected by the RTL cost function. > > This patch adds RTL tree traversal in the Neon multiply cost function > to match the vec_select used by the lane-referencing forms of the > instructions already mentioned. This traversal prevents the cost of > the vec_select from being added into the cost of the multiply - > meaning that these instructions can now be emitted in the combine > pass as they are no longer deemed prohibitively expensive. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-19 Jonathan Wright > > * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse > RTL tree to prevents vec_select from being added into Neon > multiply cost. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index f5b25a7f7041645921e6ad85714efda73b993492..b368303b0e699229266e6d008= e28179c496bf8cd 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -11985,6 +11985,21 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code= , int outer, bool speed) > op0 =3D XEXP (op0, 0); > else if (GET_CODE (op1) =3D=3D VEC_DUPLICATE) > op1 =3D XEXP (op1, 0); > + /* The same argument applies to the VEC_SELECT when using the lane- > + referencing forms of the MUL/MLA/MLS instructions. Without the > + traversal here, the combine pass deems these patterns too > + expensive and subsequently does not emit the lane-referencing > + forms of the instructions. In addition, canonical form is for the > + VEC_SELECT to be the second argument of the multiply - thus only > + op1 is traversed. */ > + if (GET_CODE (op1) =3D=3D VEC_SELECT > + && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1) > + op1 =3D XEXP (op1, 0); > + else if ((GET_CODE (op1) =3D=3D ZERO_EXTEND > + || GET_CODE (op1) =3D=3D SIGN_EXTEND) > + && GET_CODE (XEXP (op1, 0)) =3D=3D VEC_SELECT > + && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1) > + op1 =3D XEXP (XEXP (op1, 0), 0); I think this logically belongs in the =E2=80=9CGET_CODE (op1) =3D=3D VEC_DU= PLICATE=E2=80=9D if block, since the condition is never true otherwise. We can probably skip the GET_MODE_NUNITS tests, but if you'd prefer to keep them, I think it would be better to add them to the existing VEC_DUPLICATE tests rather than restrict them to the VEC_SELECT ones. Also, although this is in Advanced SIMD-specific code, I think it'd be better to use: is_a (GET_MODE (op1)) instead of: GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1 Do you have a testcase? Thanks, Richard