From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 67BD43854803
 for <gcc-patches@gcc.gnu.org>; Thu, 22 Jul 2021 17:16:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 67BD43854803
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 04B8E106F;
 Thu, 22 Jul 2021 10:16:40 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 687A03F694;
 Thu, 22 Jul 2021 10:16:39 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Jonathan Wright <Jonathan.Wright@arm.com>
Mail-Followup-To: Jonathan Wright <Jonathan.Wright@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
Subject: Re: [PATCH] aarch64: Don't include vec_select in SIMD multiply cost
References: <DB9PR08MB69592E736E56DA740E93EB56EBE29@DB9PR08MB6959.eurprd08.prod.outlook.com>
Date: Thu, 22 Jul 2021 18:16:38 +0100
In-Reply-To: <DB9PR08MB69592E736E56DA740E93EB56EBE29@DB9PR08MB6959.eurprd08.prod.outlook.com>
 (Jonathan Wright's message of "Tue, 20 Jul 2021 11:46:39 +0100")
Message-ID: <mptbl6utgjt.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Jul 2021 17:16:42 -0000

Jonathan Wright <Jonathan.Wright@arm.com> writes:
> Hi,
>
> The Neon multiply/multiply-accumulate/multiply-subtract instructions
> can take various forms - multiplying full vector registers of values
> or multiplying one vector by a single element of another. Regardless
> of the form used, these instructions have the same cost, and this
> should be reflected by the RTL cost function.
>
> This patch adds RTL tree traversal in the Neon multiply cost function
> to match the vec_select used by the lane-referencing forms of the
> instructions already mentioned. This traversal prevents the cost of
> the vec_select from being added into the cost of the multiply -
> meaning that these instructions can now be emitted in the combine
> pass as they are no longer deemed prohibitively expensive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-19  Jonathan Wright  <jonathan.wright@arm.com>
>
>         * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Traverse
>         RTL tree to prevents vec_select from being added into Neon
>         multiply cost.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index f5b25a7f7041645921e6ad85714efda73b993492..b368303b0e699229266e6d008=
e28179c496bf8cd 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -11985,6 +11985,21 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code=
, int outer, bool speed)
>  	    op0 =3D XEXP (op0, 0);
>  	  else if (GET_CODE (op1) =3D=3D VEC_DUPLICATE)
>  	    op1 =3D XEXP (op1, 0);
> +	  /* The same argument applies to the VEC_SELECT when using the lane-
> +	     referencing forms of the MUL/MLA/MLS instructions. Without the
> +	     traversal here, the combine pass deems these patterns too
> +	     expensive and subsequently does not emit the lane-referencing
> +	     forms of the instructions. In addition, canonical form is for the
> +	     VEC_SELECT to be the second argument of the multiply - thus only
> +	     op1 is traversed.  */
> +	  if (GET_CODE (op1) =3D=3D VEC_SELECT
> +	      && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1)
> +	    op1 =3D XEXP (op1, 0);
> +	  else if ((GET_CODE (op1) =3D=3D ZERO_EXTEND
> +		    || GET_CODE (op1) =3D=3D SIGN_EXTEND)
> +		   && GET_CODE (XEXP (op1, 0)) =3D=3D VEC_SELECT
> +		   && GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1)
> +	    op1 =3D XEXP (XEXP (op1, 0), 0);

I think this logically belongs in the =E2=80=9CGET_CODE (op1) =3D=3D VEC_DU=
PLICATE=E2=80=9D
if block, since the condition is never true otherwise.  We can probably
skip the GET_MODE_NUNITS tests, but if you'd prefer to keep them, I think
it would be better to add them to the existing VEC_DUPLICATE tests rather
than restrict them to the VEC_SELECT ones.

Also, although this is in Advanced SIMD-specific code, I think it'd be
better to use:

  is_a<scalar_mode> (GET_MODE (op1))

instead of:

  GET_MODE_NUNITS (GET_MODE (op1)).to_constant () =3D=3D 1

Do you have a testcase?

Thanks,
Richard