From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 53E113858C39
 for <gcc-patches@gcc.gnu.org>; Sat, 23 Oct 2021 10:39:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53E113858C39
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A66221FB;
 Sat, 23 Oct 2021 03:39:49 -0700 (PDT)
Received: from localhost (unknown [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E1D03F70D;
 Sat, 23 Oct 2021 03:39:47 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
 Tamar Christina <Tamar.Christina@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>, nd <nd@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>, richard.sandiford@arm.com
Cc: Tamar Christina <Tamar.Christina@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>, nd <nd@arm.com>,
 Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [PATCH 2/2]AArch64: Add better costing for vector constants and
 operations
References: <patch-14774-tamar@arm.com> <mptsfyplk9a.fsf@arm.com>
 <VI1PR08MB53259D8B7FC65B7F193B1589FFCC9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mpt5yvllhsd.fsf@arm.com>
 <VI1PR08MB5325A025B1855D206116FF0AFFCC9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptlf4hjw8z.fsf@arm.com>
 <VI1PR08MB532531694F553DEE5D5A689AFFD49@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Sat, 23 Oct 2021 11:39:46 +0100
In-Reply-To: <VI1PR08MB532531694F553DEE5D5A689AFFD49@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina via Gcc-patches's message of "Wed, 8 Sep 2021
 12:58:15 +0000")
Message-ID: <mptmtn0t47h.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Oct 2021 10:39:52 -0000

Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> I'm still a bit sceptical about treating the high-part cost as lower.
>> ISTM that the subreg cases are the ones that are truly =E2=80=9Cfree=E2=
=80=9D and any others
>> should have a normal cost.  So if CSE handled the subreg case itself (to=
 model
>> how the rtx would actually be generated) then aarch64 code would have to
>> do less work.  I imagine that will be true for other targets as well.
>
> I guess the main problem is that CSE lacks context because it's not until=
 after
> combine that the high part becomes truly "free" when pushed into a high o=
peration.

Yeah.  And the aarch64 code is just being asked to cost the operation
it's given, which could for example come from an existing
aarch64_simd_mov_from_<mode>high.  I think we should try to ensure that
a aarch64_simd_mov_from_<mode>high followed by some arithmetic on the
result is more expensive than the fused operation (when fusing is
possible).

An analogy might be: if the cost code is given:

  (add (reg X) (reg Y))

then, at some later point, the (reg X) might be replaced with a
multiplication, in which case we'd have a MADD operation and the
addition is effectively free.  Something similar would happen if
(reg X) became a shift by a small amount on newer cores, although
I guess then you could argue either that the cost of the add
disappears or that the cost of the shift disappears.

But we shouldn't count ADD as free on the basis that it could be
combined with a multiplication or shift in future.  We have to cost
what we're given.  I think the same thing applies to the high part.

Here we're trying to prevent cse1 from replacing a DUP (lane) with
a MOVI by saying that the DUP is strictly cheaper than the MOVI.
I don't think that's really true though, and the cost tables in the
patch say that DUP is more expensive (rather than less expensive)
than MOVI.

Also, if I've understood correctly, it looks like we'd be relying
on the vget_high of a constant remaining unfolded until RTL cse1.
I think it's likely in future that we'd try to fold vget_high
at the gimple level instead, since that could expose more
optimisations of a different kind.  The gimple optimisers would
then fold vget_high(constant) in a similar way to cse1 does now.

So perhaps we should continue to allow the vget_high(constant)
to be foloded in cse1 and come up with some way of coping with
the folded form.

Thanks,
Richard