From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id DDF903857363
 for <gcc-patches@gcc.gnu.org>; Mon, 16 May 2022 13:24:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDF903857363
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6968D1063;
 Mon, 16 May 2022 06:24:07 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8F8083F73D;
 Mon, 16 May 2022 06:24:06 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Mail-Followup-To: Tamar Christina <Tamar.Christina@arm.com>,
 "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 "rguenther\@suse.de" <rguenther@suse.de>,
 "jeffreyalaw\@gmail.com" <jeffreyalaw@gmail.com>, richard.sandiford@arm.com
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>,
 "rguenther\@suse.de" <rguenther@suse.de>,
 "jeffreyalaw\@gmail.com" <jeffreyalaw@gmail.com>
Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
 the method of argument promotions.
References: <patch-15721-tamar@arm.com> <mptzgjhzp3d.fsf@arm.com>
 <VI1PR08MB53258489F144549575D8E233FFCF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 <mptee0tznb5.fsf@arm.com> <mpta6bhzn5c.fsf@arm.com>
 <VI1PR08MB5325AEC08030B2414491AF0AFFCF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
Date: Mon, 16 May 2022 14:24:05 +0100
In-Reply-To: <VI1PR08MB5325AEC08030B2414491AF0AFFCF9@VI1PR08MB5325.eurprd08.prod.outlook.com>
 (Tamar Christina's message of "Mon, 16 May 2022 13:02:55 +0000")
Message-ID: <mptlev1y5je.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 16 May 2022 13:24:10 -0000

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Monday, May 16, 2022 1:18 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> jeffreyalaw@gmail.com
>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target de=
cide
>> the method of argument promotions.
>>=20
>> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > Tamar Christina <Tamar.Christina@arm.com> writes:
>> >>> -----Original Message-----
>> >>> From: Richard Sandiford <richard.sandiford@arm.com>
>> >>> Sent: Monday, May 16, 2022 12:36 PM
>> >>> To: Tamar Christina <Tamar.Christina@arm.com>
>> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
>> >>> jeffreyalaw@gmail.com
>> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
>> >>> target decide the method of argument promotions.
>> >>>
>> >>> Tamar Christina <tamar.christina@arm.com> writes:
>> >>> > Hi All,
>> >>> >
>> >>> > Some targets require function parameters to be promoted to a
>> >>> > different type on expand time because the target may not have
>> >>> > native instructions to work on such types.  As an example the
>> >>> > AArch64 port does not have native instructions working on integer
>> >>> > 8- or 16-bit values.  As such it promotes every parameter of these
>> types to 32-bits.
>> >>>
>> >>> This doesn't seem specific to parameters though.  It applies to any
>> >>> 8- or 16-bit variable.  E.g.:
>> >>>
>> >>> #include <stdint.h>
>> >>> uint8_t foo(uint32_t x, uint32_t y) {
>> >>>     uint8_t z =3D x !=3D 0 ? x : y;
>> >>>     return z + 1;
>> >>> }
>> >>>
>> >>> generates:
>> >>>
>> >>> foo:
>> >>>         cmp     w0, 0
>> >>>         and     w1, w1, 255
>> >>>         and     w0, w0, 255
>> >>>         csel    w0, w1, w0, eq
>> >>>         add     w0, w0, 1
>> >>>         ret
>> >>>
>> >>> So I think the new behaviour is really a modification of the
>> >>> PROMOTE_MODE behaviour rather than the
>> PROMOTE_FUNCTION_MODE behaviour.
>> >>>
>> >>> FWIW, I agree with Richard that it would be better not to add a new
>> hook.
>> >>> I think we're really making PROMOTE_MODE choose between
>> SIGN_EXTEND,
>> >>> ZERO_EXTEND or SUBREG (what LLVM would call =E2=80=9Cany
>> >>> extend=E2=80=9D) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
>> choice.
>> >>
>> >> Ah, I hadn't realized this also applied to locals.. ok I can modify
>> >> PROMOTE_MODE then, but I also need the actual SSA_NAME and not just
>> the type so will have to pass this along.
>> >>
>> >> From a practical point of view.. the actual hook however is
>> >> implemented by 34 targets, would I need to CC maintainers for each of
>> >> them or would global maintainer approval suffice for these mostly
>> mechanical changes?
>> >
>> > Yeah, single approval should be enough mechanical changes.  It would
>> > be good to do the interface change and mechanical target changes as a
>> > separate prepatch if possible though.
>> >
>> > I'm not sure about passing the SSA name to the target though, or the
>> > way that the aarch64 hook uses the info.  It looks like a single cold
>> > comparison could defeat the optimisation for hot code.
>
> I'm not sure I follow why the likelihood of the comparison matters in thi=
s case..
> I'll expand on it below..

I meant the likelihood that the comparison is executed at all,
not which outcome is more likely.  E.g. suppose the only comparison
occurs on a failure path that eventually calls abort, and that there are
other paths (without comparisons of the same value) that would benefit
from the any-extend optimisation.  We'd prioritise the cold comparison
over optimising the other (hot) code.

I'm just suspicious of heuristics along the lines of =E2=80=9Cdon't do X
if there is a single instance of Y=E2=80=9D. :-)

>> > If we do try to make the decision based on uses at expand time, it
>> > might be better for the analysis to be in target-independent code,
>> > with help from the target to decide where extensions are cheap.  It
>> > still feels a bit hacky though.
>
> I thought about it but can't see most target having this problem. I did go
> with an optimistic heuristics. There are of course various ways to defeat=
 it
> but looking through the corpus of code I don't see any but the simple cas=
es
> in practice. (more below).
>
>> >
>> > What stops us from forming cbz/cbnz when the extension is done close
>> > to the comparison (from the comment in 2/3)?  If we can solve that,
>> > could we simply do an any-extend all the time, and treat removing
>> > redundant extensions as a global availability problem?
>>=20
>
> In such cases there's no gain from doing the extension at all, e.g.
> and w0, w0, 255
> cmp w0, 0
> b.eq .Lfoo
>
> will be optimized to
>
> tst w0, 0xff
> b.ne .Lfoo
>
> already.
>
> In RTL the problem occurs when you have nested control flow like nested i=
f and switch statements
> The example in 2/3 was intended to show that before what we'd do is
>
> and w22, w0, 255
> .... <code that clobbers cc and caller saves>
> <switch1>
> cbz w22, .Lfoo1
> ....
> <switch2>
> cbz w22, .Lfoo2
>
> If we have a single comparison we already sink the zero_extend today.
>
> Now if we instead any-extend w0 we end up with:
>
> mov w22, w0
> .... <code that clobbers cc and caller saves>
> <switch1>
> tst w22, 0xff
> b.ne .Lfoo1
> ....
> <switch2>
> tst w22, 0xff
> b.ne .Lfoo2
>
> So you get an additional tst before each branch. You also can't perform t=
he tst higher since CC is clobbered.
> The cbz is nice because the zero extend doesn't use CC of course and so h=
aving the value live allows us to optimize
> The branch.

Once the cbz has been formed (in combine), where does the optimisation
of it happen?

> I don't think branch likeliness matters here as you must keep w22 live in=
 both cases somehow. In the SPECCPU 2017
> Benchmark perlbench (which uses a lot of nested switches) this pattern is=
 responsible for an extra 0.3% codesize increase
> which the approach in 2/3 prevents.
>
>> (which would run after combine)
>>=20
>> >
>> > What kind of code do we emit when do an extension just before an
>> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then it
>> > should be safe to do the extension directly into R:
>> >
>> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
>>=20
>> Oops, that should of course be:
>>=20
>>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
>>=20
>> > which avoids the problem of having two values live at once (the
>> > zero-extended value and the any-extended value).
>
> I'm not sure it does, as the any-extended value must remain live. i.e. ab=
ove you can't get rid of w22,
> you can only choose between having it be zero of any extended.  But I am =
not sure how you would do
> that after expand.

These per-operation extends are emitted during expand.  The question is
whether we do them into fresh registers:

   (set (reg:SI Rtmp) (zero_extend:SI (subreg:QI (reg:SI R))))

which leaves both R and Rtmp live at the same time, or whether we
do them in-situ:

   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))

Expand should know that the latter is valid, given the DECL_RTL.

Thanks,
Richard