RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "rguenther@suse.de" <rguenther@suse.de>,
	"jeffreyalaw@gmail.com" <jeffreyalaw@gmail.com>
Subject: RE: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions.
Date: Mon, 16 May 2022 13:02:55 +0000	[thread overview]
Message-ID: <VI1PR08MB5325AEC08030B2414491AF0AFFCF9@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <mpta6bhzn5c.fsf@arm.com>

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Monday, May 16, 2022 1:18 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> jeffreyalaw@gmail.com
> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide
> the method of argument promotions.
> 
> Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > Tamar Christina <Tamar.Christina@arm.com> writes:
> >>> -----Original Message-----
> >>> From: Richard Sandiford <richard.sandiford@arm.com>
> >>> Sent: Monday, May 16, 2022 12:36 PM
> >>> To: Tamar Christina <Tamar.Christina@arm.com>
> >>> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; rguenther@suse.de;
> >>> jeffreyalaw@gmail.com
> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the
> >>> target decide the method of argument promotions.
> >>>
> >>> Tamar Christina <tamar.christina@arm.com> writes:
> >>> > Hi All,
> >>> >
> >>> > Some targets require function parameters to be promoted to a
> >>> > different type on expand time because the target may not have
> >>> > native instructions to work on such types.  As an example the
> >>> > AArch64 port does not have native instructions working on integer
> >>> > 8- or 16-bit values.  As such it promotes every parameter of these
> types to 32-bits.
> >>>
> >>> This doesn't seem specific to parameters though.  It applies to any
> >>> 8- or 16-bit variable.  E.g.:
> >>>
> >>> #include <stdint.h>
> >>> uint8_t foo(uint32_t x, uint32_t y) {
> >>>     uint8_t z = x != 0 ? x : y;
> >>>     return z + 1;
> >>> }
> >>>
> >>> generates:
> >>>
> >>> foo:
> >>>         cmp     w0, 0
> >>>         and     w1, w1, 255
> >>>         and     w0, w0, 255
> >>>         csel    w0, w1, w0, eq
> >>>         add     w0, w0, 1
> >>>         ret
> >>>
> >>> So I think the new behaviour is really a modification of the
> >>> PROMOTE_MODE behaviour rather than the
> PROMOTE_FUNCTION_MODE behaviour.
> >>>
> >>> FWIW, I agree with Richard that it would be better not to add a new
> hook.
> >>> I think we're really making PROMOTE_MODE choose between
> SIGN_EXTEND,
> >>> ZERO_EXTEND or SUBREG (what LLVM would call “any
> >>> extend”) rather than the current SIGN_EXTEND vs. ZERO_EXTEND
> choice.
> >>
> >> Ah, I hadn't realized this also applied to locals.. ok I can modify
> >> PROMOTE_MODE then, but I also need the actual SSA_NAME and not just
> the type so will have to pass this along.
> >>
> >> From a practical point of view.. the actual hook however is
> >> implemented by 34 targets, would I need to CC maintainers for each of
> >> them or would global maintainer approval suffice for these mostly
> mechanical changes?
> >
> > Yeah, single approval should be enough mechanical changes.  It would
> > be good to do the interface change and mechanical target changes as a
> > separate prepatch if possible though.
> >
> > I'm not sure about passing the SSA name to the target though, or the
> > way that the aarch64 hook uses the info.  It looks like a single cold
> > comparison could defeat the optimisation for hot code.

I'm not sure I follow why the likelihood of the comparison matters in this case..
I'll expand on it below..

> >
> > If we do try to make the decision based on uses at expand time, it
> > might be better for the analysis to be in target-independent code,
> > with help from the target to decide where extensions are cheap.  It
> > still feels a bit hacky though.

I thought about it but can't see most target having this problem. I did go
with an optimistic heuristics. There are of course various ways to defeat it
but looking through the corpus of code I don't see any but the simple cases
in practice. (more below).

> >
> > What stops us from forming cbz/cbnz when the extension is done close
> > to the comparison (from the comment in 2/3)?  If we can solve that,
> > could we simply do an any-extend all the time, and treat removing
> > redundant extensions as a global availability problem?
> 

In such cases there's no gain from doing the extension at all, e.g.
and w0, w0, 255
cmp w0, 0
b.eq .Lfoo

will be optimized to

tst w0, 0xff
b.ne .Lfoo

already.

In RTL the problem occurs when you have nested control flow like nested if and switch statements
The example in 2/3 was intended to show that before what we'd do is

and w22, w0, 255
.... <code that clobbers cc and caller saves>
<switch1>
cbz w22, .Lfoo1
....
<switch2>
cbz w22, .Lfoo2

If we have a single comparison we already sink the zero_extend today.

Now if we instead any-extend w0 we end up with:

mov w22, w0
.... <code that clobbers cc and caller saves>
<switch1>
tst w22, 0xff
b.ne .Lfoo1
....
<switch2>
tst w22, 0xff
b.ne .Lfoo2

So you get an additional tst before each branch. You also can't perform the tst higher since CC is clobbered.
The cbz is nice because the zero extend doesn't use CC of course and so having the value live allows us to optimize
The branch.

I don't think branch likeliness matters here as you must keep w22 live in both cases somehow. In the SPECCPU 2017
Benchmark perlbench (which uses a lot of nested switches) this pattern is responsible for an extra 0.3% codesize increase
which the approach in 2/3 prevents.

> (which would run after combine)
> 
> >
> > What kind of code do we emit when do an extension just before an
> > operation?  If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then it
> > should be safe to do the extension directly into R:
> >
> >   (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X))))
> 
> Oops, that should of course be:
> 
>   (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R))))
> 
> > which avoids the problem of having two values live at once (the
> > zero-extended value and the any-extended value).

I'm not sure it does, as the any-extended value must remain live. i.e. above you can't get rid of w22,
you can only choose between having it be zero of any extended.  But I am not sure how you would do
that after expand.

Kind Regards,
Tamar

> >
> > Having identical instructions for each extension would also make it
> > easier for any global pass to move them and remove redundancies.
> >
> > Thanks,
> > Richard

next prev parent reply	other threads:[~2022-05-16 13:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-13 17:11 Tamar Christina
2022-05-13 17:11 ` [PATCH 2/3]AArch64 Promote function arguments using a paradoxical subreg when beneficial Tamar Christina
2022-10-27  3:15   ` Andrew Pinski
2022-10-28  9:57     ` Tamar Christina
2022-05-13 17:12 ` [PATCH 3/3]AArch64 Update the testsuite to remove xfails Tamar Christina
2022-05-16  6:31 ` [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions Richard Biener
2022-05-16  8:26   ` Tamar Christina
2022-05-16 11:36 ` Richard Sandiford
2022-05-16 11:49   ` Tamar Christina
2022-05-16 12:14     ` Richard Sandiford
2022-05-16 12:18       ` Richard Sandiford
2022-05-16 13:02         ` Tamar Christina [this message]
2022-05-16 13:24           ` Richard Sandiford
2022-05-16 15:29             ` Tamar Christina
2022-05-16 16:48               ` Richard Sandiford
2022-05-17  7:55                 ` Tamar Christina
2022-05-17  9:03                   ` Richard Sandiford
2022-05-17 17:45                     ` Tamar Christina
2022-05-18  7:49                       ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB5325AEC08030B2414491AF0AFFCF9@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).