public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Richard Biener <rguenther@suse.de>,
	 Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 nd <nd@arm.com>,  Jeff Law <jeffreyalaw@gmail.com>
Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend
Date: Fri, 30 Sep 2022 15:28:22 +0100	[thread overview]
Message-ID: <mptill5ndh5.fsf@arm.com> (raw)
In-Reply-To: <VI1PR08MB5325F5ABDE3FED9B4015958DFF569@VI1PR08MB5325.eurprd08.prod.outlook.com> (Tamar Christina's message of "Fri, 30 Sep 2022 13:48:49 +0100")

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Friday, September 30, 2022 12:53 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
>> <jeffreyalaw@gmail.com>
>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> branches, give hint if argument is a truth type to backend
>>
>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>
>> > > -----Original Message-----
>> > > From: Richard Biener <rguenther@suse.de>
>> > > Sent: Friday, September 30, 2022 11:17 AM
>> > > To: Tamar Christina <Tamar.Christina@arm.com>
>> > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
>> > > via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> Law
>> > > <jeffreyalaw@gmail.com>
>> > > Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> > > branches, give hint if argument is a truth type to backend
>> > >
>> > > On Fri, 30 Sep 2022, Tamar Christina wrote:
>> > >
>> > > >
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: Richard Sandiford <richard.sandiford@arm.com>
>> > > > > Sent: Friday, September 30, 2022 9:49 AM
>> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
>> > > > > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>> > > > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> Law
>> > > > > <jeffreyalaw@gmail.com>
>> > > > > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > conditional branches, give hint if argument is a truth type to
>> > > > > backend
>> > > > >
>> > > > > Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> -----Original Message-----
>> > > > > >> From: Richard Sandiford <richard.sandiford@arm.com>
>> > > > > >> Sent: Friday, September 30, 2022 9:29 AM
>> > > > > >> To: Tamar Christina <Tamar.Christina@arm.com>
>> > > > > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>> > > > > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> > > Law
>> > > > > >> <jeffreyalaw@gmail.com>
>> > > > > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > >> conditional branches, give hint if argument is a truth type
>> > > > > >> to backend
>> > > > > >>
>> > > > > >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> >> -----Original Message-----
>> > > > > >> >> From: Gcc-patches <gcc-patches-
>> > > > > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
>> > > > > Richard
>> > > > > >> >> Biener via Gcc-patches
>> > > > > >> >> Sent: Thursday, September 29, 2022 12:09 PM
>> > > > > >> >> To: Tamar Christina via Gcc-patches
>> > > > > >> >> <gcc-patches@gcc.gnu.org>
>> > > > > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
>> > > > > <nd@arm.com>
>> > > > > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > >> >> conditional branches, give hint if argument is a truth
>> > > > > >> >> type to backend
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via
>> > > > > >> >> > Gcc-patches
>> > > > > >> >> > <gcc-
>> > > > > >> >> patches@gcc.gnu.org>:
>> > > > > >> >> >
>> > > > > >> >> >
>> > > > > >> >> >>
>> > > > > >> >> >> -----Original Message-----
>> > > > > >> >> >> From: Richard Biener <rguenther@suse.de>
>> > > > > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
>> > > > > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> > > > > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>> > > > > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
>> > > > > >> >> <nd@arm.com>
>> > > > > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion
>> > > > > >> >> >> of conditional branches, give hint if argument is a
>> > > > > >> >> >> truth type to backend
>> > > > > >> >> >>
>> > > > > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>> > > > > >> >> >>>
>> > > > > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
>> > > > > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
>> > > > > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI
>> > > > > >> >> >>>>>>> ...)) as
>> > > > > argument.
>> > > > > >> >> Heh.
>> > > > > >> >> >>>>>> But then I'd still need to change the expansion
>> > > > > >> >> >>>>>> code. I suppose this could prevent the issue with
>> > > > > >> >> >>>>>> changes to code on
>> > > > > >> other targets.
>> > > > > >> >> >>>>>>
>> > > > > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc.
>> > > > > >> >> >>>>>>>>> patterns,
>> > > > > should
>> > > > > >> we
>> > > > > >> >> >>>>>>>>> have aandcc
>> > > > > >> >> >>>>>> pattern for this indicating support for andcc +
>> > > > > >> >> >>>>>> jump as opposedto
>> > > > > >> >> >> cmpcc + jump?
>> > > > > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
>> > > > > >> >> >>>>>>> Ah, so they are conditional add, not add setting
>> > > > > >> >> >>>>>>> CC, so andcc wouldn't be appropriate.
>> > > > > >> >> >>>>>>> So I'm not sure how we'd handle such situation -
>> > > > > >> >> >>>>>>> maybe looking at REG_DECL and recognizing a _Bool
>> > > > > >> >> >>>>>>> PARM_DECL is
>> > > > > OK?
>> > > > > >> >> >>>>>> I have a slight suspicion that Richard Sandiford
>> > > > > >> >> >>>>>> would likely reject this though..
>> > > > > >> >> >>>>> Good guess :-P  We shouldn't rely on something like
>> > > > > >> >> >>>>> that for
>> > > > > >> >> >> correctness.
>> > > > > >> >> >>>>>
>> > > > > >> >> >>>>> Would it help if we promoted the test-and-branch
>> > > > > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
>> > > > > >> >> >>>>> expanders could then target it
>> > > > > >> >> >> directly.
>> > > > > >> >> >>>>>
>> > > > > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
>> > > > > >> >> >>>>> It's a relatively common operation, especially on
>> > > > > >> >> >>>>> CISCy
>> > > targets.
>> > > > > >> >> >>>>
>> > > > > >> >> >>>> But don't we represent these single bit tests using
>> > > > > >> >> >>>> zero_extract as the condition of the branch?  I guess
>> > > > > >> >> >>>> if we can generate them directly rather than waiting
>> > > > > >> >> >>>> for combine to deduce that we're dealing with a
>> > > > > >> >> >>>> single bit test and constructing the zero_extract
>> > > > > >> >> >>>> form would be an improvement and might help aarch at
>> > > > > >> >> >>>> the same
>> > > > > >> time.
>> > > > > >> >> >>>
>> > > > > >> >> >>> Do you mean that the promote_mode stuff should use
>> > > > > >> >> >>> ext(z)v rather than zero_extend to promote a bool, where
>> available?
>> > > > > >> >> >>> If so, I agree that might help.  But it sounds like it
>> > > > > >> >> >>> would have downsides
>> > > > > >> too.
>> > > > > >> >> >>> Currently a bool memory can be zero-extended on the
>> > > > > >> >> >>> fly using a load, but if we used the zero_extract form
>> > > > > >> >> >>> instead, we'd have to extract the bit after the load.
>> > > > > >> >> >>> And (as an
>> > > > > >> >> >>> alternative) choosing different behaviour based on
>> > > > > >> >> >>> whether expand sees a REG or a MEM sounds like it
>> > > > > >> >> >>> could still cause problems, since REGs could be
>> > > > > >> >> >>> replaced by MEMs (or vice versa)
>> > > > > later in the RTL passes.
>> > > > > >> >> >>>
>> > > > > >> >> >>> ISTM that the original patch was inserting an extra
>> > > > > >> >> >>> operation in the branch expansion in order to target a
>> > > > > >> >> >>> specific
>> > > instruction.
>> > > > > >> >> >>> Targeting the instruction in expand seems good, but
>> > > > > >> >> >>> IMO we should do it directly, based on knowledge of
>> > > > > >> >> >>> whether the instruction actually
>> > > > > >> >> exists.
>> > > > > >> >> >>
>> > > > > >> >> >> Yes, I think a compare-and-branch pattern is the best fit
>> here.
>> > > > > >> >> >> Note on GIMPLE we'd rely on the fact this is a
>> > > > > >> >> >> BOOLEAN_TYPE (so even 8 bit precision bools only have 1
>> > > > > >> >> >> and 0 as meaningful
>> > > values).
>> > > > > >> >> >> So the 'compare-' bit in compare-and-branch would be
>> > > > > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general
>> compare.
>> > > > > >> >> >
>> > > > > >> >> > Oh, I was thinking of adding a constant argument
>> > > > > >> >> > representing the precision that is relevant for the
>> > > > > >> >> > compare in order to make this a bit more
>> > > > > >> >> general/future proof.
>> > > > > >> >> >
>> > > > > >> >> > Are you thinking I should instead just make the optab
>> > > > > >> >> > implicitly only work for 1-bit precision comparisons?
>> > > > > >> >>
>> > > > > >> >> What?s the optab you propose (cite also the documentation
>> part)?
>> > > > > >> >
>> > > > > >> > tbranchmode5
>> > > > > >> >   Conditional branch instruction combined with a bit test
>> instruction.
>> > > > > >> Operand 0 is a comparison operator.
>> > > > > >> >   Operand 1 and Operand 2 are the first and second operands
>> > > > > >> > of the
>> > > > > >> comparison, respectively.
>> > > > > >> >   Operand 3 is the number of low-order bits that are
>> > > > > >> > relevant for the
>> > > > > >> comparison.
>> > > > > >> >   Operand 4 is the code_label to jump to.
>> > > > > >>
>> > > > > >> For the TB instructions (and for other similar instructions
>> > > > > >> that I've seen on other architectures) it would be more
>> > > > > >> useful to have a single-bit test, with operand 4 specifying the bit
>> position.
>> > > > > >> Arguably it might then be better to have separate eq and ne
>> > > > > >> optabs, to avoid the awkward doubling of the operands
>> > > > > >> (operand 1 contains
>> > > > > operands 2 and 3).
>> > > > > >>
>> > > > > >> I guess a more general way of achieving the same thing would
>> > > > > >> be to make operand 4 in the optab above a mask rather than a bit
>> count.
>> > > > > >> But that might be overly general, if there are no known
>> > > > > >> architectures that have such an instruction.
>> > > > > >
>> > > > > > One of the reasons I wanted a range rather than a single bit
>> > > > > > is that I can the use this to generate cbz/cbnz early on as well.
>> > > > >
>> > > > > We already have the opportunity to do that via cbranch<mode>4.
>> > > > > But at the moment aarch64.md always forces the separate
>> > > > > comparison instead.  (Not sure why TBH.  Does it enable more
>> > > > > ifcvt
>> > > > > opportunities?)
>> > > > >
>> > > > > If we change the body of cbranch<mode>4 to:
>> > > > >
>> > > > >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) !=
>> NE)
>> > > > >       || operands[2] != const0_rtx)
>> > > > >     {
>> > > > >       operands[1] = aarch64_gen_compare_reg (GET_CODE
>> (operands[0]),
>> > > > >                                            operands[1], operands[2]);
>> > > > >       operands[2] = const0_rtx;
>> > > > >     }
>> > > > >
>> > > > > then we generate the cbz/cbnz directly.
>> > > > >
>> > > >
>> > > > Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
>> > >
>> > > Somehow I understood that cbranch<>4 is already fully capable of the
>> > > optimization?
>> > >
>> > > On your earlier proposal I'd have commented that if it wouldn't make
>> > > sense to instead have a CCmode setter instead of an expander with a
>> branch label?
>> > > That would be a bit test, like {sign,zero}_extract compared against
>> > > zero which can then be combined with a branch?
>> > >
>> >
>> > I missed that part, that could work too.
>> >
>> > > Of course if the ISA is really bit-test-and-branch then cbranch<>4
>> > > itself or a variant of it might be more useful.  Maybe
>> > > cbranchbi4 would be "abused" for this?
>> >
>> > The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
>> > Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
>> > use this to optimize a < 0 where a is a signed value.  With the new
>> > optab this would just be a bit-test-and-branch of the sign bit.
>> >
>> > But also I'm not entirely convinced that using `BImode` and assuming a
>> > single bit is safe here. What happens if I compile my source with -std=c89?
>> >
>> > So I personally think the new optab makes more sense here. The CC setter
>> would work too.
>> >
>> > I guess my question is, between you folks, which approach would you
>> > like. It seems that Richi You'd like a CC setter. Richard do you have a
>> preference of one over the other?
>>
>> My order of preference is
>>
>> a) an existing pattern, if possible
>> b) something usable by N > 1 targets we know of, even if it requires some
>> combine magic
>> c) something that fits the actual ISA
>>
>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets
>> ZF when the result is zero.  For a branch on sign bit there's easier ways, so it's
>> probably not very useful for general compare and branch optimization and if
>> (a & 0x30) is probably handled by combine(?).
>
> Agreed, I was more giving an example of other uses. But ok.
>
>> It also seems that if (a & 1) is handled for aarch64 already and it's just a lack of
>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?
>
> Yes this was what I was originally after.  Your original suggestion was to abuse BImode
> and cbranch for this.  That could work, but I worry about introducing BImode in the RTL,
> as I don't think we currently use it at all and wonder about new missed optimizations.
>
> Richard Sandiford, would this be OK with you? I also don't want to emit an extract here as I think
> that's gonna have a bigger effect on other targets than & 1 did.
>
> I would rather have something I can explicitly check for target support for. I also wonder if just a
> target hook asking the target how to expand a given tree Boolean argument would work.

Personally I'd prefer the test-and-branch optab.  We used to have
separate compare optabs and branch optabs in the cc0 days, but having
the combined pattern turned out to be much easier.  We don't currently
expose CC registers at the optabs level and I think that's a good thing.

We could have a test-bit-and-set-pseudo optab.  But using that optab
here would reintroduce the problem that I had with the original patch,
which is:

- Currently, the branch on the boolean value expands to a single optab
  (cbranch<mode>4).  That optab could easily expand to a single cb(n)z
  instruction on aarch64 (even though the port chooses not to do that
  currently).  So taken on its own, the branch is already maximally
  efficient.

  The patch instead adds an extra instruction to the branch expansion,
  so that it uses two optabs rather than one.  On the face of it,
  this extra instruction is entirely redundant.  But emitting it can
  sometimes allow profitable combines.  That is, rather than expand to:

    and reg, reg, 0xFF (from promote_mode)
    cbranch on reg     (from the branch expansion)

  the patch expanded to:

    and reg, reg, 0xFF (from promote_mode)
    and reg, reg, 0x1  (from the branch expansion)
    cbranch on reg     (from the branch expansion)

  The first two ANDs come from separate sources.  When both ANDs exist,
  combine can get rid of the redundancy and can then see that the
  retained AND 0x1 + cbranch optimise to a test-and-branch instruction.

  But if the target can't in fact use a test-and-branch in a particular
  situation, we'd be left with the AND 0x1 and the cbranch.  That's not
  too bad if the AND from promote_mode and the AND 0x1 can be combined.
  But if the AND with 0xff happens "too far" from the AND with 0x1
  (e.g. because it's in a loop preheader), or if there's intermediate
  logic, we might end up keeping all three instructions.

Emitting an extra bit test instruction as part of the branch expansion
IMO has the same problem.  We're relying on combine combining the
(redundant) bit test with the cbranch.  If combine doesn't do that
(e.g. because the target doesn't support the combination in certain
cases) then we could be left with three instructions rather than the
original two.

That's why I think the fix is either for promote_mode to use a different
expansion for booleans (which could also have knock-on effects) or for
the branch expanders to target the test-and-branch instruction directly.

The optab wouldn't be an aarch64 special.  arc, avr, and h8300sx have
similar instructions.  There are problem others too: h8300sx was from
memory and I stopped looking after avr. :-)

Thanks,
Richard

  reply	other threads:[~2022-09-30 14:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23  9:24 Tamar Christina
2022-09-23  9:25 ` [PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI extensions Tamar Christina
2022-09-23  9:42   ` Richard Sandiford
2022-09-23  9:48     ` Tamar Christina
2022-09-26 10:35 ` [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend Richard Biener
2022-09-26 11:05   ` Tamar Christina
2022-09-26 11:32     ` Richard Biener
2022-09-26 11:46       ` Tamar Christina
2022-09-26 12:34         ` Richard Biener
2022-09-26 12:43           ` Richard Biener
2022-09-26 14:02             ` Tamar Christina
2022-09-28 15:04         ` Richard Sandiford
2022-09-28 17:23           ` Jeff Law
2022-09-29  9:37             ` Richard Sandiford
2022-09-29  9:40               ` Richard Biener
2022-09-29 10:21                 ` Tamar Christina
2022-09-29 11:09                   ` Richard Biener
2022-09-30  8:00                     ` Tamar Christina
2022-09-30  8:28                       ` Richard Sandiford
2022-09-30  8:38                         ` Tamar Christina
2022-09-30  8:48                           ` Richard Sandiford
2022-09-30  9:15                             ` Tamar Christina
2022-09-30 10:16                               ` Richard Biener
2022-09-30 11:11                                 ` Tamar Christina
2022-09-30 11:52                                   ` Richard Biener
2022-09-30 12:48                                     ` Tamar Christina
2022-09-30 14:28                                       ` Richard Sandiford [this message]
2022-09-30 14:33                                         ` Richard Biener
2022-09-29 20:49               ` Jeff Law
2022-10-27  3:22 ` Andrew Pinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptill5ndh5.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).