Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jakub Jelinek <jakub@redhat.com>,
	Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>,
	Tamar Christina <Tamar.Christina@arm.com>,
	Roger Sayle <roger@nextmovesoftware.com>,
	Jeff Law <jeffreyalaw@gmail.com>,
	richard.sandiford@arm.com
Subject: Re: [PATCH] combine: Try harder to form zero_extends [PR106594]
Date: Wed, 8 Mar 2023 16:50:15 -0600	[thread overview]
Message-ID: <20230308225015.GT25951@gate.crashing.org> (raw)
In-Reply-To: <mptcz5j4fpg.fsf@arm.com>

On Wed, Mar 08, 2023 at 11:58:51AM +0000, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > An #ifdef is a way of making a change that is not finished yet not hurt
> > the other targets.  It still hurts generic development, which indirectly
> > hurts all targets.
> 
> Seems like this might be moot anyway given that your results
> suggest no impact on other targets.

Which means the patch does not do what it says it does.  It is a net
negative on the only target it did change code on, too.

If the patch did do what it promises it would be a (large!) net benefit,
and also on various other targets.

As it is, either the regression wasn't P1 at all, or the patch doesn't
fix the problem, or the problem only happens in unusual code (or vector
or float code).  Please explain what the regression is you want to
solve?  With a compilable testcase etc., the usual.

> >> Reorganising the
> >> expression in this way isn't something new.  extract_left_shift already
> >> does a similar thing (and does it for all targets).
> >
> > That is not similar at all, no.
> >
> > /* See if X (of mode MODE) contains an ASHIFT of COUNT or more bits that
> >    can be commuted with any other operations in X.  Return X without
> >    that shift if so.  */
> >
> > If you can factor out a utility function like that, with an actual nice
> > description like that, it would be a much more palatable patch.
> 
> OK, I've factored it out below.  Does the comment look OK?

> As mentioned in the patch description below, I think some of the other
> "and" handling could be moved here too, which should avoid a bit of
> (existing) duplication.  But that isn't necessary to fix the bug.

And stage 1 material.  Like I still think the current patch is as well.

> On the code size results: as I mentioned on IRC yesterday, when I tried
> building the linux kernel locally (with -Os -fno-schedule-insns{,2}),
> I saw code size improvements in 182 .os and a regression in only one .o.
> (I was comparing individual .os because it makes isolation easier.)

Nothing here is about code size.  It is just a good stand-in to compare
the result of a change in combine with: almost all changes in generated
code are because combine could combine more (or fewer) instructions.

This is good if you just want to look at a table of numbers.  It will
often show something is obviously not good, or obviously good, and it
shows what targets are of extra interest.

You still need to look at the actual generated code to confirm things.
For example, with your previous patch on aarch64 part of the code size
increase is extra tail duplication (in the bpf interpreter), not a bad
thing.  Unfortunately that was only a small part of the code size
increase.

> And the testcase in the PR was from a real workload (ArmRAL).

What testcase?  Oh the snippet in #c0?

This is not a good example if you want this to be P1, heh.

> If you can isolate the case you saw, I'll have a look.  But ISTM that
> the change is a pretty clear win on aarch64.

No, *you* have to show it is an improvement.  I have a day job as well,
you know.

> Combine's approach to simplifying a pattern X is to:
> 
> (1) expand "compound" operations (such as extends and extracts)
>     into multiple operations (such as shifts), to give a new
>     expression X1
> 
> (2) simplify X1 in the normal way, and with some combine-specific
>     extras, to give X2
> 
> (3) remake compound operations from X2, to give X3

This is not a good description of what really goes on.  This is only
what is done for some compound operations in some cases.  And then it
is a tiny part of what is done.  And yes, it sucks, but disabling it
causes regressions.

> For example, (1) expands sign_extend to an ashift/ashiftrt pair,
> with the ashift being an immediate operand of the ashiftrt.

It more often converts it to a zero_extend here, for example (a real
one, or what expand_compound_operation comes up with).

> By design, (2) can perturb the structure of the expanded operations
> in X1.

Where X1 is just an abstraction you use here, not something that
actually exists in combine.  Okay.

> Sometimes it will rework them entirely.  But sometimes it
> will keep the outer operations and inner operations, but in a
> slightly different arrangement.  For example, the inner operation
> might no longer be a direct operand of the outer operation.

What does that mean?  It will always still be a single expression.

What is a "direct operand"?  A subexpression?  But it always *is* one.

> The PR contains another case where we need this.  We have:
> 
>   (set (reg:DI R2) (sign_extend:DI (reg:SI R1)))
>   ... (plus:DI (mult:DI (reg:DI R2) (const_int 4)) (reg:DI R3)) ...
> 
> which is a natural, direct comnbination on aarch64.
> 
> First, (1) expands the sign_extend into two operations.  It uses
> R1's nonzero_bits information to determine that the sign extension
> is equivalent to a zero extension:
> 
>   /* Convert sign extension to zero extension, if we know that the high
>      bit is not set, as this is easier to optimize.  It will be converted
>      back to cheaper alternative in make_extraction.  */
> 
> As I'll explain below, the problem is that this conversion back to a
> cheaper alternative doesn't happen.
> 
> The zero extension is expanded into an "and" of a subreg.
> Again, the nonzero_bits information narrows the "and" mask
> from a full SImode mask to a smaller constant (63).  So X1
> contains:
> 
>   (mult:DI (and:DI (subreg:DI (reg:SI R1) 0)
>                    (const_int 63))
>            (const_int 4))
> 
> The simplification rules for mult convert this to an ashift by 2.
> Then, this rule in simplify_shift_const_1:
> 
> 	  /* If we have (shift (logical)), move the logical to the outside
> 	     to allow it to possibly combine with another logical and the
> 	     shift to combine with another shift.  This also canonicalizes to
> 	     what a ZERO_EXTRACT looks like.  Also, some machines have
> 	     (and (shift)) insns.  */
> 
> moves the shift inside the "and", so that X2 contains:
> 
>   (and:DI (ashift:DI (subreg:DI (reg:SI R1) 0)
>                      (const_int 2))
>           (const_int 252))
> 
> We later recanonicalise to a mult (since this is an address):

Yeah that is a big wart, one that causes problems on most targets.  But
nothing new ;-)

>   (and:DI (mult:DI (subreg:DI (reg:SI R1) 0)
>                    (const_int 4))
>           (const_int 252))
> 
> But we fail to transform this back to the natural substitution:
> 
>   (mult:DI (sign_extend:DI (reg:SI R1))
>            (const_int 4))

You call this "natural".  Is that a reasonable thing to do on aarch?
What MD patterns should I look at?

> 	* combine.cc (make_compound_operation_int): Extend the AND to
> 	ZERO_EXTEND transformation so that it can handle an intervening
> 	multiplication by a power of two.

If that is truly all it does, that sounds nice :-)

But the patch does more?  make_compound_operation_and is the obvious
example, a new function.

Oh, this is factored out from existing code?  Please do that as a
separate patch.  First the refactoring, than the (hopefully tiny!) one
with the actual changes.

(And send new patch series as a new mail thread please).

Segher

next prev parent reply	other threads:[~2023-03-08 22:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-04 18:32 [PATCH] PR rtl-optimization/106594: Preserve zero_extend in combine when cheap Roger Sayle
2023-03-04 22:17 ` Segher Boessenkool
2023-03-05 19:28   ` Tamar Christina
2023-03-05 19:56     ` Jeff Law
2023-03-05 20:43       ` Tamar Christina
2023-03-05 21:33         ` Segher Boessenkool
2023-03-06 12:08           ` Segher Boessenkool
2023-03-06 12:11             ` Tamar Christina
2023-03-06 12:47       ` [PATCH] combine: Try harder to form zero_extends [PR106594] Richard Sandiford
2023-03-06 13:58         ` Segher Boessenkool
2023-03-06 15:08           ` Richard Sandiford
2023-03-06 16:18             ` Jakub Jelinek
2023-03-06 16:34               ` Richard Sandiford
2023-03-06 18:31                 ` Segher Boessenkool
2023-03-06 19:13                   ` Richard Sandiford
2023-03-06 23:31                     ` Segher Boessenkool
2023-03-08 11:58                       ` Richard Sandiford
2023-03-08 22:50                         ` Segher Boessenkool [this message]
2023-03-09 10:18                           ` Richard Sandiford
2023-03-06 22:58                 ` Segher Boessenkool
2023-03-06 18:13               ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230308225015.GT25951@gate.crashing.org \
    --to=segher@kernel.crashing.org \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=richard.sandiford@arm.com \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).