Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
       [not found] <202404031107.433B7hCA019240@gate.crashing.org>
@ 2024-04-05 21:27 ` Segher Boessenkool
  2024-04-05 21:46   ` Jeff Law
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Segher Boessenkool @ 2024-04-05 21:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jakub Jelinek, jeffreyalaw

Hi!

On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> The following avoids re-walking and re-combining the instructions
> between i2 and i3 when the pattern of i2 doesn't change.
> 
> Bootstrap and regtest running ontop of a reversal of 
> r14-9692-g839bc42772ba7a.

Please include that in the patch (or series, preferably).

> It brings down memory use frmo 9GB to 400MB and compile-time from
> 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> but has shown code generation regressions across acrchitectures.
> 
> OK to revert r14-9692-g839bc42772ba7a?

No.

The patch solved a very real problem.  How does your replacement handle
that?  You don't say.  It looks like it only battles symptoms a bit,
instead :-(

We had this before: 3->2 combinations that leave an instruction
identical to what was there before.  This was just a combination with
context as well.  The only reason this wasn't a huge problem then
already was because this is a 3->2 combination, even if it really is a
2->1 one it still is beneficial in all the same cases.  But in the new
case it can iterate indefinitely -- well not quite, but some polynomial
number of times, for a polynomial at least of degree three, possibly
more :-(

With this patch you need to show combine still is linear.  I don't think
it is, but some deeper analysis might show it still is.

  ~ - ~ - ~

What should *really* be done is something that has been on the wish list
for decades: an uncse pass.

The things that combine no longer works on after my patch are actually
1->1 combinations (which we never do currently, although we probably
should); or alternatively, an un-CSE followed by a 2->1 combination.

We can do the latter of course, but we need to do an actual uncse first!
Somewhere before combine, and then redo a CSE after it.  An actual CSE,
not doing ten gazillion other things.

Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
  2024-04-05 21:27 ` [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination Segher Boessenkool
@ 2024-04-05 21:46   ` Jeff Law
  2024-04-05 21:50     ` Jakub Jelinek
  2024-04-06 12:50   ` Richard Biener
  2024-04-08 12:18   ` Richard Sandiford
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2024-04-05 21:46 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener; +Cc: gcc-patches, Jakub Jelinek



On 4/5/24 3:27 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
>> The following avoids re-walking and re-combining the instructions
>> between i2 and i3 when the pattern of i2 doesn't change.
>>
>> Bootstrap and regtest running ontop of a reversal of
>> r14-9692-g839bc42772ba7a.
> 
> Please include that in the patch (or series, preferably).
> 
>> It brings down memory use frmo 9GB to 400MB and compile-time from
>> 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
>> but has shown code generation regressions across acrchitectures.
>>
>> OK to revert r14-9692-g839bc42772ba7a?
> 
> No.
> 
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(
But that patch very clearly created new problems in the process.  I 
would argue that it was a step backwards given that we're in stage4.


Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
  2024-04-05 21:46   ` Jeff Law
@ 2024-04-05 21:50     ` Jakub Jelinek
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Jelinek @ 2024-04-05 21:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: Segher Boessenkool, Richard Biener, gcc-patches

On Fri, Apr 05, 2024 at 03:46:30PM -0600, Jeff Law wrote:
> 
> 
> On 4/5/24 3:27 PM, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> > > The following avoids re-walking and re-combining the instructions
> > > between i2 and i3 when the pattern of i2 doesn't change.
> > > 
> > > Bootstrap and regtest running ontop of a reversal of
> > > r14-9692-g839bc42772ba7a.
> > 
> > Please include that in the patch (or series, preferably).
> > 
> > > It brings down memory use frmo 9GB to 400MB and compile-time from
> > > 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> > > but has shown code generation regressions across acrchitectures.
> > > 
> > > OK to revert r14-9692-g839bc42772ba7a?
> > 
> > No.
> > 
> > The patch solved a very real problem.  How does your replacement handle
> > that?  You don't say.  It looks like it only battles symptoms a bit,
> > instead :-(
> But that patch very clearly created new problems in the process.  I would
> argue that it was a step backwards given that we're in stage4.

Yeah.  E.g. PR114518 and PR114522, both P1s.

	Jakub


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
  2024-04-05 21:27 ` [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination Segher Boessenkool
  2024-04-05 21:46   ` Jeff Law
@ 2024-04-06 12:50   ` Richard Biener
  2024-04-08 12:18   ` Richard Sandiford
  2 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2024-04-06 12:50 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, gcc-patches, Jakub Jelinek, jeffreyalaw

On Fri, Apr 5, 2024 at 11:29 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> > The following avoids re-walking and re-combining the instructions
> > between i2 and i3 when the pattern of i2 doesn't change.
> >
> > Bootstrap and regtest running ontop of a reversal of
> > r14-9692-g839bc42772ba7a.
>
> Please include that in the patch (or series, preferably).

I'd like reversal to be considered independently of this patch which is why
I didn't include the reversal.  Of course without reversal this patch doesn't
make sense.

> > It brings down memory use frmo 9GB to 400MB and compile-time from
> > 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> > but has shown code generation regressions across acrchitectures.
> >
> > OK to revert r14-9692-g839bc42772ba7a?
>
> No.
>
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(

My patch isn't a replacement for your solution.  Reversal is to address
the P1 regressions caused by the change.  My change offers to address
some of the symptoms shown with the testcase without disabling the
offending 2->2 combinations.

> We had this before: 3->2 combinations that leave an instruction
> identical to what was there before.  This was just a combination with
> context as well.  The only reason this wasn't a huge problem then
> already was because this is a 3->2 combination, even if it really is a
> 2->1 one it still is beneficial in all the same cases.  But in the new
> case it can iterate indefinitely -- well not quite, but some polynomial
> number of times, for a polynomial at least of degree three, possibly
> more :-(
>
> With this patch you need to show combine still is linear.  I don't think
> it is, but some deeper analysis might show it still is.

We have come to different conclusions as to what the issue is that the
testcase exposes in combine.  While you spotted a transform that
combine shouldn't have done (and I think I agree on that!) I identified
the fact that while the number of successful combines is linear in the
number of log-links (and thus linear in the size of a basic-block), the
number of combination _attempts_ appears to be O(log-links) times
O(successful combinations).  The testcase hits hard on this because of
those 2->2 combinations done but this problem persists with all N->2
combinations.  This "quadraticness" (I know you don't like to call it that
way) is because for each successful N->2 combination of insns
{I2, .. <n intervening insns> .., I1} we try combining into all insns
between (and including) I2 and I1.  combine wants to retry combining
into I2 rightfully so but there's no good reason to retry _all_ of the
n intervening insns.  Yes, we want to retry all insns that have their
log-links point to I2 (and possibly more for second-level references,
dependent on how combine exactly identifies I2, I3 and I4).  Re-trying
all of them obviously works, but there's your quadraticness.

My patch makes this problem a little bit (in general) less often hit
since it avoids retrying iff I2 did not change.  I _think_ in that case
the log-links/notes shouldn't have changed either.  In any case I'm
disabling less of combine than what your patch did.

So again - is it OK to revert your patch?  Or do you expect you can
address the code generation regressions within the next week?  Since
the original issue is quite old postponing your solution to the next stage1
should be acceptable.  Currently those regressions will block the release
of GCC 14.

Do you agree that the patch I propose, while it doesn't solve any actual
issue (it doesn't fix the walking scheme nor does it avoid the combinations
combine shouldn't do), it helps in some cases and shouldn't cause code
generation regressions that your patch wouldn't have caused as well?
So is that change OK until we get your real solution implemented in a way
that doesn't cause regressions?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
  2024-04-05 21:27 ` [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination Segher Boessenkool
  2024-04-05 21:46   ` Jeff Law
  2024-04-06 12:50   ` Richard Biener
@ 2024-04-08 12:18   ` Richard Sandiford
  2 siblings, 0 replies; 6+ messages in thread
From: Richard Sandiford @ 2024-04-08 12:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Richard Biener, gcc-patches, Jakub Jelinek, jeffreyalaw

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
>> The following avoids re-walking and re-combining the instructions
>> between i2 and i3 when the pattern of i2 doesn't change.
>> 
>> Bootstrap and regtest running ontop of a reversal of 
>> r14-9692-g839bc42772ba7a.
>
> Please include that in the patch (or series, preferably).
>
>> It brings down memory use frmo 9GB to 400MB and compile-time from
>> 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
>> but has shown code generation regressions across acrchitectures.
>> 
>> OK to revert r14-9692-g839bc42772ba7a?
>
> No.
>
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(
>
> We had this before: 3->2 combinations that leave an instruction
> identical to what was there before.  This was just a combination with
> context as well.  The only reason this wasn't a huge problem then
> already was because this is a 3->2 combination, even if it really is a
> 2->1 one it still is beneficial in all the same cases.  But in the new
> case it can iterate indefinitely -- well not quite, but some polynomial
> number of times, for a polynomial at least of degree three, possibly
> more :-(
>
> With this patch you need to show combine still is linear.  I don't think
> it is, but some deeper analysis might show it still is.
>
>   ~ - ~ - ~
>
> What should *really* be done is something that has been on the wish list
> for decades: an uncse pass.
>
> The things that combine no longer works on after my patch are actually
> 1->1 combinations (which we never do currently, although we probably
> should); or alternatively, an un-CSE followed by a 2->1 combination.
>
> We can do the latter of course, but we need to do an actual uncse first!
> Somewhere before combine, and then redo a CSE after it.  An actual CSE,
> not doing ten gazillion other things.

Can you give a specific example of a 2->2 combination that we still
want to apply after r14-9692-g839bc42772ba7a?

2->2 combinations as I understand them were added by
c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306:

Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Mon Jul 30 15:18:17 2018 +0200

    combine: Allow combining two insns to two insns

    This patch allows combine to combine two insns into two.  This helps
    in many cases, by reducing instruction path length, and also allowing
    further combinations to happen.  PR85160 is a typical example of code
    that it can improve.

    This patch does not allow such combinations if either of the original
    instructions was a simple move instruction.  In those cases combining
    the two instructions increases register pressure without improving the
    code.  With this move test register pressure does no longer increase
    noticably as far as I can tell.

    (At first I also didn't allow either of the resulting insns to be a
    move instruction.  But that is actually a very good thing to have, as
    should have been obvious).

            PR rtl-optimization/85160
            * combine.c (is_just_move): New function.
            (try_combine): Allow combining two instructions into two if neither of
            the original instructions was a move.

That patch didn't have a testcase, but one was added later in
81bdfc1e2940fc93bcd0bba4416daff47f04f3b3:

    testcase for 2-2 combine

    gcc/testsuite/
            PR rtl-optimization/85160
            * gcc.target/powerpc/combine-2-2.c: New testcase.

But this is the powerpc test that regresses with the recent patch (PR114518).

The patches reference aarch64 bug PR85160.  If I check out and build
c4c5ad1d6d1e above, I can see that it does indeed remove two mvns from
the PR85160 testcase.  The diff from c4c5ad1d6d1e~ is:

@@ -10,12 +10,10 @@
        .cfi_startproc
        ldr     w3, [x2, w3, sxtw 2]
        ldr     w2, [x2, w4, sxtw 2]
-       mvn     w3, w3
-       mvn     w2, w2
-       and     w4, w3, w1
-       and     w1, w2, w1
-       and     w3, w3, w0
-       and     w2, w2, w0
+       bic     w4, w1, w3
+       bic     w3, w0, w3
+       bic     w1, w1, w2
+       bic     w2, w0, w2
        asr     w4, w4, 9
        asr     w1, w1, 7
        orr     w3, w4, w3, asr 7

(which is great).  But if I apply 839bc42772ba on top of c4c5ad1d6d1e
then the optimisation is undone.

Is that the intention?  I.e. are we effectively removing the kind of
2->2 combinations added in c4c5ad1d6d1e1e1fe?  If so, why not simply
revert c4c5ad1d6d1e1e1fe itself?

Or is there a specific testcase that is still optimised with the
combination of c4c5ad1d6d1e1e1fe and 839bc42772ba7a that would not
be optimised without c4c5ad1d6d1e1e1fe?  If so, can you say what it is?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
@ 2024-04-03 11:07 Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2024-04-03 11:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, segher, jeffreyalaw

The following avoids re-walking and re-combining the instructions
between i2 and i3 when the pattern of i2 doesn't change.

Bootstrap and regtest running ontop of a reversal of 
r14-9692-g839bc42772ba7a.

It brings down memory use frmo 9GB to 400MB and compile-time from
80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
but has shown code generation regressions across acrchitectures.

OK to revert r14-9692-g839bc42772ba7a?

OK to install this instead?

Thanks,
Richard.

	PR rtl-optimization/101523
	* combine.cc (try_combine): When the pattern of i2 doesn't
	change do not re-start combining at i2 or an earlier insn which
	had links or notes added.
---
 gcc/combine.cc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..ff25752cac4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
       adjust_for_new_dest (i3);
     }
 
+  bool i2_unchanged = false;
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+    i2_unchanged = true;
+
   /* We now know that we can do this combination.  Merge the insns and
      update the status of registers and LOG_LINKS.  */
 
@@ -4752,6 +4756,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
   combine_successes++;
   undo_commit ();
 
+  if (i2_unchanged)
+    return i3;
+
   rtx_insn *ret = newi2pat ? i2 : i3;
   if (added_links_insn && DF_INSN_LUID (added_links_insn) < DF_INSN_LUID (ret))
     ret = added_links_insn;
-- 
2.35.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-08 12:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <202404031107.433B7hCA019240@gate.crashing.org>
2024-04-05 21:27 ` [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination Segher Boessenkool
2024-04-05 21:46   ` Jeff Law
2024-04-05 21:50     ` Jakub Jelinek
2024-04-06 12:50   ` Richard Biener
2024-04-08 12:18   ` Richard Sandiford
2024-04-03 11:07 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).