Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Richard Biener <rguenther@suse.de>,
	 gcc-patches@gcc.gnu.org,  Jakub Jelinek <jakub@redhat.com>,
	 jeffreyalaw@gmail.com
Subject: Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
Date: Mon, 08 Apr 2024 13:18:53 +0100	[thread overview]
Message-ID: <mpth6gcvyhe.fsf@arm.com> (raw)
In-Reply-To: <20240405212754.GL19790@gate.crashing.org> (Segher Boessenkool's message of "Fri, 5 Apr 2024 16:27:54 -0500")

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
>> The following avoids re-walking and re-combining the instructions
>> between i2 and i3 when the pattern of i2 doesn't change.
>> 
>> Bootstrap and regtest running ontop of a reversal of 
>> r14-9692-g839bc42772ba7a.
>
> Please include that in the patch (or series, preferably).
>
>> It brings down memory use frmo 9GB to 400MB and compile-time from
>> 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
>> but has shown code generation regressions across acrchitectures.
>> 
>> OK to revert r14-9692-g839bc42772ba7a?
>
> No.
>
> The patch solved a very real problem.  How does your replacement handle
> that?  You don't say.  It looks like it only battles symptoms a bit,
> instead :-(
>
> We had this before: 3->2 combinations that leave an instruction
> identical to what was there before.  This was just a combination with
> context as well.  The only reason this wasn't a huge problem then
> already was because this is a 3->2 combination, even if it really is a
> 2->1 one it still is beneficial in all the same cases.  But in the new
> case it can iterate indefinitely -- well not quite, but some polynomial
> number of times, for a polynomial at least of degree three, possibly
> more :-(
>
> With this patch you need to show combine still is linear.  I don't think
> it is, but some deeper analysis might show it still is.
>
>   ~ - ~ - ~
>
> What should *really* be done is something that has been on the wish list
> for decades: an uncse pass.
>
> The things that combine no longer works on after my patch are actually
> 1->1 combinations (which we never do currently, although we probably
> should); or alternatively, an un-CSE followed by a 2->1 combination.
>
> We can do the latter of course, but we need to do an actual uncse first!
> Somewhere before combine, and then redo a CSE after it.  An actual CSE,
> not doing ten gazillion other things.

Can you give a specific example of a 2->2 combination that we still
want to apply after r14-9692-g839bc42772ba7a?

2->2 combinations as I understand them were added by
c4c5ad1d6d1e1e1fe7a1c2b3bb097cc269dc7306:

Author: Segher Boessenkool <segher@kernel.crashing.org>
Date:   Mon Jul 30 15:18:17 2018 +0200

    combine: Allow combining two insns to two insns

    This patch allows combine to combine two insns into two.  This helps
    in many cases, by reducing instruction path length, and also allowing
    further combinations to happen.  PR85160 is a typical example of code
    that it can improve.

    This patch does not allow such combinations if either of the original
    instructions was a simple move instruction.  In those cases combining
    the two instructions increases register pressure without improving the
    code.  With this move test register pressure does no longer increase
    noticably as far as I can tell.

    (At first I also didn't allow either of the resulting insns to be a
    move instruction.  But that is actually a very good thing to have, as
    should have been obvious).

            PR rtl-optimization/85160
            * combine.c (is_just_move): New function.
            (try_combine): Allow combining two instructions into two if neither of
            the original instructions was a move.

That patch didn't have a testcase, but one was added later in
81bdfc1e2940fc93bcd0bba4416daff47f04f3b3:

    testcase for 2-2 combine

    gcc/testsuite/
            PR rtl-optimization/85160
            * gcc.target/powerpc/combine-2-2.c: New testcase.

But this is the powerpc test that regresses with the recent patch (PR114518).

The patches reference aarch64 bug PR85160.  If I check out and build
c4c5ad1d6d1e above, I can see that it does indeed remove two mvns from
the PR85160 testcase.  The diff from c4c5ad1d6d1e~ is:

@@ -10,12 +10,10 @@
        .cfi_startproc
        ldr     w3, [x2, w3, sxtw 2]
        ldr     w2, [x2, w4, sxtw 2]
-       mvn     w3, w3
-       mvn     w2, w2
-       and     w4, w3, w1
-       and     w1, w2, w1
-       and     w3, w3, w0
-       and     w2, w2, w0
+       bic     w4, w1, w3
+       bic     w3, w0, w3
+       bic     w1, w1, w2
+       bic     w2, w0, w2
        asr     w4, w4, 9
        asr     w1, w1, 7
        orr     w3, w4, w3, asr 7

(which is great).  But if I apply 839bc42772ba on top of c4c5ad1d6d1e
then the optimisation is undone.

Is that the intention?  I.e. are we effectively removing the kind of
2->2 combinations added in c4c5ad1d6d1e1e1fe?  If so, why not simply
revert c4c5ad1d6d1e1e1fe itself?

Or is there a specific testcase that is still optimised with the
combination of c4c5ad1d6d1e1e1fe and 839bc42772ba7a that would not
be optimised without c4c5ad1d6d1e1e1fe?  If so, can you say what it is?

Thanks,
Richard

next prev parent reply	other threads:[~2024-04-08 12:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <202404031107.433B7hCA019240@gate.crashing.org>
2024-04-05 21:27 ` Segher Boessenkool
2024-04-05 21:46   ` Jeff Law
2024-04-05 21:50     ` Jakub Jelinek
2024-04-06 12:50   ` Richard Biener
2024-04-08 12:18   ` Richard Sandiford [this message]
2024-04-03 11:07 Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpth6gcvyhe.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).