public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH/RFC] cprop_hardreg... Third time's a charm.
Date: Thu, 2 Jun 2022 12:55:34 +0200	[thread overview]
Message-ID: <CAFiYyc1VqPKuL2rv3RdBaP=Q+NUVymX8n2Q_gddWZP45fJ_L8g@mail.gmail.com> (raw)
In-Reply-To: <03c201d8766a$450d9a80$cf28cf80$@nextmovesoftware.com>

On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
>
> This middle-end patch proposes the "hard register constant propagation"
>
> pass be performed up to three times on each basic block (up from the
>
> current two times) if the second pass successfully made changes.
>
>
>
> The motivation for three passes is to handle the "swap idiom" (i.e.
>
> t = x; x = y; y = t;" sequences) that get generated by register allocation
>
> (reload).
>
>
>
> Consider the x86_64 test case for __int128 addition recently discussed
>
> on gcc-patches.  With that proposed patch, the input to the cprop_hardreg
>
> pass looks like:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %r8, %rsi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rsi %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> where the first three instructions effectively swap %rsi and %rdi.
>
>
>
> On the first pass of cprop_hardreg, we notice that the third insn,
>
> %rsi := %r8, is redundant and can eliminated/propagated to produce:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8 %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> Because a successful propagation was found, cprop_hardreg then runs
>
> a second pass/sweep on affected basic blocks (using worklist), and
>
> on this second pass notices that the second instruction, %rdi := %rsi,
>
> may now be propagated (%rsi was killed in the before the first transform),
>
> and after a second pass, we now end up with:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> which is the current behaviour on mainline.  However, a third and final
>
> pass would now notice that the first insn, "%r8 := %rdi" is also now
>
> eliminable, and a third iteration would produce optimal code:
>
>
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rdi, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> The patch below creates an additional worklist, third_pass, that is
>
> populated with the basic block id's of blocks that were updated during
>
> the second pass.  Does the motivation for three passes (reload doesn't
>
> generate more or less than three instructions to swap a pair of registers)
>
> seem reasonable for all targets?  If cprop_hardreg is considered an
>
> expensive pass, this change could be gated based on basic block count
>
> or similar.  Finally, I should point out that this a regression fix;
>
> GCC 4.8 generated optimal code with two moves (whereas GCC 12 required
>
> 5 moves, up from GCC 11's 4 moves).
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>
> and make -k check, both with and without --target_board=unix{-m32} with
>
> no new failures.  Thoughts?  Ok for mainline?

Can you instead refactor the code to have

auto_vec<int> worklist1, worklist2;
vec<int> *worklist = &worklist1;

and alternate between worklist1 and worklist2 N times (const int N = 2
is OK for now I guess, OTOH for -O1 we might want to stick to 1).

it might be tempting to maintain the number of times we visited a block
and just directly iterate on changed BBs.  There's also the comment
in cprop_hardreg_bb:

  /* If a block has a single predecessor, that we've already
     processed, begin with the value data that was live at
     the end of the predecessor block.  */
  /* ??? Ought to use more intelligent queuing of blocks.  */

which suggests we want to improve on its dataflow (like iterating
in RPO order so we can intersect the live value data of predecessors?)
That said, calling df_analyze () between iterations is what makes
iterating expensive I guess, as well as cprop_hardreg_debug.

Richard.

>
>
>
>
> 2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>
>
>
>
> gcc/ChangeLog
>
>         * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
>
>         iteration over each basic block that was updated by the second
>
>         iteration.
>
>
>
>
>
> Thanks in advance,
>
> Roger
>
> --
>
>
>

      reply	other threads:[~2022-06-02 10:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02 10:19 Roger Sayle
2022-06-02 10:55 ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc1VqPKuL2rv3RdBaP=Q+NUVymX8n2Q_gddWZP45fJ_L8g@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).