* [PATCH/RFC] cprop_hardreg... Third time's a charm.
@ 2022-06-02 10:19 Roger Sayle
2022-06-02 10:55 ` Richard Biener
0 siblings, 1 reply; 2+ messages in thread
From: Roger Sayle @ 2022-06-02 10:19 UTC (permalink / raw)
To: 'GCC Patches'
[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]
This middle-end patch proposes the "hard register constant propagation"
pass be performed up to three times on each basic block (up from the
current two times) if the second pass successfully made changes.
The motivation for three passes is to handle the "swap idiom" (i.e.
t = x; x = y; y = t;" sequences) that get generated by register allocation
(reload).
Consider the x86_64 test case for __int128 addition recently discussed
on gcc-patches. With that proposed patch, the input to the cprop_hardreg
pass looks like:
movq %rdi, %r8
movq %rsi, %rdi
movq %r8, %rsi
movq %rdx, %rax
movq %rcx, %rdx
addq %rsi %rax
adcq %rdi, %rdx
ret
where the first three instructions effectively swap %rsi and %rdi.
On the first pass of cprop_hardreg, we notice that the third insn,
%rsi := %r8, is redundant and can eliminated/propagated to produce:
movq %rdi, %r8
movq %rsi, %rdi
movq %rdx, %rax
movq %rcx, %rdx
addq %r8 %rax
adcq %rdi, %rdx
ret
Because a successful propagation was found, cprop_hardreg then runs
a second pass/sweep on affected basic blocks (using worklist), and
on this second pass notices that the second instruction, %rdi := %rsi,
may now be propagated (%rsi was killed in the before the first transform),
and after a second pass, we now end up with:
movq %rdi, %r8
movq %rdx, %rax
movq %rcx, %rdx
addq %r8, %rax
adcq %rsi, %rdx
ret
which is the current behaviour on mainline. However, a third and final
pass would now notice that the first insn, "%r8 := %rdi" is also now
eliminable, and a third iteration would produce optimal code:
movq %rdx, %rax
movq %rcx, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
The patch below creates an additional worklist, third_pass, that is
populated with the basic block id's of blocks that were updated during
the second pass. Does the motivation for three passes (reload doesn't
generate more or less than three instructions to swap a pair of registers)
seem reasonable for all targets? If cprop_hardreg is considered an
expensive pass, this change could be gated based on basic block count
or similar. Finally, I should point out that this a regression fix;
GCC 4.8 generated optimal code with two moves (whereas GCC 12 required
5 moves, up from GCC 11's 4 moves).
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures. Thoughts? Ok for mainline?
2022-06-02 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* regcprop.cc (pass_cprop_hardreg::execute): Perform a third
iteration over each basic block that was updated by the second
iteration.
Thanks in advance,
Roger
--
[-- Attachment #2: patchch.txt --]
[-- Type: text/plain, Size: 1377 bytes --]
diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 1fdc367..5555c4a 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -1384,6 +1384,7 @@ pass_cprop_hardreg::execute (function *fun)
bitmap_clear (visited);
auto_vec<int> worklist;
+ auto_vec<int> third_pass;
bool any_debug_changes = false;
/* We need accurate notes. Earlier passes such as if-conversion may
@@ -1425,7 +1426,27 @@ pass_cprop_hardreg::execute (function *fun)
for (int index : worklist)
{
bb = BASIC_BLOCK_FOR_FN (fun, index);
- cprop_hardreg_bb (bb, all_vd, visited);
+ /* Perform a third pass, if the second pass changed anything.
+ Three passes are required for swaps: t = x; x = y; y = t. */
+ if (cprop_hardreg_bb (bb, all_vd, visited))
+ third_pass.safe_push (bb->index);
+ if (all_vd[bb->index].n_debug_insn_changes)
+ any_debug_changes = true;
+ }
+
+ df_analyze ();
+ if (MAY_HAVE_DEBUG_BIND_INSNS && any_debug_changes)
+ cprop_hardreg_debug (fun, all_vd);
+ }
+
+ if (!third_pass.is_empty ())
+ {
+ any_debug_changes = false;
+ bitmap_clear (visited);
+ for (int index : third_pass)
+ {
+ bb = BASIC_BLOCK_FOR_FN (fun, index);
+ cprop_hardreg_bb (bb, all_vd, visited);
if (all_vd[bb->index].n_debug_insn_changes)
any_debug_changes = true;
}
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH/RFC] cprop_hardreg... Third time's a charm.
2022-06-02 10:19 [PATCH/RFC] cprop_hardreg... Third time's a charm Roger Sayle
@ 2022-06-02 10:55 ` Richard Biener
0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2022-06-02 10:55 UTC (permalink / raw)
To: Roger Sayle; +Cc: GCC Patches
On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
>
> This middle-end patch proposes the "hard register constant propagation"
>
> pass be performed up to three times on each basic block (up from the
>
> current two times) if the second pass successfully made changes.
>
>
>
> The motivation for three passes is to handle the "swap idiom" (i.e.
>
> t = x; x = y; y = t;" sequences) that get generated by register allocation
>
> (reload).
>
>
>
> Consider the x86_64 test case for __int128 addition recently discussed
>
> on gcc-patches. With that proposed patch, the input to the cprop_hardreg
>
> pass looks like:
>
>
>
> movq %rdi, %r8
>
> movq %rsi, %rdi
>
> movq %r8, %rsi
>
> movq %rdx, %rax
>
> movq %rcx, %rdx
>
> addq %rsi %rax
>
> adcq %rdi, %rdx
>
> ret
>
>
>
> where the first three instructions effectively swap %rsi and %rdi.
>
>
>
> On the first pass of cprop_hardreg, we notice that the third insn,
>
> %rsi := %r8, is redundant and can eliminated/propagated to produce:
>
>
>
> movq %rdi, %r8
>
> movq %rsi, %rdi
>
> movq %rdx, %rax
>
> movq %rcx, %rdx
>
> addq %r8 %rax
>
> adcq %rdi, %rdx
>
> ret
>
>
>
> Because a successful propagation was found, cprop_hardreg then runs
>
> a second pass/sweep on affected basic blocks (using worklist), and
>
> on this second pass notices that the second instruction, %rdi := %rsi,
>
> may now be propagated (%rsi was killed in the before the first transform),
>
> and after a second pass, we now end up with:
>
>
>
> movq %rdi, %r8
>
> movq %rdx, %rax
>
> movq %rcx, %rdx
>
> addq %r8, %rax
>
> adcq %rsi, %rdx
>
> ret
>
>
>
> which is the current behaviour on mainline. However, a third and final
>
> pass would now notice that the first insn, "%r8 := %rdi" is also now
>
> eliminable, and a third iteration would produce optimal code:
>
>
>
> movq %rdx, %rax
>
> movq %rcx, %rdx
>
> addq %rdi, %rax
>
> adcq %rsi, %rdx
>
> ret
>
>
>
> The patch below creates an additional worklist, third_pass, that is
>
> populated with the basic block id's of blocks that were updated during
>
> the second pass. Does the motivation for three passes (reload doesn't
>
> generate more or less than three instructions to swap a pair of registers)
>
> seem reasonable for all targets? If cprop_hardreg is considered an
>
> expensive pass, this change could be gated based on basic block count
>
> or similar. Finally, I should point out that this a regression fix;
>
> GCC 4.8 generated optimal code with two moves (whereas GCC 12 required
>
> 5 moves, up from GCC 11's 4 moves).
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>
> and make -k check, both with and without --target_board=unix{-m32} with
>
> no new failures. Thoughts? Ok for mainline?
Can you instead refactor the code to have
auto_vec<int> worklist1, worklist2;
vec<int> *worklist = &worklist1;
and alternate between worklist1 and worklist2 N times (const int N = 2
is OK for now I guess, OTOH for -O1 we might want to stick to 1).
it might be tempting to maintain the number of times we visited a block
and just directly iterate on changed BBs. There's also the comment
in cprop_hardreg_bb:
/* If a block has a single predecessor, that we've already
processed, begin with the value data that was live at
the end of the predecessor block. */
/* ??? Ought to use more intelligent queuing of blocks. */
which suggests we want to improve on its dataflow (like iterating
in RPO order so we can intersect the live value data of predecessors?)
That said, calling df_analyze () between iterations is what makes
iterating expensive I guess, as well as cprop_hardreg_debug.
Richard.
>
>
>
>
> 2022-06-02 Roger Sayle <roger@nextmovesoftware.com>
>
>
>
> gcc/ChangeLog
>
> * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
>
> iteration over each basic block that was updated by the second
>
> iteration.
>
>
>
>
>
> Thanks in advance,
>
> Roger
>
> --
>
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2022-06-02 10:55 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-02 10:19 [PATCH/RFC] cprop_hardreg... Third time's a charm Roger Sayle
2022-06-02 10:55 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).