This middle-end patch proposes the "hard register constant propagation"

pass be performed up to three times on each basic block (up from the

current two times) if the second pass successfully made changes.

 

The motivation for three passes is to handle the "swap idiom" (i.e.

t = x; x = y; y = t;" sequences) that get generated by register allocation

(reload).

 

Consider the x86_64 test case for __int128 addition recently discussed

on gcc-patches.  With that proposed patch, the input to the cprop_hardreg

pass looks like:

 

        movq    %rdi, %r8

        movq    %rsi, %rdi

        movq    %r8, %rsi

        movq    %rdx, %rax

        movq    %rcx, %rdx

        addq    %rsi %rax

        adcq    %rdi, %rdx

        ret

 

where the first three instructions effectively swap %rsi and %rdi.

 

On the first pass of cprop_hardreg, we notice that the third insn,

%rsi := %r8, is redundant and can eliminated/propagated to produce:

 

        movq    %rdi, %r8

        movq    %rsi, %rdi

        movq    %rdx, %rax

        movq    %rcx, %rdx

        addq    %r8 %rax

        adcq    %rdi, %rdx

        ret

 

Because a successful propagation was found, cprop_hardreg then runs

a second pass/sweep on affected basic blocks (using worklist), and

on this second pass notices that the second instruction, %rdi := %rsi,

may now be propagated (%rsi was killed in the before the first transform),

and after a second pass, we now end up with:

 

        movq    %rdi, %r8

        movq    %rdx, %rax

        movq    %rcx, %rdx

        addq    %r8, %rax

        adcq    %rsi, %rdx

        ret

 

which is the current behaviour on mainline.  However, a third and final

pass would now notice that the first insn, "%r8 := %rdi" is also now

eliminable, and a third iteration would produce optimal code:

 

        movq    %rdx, %rax

        movq    %rcx, %rdx

        addq    %rdi, %rax

        adcq    %rsi, %rdx

        ret

 

The patch below creates an additional worklist, third_pass, that is

populated with the basic block id's of blocks that were updated during

the second pass.  Does the motivation for three passes (reload doesn't

generate more or less than three instructions to swap a pair of registers)

seem reasonable for all targets?  If cprop_hardreg is considered an

expensive pass, this change could be gated based on basic block count

or similar.  Finally, I should point out that this a regression fix;

GCC 4.8 generated optimal code with two moves (whereas GCC 12 required

5 moves, up from GCC 11's 4 moves).

 

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap

and make -k check, both with and without --target_board=unix{-m32} with

no new failures.  Thoughts?  Ok for mainline?

 

 

2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>

 

gcc/ChangeLog

        * regcprop.cc (pass_cprop_hardreg::execute): Perform a third

        iteration over each basic block that was updated by the second

        iteration.

 

 

Thanks in advance,

Roger

--