Hi, The attached patch tries to fix PR88833. For the following test-case: subroutine foo(x) real :: x(100) x = x + 10 end subroutine foo Assembly with -O3 -march=armv8.2-a+sve: foo_: .LFB0: .cfi_startproc mov w2, 100 mov w3, w2 mov x1, 0 whilelo p0.s, wzr, w2 fmov z1.s, #1.0e+1 .p2align 3,,7 .L2: ld1w z0.s, p0/z, [x0, x1, lsl 2] fadd z0.s, z0.s, z1.s st1w z0.s, p0, [x0, x1, lsl 2] incw x1 whilelo p0.s, w1, w3 bne .L2 ret As we can see, it generates extra mov w3, w2. Instead it could have reused w2 in both whilelo's. expand produces: insn 7: reg:SI 97 = 100 insn 8: use (reg:SI 97) insn 22: reg:SI 105 = 100 insn 23: use (reg:SI 105) Both reg:SI 97 and reg:SI 105 have only single definitions (and also single use). cse2 then replaces 100 with reg:SI 97 in insn 22, which becomes: insn 22: reg:SI 105 = reg:SI 97. sched1 then reorders instructions, and insn 7 and insn 22 end up falling in same basic block Looking at reload dump: Choosing alt 3 in insn 7: (0) r (1) M {*movsi_aarch64} alt=0,overall=0,losers=0,rld_nregs=0 Choosing alt 0 in insn 2: (0) =r (1) r {*movdi_aarch64} alt=0,overall=0,losers=0,rld_nregs=0 Choosing alt 0 in insn 22: (0) =r (1) r {*movsi_aarch64} 1 Non-pseudo reload: reject+=2 1 Non input pseudo reload: reject++ Cycle danger: overall += LRA_MAX_REJECT alt=0,overall=609,losers=1,rld_nregs=1 which shows, it ends up taking extra register. The issue here is that cse2 pass is leaving opportunities for propagating register copies. To address this, the patch makes following changes to fwprop.c: (a) Add support for handling UNSPEC in propagate_rtx_1 in a similar manner to simplify_replace_fn_rtx. (b) Allow propagating def inside a loop if source of def is a register in forward_propagate_into. AFAIU, replacing register by another register shouldn't increase cost. (c) Integrate fwprop and fwprop_addr, and make fwprop_addr propagate register copies. With the patch, fwprop_addr propagates reg:SI 97 in insn 23 and deletes insn 22, which eliminates the redundant mov. Does this patch look OK ? Bootstrapped + tested on x86_64-unknown-linux-gnu and aarch64-linux-gnu. Cross-testing with SVE in progress. Thanks, Prathamesh