On 7/25/23 20:31, Jeff Law via Gcc-patches wrote: > > > On 7/25/23 05:24, Jivan Hakobyan wrote: >> Hi. >> >> I re-run the benchmarks and hopefully got the same profit. >> I also compared the leela's code and figured out the reason. >> >> Actually, my and Manolis's patches do the same thing. The difference >> is only execution order. > But shouldn't your patch also allow for for at the last the potential > to pull the fp+offset computation out of a loop?  I'm pretty sure > Manolis's patch can't do that. > >> Because of f-m-o held after the register allocation it cannot >> eliminate redundant move 'sp' to another register. > Actually that's supposed to be handled by a different patch that > should already be upstream.  Specifically; > >> commit 6a2e8dcbbd4bab374b27abea375bf7a921047800 >> Author: Manolis Tsamis >> Date:   Thu May 25 13:44:41 2023 +0200 >> >>     cprop_hardreg: Enable propagation of the stack pointer if possible >>         Propagation of the stack pointer in cprop_hardreg is currenty >>     forbidden in all cases, due to maybe_mode_change returning NULL. >>     Relax this restriction and allow propagation when no mode change is >>     requested. >>         gcc/ChangeLog: >>                 * regcprop.cc (maybe_mode_change): Enable stack pointer >>             propagation. > I think there were a couple-follow-ups.  But that's the key change > that should allow propagation of copies from the stack pointer and > thus eliminate the mov gpr,sp instructions.  If that's not happening, > then it's worth investigating why. > >> >> Besides that, I have checked the build failure on x264_r. It is >> already fixed on the third version. > Yea, this was a problem with re-recognition.  I think it was fixed by: > >> commit ecfa870ff29d979bd2c3d411643b551f2b6915b0 >> Author: Vineet Gupta >> Date:   Thu Jul 20 11:15:37 2023 -0700 >> >>     RISC-V: optim const DF +0.0 store to mem [PR/110748] >>         Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") >>         DF +0.0 is bitwise all zeros so int x0 store to mem can be >> used to optimize it. > [ ... ] > > > So I think the big question WRT your patch is does it still help the > case where we weren't pulling the fp+offset computation out of a loop. I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to avoid the Thunderbird mangling the test formatting)