On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:
>
>
> On 7/25/23 05:24, Jivan Hakobyan wrote:
>> Hi.
>>
>> I re-run the benchmarks and hopefully got the same profit.
>> I also compared the leela's code and figured out the reason.
>>
>> Actually, my and Manolis's patches do the same thing. The difference 
>> is only execution order.
> But shouldn't your patch also allow for for at the last the potential 
> to pull the fp+offset computation out of a loop?  I'm pretty sure 
> Manolis's patch can't do that.
>
>> Because of f-m-o held after the register allocation it cannot 
>> eliminate redundant move 'sp' to another register.
> Actually that's supposed to be handled by a different patch that 
> should already be upstream.  Specifically;
>
>> commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
>> Author: Manolis Tsamis <manolis.tsamis@vrull.eu>
>> Date:   Thu May 25 13:44:41 2023 +0200
>>
>>     cprop_hardreg: Enable propagation of the stack pointer if possible
>>         Propagation of the stack pointer in cprop_hardreg is currenty
>>     forbidden in all cases, due to maybe_mode_change returning NULL.
>>     Relax this restriction and allow propagation when no mode change is
>>     requested.
>>         gcc/ChangeLog:
>>                 * regcprop.cc (maybe_mode_change): Enable stack pointer
>>             propagation.
> I think there were a couple-follow-ups.  But that's the key change 
> that should allow propagation of copies from the stack pointer and 
> thus eliminate the mov gpr,sp instructions.  If that's not happening, 
> then it's worth investigating why.
>
>>
>> Besides that, I have checked the build failure on x264_r. It is 
>> already fixed on the third version.
> Yea, this was a problem with re-recognition.  I think it was fixed by:
>
>> commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
>> Author: Vineet Gupta <vineetg@rivosinc.com>
>> Date:   Thu Jul 20 11:15:37 2023 -0700
>>
>>     RISC-V: optim const DF +0.0 store to mem [PR/110748]
>>         Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
>>         DF +0.0 is bitwise all zeros so int x0 store to mem can be 
>> used to optimize it.
> [ ... ]
>
>
> So I think the big question WRT your patch is does it still help the 
> case where we weren't pulling the fp+offset computation out of a loop.

I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to 
avoid the Thunderbird mangling the test formatting)