On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote: > Accessing local arrays element turned into load form (fp + (index << C1)) + > C2 address. > In the case when access is in the loop we got loop invariant computation. > For some reason, moving out that part cannot be done in > loop-invariant passes. > But we can handle that in target-specific hook (legitimize_address). > That provides an opportunity to rewrite memory access more suitable for the > target architecture. > > This patch solves the mentioned case by rewriting mentioned case to ((fp + > C2) + (index << C1)) > I have evaluated it on SPEC2017 and got an improvement on leela (over 7b > instructions, > .39% of the dynamic count) and dwarfs the regression for gcc (14m > instructions, .0012% > of the dynamic count). > > > gcc/ChangeLog: > * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding. > (mem_shadd_or_shadd_rtx_p): New predicate. So I poked a bit more in this space today. As you may have noted, Manolis's patch still needs another rev. But I was able to test this patch in conjunction with the f-m-o patch as well as the additional improvements made to hard register cprop. The net result was that this patch still shows a nice decrease in instruction counts on leela. It's a bit of a mixed bag elsewhere. I dove a bit deeper into the small regression in x264. In the case I looked at the reason the patch regresses is the original form of the address calculations exposes a common subexpression ie addr1 = (reg1 << 2) + fp + C1 addr2 = (reg1 << 2) + fp + C2 (reg1 << 2) + fp is a common subexpression resulting in something like this as we leave CSE: t = (reg1 << 2) + fp; addr1 = t + C1 addr2 = t + C2 mem (addr1) mem (addr2) C1 and C2 are small constants, so combine generates t = (reg1 << 2) + fp; mem (t+C1) mem (t+C2) FP elimination occurs after IRA and we get: t2 = sp + C3 t = (reg << 2) + t2 mem (t + C1) mem (t + C2) Not bad. Manolis's work should allow us to improve that a bit more. With this patch we don't capture the CSE and ultimately generate slightly worse code. This kind of issue is fairly inherent in reassociations -- and given the regression is 2 orders of magnitude smaller than the improvement my inclination is to go forward with this patch. I've fixed a few formatting issues and changed once conditional to use CONST_INT_P rather than checking the code directory and pushed the final version to the trunk. Thanks for your patience. jeff