Hi Richard, the patch is working correctly with the four lines deleted and "get_addr" on "load_mem" inlined on "rtx_equal_for_cselib_1" call. Also changed store_info to str_info to avoid a warning of duplicate names on bootstrap (one definition rule). Rebased on top of your sme2 changes and submitted as v5. Thanks! Manos. On Mon, Dec 4, 2023 at 10:35 PM Richard Sandiford wrote: > Manos Anagnostakis writes: > > Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford < > > richard.sandiford@arm.com> έγραψε: > > > >> Manos Anagnostakis writes: > >> > This is an RTL pass that detects store forwarding from stores to > larger > >> loads (load pairs). > >> > > >> > This optimization is SPEC2017-driven and was found to be beneficial > for > >> some benchmarks, > >> > through testing on ampere1/ampere1a machines. > >> > > >> > For example, it can transform cases like > >> > > >> > str d5, [sp, #320] > >> > fmul d5, d31, d29 > >> > ldp d31, d17, [sp, #312] # Large load from small store > >> > > >> > to > >> > > >> > str d5, [sp, #320] > >> > fmul d5, d31, d29 > >> > ldr d31, [sp, #312] > >> > ldr d17, [sp, #320] > >> > > >> > Currently, the pass is disabled by default on all architectures and > >> enabled by a target-specific option. > >> > > >> > If deemed beneficial enough for a default, it will be enabled on > >> ampere1/ampere1a, > >> > or other architectures as well, without needing to be turned on by > this > >> option. > >> > > >> > Bootstrapped and regtested on aarch64-linux. > >> > > >> > gcc/ChangeLog: > >> > > >> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > >> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New > >> pass. > >> > * config/aarch64/aarch64-protos.h > >> (make_pass_avoid_store_forwarding): Declare. > >> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New > >> option. > >> > (aarch64-store-forwarding-threshold): New param. > >> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > >> > * doc/invoke.texi: Document new option and new param. > >> > * config/aarch64/aarch64-store-forwarding.cc: New file. > >> > > >> > gcc/testsuite/ChangeLog: > >> > > >> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > >> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > >> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > >> > > >> > Signed-off-by: Manos Anagnostakis > >> > Co-Authored-By: Manolis Tsamis > >> > Co-Authored-By: Philipp Tomsich > >> > --- > >> > Changes in v4: > >> > - I had problems to make cselib_subst_to_values work correctly > >> > so I used cselib_lookup to implement the exact same behaviour > and > >> > record the store value at the time we iterate over it. > >> > - Removed the store/load_mem_addr check from is_forwarding as > >> > unnecessary. > >> > - The pass is called on all optimization levels right now. > >> > - The threshold check should remain as it is as we only care for > >> > the front element of the list. The comment above the check > >> explains > >> > why a single if is enough. > >> > >> I still think this is structurally better as a while. There's no reason > >> in principle we why wouldn't want to record the stores in: > >> > >> stp x0, x1, [x4, #8] > >> ldp x0, x1, [x4, #0] > >> ldp x2, x3, [x4, #16] > >> > >> and then the two stores should have the same distance value. > >> I realise we don't do that yet, but still. > >> > > Ah, you mean forwarding from stp. I was a bit confused with what you > meant > > the previous time. This was not initially meant for this patch, but I > think > > it wouldn't take long to implement that before pushing this. It is your > > call of course if I should include it. > > No strong opinion either way, really. It's definitely fine to check > only for STRs initially. I'd just rather not bake that assumption into > places where we don't need to. > > If you do check for STPs, it'd probably be worth committing the pass > without it at first, waiting until Alex's patch is in, and then doing > STP support as a separate follow-up patch. That's because Alex's patch > will convert STPs to using a single double-width memory operand. > > (Waiting shouldn't jeopardise the chances of the patch going in, since > the pass was posted well in time and the hold-up isn't your fault.) > > > [...] > >> > +/* Return true if STORE_MEM_ADDR is forwarding to the address of > >> LOAD_MEM; > >> > + otherwise false. STORE_MEM_MODE is the mode of the MEM rtx > >> containing > >> > + STORE_MEM_ADDR. */ > >> > + > >> > +static bool > >> > +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode > >> store_mem_mode) > >> > +{ > >> > + /* Sometimes we do not have the proper value. */ > >> > + if (!CSELIB_VAL_PTR (store_mem_addr)) > >> > + return false; > >> > + > >> > + gcc_checking_assert (MEM_P (load_mem)); > >> > + > >> > + rtx load_mem_addr = get_addr (XEXP (load_mem, 0)); > >> > + machine_mode load_mem_mode = GET_MODE (load_mem); > >> > + load_mem_addr = cselib_lookup (load_mem_addr, load_mem_mode, 1, > >> > + load_mem_mode)->val_rtx; > >> > >> Like I said in the previous review, it shouldn't be necessary to do any > >> manual lookup on the load address. rtx_equal_for_cselib_1 does the > >> lookup itself. Does that not work? > >> > > I thought you meant only that the if check was redundant here, which it > > was. > > Ah, no, I meant the full lookup, sorry. > > > I'll reply if cselib can handle the load all by itself. > > Thanks! > > Richard > -- *Manos Anagnostakis | Compiler Engineer |* E: manos.anagnostakis@vrull.eu *VRULL GmbH *| Beatrixgasse 32 1030 Vienna | W: www.vrull.eu | LinkedIn