Στις Δευ 4 Δεκ 2023, 21:22 ο χρήστης Richard Sandiford < richard.sandiford@arm.com> έγραψε: > Manos Anagnostakis writes: > > This is an RTL pass that detects store forwarding from stores to larger > loads (load pairs). > > > > This optimization is SPEC2017-driven and was found to be beneficial for > some benchmarks, > > through testing on ampere1/ampere1a machines. > > > > For example, it can transform cases like > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldp d31, d17, [sp, #312] # Large load from small store > > > > to > > > > str d5, [sp, #320] > > fmul d5, d31, d29 > > ldr d31, [sp, #312] > > ldr d17, [sp, #320] > > > > Currently, the pass is disabled by default on all architectures and > enabled by a target-specific option. > > > > If deemed beneficial enough for a default, it will be enabled on > ampere1/ampere1a, > > or other architectures as well, without needing to be turned on by this > option. > > > > Bootstrapped and regtested on aarch64-linux. > > > > gcc/ChangeLog: > > > > * config.gcc: Add aarch64-store-forwarding.o to extra_objs. > > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New > pass. > > * config/aarch64/aarch64-protos.h > (make_pass_avoid_store_forwarding): Declare. > > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New > option. > > (aarch64-store-forwarding-threshold): New param. > > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o > > * doc/invoke.texi: Document new option and new param. > > * config/aarch64/aarch64-store-forwarding.cc: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. > > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. > > * gcc.target/aarch64/ldp_ssll_overlap.c: New test. > > > > Signed-off-by: Manos Anagnostakis > > Co-Authored-By: Manolis Tsamis > > Co-Authored-By: Philipp Tomsich > > --- > > Changes in v4: > > - I had problems to make cselib_subst_to_values work correctly > > so I used cselib_lookup to implement the exact same behaviour and > > record the store value at the time we iterate over it. > > - Removed the store/load_mem_addr check from is_forwarding as > > unnecessary. > > - The pass is called on all optimization levels right now. > > - The threshold check should remain as it is as we only care for > > the front element of the list. The comment above the check > explains > > why a single if is enough. > > I still think this is structurally better as a while. There's no reason > in principle we why wouldn't want to record the stores in: > > stp x0, x1, [x4, #8] > ldp x0, x1, [x4, #0] > ldp x2, x3, [x4, #16] > > and then the two stores should have the same distance value. > I realise we don't do that yet, but still. > Ah, you mean forwarding from stp. I was a bit confused with what you meant the previous time. This was not initially meant for this patch, but I think it wouldn't take long to implement that before pushing this. It is your call of course if I should include it. > > > - The documentation changes requested. > > - Adjusted a comment. > > > > gcc/config.gcc | 1 + > > gcc/config/aarch64/aarch64-passes.def | 1 + > > gcc/config/aarch64/aarch64-protos.h | 1 + > > .../aarch64/aarch64-store-forwarding.cc | 321 ++++++++++++++++++ > > gcc/config/aarch64/aarch64.opt | 9 + > > gcc/config/aarch64/t-aarch64 | 10 + > > gcc/doc/invoke.texi | 11 +- > > .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ > > .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ > > .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ > > 10 files changed, 452 insertions(+), 1 deletion(-) > > create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc > > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > > > diff --git a/gcc/config.gcc b/gcc/config.gcc > > index 748430194f3..2ee3b61c4fa 100644 > > --- a/gcc/config.gcc > > +++ b/gcc/config.gcc > > @@ -350,6 +350,7 @@ aarch64*-*-*) > > cxx_target_objs="aarch64-c.o" > > d_target_objs="aarch64-d.o" > > extra_objs="aarch64-builtins.o aarch-common.o > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > cortex-a57-fma-steering.o aarch64-speculation.o > falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o" > > + extra_objs="${extra_objs} aarch64-store-forwarding.o" > > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" > > target_has_targetm_common=yes > > ;; > > diff --git a/gcc/config/aarch64/aarch64-passes.def > b/gcc/config/aarch64/aarch64-passes.def > > index 6ace797b738..fa79e8adca8 100644 > > --- a/gcc/config/aarch64/aarch64-passes.def > > +++ b/gcc/config/aarch64/aarch64-passes.def > > @@ -23,3 +23,4 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, > pass_track_speculation); > > INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); > > INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); > > INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); > > +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding); > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > > index d2718cc87b3..7d9dfa06af9 100644 > > --- a/gcc/config/aarch64/aarch64-protos.h > > +++ b/gcc/config/aarch64/aarch64-protos.h > > @@ -1050,6 +1050,7 @@ rtl_opt_pass *make_pass_track_speculation > (gcc::context *); > > rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *); > > rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt); > > rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt); > > +rtl_opt_pass *make_pass_avoid_store_forwarding (gcc::context *ctxt); > > > > poly_uint64 aarch64_regmode_natural_size (machine_mode); > > > > diff --git a/gcc/config/aarch64/aarch64-store-forwarding.cc > b/gcc/config/aarch64/aarch64-store-forwarding.cc > > new file mode 100644 > > index 00000000000..ae3cbe519cd > > --- /dev/null > > +++ b/gcc/config/aarch64/aarch64-store-forwarding.cc > > @@ -0,0 +1,321 @@ > > +/* Avoid store forwarding optimization pass. > > + Copyright (C) 2023 Free Software Foundation, Inc. > > + Contributed by VRULL GmbH. > > + > > + This file is part of GCC. > > + > > + GCC is free software; you can redistribute it and/or modify it > > + under the terms of the GNU General Public License as published by > > + the Free Software Foundation; either version 3, or (at your option) > > + any later version. > > + > > + GCC is distributed in the hope that it will be useful, but > > + WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + General Public License for more details. > > + > > + You should have received a copy of the GNU General Public License > > + along with GCC; see the file COPYING3. If not see > > + . */ > > + > > +#define IN_TARGET_CODE 1 > > + > > +#include "config.h" > > +#define INCLUDE_LIST > > +#include "system.h" > > +#include "coretypes.h" > > +#include "backend.h" > > +#include "rtl.h" > > +#include "alias.h" > > +#include "rtlanal.h" > > +#include "tree-pass.h" > > +#include "cselib.h" > > + > > +/* This is an RTL pass that detects store forwarding from stores to > larger > > + loads (load pairs). For example, it can transform cases like > > + > > + str d5, [sp, #320] > > + fmul d5, d31, d29 > > + ldp d31, d17, [sp, #312] # Large load from small store > > + > > + to > > + > > + str d5, [sp, #320] > > + fmul d5, d31, d29 > > + ldr d31, [sp, #312] > > + ldr d17, [sp, #320] > > + > > + Design: The pass follows a straightforward design. It starts by > > + initializing the alias analysis and the cselib. Both of these are > used to > > + find stores and larger loads with overlapping addresses, which are > > + candidates for store forwarding optimizations. It then scans on > basic block > > + level to find stores that forward to larger loads and handles them > > + accordingly as described in the above example. Finally, the alias > analysis > > + and the cselib library are closed. */ > > + > > +typedef struct > > +{ > > + rtx_insn *store_insn; > > + rtx store_mem_addr; > > + unsigned int insn_cnt; > > +} store_info; > > + > > +typedef std::list list_store_info; > > + > > +/* Statistics counters. */ > > +static unsigned int stats_store_count = 0; > > +static unsigned int stats_ldp_count = 0; > > +static unsigned int stats_ssll_count = 0; > > +static unsigned int stats_transformed_count = 0; > > + > > +/* Default. */ > > +static rtx dummy; > > +static bool is_load (rtx expr, rtx &op_1=dummy); > > + > > +/* Return true if SET expression EXPR is a store; otherwise false. */ > > + > > +static bool > > +is_store (rtx expr) > > +{ > > + return MEM_P (SET_DEST (expr)); > > +} > > + > > +/* Return true if SET expression EXPR is a load; otherwise false. OP_1 > will > > + contain the MEM operand of the load. */ > > + > > +static bool > > +is_load (rtx expr, rtx &op_1) > > +{ > > + op_1 = SET_SRC (expr); > > + > > + if (GET_CODE (op_1) == ZERO_EXTEND > > + || GET_CODE (op_1) == SIGN_EXTEND) > > + op_1 = XEXP (op_1, 0); > > + > > + return MEM_P (op_1); > > +} > > + > > +/* Return true if STORE_MEM_ADDR is forwarding to the address of > LOAD_MEM; > > + otherwise false. STORE_MEM_MODE is the mode of the MEM rtx > containing > > + STORE_MEM_ADDR. */ > > + > > +static bool > > +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode > store_mem_mode) > > +{ > > + /* Sometimes we do not have the proper value. */ > > + if (!CSELIB_VAL_PTR (store_mem_addr)) > > + return false; > > + > > + gcc_checking_assert (MEM_P (load_mem)); > > + > > + rtx load_mem_addr = get_addr (XEXP (load_mem, 0)); > > + machine_mode load_mem_mode = GET_MODE (load_mem); > > + load_mem_addr = cselib_lookup (load_mem_addr, load_mem_mode, 1, > > + load_mem_mode)->val_rtx; > > Like I said in the previous review, it shouldn't be necessary to do any > manual lookup on the load address. rtx_equal_for_cselib_1 does the > lookup itself. Does that not work? > I thought you meant only that the if check was redundant here, which it was. I'll reply if cselib can handle the load all by itself. Thanks for the review! Manos. > > The patch is OK with the four lines above deleted, if that works, > and with s/if/while/. But please reply if that combination doesn't work. > > Thanks, > Richard > > > + return rtx_equal_for_cselib_1 (store_mem_addr, > > + load_mem_addr, > > + store_mem_mode, 0); > > +} > > + > > +/* Return true if INSN is a load pair, preceded by a store forwarding > to it; > > + otherwise false. STORE_EXPRS contains the stores. */ > > + > > +static bool > > +is_small_store_to_large_load (list_store_info store_exprs, rtx_insn > *insn) > > +{ > > + unsigned int load_count = 0; > > + bool forwarding = false; > > + rtx expr = PATTERN (insn); > > + > > + if (GET_CODE (expr) != PARALLEL > > + || XVECLEN (expr, 0) != 2) > > + return false; > > + > > + for (int i = 0; i < XVECLEN (expr, 0); i++) > > + { > > + rtx op_1; > > + rtx out_exp = XVECEXP (expr, 0, i); > > + > > + if (GET_CODE (out_exp) != SET) > > + continue; > > + > > + if (!is_load (out_exp, op_1)) > > + continue; > > + > > + load_count++; > > + > > + for (store_info str : store_exprs) > > + { > > + rtx store_insn = str.store_insn; > > + > > + if (!is_forwarding (str.store_mem_addr, op_1, > > + GET_MODE (SET_DEST (PATTERN (store_insn))))) > > + continue; > > + > > + if (dump_file) > > + { > > + fprintf (dump_file, > > + "Store forwarding to PARALLEL with loads:\n"); > > + fprintf (dump_file, " From: "); > > + print_rtl_single (dump_file, store_insn); > > + fprintf (dump_file, " To: "); > > + print_rtl_single (dump_file, insn); > > + } > > + > > + forwarding = true; > > + } > > + } > > + > > + if (load_count == 2) > > + stats_ldp_count++; > > + > > + return load_count == 2 && forwarding; > > +} > > + > > +/* Break a load pair into its 2 distinct loads, except if the base > source > > + address to load from is overwriten in the first load. INSN should > be the > > + PARALLEL of the load pair. */ > > + > > +static void > > +break_ldp (rtx_insn *insn) > > +{ > > + rtx expr = PATTERN (insn); > > + > > + gcc_checking_assert (GET_CODE (expr) == PARALLEL && XVECLEN (expr, 0) > == 2); > > + > > + rtx load_0 = XVECEXP (expr, 0, 0); > > + rtx load_1 = XVECEXP (expr, 0, 1); > > + > > + gcc_checking_assert (is_load (load_0) && is_load (load_1)); > > + > > + /* The base address was overwriten in the first load. */ > > + if (reg_mentioned_p (SET_DEST (load_0), SET_SRC (load_1))) > > + return; > > + > > + emit_insn_before (load_0, insn); > > + emit_insn_before (load_1, insn); > > + remove_insn (insn); > > + > > + stats_transformed_count++; > > +} > > + > > +static void > > +scan_and_transform_bb_level () > > +{ > > + rtx_insn *insn, *next; > > + basic_block bb; > > + FOR_EACH_BB_FN (bb, cfun) > > + { > > + list_store_info store_exprs; > > + unsigned int insn_cnt = 0; > > + for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = > next) > > + { > > + next = NEXT_INSN (insn); > > + > > + /* If we cross a CALL_P insn, clear the list, because the > > + small-store-to-large-load is unlikely to cause performance > > + difference. */ > > + if (CALL_P (insn)) > > + store_exprs.clear (); > > + > > + if (!NONJUMP_INSN_P (insn)) > > + continue; > > + > > + cselib_process_insn (insn); > > + > > + rtx expr = single_set (insn); > > + > > + /* If a store is encountered, append it to the store_exprs list > to > > + check it later. */ > > + if (expr && is_store (expr)) > > + { > > + rtx store_mem = SET_DEST (expr); > > + rtx store_mem_addr = get_addr (XEXP (store_mem, 0)); > > + machine_mode store_mem_mode = GET_MODE (store_mem); > > + store_mem_addr = cselib_lookup (store_mem_addr, > > + store_mem_mode, 1, > > + store_mem_mode)->val_rtx; > > + store_exprs.push_back ({ insn, store_mem_addr, insn_cnt++ }); > > + stats_store_count++; > > + } > > + > > + /* Check for small-store-to-large-load. */ > > + if (is_small_store_to_large_load (store_exprs, insn)) > > + { > > + stats_ssll_count++; > > + break_ldp (insn); > > + } > > + > > + /* Pop the first store from the list if it's distance crosses the > > + maximum accepted threshold. The list contains unique values > > + sorted in ascending order, meaning that only one distance can > be > > + off at a time. */ > > + if (!store_exprs.empty () > > + && (insn_cnt - store_exprs.front ().insn_cnt > > + > (unsigned int) > aarch64_store_forwarding_threshold_param)) > > + store_exprs.pop_front (); > > + } > > + } > > +} > > + > > +static void > > +execute_avoid_store_forwarding () > > +{ > > + init_alias_analysis (); > > + cselib_init (CSELIB_RECORD_MEMORY | CSELIB_PRESERVE_CONSTANTS); > > + scan_and_transform_bb_level (); > > + end_alias_analysis (); > > + cselib_finish (); > > + statistics_counter_event (cfun, "Number of stores identified: ", > > + stats_store_count); > > + statistics_counter_event (cfun, "Number of load pairs identified: ", > > + stats_ldp_count); > > + statistics_counter_event (cfun, > > + "Number of forwarding cases identified: ", > > + stats_ssll_count); > > + statistics_counter_event (cfun, "Number of trasformed cases: ", > > + stats_transformed_count); > > +} > > + > > +const pass_data pass_data_avoid_store_forwarding = > > +{ > > + RTL_PASS, /* type. */ > > + "avoid_store_forwarding", /* name. */ > > + OPTGROUP_NONE, /* optinfo_flags. */ > > + TV_NONE, /* tv_id. */ > > + 0, /* properties_required. */ > > + 0, /* properties_provided. */ > > + 0, /* properties_destroyed. */ > > + 0, /* todo_flags_start. */ > > + 0 /* todo_flags_finish. */ > > +}; > > + > > +class pass_avoid_store_forwarding : public rtl_opt_pass > > +{ > > +public: > > + pass_avoid_store_forwarding (gcc::context *ctxt) > > + : rtl_opt_pass (pass_data_avoid_store_forwarding, ctxt) > > + {} > > + > > + /* opt_pass methods: */ > > + virtual bool gate (function *) > > + { > > + return aarch64_flag_avoid_store_forwarding; > > + } > > + > > + virtual unsigned int execute (function *) > > + { > > + execute_avoid_store_forwarding (); > > + return 0; > > + } > > + > > +}; // class pass_avoid_store_forwarding > > + > > +/* Create a new avoid store forwarding pass instance. */ > > + > > +rtl_opt_pass * > > +make_pass_avoid_store_forwarding (gcc::context *ctxt) > > +{ > > + return new pass_avoid_store_forwarding (ctxt); > > +} > > diff --git a/gcc/config/aarch64/aarch64.opt > b/gcc/config/aarch64/aarch64.opt > > index f5a518202a1..e4498d53b46 100644 > > --- a/gcc/config/aarch64/aarch64.opt > > +++ b/gcc/config/aarch64/aarch64.opt > > @@ -304,6 +304,10 @@ moutline-atomics > > Target Var(aarch64_flag_outline_atomics) Init(2) Save > > Generate local calls to out-of-line atomic operations. > > > > +mavoid-store-forwarding > > +Target Bool Var(aarch64_flag_avoid_store_forwarding) Init(0) > Optimization > > +Avoid store forwarding to load pairs. > > + > > -param=aarch64-sve-compare-costs= > > Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) > IntegerRange(0, 1) Param > > When vectorizing for SVE, consider using unpacked vectors for smaller > elements and use the cost model to pick the cheapest approach. Also use > the cost model to choose between SVE and Advanced SIMD vectorization. > > @@ -360,3 +364,8 @@ Enum(aarch64_ldp_stp_policy) String(never) > Value(AARCH64_LDP_STP_POLICY_NEVER) > > > > EnumValue > > Enum(aarch64_ldp_stp_policy) String(aligned) > Value(AARCH64_LDP_STP_POLICY_ALIGNED) > > + > > +-param=aarch64-store-forwarding-threshold= > > +Target Joined UInteger Var(aarch64_store_forwarding_threshold_param) > Init(20) Param > > +Maximum instruction distance allowed between a store and a load pair > for this to be > > +considered a candidate to avoid when using -mavoid-store-forwarding. > > diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 > > index a9a244ab6d6..7639b50358d 100644 > > --- a/gcc/config/aarch64/t-aarch64 > > +++ b/gcc/config/aarch64/t-aarch64 > > @@ -176,6 +176,16 @@ aarch64-cc-fusion.o: > $(srcdir)/config/aarch64/aarch64-cc-fusion.cc \ > > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > > $(srcdir)/config/aarch64/aarch64-cc-fusion.cc > > > > +aarch64-store-forwarding.o: \ > > + $(srcdir)/config/aarch64/aarch64-store-forwarding.cc \ > > + $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(REGS_H) insn-config.h > $(RTL_BASE_H) \ > > + dominance.h cfg.h cfganal.h $(BASIC_BLOCK_H) $(INSN_ATTR_H) > $(RECOG_H) \ > > + output.h hash-map.h $(DF_H) $(OBSTACK_H) $(TARGET_H) $(RTL_H) \ > > + $(CONTEXT_H) $(TREE_PASS_H) regrename.h \ > > + $(srcdir)/config/aarch64/aarch64-protos.h > > + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > > + $(srcdir)/config/aarch64/aarch64-store-forwarding.cc > > + > > comma=, > > MULTILIB_OPTIONS = $(subst $(comma),/, $(patsubst %, mabi=%, $(subst > $(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG)))) > > MULTILIB_DIRNAMES = $(subst $(comma), ,$(TM_MULTILIB_CONFIG)) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index 2b51ff304f6..39dbc04207e 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -798,7 +798,7 @@ Objective-C and Objective-C++ Dialects}. > > -moverride=@var{string} -mverbose-cost-dump > > -mstack-protector-guard=@var{guard} > -mstack-protector-guard-reg=@var{sysreg} > > -mstack-protector-guard-offset=@var{offset} -mtrack-speculation > > --moutline-atomics } > > +-moutline-atomics -mavoid-store-forwarding} > > > > @emph{Adapteva Epiphany Options} > > @gccoptlist{-mhalf-reg-file -mprefer-short-insn-regs > > @@ -16738,6 +16738,11 @@ With @option{--param=aarch64-stp-policy=never}, > do not emit stp. > > With @option{--param=aarch64-stp-policy=aligned}, emit stp only if the > > source pointer is aligned to at least double the alignment of the type. > > > > +@item aarch64-store-forwarding-threshold > > +Maximum allowed instruction distance between a store and a load pair for > > +this to be considered a candidate to avoid when using > > +@option{-mavoid-store-forwarding}. > > + > > @item aarch64-loop-vect-issue-rate-niters > > The tuning for some AArch64 CPUs tries to take both latencies and issue > > rates into account when deciding whether a loop should be vectorized > > @@ -20763,6 +20768,10 @@ Generate code which uses only the > general-purpose registers. This will prevent > > the compiler from using floating-point and Advanced SIMD registers but > will not > > impose any restrictions on the assembler. > > > > +@item -mavoid-store-forwarding > > +@itemx -mno-avoid-store-forwarding > > +Avoid store forwarding to load pairs. > > + > > @opindex mlittle-endian > > @item -mlittle-endian > > Generate little-endian code. This is the default when GCC is > configured for an > > diff --git > a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > new file mode 100644 > > index 00000000000..b77de6c64b6 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c > > @@ -0,0 +1,33 @@ > > +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ > > + > > +#include > > + > > +typedef int v4si __attribute__ ((vector_size (16))); > > + > > +/* Different address, same offset, no overlap */ > > + > > +#define LDP_SSLL_NO_OVERLAP_ADDRESS(TYPE) \ > > +TYPE ldp_ssll_no_overlap_address_##TYPE(TYPE *ld_arr, TYPE *st_arr, > TYPE *st_arr_2, TYPE i, TYPE dummy){ \ > > + TYPE r, y; \ > > + st_arr[0] = i; \ > > + ld_arr[0] = dummy; \ > > + r = st_arr_2[0]; \ > > + y = st_arr_2[1]; \ > > + return r + y; \ > > +} > > + > > +LDP_SSLL_NO_OVERLAP_ADDRESS(uint32_t) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(uint64_t) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(int32_t) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(int64_t) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(int) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(long) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(float) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(double) > > +LDP_SSLL_NO_OVERLAP_ADDRESS(v4si) > > + > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ > > +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ > > diff --git > a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > new file mode 100644 > > index 00000000000..f1b3a66abfd > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c > > @@ -0,0 +1,33 @@ > > +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ > > + > > +#include > > + > > +typedef int v4si __attribute__ ((vector_size (16))); > > + > > +/* Same address, different offset, no overlap */ > > + > > +#define LDP_SSLL_NO_OVERLAP_OFFSET(TYPE) \ > > +TYPE ldp_ssll_no_overlap_offset_##TYPE(TYPE *ld_arr, TYPE *st_arr, TYPE > i, TYPE dummy){ \ > > + TYPE r, y; \ > > + st_arr[0] = i; \ > > + ld_arr[0] = dummy; \ > > + r = st_arr[10]; \ > > + y = st_arr[11]; \ > > + return r + y; \ > > +} > > + > > +LDP_SSLL_NO_OVERLAP_OFFSET(uint32_t) > > +LDP_SSLL_NO_OVERLAP_OFFSET(uint64_t) > > +LDP_SSLL_NO_OVERLAP_OFFSET(int32_t) > > +LDP_SSLL_NO_OVERLAP_OFFSET(int64_t) > > +LDP_SSLL_NO_OVERLAP_OFFSET(int) > > +LDP_SSLL_NO_OVERLAP_OFFSET(long) > > +LDP_SSLL_NO_OVERLAP_OFFSET(float) > > +LDP_SSLL_NO_OVERLAP_OFFSET(double) > > +LDP_SSLL_NO_OVERLAP_OFFSET(v4si) > > + > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ > > +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > new file mode 100644 > > index 00000000000..8d5ce5cc87e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c > > @@ -0,0 +1,33 @@ > > +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ > > + > > +#include > > + > > +typedef int v4si __attribute__ ((vector_size (16))); > > + > > +/* Same address, same offset, overlap */ > > + > > +#define LDP_SSLL_OVERLAP(TYPE) \ > > +TYPE ldp_ssll_overlap_##TYPE(TYPE *ld_arr, TYPE *st_arr, TYPE i, TYPE > dummy){ \ > > + TYPE r, y; \ > > + st_arr[0] = i; \ > > + ld_arr[0] = dummy; \ > > + r = st_arr[0]; \ > > + y = st_arr[1]; \ > > + return r + y; \ > > +} > > + > > +LDP_SSLL_OVERLAP(uint32_t) > > +LDP_SSLL_OVERLAP(uint64_t) > > +LDP_SSLL_OVERLAP(int32_t) > > +LDP_SSLL_OVERLAP(int64_t) > > +LDP_SSLL_OVERLAP(int) > > +LDP_SSLL_OVERLAP(long) > > +LDP_SSLL_OVERLAP(float) > > +LDP_SSLL_OVERLAP(double) > > +LDP_SSLL_OVERLAP(v4si) > > + > > +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */ > > +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 0 } } */ > > +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 0 } } */ > > +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */ > > -- > > 2.41.0 >