From: Jeff Law <jeffreyalaw@gmail.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: "Richard Biener" <rguenther@suse.de>,
"Philipp Tomsich" <philipp.tomsich@vrull.eu>,
"Christoph Müllner" <christoph.muellner@vrull.eu>,
"Jiangning Liu" <jiangning.liu@amperecomputing.com>,
"Jakub Jelinek" <jakub@redhat.com>,
"Andrew Pinski" <quic_apinski@quicinc.com>
Subject: Re: [PATCH v2] Target-independent store forwarding avoidance.
Date: Fri, 7 Jun 2024 16:31:15 -0600 [thread overview]
Message-ID: <e7a8db67-2a90-41f3-972e-d77b97d8dba5@gmail.com> (raw)
In-Reply-To: <20240606101043.3682477-1-manolis.tsamis@vrull.eu>
On 6/6/24 4:10 AM, Manolis Tsamis wrote:
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
>
> strb w2, [x1, 1]
> ldr x0, [x1] # Expensive store forwarding to larger load.
>
> To:
>
> ldr x0, [x1]
> strb w2, [x1]
> bfi x0, x2, 0, 8
>
> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
> Neoverse-N1: +29.4%
> Intel Coffeelake: +13.1%
> AMD 5950X: +17.5%
>
> gcc/ChangeLog:
>
> * Makefile.in: Add avoid-store-forwarding.o.
> * common.opt: New option -favoid-store-forwarding.
> * params.opt: New param store-forwarding-max-distance.
> * doc/invoke.texi: Document new pass.
> * doc/passes.texi: Document new pass.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> * avoid-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
So this is getting a lot more interesting. I think the first time I
looked at this it was more concerned with stores feeding something like
a load-pair and avoiding the store forwarding penalty for that case. Am
I mis-remembering, or did it get significantly more general?
> +
> +static unsigned int stats_sf_detected = 0;
> +static unsigned int stats_sf_avoided = 0;
> +
> +static rtx
> +get_load_mem (rtx expr)
Needs a function comment. You should probably mention that EXPR must be
a single_set in that comment.
+
> + rtx dest;
> + if (eliminate_load)
> + dest = gen_reg_rtx (load_inner_mode);
> + else
> + dest = SET_DEST (load);
> +
> + int move_to_front = -1;
> + int total_cost = 0;
> +
> + /* Check if we can emit bit insert instructions for all forwarded stores. */
> + FOR_EACH_VEC_ELT (stores, i, it)
> + {
> + it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
> + rtx_insn *insns = NULL;
> +
> + /* If we're eliminating the load then find the store with zero offset
> + and use it as the base register to avoid a bit insert. */
> + if (eliminate_load && it->offset == 0)
So often is this triggering? We have various codes in the gimple
optimizers to detect store followed by a load from the same address and
do the forwarding. If they're happening with any frequency that would
be a good sign code in DOM and elsewhere isn't working well.
THe way these passes detect this case is to take store, flip the
operands around (ie, it looks like a load) and enter that into the
expression hash tables. After that standard redundancy elimination
approaches will work.
> + {
> + start_sequence ();
> +
> + /* We can use a paradoxical subreg to force this to a wider mode, as
> + the only use will be inserting the bits (i.e., we don't care about
> + the value of the higher bits). */
Which may be a good hint about the cases you're capturing -- if the
modes/sizes differ that would make more sense since I don't think we're
as likely to be capturing those cases.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4e8967fd8ab..c769744d178 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12657,6 +12657,15 @@ loop unrolling.
> This option is enabled by default at optimization levels @option{-O1},
> @option{-O2}, @option{-O3}, @option{-Os}.
>
> +@opindex favoid-store-forwarding
> +@item -favoid-store-forwarding
> +@itemx -fno-avoid-store-forwarding
> +Many CPUs will stall for many cycles when a load partially depends on previous
> +smaller stores. This pass tries to detect such cases and avoid the penalty by
> +changing the order of the load and store and then fixing up the loaded value.
> +
> +Disabled by default.
Is there any particular reason why this would be off by default at -O1
or higher? It would seem to me that on modern cores that this
transformation should easily be a win. Even on an old in-order core,
avoiding the load with the bit insert is likely profitable, just not as
much so.
> diff --git a/gcc/params.opt b/gcc/params.opt
> index d34ef545bf0..b8115f5c27a 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned stores if it is legal to do
> Common Joined UInteger Var(param_store_merging_max_size) Init(65536) IntegerRange(1, 65536) Param Optimization
> Maximum size of a single store merging region in bytes.
>
> +-param=store-forwarding-max-distance=
> +Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) IntegerRange(1, 1000) Param Optimization
> +Maximum number of instruction distance that a small store forwarded to a larger load may stall.
I think you may need to run the update-urls script since you've added a
new option.
In general it seems pretty reasonable.
I've actually added it to my tester just to see if there's any fallout.
It'll take a week to churn through the long running targets that
bootstrap in QEMU, but the crosses should have data Monday.
jeff
next prev parent reply other threads:[~2024-06-07 22:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-06 10:10 Manolis Tsamis
2024-06-07 22:31 ` Jeff Law [this message]
2024-06-09 14:29 ` Jeff Law
2024-06-10 8:03 ` Manolis Tsamis
2024-06-13 11:40 ` Manolis Tsamis
2024-06-13 13:59 ` Jeff Law
2024-06-10 6:26 ` Richard Biener
2024-06-10 7:55 ` Manolis Tsamis
2024-06-10 18:03 ` Jeff Law
2024-06-10 18:27 ` Philipp Tomsich
2024-06-10 18:37 ` Jeff Law
2024-06-12 13:02 ` Manolis Tsamis
2024-06-11 7:22 ` Richard Biener
2024-06-11 13:37 ` Jeff Law
2024-06-11 13:52 ` Philipp Tomsich
2024-06-11 14:18 ` Jeff Law
2024-06-12 6:47 ` Richard Biener
2024-06-12 14:18 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e7a8db67-2a90-41f3-972e-d77b97d8dba5@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=christoph.muellner@vrull.eu \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=jiangning.liu@amperecomputing.com \
--cc=manolis.tsamis@vrull.eu \
--cc=philipp.tomsich@vrull.eu \
--cc=quic_apinski@quicinc.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).