Re: [PATCH v2] Target-independent store forwarding avoidance.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <jeffreyalaw@gmail.com>
To: Manolis Tsamis <manolis.tsamis@vrull.eu>, gcc-patches@gcc.gnu.org
Cc: "Richard Biener" <rguenther@suse.de>,
	"Philipp Tomsich" <philipp.tomsich@vrull.eu>,
	"Christoph Müllner" <christoph.muellner@vrull.eu>,
	"Jiangning Liu" <jiangning.liu@amperecomputing.com>,
	"Jakub Jelinek" <jakub@redhat.com>,
	"Andrew Pinski" <quic_apinski@quicinc.com>
Subject: Re: [PATCH v2] Target-independent store forwarding avoidance.
Date: Fri, 7 Jun 2024 16:31:15 -0600	[thread overview]
Message-ID: <e7a8db67-2a90-41f3-972e-d77b97d8dba5@gmail.com> (raw)
In-Reply-To: <20240606101043.3682477-1-manolis.tsamis@vrull.eu>



On 6/6/24 4:10 AM, Manolis Tsamis wrote:
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
> 
>       strb    w2, [x1, 1]
>       ldr     x0, [x1]      # Expensive store forwarding to larger load.
> 
> To:
> 
>       ldr     x0, [x1]
>       strb    w2, [x1]
>       bfi     x0, x2, 0, 8
> 
> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
> 
>    Neoverse-N1:      +29.4%
>    Intel Coffeelake: +13.1%
>    AMD 5950X:        +17.5%
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add avoid-store-forwarding.o.
> 	* common.opt: New option -favoid-store-forwarding.
> 	* params.opt: New param store-forwarding-max-distance.
> 	* doc/invoke.texi: Document new pass.
> 	* doc/passes.texi: Document new pass.
> 	* passes.def: Schedule a new pass.
> 	* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> 	* avoid-store-forwarding.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> 	* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> 	* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> 	* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
So this is getting a lot more interesting.  I think the first time I 
looked at this it was more concerned with stores feeding something like 
a load-pair and avoiding the store forwarding penalty for that case.  Am 
I mis-remembering, or did it get significantly more general?





> +
> +static unsigned int stats_sf_detected = 0;
> +static unsigned int stats_sf_avoided = 0;
> +
> +static rtx
> +get_load_mem (rtx expr)
Needs a function comment.  You should probably mention that EXPR must be 
a single_set in that comment.



  +
> +  rtx dest;
> +  if (eliminate_load)
> +    dest = gen_reg_rtx (load_inner_mode);
> +  else
> +    dest = SET_DEST (load);
> +
> +  int move_to_front = -1;
> +  int total_cost = 0;
> +
> +  /* Check if we can emit bit insert instructions for all forwarded stores.  */
> +  FOR_EACH_VEC_ELT (stores, i, it)
> +    {
> +      it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
> +      rtx_insn *insns = NULL;
> +
> +      /* If we're eliminating the load then find the store with zero offset
> +	 and use it as the base register to avoid a bit insert.  */
> +      if (eliminate_load && it->offset == 0)
So often is this triggering?  We have various codes in the gimple 
optimizers to detect store followed by a load from the same address and 
do the forwarding.  If they're happening with any frequency that would 
be a good sign code in DOM and elsewhere isn't working well.

THe way these passes detect this case is to take store, flip the 
operands around (ie, it looks like a load) and enter that into the 
expression hash tables.  After that standard redundancy elimination 
approaches will work.


> +	{
> +	  start_sequence ();
> +
> +	  /* We can use a paradoxical subreg to force this to a wider mode, as
> +	     the only use will be inserting the bits (i.e., we don't care about
> +	     the value of the higher bits).  */
Which may be a good hint about the cases you're capturing -- if the 
modes/sizes differ that would make more sense since I don't think we're 
as likely to be capturing those cases.


> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4e8967fd8ab..c769744d178 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12657,6 +12657,15 @@ loop unrolling.
>   This option is enabled by default at optimization levels @option{-O1},
>   @option{-O2}, @option{-O3}, @option{-Os}.
>   
> +@opindex favoid-store-forwarding
> +@item -favoid-store-forwarding
> +@itemx -fno-avoid-store-forwarding
> +Many CPUs will stall for many cycles when a load partially depends on previous
> +smaller stores.  This pass tries to detect such cases and avoid the penalty by
> +changing the order of the load and store and then fixing up the loaded value.
> +
> +Disabled by default.
Is there any particular reason why this would be off by default at -O1 
or higher?  It would seem to me that on modern cores that this 
transformation should easily be a win.  Even on an old in-order core, 
avoiding the load with the bit insert is likely profitable, just not as 
much so.




> diff --git a/gcc/params.opt b/gcc/params.opt
> index d34ef545bf0..b8115f5c27a 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned stores if it is legal to do
>   Common Joined UInteger Var(param_store_merging_max_size) Init(65536) IntegerRange(1, 65536) Param Optimization
>   Maximum size of a single store merging region in bytes.
>   
> +-param=store-forwarding-max-distance=
> +Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) IntegerRange(1, 1000) Param Optimization
> +Maximum number of instruction distance that a small store forwarded to a larger load may stall.
I think you may need to run the update-urls script since you've added a 
new option.


In general it seems pretty reasonable.

I've actually added it to my tester just to see if there's any fallout. 
It'll take a week to churn through the long running targets that 
bootstrap in QEMU, but the crosses should have data Monday.

jeff

next prev parent reply	other threads:[~2024-06-07 22:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06 10:10 Manolis Tsamis
2024-06-07 22:31 ` Jeff Law [this message]
2024-06-09 14:29   ` Jeff Law
2024-06-10  8:03     ` Manolis Tsamis
2024-06-13 11:40     ` Manolis Tsamis
2024-06-13 13:59       ` Jeff Law
2024-06-10  6:26   ` Richard Biener
2024-06-10  7:55   ` Manolis Tsamis
2024-06-10 18:03     ` Jeff Law
2024-06-10 18:27       ` Philipp Tomsich
2024-06-10 18:37         ` Jeff Law
2024-06-12 13:02         ` Manolis Tsamis
2024-06-11  7:22       ` Richard Biener
2024-06-11 13:37         ` Jeff Law
2024-06-11 13:52           ` Philipp Tomsich
2024-06-11 14:18             ` Jeff Law
2024-06-12  6:47               ` Richard Biener
2024-06-12 14:18                 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7a8db67-2a90-41f3-972e-d77b97d8dba5@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=christoph.muellner@vrull.eu \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jiangning.liu@amperecomputing.com \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    --cc=quic_apinski@quicinc.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).