Re: [PATCH v2] Target-independent store forwarding avoidance.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: "Philipp Tomsich" <philipp.tomsich@vrull.eu>,
	"Manolis Tsamis" <manolis.tsamis@vrull.eu>,
	gcc-patches@gcc.gnu.org,
	"Christoph Müllner" <christoph.muellner@vrull.eu>,
	"Jiangning Liu" <jiangning.liu@amperecomputing.com>,
	"Jakub Jelinek" <jakub@redhat.com>,
	"Andrew Pinski" <quic_apinski@quicinc.com>
Subject: Re: [PATCH v2] Target-independent store forwarding avoidance.
Date: Wed, 12 Jun 2024 08:47:17 +0200 (CEST)	[thread overview]
Message-ID: <8r9332q4-7q15-080s-nnp4-11o24q9srsp8@fhfr.qr> (raw)
In-Reply-To: <28fa1d89-6b7b-4940-bfe6-798627ced812@gmail.com>

On Tue, 11 Jun 2024, Jeff Law wrote:

> 
> 
> On 6/11/24 7:52 AM, Philipp Tomsich wrote:
> > On Tue, 11 Jun 2024 at 15:37, Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 6/11/24 1:22 AM, Richard Biener wrote:
> >>
> >>>> Absolutely.   But forwarding from a smaller store to a wider load is
> >>>> painful
> >>>> from a hardware standpoint and if we can avoid it from a codegen
> >>>> standpoint,
> >>>> we should.
> >>>
> >>> Note there's also the possibility to increase the distance between the
> >>> store and the load - in fact the time a store takes to a) retire and
> >>> b) get from the store buffers to where the load-store unit would pick it
> >>> up (L1-D) is another target specific tuning knob.  That said, if that
> >>> distance isn't too large (on x86 there might be only an upper bound
> >>> given by the OOO window size and the L1D store latency(?), possibly
> >>> also additionally by the store buffer size) attacking the issue in
> >>> sched1 or sched2 might be another possibility.  So I think pass placement
> >>> is another thing to look at - I'd definitely place it after sched1
> >>> but I guess without looking at the pass again it's way before that?
> >> True, but I doubt there are enough instructions we could sink the load
> >> past to make a measurable difference.  This is especially true on the
> >> class of uarchs where this is going to be most important.
> >>
> >> In the case where the store/load can't be interchanged and thus this new
> >> pass rejects any transformation, we could try to do something in the
> >> scheduler to defer the load as long as possible.  Essentially it's a
> >> true dependency through a memory location using must-aliasing properties
> >> and in that case we'd want to crank up the "latency" of the store so
> >> that the load gets pushed away.
> >>
> >> I think one of the difficulties here is we often model stores as not
> >> having any latency (which is probably OK in most cases).  Input data
> >> dependencies and structural hazards dominate dominate considerations for
> >> stores.
> > 
> > I don't think that TARGET_SCHED_ADJUST_COST would even be called for a
> > data-dependence through a memory location.
> Probably correct, but we could adjust that behavior or add another mechanism
> to adjust costs based on memory dependencies.
> 
> > 
> > Note that, strictly speaking, the store does not have an extended
> > latency; it will be the load that will have an increased latency
> > (almost as if we knew that the load will miss to one of the outer
> > points-of-coherence).  The difference being that the load would not
> > hang around in a scheduling queue until being dispatched, but its
> > execution would start immediately and take more cycles (and
> > potentially block an execution pipeline for longer).
> Absolutely true.  I'm being imprecise in my language, increasing the "latency"
> of the store is really a proxy for "do something to encourage the load to move
> away from the store".
> 
> But overall rewriting the sequence is probably the better choice.  In my mind
> the scheduler approach would be a secondary attempt if we couldn't interchange
> the store/load.  And I'd make a small bet that its impact would be on the
> margins if we're doing a reasonable job in the new pass.

One of the points I wanted to make is that sched1 can make quite a
difference as to the relative distance of the store and load and
we have the instruction window the pass considers when scanning
(possibly driven by target uarch details).  So doing the rewriting
before sched1 might be not ideal (but I don't know how much cleanup
work the pass leaves behind - there's nothing between sched1 and RA).

On the hardware side I always wondered whether a failed load-to-store
forward results in the load uop stalling (because the hardware actually
_did_ see the conflict with an in-flight store) or whether this gets
catched later as the hardware speculates a load from L1 (with the
wrong value) but has to roll back because of the conflict.  I would
imagine the latter is cheaper to implement but worse in case of
conflict.

Richard.

next prev parent reply	other threads:[~2024-06-12  6:47 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-06 10:10 Manolis Tsamis
2024-06-07 22:31 ` Jeff Law
2024-06-09 14:29   ` Jeff Law
2024-06-10  8:03     ` Manolis Tsamis
2024-06-13 11:40     ` Manolis Tsamis
2024-06-13 13:59       ` Jeff Law
2024-06-10  6:26   ` Richard Biener
2024-06-10  7:55   ` Manolis Tsamis
2024-06-10 18:03     ` Jeff Law
2024-06-10 18:27       ` Philipp Tomsich
2024-06-10 18:37         ` Jeff Law
2024-06-12 13:02         ` Manolis Tsamis
2024-06-11  7:22       ` Richard Biener
2024-06-11 13:37         ` Jeff Law
2024-06-11 13:52           ` Philipp Tomsich
2024-06-11 14:18             ` Jeff Law
2024-06-12  6:47               ` Richard Biener [this message]
2024-06-12 14:18                 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8r9332q4-7q15-080s-nnp4-11o24q9srsp8@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=christoph.muellner@vrull.eu \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=jiangning.liu@amperecomputing.com \
    --cc=manolis.tsamis@vrull.eu \
    --cc=philipp.tomsich@vrull.eu \
    --cc=quic_apinski@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).