public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rsandifo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
Date: Wed, 08 Dec 2021 09:33:42 +0000	[thread overview]
Message-ID: <bug-98782-4-Q1RNgnBXVX@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-98782-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #16 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Tamar Christina from comment #15)
> > From that point of view, it doesn't look like the memory and register
> > costs of R are too wrong here.  The things being costed are the store
> > and load around the call (which do exist if we allocate a call-clobbered
> > register) and the loads at each use site (which do exist if we spill R).
> 
> Indeed, I don't think the heuristics are wrong, but because one frequency
> CALL_FREQ grows much quicker than BB_FREQ and at the smaller values they are
> a
> bit sensitive to any changes.  The edge probabilities can barely change while
> the BB_FREQ can change dramatically.
On “CALL_FREQ grows much quicker than BB_FREQ”: for r104, the
ALLOCNO_FREQ ought in principle to be fixed for a given loop iteration
count.  It shouldn't grow or shrink based on the value of SPILLED.
That's because every execution of the loop body involves exactly one
reference to r104.  SPILLED specifies the probability that that single
reference is the “call” use rather than the “non-call” use, but it doesn't
change the total number of references per iteration.

So I think the only reason we see the different ALLOCNO_FREQs in:

   ALLOCNO_FREQ 989, …

vs:

   ALLOCNO_FREQ 990, …

is round-off error.  If the values had more precision, I think we'd
have a fixed ALLOCNO_FREQ and a varying ALLOCNO_CALL_FREQ.

At one extreme, if SPILLED is very low (but if we nevertheless
continue to duplicate the common part of the loop body) then
storing and loading around the call becomes very cheap
(ALLOCNO_CALL_FREQ is much lower than the conceptually fixed
ALLOCNO_FREQ).  If SPILLED is very high (but again we continue
to duplicate the common part of the loop body) then storing and
loading around the call becomes very expensive.  So I guess the
question is: where is the cut-off?

Given that:

- the loop iterates 1024 times
- there is a single store outside the loop
- the target claims that loads and stores have equal cost

ISTM that the cut-off really is in the range [0.5, 0.51].  If the
value of SPILLED reflects the real probability at runtime then I think
IRA is doing the right thing, given the claimed costs of storing and
loading.

I guess the problems are:

- SPILLED is only a guess
- the aarch64 costs of stores and loads don't reflect reality

> > Like Feng Xue says in comment 1, I think the main missed optimisation
> > opportunity here is that foo + 1024 is invariant, so if we allocate
> > a call-clobbered register, we could save R once outside the loop
> > and reload it after each call.  That would give:
> > 
> > - a store of R outside the loop (low execution count)
> > - a load of R inside the loop (after the call) with freq 0.51 * loop iters
> > 
> 
> Yes, that is the ideal solution, but also requires more changes to RA.
True :-)

> Instead I've chosen a middle ground here (same as yours but done in
> ira_tune_allocno_costs instead), which is to store and load only inside
> the loop, but to do so only in the BB which contains the call.
I don't think you were saying otherwise, but just FTR: I wasn't
proposing a solution, I was just describing a hack.  It seemed
to me like IRA was making the right decision for r104 in isolation,
for the given SPILLED value and target costs.  My hack to force
an allocation for r104 made things worse.

> > which is cheaper than both the current approaches.  We don't do that
> > optimisation yet though, so the current costing seems to reflect what we
> > currently generate.
> 
> In many (if not most) Arches stores are significantly cheaper than the loads
> though. So the store before the call doesn't end up making that much of a
> difference, but yes it adds up if you have many of them.
Yeah.  Could we fix the problem that way instead?  The only reason IRA is
treating loads and stores as equal cost is because aarch64 asked it to :-)

At least for the reduced testcase, ISTM that IRA is making the right choice
for the “loads and stores are equally expensive” assumption.  It also seems
to make the right choice for a “loads are more expensive than stores”
assumption.  It's up to the target to choose which is right.

So I think the reduced testcase is showing a target bug.

  parent reply	other threads:[~2021-12-08  9:33 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 14:31 [Bug rtl-optimization/98782] New: IRA artificially creating spills due to " tnfchris at gcc dot gnu.org
2021-01-21 14:35 ` [Bug rtl-optimization/98782] " jgreenhalgh at gcc dot gnu.org
2021-01-22 10:12 ` fxue at os dot amperecomputing.com
2021-01-29 13:34 ` tnfchris at gcc dot gnu.org
2021-02-05 12:02 ` [Bug rtl-optimization/98782] [11 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in " tnfchris at gcc dot gnu.org
2021-02-23  2:11 ` jiangning.liu at amperecomputing dot com
2021-02-23 12:06 ` rguenth at gcc dot gnu.org
2021-02-26 12:32 ` rguenth at gcc dot gnu.org
2021-11-28 19:07 ` [Bug rtl-optimization/98782] [11/12 " hubicka at gcc dot gnu.org
2021-11-29  1:33 ` jiangning.liu at amperecomputing dot com
2021-11-29  6:59 ` tnfchris at gcc dot gnu.org
2021-12-03 11:44 ` hubicka at gcc dot gnu.org
2021-12-03 11:47 ` hubicka at gcc dot gnu.org
2021-12-07 11:19 ` tnfchris at gcc dot gnu.org
2021-12-07 11:21 ` tnfchris at gcc dot gnu.org
2021-12-07 11:21 ` tnfchris at gcc dot gnu.org
2021-12-07 19:44 ` rsandifo at gcc dot gnu.org
2021-12-07 23:52 ` tnfchris at gcc dot gnu.org
2021-12-08  9:33 ` rsandifo at gcc dot gnu.org [this message]
2021-12-08 14:31 ` tnfchris at gcc dot gnu.org
2021-12-08 15:02 ` rsandifo at gcc dot gnu.org
2021-12-09 19:56 ` pthaugen at gcc dot gnu.org
2021-12-09 20:12 ` hubicka at gcc dot gnu.org
2021-12-09 21:27 ` pthaugen at gcc dot gnu.org
2021-12-10 11:36 ` hubicka at gcc dot gnu.org
2021-12-14 14:38 ` tnfchris at gcc dot gnu.org
2021-12-14 14:40 ` hubicka at kam dot mff.cuni.cz
2021-12-14 14:48 ` tnfchris at gcc dot gnu.org
2021-12-14 14:58 ` hubicka at kam dot mff.cuni.cz
2021-12-14 15:07 ` tnfchris at gcc dot gnu.org
2021-12-14 15:08 ` tnfchris at gcc dot gnu.org
2021-12-14 18:16 ` jamborm at gcc dot gnu.org
2021-12-15 12:15 ` tnfchris at gcc dot gnu.org
2021-12-20 18:06 ` rsandifo at gcc dot gnu.org
2021-12-31 17:28 ` rsandifo at gcc dot gnu.org
2022-01-04 22:26 ` pthaugen at gcc dot gnu.org
2022-01-04 22:29 ` pthaugen at gcc dot gnu.org
2022-01-06 14:53 ` rsandifo at gcc dot gnu.org
2022-01-10  1:29 ` crazylht at gmail dot com
2022-01-11 10:14   ` Jan Hubicka
2022-01-10 14:47 ` cvs-commit at gcc dot gnu.org
2022-01-10 14:47 ` cvs-commit at gcc dot gnu.org
2022-01-10 14:47 ` cvs-commit at gcc dot gnu.org
2022-01-10 14:47 ` cvs-commit at gcc dot gnu.org
2022-01-10 14:52 ` [Bug rtl-optimization/98782] [11 " rsandifo at gcc dot gnu.org
2022-01-11 10:14 ` hubicka at kam dot mff.cuni.cz
2022-01-11 14:22 ` rsandifo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-98782-4-Q1RNgnBXVX@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).