From: Jeff Law <jeffreyalaw@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Andrew MacLeod <amacleod@redhat.com>, GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: replacing the backwards threader and more
Date: Sun, 13 Jun 2021 15:59:28 -0600 [thread overview]
Message-ID: <f127eac2-5d65-390a-ddea-3df1006fdd84@gmail.com> (raw)
In-Reply-To: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com>
On 6/9/2021 5:48 AM, Aldy Hernandez wrote:
> Hi Jeff. Hi folks.
>
> What started as a foray into severing the old (forward) threader's
> dependency on evrp, turned into a rewrite of the backwards threader
> code. I'd like to discuss the possibility of replacing the current
> backwards threader with a new one that gets far more threads and can
> potentially subsume all threaders in the future.
>
> I won't include code here, as it will just detract from the high level
> discussion. But if it helps, I could post what I have, which just
> needs some cleanups and porting to the latest trunk changes Andrew has
> made.
>
> Currently the backwards threader works by traversing DEF chains
> through PHIs leading to possible paths that start in a constant. When
> such a path is found, it is checked to see if it is profitable, and if
> so, the constant path is threaded. The current implementation is
> rather limited since backwards paths must end in a constant. For
> example, the backwards threader can't get any of the tests in
> gcc.dg/tree-ssa/ssa-thread-14.c:
>
> if (a && b)
> foo ();
> if (!b && c)
> bar ();
>
> etc.
Right. And these kinds of cases are particularly interesting to capture
-- not only do you remove the runtime test/compare, all the setup code
usually dies as well. I can't remember who, but someone added some bits
to detect these cases in DOM a while back and while the number of
additional jumps threaded wasn't great, the overall impact was much
better than we initially realized. Instead of allowign removal of a
single compare/branch, it typically allowed removal of a chain of
logicals that fed the conditional.
>
> After my refactoring patches to the threading code, it is now possible
> to drop in an alternate implementation that shares the profitability
> code (is this path profitable?), the jump registry, and the actual
> jump threading code. I have leveraged this to write a ranger-based
> threader that gets every single thread the current code gets, plus
> 90-130% more.
Sweet.
>
> Here are the details from the branch, which should be very similar to
> trunk. I'm presenting the branch numbers because they contain
> Andrew's upcoming relational query which significantly juices up the
> results.
Yea, I'm not surprised that the relational query helps significantly
here. And I'm not surprised that we can do much better with the
backwards threader with a rewrite.
Much of the ranger design was with the idea behind using it in the
backwards jump threader in mind. Backwards threading is, IMHO, a much
better way to think about the problem. THe backwards threader also has
a much stronger region copier -- so we don't have to live with the
various limitations of the old jump threading approach.
>
> New threader:
> ethread:65043 (+3.06%)
> dom:32450 (-13.3%)
> backwards threader:72482 (+89.6%)
> vrp:40532 (-30.7%)
> Total threaded: 210507 (+6.70%)
>
> This means that the new code gets 89.6% more jump threading
> opportunities than the code I want to replace. In doing so, it
> reduces the amount of DOM threading opportunities by 13.3% and by
> 30.7% from the VRP jump threader. The total improvement across the
> jump threading opportunities in the compiler is 6.70%.
This looks good at first glance. It's worth noting that the backwards
threader runs before the others, so, yea, as it captures more stuff I
would expect DOM/VRP to capture fewer things. It would be interesting
to know the breakdown of things caught by VRP1/VRP2 and how much of that
is secondary opportunities that are only appearing because we've done a
better job earlier.
And just to be clear, I expect that we're going to leave some of those
secondary opportunities on the table -- we just don't want it to be too
many :-) When I last looked at this my sense was wiring the backwards
threader and ranger together should be enough to subsume VRP1/VRP2 jump
threading.
>
> However, these are pessimistic numbers...
>
> I have noticed that some of the threading opportunities that DOM and
> VRP now get are not because they're smarter, but because they're
> picking up opportunities that the new code exposes. I experimented
> with running an iterative threader, and then seeing what VRP and DOM
> could actually get. This is too expensive to do in real life, but it
> at least shows what the effect of the new code is on DOM/VRP's abilities:
>
> Iterative threader:
> ethread:65043 (+3.06%)
> dom:31170 (-16.7%)
> thread:86717 (+127%)
> vrp:33851 (-42.2%)
> Total threaded: 216781 (+9.90%)
>
> This means that the new code not only gets 127% more cases, but it
> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
> respectively). The end result is that we have the possibility of
> getting almost 10% more jump threading opportunities in the entire
> compilation run.
>
> (Note that the new code gets even more opportunities, but I'm only
> reporting the profitable ones that made it all the way through to the
> threader backend, and actually eliminated a branch.)
Thanks for clarifying that. It was one of the questions that first
popped into my mind.
>
> The overall compilation hit from this work is currently 1.38% as
> measured by callgrind. We should be able to reduce this a bit, plus
> we could get some of that back if we can replace the DOM and VRP
> threaders (future work).
Given how close we were to dropping the VRP threaders before, I would
support dropping them at the same time. That gives you a bit more
compile-time budget.
>
> My proposed implementation should be able to get any threading
> opportunity, and will get more as range-ops and ranger improve.
>
> I can go into the details if necessary, but the gist of it is that we
> leverage the import facility in the ranger to only look up paths that
> have a direct repercussion in the conditional being threaded, thus
> reducing the search space. This enhanced path discovery, plus an
> engine to resolve conditionals based on knowledge from a CFG path, is
> all that is needed to register new paths. There is no limit to how
> far back we look, though in practice, we stop looking once a path is
> too expensive to continue the search in a given direction.
Right. That's one of the great things about Ranger -- it can
dramatically reduce the search space.
>
> The solver API is simple:
>
> // This class is a thread path solver. Given a set of BBs indicating
> // a path through the CFG, range_in_path() will return the range
> // of an SSA as if the BBs in the path would have been executed in
> // order.
> //
> // Note that the blocks are in reverse order, thus the exit block is
> path[0].
>
> class thread_solver : gori_compute
> {
>
> public:
> thread_solver (gimple_ranger &ranger);
> virtual ~thread_solver ();
> void set_path (const vec<basic_block> *, const bitmap_head *imports);
> void range_in_path (irange &, tree name);
> void range_in_path (irange &, gimple *);
> ...
> };
>
> Basically, as we're discovering paths, we ask the solver what the
> value of the final conditional in a BB is in a given path. If it
> resolves, we register the path.
Exactly. Given a path, do we know enough to resolve the conditional at
the end. If so register the path as a potential jump threading opportunity.
>
> A follow-up project would be to analyze what DOM/VRP are actually
> getting that we don't, because in theory with an enhanced ranger, we
> should be able to get everything they do (minus some float stuff, and
> some CSE things DOM does). However, IMO, this is good enough to at
> least replace the current backwards threading code.
I bet it's going to be tougher to remove DOM's threader. It knows how
to do thinks like resolve memory references using temporary equivalences
and such. But I bet it's enough to drop the VRP based threaders.
>
> My suggestion would be to keep both implementations, defaulting to the
> ranger based, and running the old code immediately after-- trapping if
> it can find any threading opportunities. After a few weeks, we could
> kill the old code.
I can live with that :-)
> p.s. BTW, ranger-based is technically a minomer. It's gori based. We
> don't need the entire ranger caching ability here. I'm only using it
> to get the imports for the interesting conditionals, since those are
> static.
That's probably sufficient to capture the first order effects. I can
come up with theoritical cases that it'd likely miss (involving
temporary equivalences which could affect the set of interesting
conditionals), but I doubt they're important in practice.
jeff
next prev parent reply other threads:[~2021-06-13 21:59 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-09 11:48 Aldy Hernandez
2021-06-09 12:09 ` Richard Biener
2021-06-09 15:34 ` Aldy Hernandez
2021-06-09 16:18 ` Richard Biener
2021-06-09 19:47 ` Jeff Law
2021-06-09 20:39 ` Aldy Hernandez
2021-06-09 20:43 ` Jeff Law
2021-06-21 14:40 ` Aldy Hernandez
2021-06-24 16:14 ` Jeff Law
2021-06-25 7:54 ` Richard Biener
2021-06-25 7:58 ` Richard Biener
2021-06-25 16:20 ` Aldy Hernandez
2021-06-25 17:19 ` Martin Sebor
2021-06-28 6:17 ` Aldy Hernandez
2021-06-28 22:29 ` Martin Sebor
2021-06-29 0:32 ` Aldy Hernandez
2021-06-29 22:16 ` Martin Sebor
2021-06-30 6:07 ` Aldy Hernandez
2021-06-28 8:22 ` Richard Biener
2021-06-28 12:05 ` Aldy Hernandez
2021-06-13 21:59 ` Jeff Law [this message]
2021-06-14 6:40 ` Richard Biener
2021-06-15 4:03 ` Jeff Law
2021-06-15 5:39 ` Aldy Hernandez
2021-06-15 16:35 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f127eac2-5d65-390a-ddea-3df1006fdd84@gmail.com \
--to=jeffreyalaw@gmail.com \
--cc=aldyh@redhat.com \
--cc=amacleod@redhat.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).