Re: replacing the backwards threader and more

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jeff Law <jeffreyalaw@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Andrew MacLeod <amacleod@redhat.com>, GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: replacing the backwards threader and more
Date: Sun, 13 Jun 2021 15:59:28 -0600	[thread overview]
Message-ID: <f127eac2-5d65-390a-ddea-3df1006fdd84@gmail.com> (raw)
In-Reply-To: <07775b9d-b8eb-48cb-57ef-9cc278d38967@redhat.com>



On 6/9/2021 5:48 AM, Aldy Hernandez wrote:
> Hi Jeff.  Hi folks.
>
> What started as a foray into severing the old (forward) threader's 
> dependency on evrp, turned into a rewrite of the backwards threader 
> code.  I'd like to discuss the possibility of replacing the current 
> backwards threader with a new one that gets far more threads and can 
> potentially subsume all threaders in the future.
>
> I won't include code here, as it will just detract from the high level 
> discussion.  But if it helps, I could post what I have, which just 
> needs some cleanups and porting to the latest trunk changes Andrew has 
> made.
>
> Currently the backwards threader works by traversing DEF chains 
> through PHIs leading to possible paths that start in a constant. When 
> such a path is found, it is checked to see if it is profitable, and if 
> so, the constant path is threaded.  The current implementation is 
> rather limited since backwards paths must end in a constant.  For 
> example, the backwards threader can't get any of the tests in 
> gcc.dg/tree-ssa/ssa-thread-14.c:
>
>   if (a && b)
>     foo ();
>   if (!b && c)
>     bar ();
>
> etc.
Right.  And these kinds of cases are particularly interesting to capture 
-- not only do you remove the runtime test/compare, all the setup code 
usually dies as well.  I can't remember who, but someone added some bits 
to detect these cases in DOM a while back and while the number of 
additional jumps threaded wasn't great, the overall impact was much 
better than we initially realized.   Instead of allowign removal of a 
single compare/branch, it typically allowed removal of a chain of 
logicals that fed the conditional.

>
> After my refactoring patches to the threading code, it is now possible 
> to drop in an alternate implementation that shares the profitability 
> code (is this path profitable?), the jump registry, and the actual 
> jump threading code.  I have leveraged this to write a ranger-based 
> threader that gets every single thread the current code gets, plus 
> 90-130% more.
Sweet.

>
> Here are the details from the branch, which should be very similar to 
> trunk.  I'm presenting the branch numbers because they contain 
> Andrew's upcoming relational query which significantly juices up the 
> results.
Yea, I'm not surprised that the relational query helps significantly 
here.  And I'm not surprised that we can do much better with the 
backwards threader with a rewrite.

Much of the ranger design was with the idea behind using it in the 
backwards jump threader in mind.  Backwards threading is, IMHO, a much 
better way to think about the problem.  THe backwards threader also has 
a much stronger region copier -- so we don't have to live with the 
various limitations of the old jump threading approach.



>
> New threader:
>          ethread:65043    (+3.06%)
>          dom:32450      (-13.3%)
>          backwards threader:72482   (+89.6%)
>          vrp:40532      (-30.7%)
>   Total threaded:  210507 (+6.70%)
>
> This means that the new code gets 89.6% more jump threading 
> opportunities than the code I want to replace.  In doing so, it 
> reduces the amount of DOM threading opportunities by 13.3% and by 
> 30.7% from the VRP jump threader.  The total  improvement across the 
> jump threading opportunities in the compiler is 6.70%.
This looks good at first glance.  It's worth noting that the backwards 
threader runs before the others, so, yea, as it captures more stuff I 
would expect DOM/VRP to capture fewer things.    It would be interesting 
to know the breakdown of things caught by VRP1/VRP2 and how much of that 
is secondary opportunities that are only appearing because we've done a 
better job earlier.

And just to be clear, I expect that we're going to leave some of those 
secondary opportunities on the table -- we just don't want it to be too 
many :-)  When I last looked at this my sense was wiring the backwards 
threader and ranger together should be enough to subsume VRP1/VRP2 jump 
threading.

>
> However, these are pessimistic numbers...
>
> I have noticed that some of the threading opportunities that DOM and 
> VRP now get are not because they're smarter, but because they're 
> picking up opportunities that the new code exposes.  I experimented 
> with running an iterative threader, and then seeing what VRP and DOM 
> could actually get.  This is too expensive to do in real life, but it 
> at least shows what the effect of the new code is on DOM/VRP's abilities:
>
>   Iterative threader:
>     ethread:65043    (+3.06%)
>     dom:31170    (-16.7%)
>         thread:86717    (+127%)
>         vrp:33851    (-42.2%)
>   Total threaded:  216781 (+9.90%)
>
> This means that the new code not only gets 127% more cases, but it 
> reduces the DOM and VRP opportunities considerably (16.7% and 42.2% 
> respectively).   The end result is that we have the possibility of 
> getting almost 10% more jump threading opportunities in the entire 
> compilation run.
>
> (Note that the new code gets even more opportunities, but I'm only 
> reporting the profitable ones that made it all the way through to the 
> threader backend, and actually eliminated a branch.)
Thanks for clarifying that.  It was one of the questions that first 
popped into my mind.

>
> The overall compilation hit from this work is currently 1.38% as 
> measured by callgrind.  We should be able to reduce this a bit, plus 
> we could get some of that back if we can replace the DOM and VRP 
> threaders (future work).
Given how close we were to dropping the VRP threaders before, I would 
support dropping them at the same time.  That gives you a bit more 
compile-time budget.

>
> My proposed implementation should be able to get any threading 
> opportunity, and will get more as range-ops and ranger improve.
>
> I can go into the details if necessary, but the gist of it is that we 
> leverage the import facility in the ranger to only look up paths that 
> have a direct repercussion in the conditional being threaded, thus 
> reducing the search space.  This enhanced path discovery, plus an 
> engine to resolve conditionals based on knowledge from a CFG path, is 
> all that is needed to register new paths.  There is no limit to how 
> far back we look, though in practice, we stop looking once a path is 
> too expensive to continue the search in a given direction.
Right.  That's one of the great things about Ranger -- it can 
dramatically reduce the search space.

>
> The solver API is simple:
>
> // This class is a thread path solver.  Given a set of BBs indicating
> // a path through the CFG, range_in_path() will return the range
> // of an SSA as if the BBs in the path would have been executed in
> // order.
> //
> // Note that the blocks are in reverse order, thus the exit block is 
> path[0].
>
> class thread_solver : gori_compute
> {
>
> public:
>   thread_solver (gimple_ranger &ranger);
>   virtual ~thread_solver ();
>   void set_path (const vec<basic_block> *, const bitmap_head *imports);
>   void range_in_path (irange &, tree name);
>   void range_in_path (irange &, gimple *);
> ...
> };
>
> Basically, as we're discovering paths, we ask the solver what the 
> value of the final conditional in a BB is in a given path.  If it 
> resolves, we register the path.
Exactly.  Given a path, do we know enough to resolve the conditional at 
the end.  If so register the path as a potential jump threading opportunity.

>
> A follow-up project would be to analyze what DOM/VRP are actually 
> getting that we don't, because in theory with an enhanced ranger, we 
> should be able to get everything they do (minus some float stuff, and 
> some CSE things DOM does).  However, IMO, this is good enough to at 
> least replace the current backwards threading code.
I bet it's going to be tougher to remove DOM's threader.  It knows how 
to do thinks like resolve memory references using temporary equivalences 
and such.  But I bet it's enough to drop the VRP based threaders.

>
> My suggestion would be to keep both implementations, defaulting to the 
> ranger based, and running the old code immediately after-- trapping if 
> it can find any threading opportunities.  After a few weeks, we could 
> kill the old code.
I can live with that :-)

> p.s. BTW, ranger-based is technically a minomer.  It's gori based.  We 
> don't need the entire ranger caching ability here.  I'm only using it 
> to get the imports for the interesting conditionals, since those are 
> static.
That's probably sufficient to capture the first order effects.  I can 
come up with theoritical cases that it'd likely miss (involving 
temporary equivalences which could affect the set of interesting 
conditionals), but I doubt they're important in practice.

jeff

next prev parent reply	other threads:[~2021-06-13 21:59 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 11:48 Aldy Hernandez
2021-06-09 12:09 ` Richard Biener
2021-06-09 15:34   ` Aldy Hernandez
2021-06-09 16:18     ` Richard Biener
2021-06-09 19:47     ` Jeff Law
2021-06-09 20:39       ` Aldy Hernandez
2021-06-09 20:43         ` Jeff Law
2021-06-21 14:40   ` Aldy Hernandez
2021-06-24 16:14     ` Jeff Law
2021-06-25  7:54       ` Richard Biener
2021-06-25  7:58         ` Richard Biener
2021-06-25 16:20         ` Aldy Hernandez
2021-06-25 17:19           ` Martin Sebor
2021-06-28  6:17             ` Aldy Hernandez
2021-06-28 22:29               ` Martin Sebor
2021-06-29  0:32                 ` Aldy Hernandez
2021-06-29 22:16                   ` Martin Sebor
2021-06-30  6:07                     ` Aldy Hernandez
2021-06-28  8:22           ` Richard Biener
2021-06-28 12:05             ` Aldy Hernandez
2021-06-13 21:59 ` Jeff Law [this message]
2021-06-14  6:40   ` Richard Biener
2021-06-15  4:03     ` Jeff Law
2021-06-15  5:39       ` Aldy Hernandez
2021-06-15 16:35         ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f127eac2-5d65-390a-ddea-3df1006fdd84@gmail.com \
    --to=jeffreyalaw@gmail.com \
    --cc=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).