public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>, GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: replacing the backwards threader and more
Date: Wed, 9 Jun 2021 17:34:03 +0200	[thread overview]
Message-ID: <e24871cd-5455-ca71-02cf-26d54066f7bb@redhat.com> (raw)
In-Reply-To: <CAFiYyc2OgPLMyvoGk=XQY0xJvLDG-dPCV+6dQj-WuRVDLocuKA@mail.gmail.com>



On 6/9/21 2:09 PM, Richard Biener wrote:
> On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc <gcc@gcc.gnu.org> wrote:
>>
>> Hi Jeff.  Hi folks.
>>
>> What started as a foray into severing the old (forward) threader's
>> dependency on evrp, turned into a rewrite of the backwards threader
>> code.  I'd like to discuss the possibility of replacing the current
>> backwards threader with a new one that gets far more threads and can
>> potentially subsume all threaders in the future.
>>
>> I won't include code here, as it will just detract from the high level
>> discussion.  But if it helps, I could post what I have, which just needs
>> some cleanups and porting to the latest trunk changes Andrew has made.
>>
>> Currently the backwards threader works by traversing DEF chains through
>> PHIs leading to possible paths that start in a constant.  When such a
>> path is found, it is checked to see if it is profitable, and if so, the
>> constant path is threaded.  The current implementation is rather limited
>> since backwards paths must end in a constant.  For example, the
>> backwards threader can't get any of the tests in
>> gcc.dg/tree-ssa/ssa-thread-14.c:
>>
>>     if (a && b)
>>       foo ();
>>     if (!b && c)
>>       bar ();
>>
>> etc.
>>
>> After my refactoring patches to the threading code, it is now possible
>> to drop in an alternate implementation that shares the profitability
>> code (is this path profitable?), the jump registry, and the actual jump
>> threading code.  I have leveraged this to write a ranger-based threader
>> that gets every single thread the current code gets, plus 90-130% more.
>>
>> Here are the details from the branch, which should be very similar to
>> trunk.  I'm presenting the branch numbers because they contain Andrew's
>> upcoming relational query which significantly juices up the results.
>>
>> New threader:
>>            ethread:65043    (+3.06%)
>>            dom:32450      (-13.3%)
>>            backwards threader:72482   (+89.6%)
>>            vrp:40532      (-30.7%)
>>     Total threaded:  210507 (+6.70%)
>>
>> This means that the new code gets 89.6% more jump threading
>> opportunities than the code I want to replace.  In doing so, it reduces
>> the amount of DOM threading opportunities by 13.3% and by 30.7% from the
>> VRP jump threader.  The total  improvement across the jump threading
>> opportunities in the compiler is 6.70%.
>>
>> However, these are pessimistic numbers...
>>
>> I have noticed that some of the threading opportunities that DOM and VRP
>> now get are not because they're smarter, but because they're picking up
>> opportunities that the new code exposes.  I experimented with running an
>> iterative threader, and then seeing what VRP and DOM could actually get.
>>    This is too expensive to do in real life, but it at least shows what
>> the effect of the new code is on DOM/VRP's abilities:
>>
>>     Iterative threader:
>>       ethread:65043    (+3.06%)
>>       dom:31170    (-16.7%)
>>           thread:86717    (+127%)
>>           vrp:33851    (-42.2%)
>>     Total threaded:  216781 (+9.90%)
>>
>> This means that the new code not only gets 127% more cases, but it
>> reduces the DOM and VRP opportunities considerably (16.7% and 42.2%
>> respectively).   The end result is that we have the possibility of
>> getting almost 10% more jump threading opportunities in the entire
>> compilation run.
> 
> Yeah, DOM once was iterating ...
> 
> You probably have noticed that we have very man (way too many)
> 'thread' passes, often in close succession with each other or
> DOM or VRP.  So in the above numbers I wonder if you can break
> down the numbers individually for the actual passes (in their order)?

Sure, I can do that.  Let me whip up the old branch and gather some info.

> 
>> (Note that the new code gets even more opportunities, but I'm only
>> reporting the profitable ones that made it all the way through to the
>> threader backend, and actually eliminated a branch.)
>>
>> The overall compilation hit from this work is currently 1.38% as
>> measured by callgrind.  We should be able to reduce this a bit, plus we
>> could get some of that back if we can replace the DOM and VRP threaders
>> (future work).
>>
>> My proposed implementation should be able to get any threading
>> opportunity, and will get more as range-ops and ranger improve.
>>
>> I can go into the details if necessary, but the gist of it is that we
>> leverage the import facility in the ranger to only look up paths that
>> have a direct repercussion in the conditional being threaded, thus
>> reducing the search space.  This enhanced path discovery, plus an engine
>> to resolve conditionals based on knowledge from a CFG path, is all that
>> is needed to register new paths.  There is no limit to how far back we
>> look, though in practice, we stop looking once a path is too expensive
>> to continue the search in a given direction.
>>
>> The solver API is simple:
>>
>> // This class is a thread path solver.  Given a set of BBs indicating
>> // a path through the CFG, range_in_path() will return the range
>> // of an SSA as if the BBs in the path would have been executed in
>> // order.
>> //
>> // Note that the blocks are in reverse order, thus the exit block is
>> path[0].
>>
>> class thread_solver : gori_compute
>> {
>>
>> public:
>>     thread_solver (gimple_ranger &ranger);
>>     virtual ~thread_solver ();
>>     void set_path (const vec<basic_block> *, const bitmap_head *imports);
>>     void range_in_path (irange &, tree name);
>>     void range_in_path (irange &, gimple *);
>> ...
>> };
>>
>> Basically, as we're discovering paths, we ask the solver what the value
>> of the final conditional in a BB is in a given path.  If it resolves, we
>> register the path.
>>
>> A follow-up project would be to analyze what DOM/VRP are actually
>> getting that we don't, because in theory with an enhanced ranger, we
>> should be able to get everything they do (minus some float stuff, and
>> some CSE things DOM does).  However, IMO, this is good enough to at
>> least replace the current backwards threading code.
>>
>> My suggestion would be to keep both implementations, defaulting to the
>> ranger based, and running the old code immediately after-- trapping if
>> it can find any threading opportunities.
> 
> But due to iteration uncovering new opportunities this will inevitably
> break, no?

No, actually because I'm comparing current and new backwards threader 
behavior on the same IL, something I can't do with VRP because the IL is 
slightly different (VRP asserts).

So I can compare apples to apples in the backwards threader code.  What 
I do is run the new code first, and then run the old code on the same IL 
before the threader has altered the CFG, while asserting that the old 
code cannot register any "new" paths not already present in the registry.

I have tested this assertion throughout 200+ .ii files from a bootstrap, 
and there has never been a case where the old code can get something we 
can't.

> 
>> After a few weeks, we could
>> kill the old code.
> 
> Note that for analyzing threadings done apart from looking at overall
> numbers the statistics infrastructure can be useful, likewise could
> be the opt-info one where you can diff stats based on file or function
> (statistics) or even location of a participating jump (opt-info).
> 
> If you are re-using tree-ssa-thread{edge,update}.{c,h} anyway you
> probably only have to amend one or two places.  I'm personally
> breaking things down to file/function via statistics to spot gross
> differences more locally.

I am indeed re-using all of tree-ssa-thread{edge,update}.  That is 
unchanged.  I've been using the overall statistics plus an awk script to 
collate it all.  But thanks for the opt-info tip.  I didn't know about that.

> 
> IMHO removing threading from VRP (as a step to make "VRP"
> into another EVRP run) should be part of the initial transition,
> there's always a thread pass nearby.  Performing threading from
> EVRP itself might be another option to evaluate.  Trimming down
> the number of (now backwards-)threaders would be another goal.

Agreed.  I will have to sit down and investigate what VRP is getting. 
Last I peeked it was either stuff range-ops could be taught, or 
unconditional jumps to unconditional jumps that could be easily handled. 
  However, I think we can prove that the current backwards threader code 
cannot get anything in the presence of the new code, and could be easily 
replaced before concentrating on DOM/VRP.

That being said, figuring out exactly the discrepancies with DOM/VRP is 
on my short-term radar ;-).

Aldy


  reply	other threads:[~2021-06-09 15:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 11:48 Aldy Hernandez
2021-06-09 12:09 ` Richard Biener
2021-06-09 15:34   ` Aldy Hernandez [this message]
2021-06-09 16:18     ` Richard Biener
2021-06-09 19:47     ` Jeff Law
2021-06-09 20:39       ` Aldy Hernandez
2021-06-09 20:43         ` Jeff Law
2021-06-21 14:40   ` Aldy Hernandez
2021-06-24 16:14     ` Jeff Law
2021-06-25  7:54       ` Richard Biener
2021-06-25  7:58         ` Richard Biener
2021-06-25 16:20         ` Aldy Hernandez
2021-06-25 17:19           ` Martin Sebor
2021-06-28  6:17             ` Aldy Hernandez
2021-06-28 22:29               ` Martin Sebor
2021-06-29  0:32                 ` Aldy Hernandez
2021-06-29 22:16                   ` Martin Sebor
2021-06-30  6:07                     ` Aldy Hernandez
2021-06-28  8:22           ` Richard Biener
2021-06-28 12:05             ` Aldy Hernandez
2021-06-13 21:59 ` Jeff Law
2021-06-14  6:40   ` Richard Biener
2021-06-15  4:03     ` Jeff Law
2021-06-15  5:39       ` Aldy Hernandez
2021-06-15 16:35         ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e24871cd-5455-ca71-02cf-26d54066f7bb@redhat.com \
    --to=aldyh@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).