Re: [PATCH] Refactor back_threader_profitability

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] Refactor back_threader_profitability
Date: Wed, 17 Aug 2022 10:04:00 +0200	[thread overview]
Message-ID: <CAGm3qMUmAbAd5aGNERR_BX6Vi9MrrhK95qRg3tD6dXTKVKumBQ@mail.gmail.com> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2208170746280.13569@jbgna.fhfr.qr>

On Wed, Aug 17, 2022 at 9:54 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 17 Aug 2022, Aldy Hernandez wrote:
>
> > I just have a few high level comments.
> >
> > On Tue, Aug 16, 2022 at 4:05 PM Richard Biener <rguenther@suse.de> wrote:
> > >
> > > The following refactors profitable_path_p in the backward threader,
> > > splitting out parts that can be computed once the exit block is known,
> > > parts that contiguously update and that can be checked allowing
> > > for the path to be later identified as FSM with larger limits,
> > > possibly_profitable_path_p, and final checks done when the whole
> > > path is known, profitable_path_p.
> >
> > I thought we were removing references to FSM, as they were leftovers
> > from some previous incarnation.  For that matter, I don't think I ever
> > understood what they are, so if we're gonna keep them, could you
> > comment what makes FSM threads different from other threads?
>
> I don't know exactly what 'FSM' stands for but the FSM threader
> specifically tried to cover

It probably stands for finite state machine then?

>
> for (;;)
>  {
>    switch (state)
>     {
>     case 1:
>       /* sth */
>       state = 3;
>       break;
>     ...
>     case 3:
>       ...
>     }
> }
>
> so optimizing state machine transitions.  That is, these are all
> multiway (switch or goto, but goto support has been removed with the
> ranger rewrite it seems) and the thread path going through the

ISTR going through the computed goto path in the old code, and it
never got triggered.  I just didn't get around to removing the
references to GIMPLE_GOTO.  At least, the old threader not once got a
thread we didn't get with the new code, even through a full Fedora
build.  I could be wrong though, but that's my recollection.

> loop backedge.  This is why FSM has different size limits because
> it was thought those threadings are especially worthwhile and
> they tend to be larger.  To make discovery cheaper a TODO item
> would be to skip to the loop backedge when we reach the regular
> thread limit and only continue the real search there (and
> pick the "smallest" ways through any diamonds on the way).

Ah, thanks.  If you could, add some similar blurb, it'd be great.  I'm
sure we'll all forget it at some time (well, I will :)).

Aldy

>
> > In your possibly_profitable_path_p function, could you document a bit
> > better what's the difference between profitable_path_p and
> > possibly_profitable_path_p?
>
> Sure.
>
> > >
> > > I've removed the back_threader_profitability instance from the
> > > back_threader class and instead instantiate it once per path
> > > discovery.  I've kept the size compute non-incremental to simplify
> > > the patch and not worry about unwinding.
> > >
> > > There's key changes to previous behavior - namely we apply
> > > the param_max_jump_thread_duplication_stmts early only when
> > > we know the path cannot become an FSM one (multiway + thread through
> > > latch) but make sure to elide the path query when we we didn't
> > > yet discover that but are over this limit.  Similarly the
> > > speed limit is now used even when we did not yet discover a
> > > hot BB on the path.  Basically the idea is to only stop path
> > > discovery when we know the path will never become profitable
> > > but avoid the expensive path range query when we know it's
> > > currently not.
> > >
> > > I've done a few cleanups, merging functions, on the way.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > Statistics show an overall slight increase in threading but
> > > looking at different files theres noise up and down.  That's
> > > somewhat expected since we now are applying the "more correct"
> > > limits in the end.  Unless I made big mistakes of course.
> > >
> > > The next thing cost-wise would be to raise the backwards
> > > threading limit to the limit of DOM so we don't get
> > > artificial high counts for that.
> >
> > The DOM threader has limits?  I thought most of those limits were just
> > due to the fact that it couldn't determine long enough paths?  Either
> > way, I like that we're merging the necessary forward threader bits
> > here, in preparation for its demise ;-).
>
> Both use param_max_jump_thread_duplication_stmts, but the backwards
> threader applies this limit to non-FSM threads with
>
>   /* The generic copier used by the backthreader does not re-use an
>      existing threading path to reduce code duplication.  So for that
>      case, drastically reduce the number of statements we are allowed
>      to copy.  */
>   if (!(threaded_through_latch && threaded_multiway_branch)
>       && (n_insns * param_fsm_scale_path_stmts
>           >= param_max_jump_thread_duplication_stmts))
>
> and param_fsm_scale_path_stmts happens to be two.  I have no idea
> why we apply this scaling, the scaling is otherwise used in
>
>   /* We avoid creating irreducible inner loops unless we thread through
>      a multiway branch, in which case we have deemed it worth losing
>      other loop optimizations later.
>
>      We also consider it worth creating an irreducible inner loop if
>      the number of copied statement is low relative to the length of
>      the path -- in that case there's little the traditional loop
>      optimizer would have done anyway, so an irreducible loop is not
>      so bad.  */
>   if (!threaded_multiway_branch
>       && creates_irreducible_loop
>       && *creates_irreducible_loop
>       && (n_insns * (unsigned) param_fsm_scale_path_stmts
>           > (m_path.length () *
>              (unsigned) param_fsm_scale_path_blocks)))
>
> so my idea is to drop the scaling from applying the
> param_max_jump_thread_duplication_stmts limit which raises the
> effective default limit from 15 / 2 to 15, just like what DOM
> applies (where DOM also estimates some optimization on the path,
> reducing the stmt count).
>
> > Looks good.
>
> Thanks - I'll adjust the commentary and push.
>
> Richard.
>

next prev parent reply	other threads:[~2022-08-17  8:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <81970.122081610045900133@us-mta-26.us.mimecast.lan>
2022-08-17  7:31 ` Aldy Hernandez
2022-08-17  7:53   ` Richard Biener
2022-08-17  8:04     ` Aldy Hernandez [this message]
2022-08-17  8:38       ` Richard Biener
2022-08-17  8:46         ` Aldy Hernandez
2022-08-17  8:59           ` Richard Biener
2022-08-17  9:27             ` Aldy Hernandez
2022-08-19 16:02   ` Jeff Law
2022-08-16 14:04 Richard Biener
  -- strict thread matches above, loose matches on Subject: below --
2022-08-16 14:04 Richard Biener
2022-08-16 14:04 Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGm3qMUmAbAd5aGNERR_BX6Vi9MrrhK95qRg3tD6dXTKVKumBQ@mail.gmail.com \
    --to=aldyh@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).