public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>,
	GCC patches <gcc-patches@gcc.gnu.org>,
	 "MacLeod, Andrew" <amacleod@redhat.com>
Subject: Re: [PATCH] Replace EVRP in DOM with ranger.
Date: Mon, 2 May 2022 09:17:12 +0200	[thread overview]
Message-ID: <CAGm3qMWb2yc8cVdkC=ftNunKmtto0B_xq5C-k3L5sWafUicsXg@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc1giCBE4t7NdsLZCX72yTvU3coonrmwo7pRD2QHqhCbHw@mail.gmail.com>

On Mon, May 2, 2022 at 8:30 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > > > to the commit message.]
> > > > > > And I just dropped it into the tester.  We should have the vast majority
> > > > > > of targets tested by this time tomorrow.
> > > > > >
> > > > > > >
> > > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > > had closed.]
> > > > > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > > splitting of this patch would be problematic.
> > > > > > ACK
> > > > > >
> > > > > >
> > > > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > > > retrospect, but I've added a short description of each use to aid in
> > > > > > > reviewing:
> > > > > > >
> > > > > > > * Convert evrp use in cprop to ranger.
> > > > > > >
> > > > > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > > > >    evrp_range_analyzer.
> > > > > > Yea.  We may have covered this before, but what does ranger do with a
> > > > > > conditional equivalence?
> > > > > >
> > > > > > ie if we have something like
> > > > > >
> > > > > > if (a == b)
> > > > > >    {
> > > > > >      blah blah
> > > > > >    }
> > > > > >
> > > > > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > > > primarily useful for simplification rather than propagation.
> > > > > >
> > > > > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > > > > helps in a meaningful way and there's no good way to know which is the
> > > > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > > > > due to SSA_NAME recycling.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > * Convert evrp use in threader to ranger.
> > > > > > >
> > > > > > >    The idea here is to use the hybrid approach we used for the initial
> > > > > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > > > > >    using the forward threader infrastructure while continuing to query
> > > > > > >    DOM data structures, and only if the conditional does not relsolve,
> > > > > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > > > > >    proven approach.
> > > > > > >
> > > > > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > > > > >    can move away from the forward threader altogether, and just add
> > > > > > >    another backward threader.  This will not only remove the last use
> > > > > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > > > > >    threader instances.
> > > > > > Excellent.
> > > > > >
> > > > > > >
> > > > > > > * Convert conditional folding to use the method used by the ranger and
> > > > > > >    evrp.  Previously DOM was calling into the guts of
> > > > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > > > > >    using fold_cond() which rewrites the conditional and edges
> > > > > > >    automatically.
> > > > > > >
> > > > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > > > >    cleaned up, and there will only be one entry point into simplifying
> > > > > > >    a statement.
> > > > > > Sure.
> > > > > >
> > > > > > >
> > > > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > > > >    side-effect of using the evrp engine.  We must handle these cases
> > > > > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > > >    walk.
> > > > > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > > > be surprised if too much code depended on that.  Presumably there's a
> > > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > > refine/record global ranges?
> > > > > >
> > > > > > >
> > > > > > >    For the record, this is the case *vrp handled:
> > > > > > >
> > > > > > >       <bb C>:
> > > > > > >       ...
> > > > > > >       if (c_5(D) != 5)
> > > > > > >       goto <bb N>;
> > > > > > >       else
> > > > > > >       goto <bb M>;
> > > > > > >       <bb N>:
> > > > > > >       __builtin_unreachable ();
> > > > > > >       <bb M>:
> > > > > > >
> > > > > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > > > > >    to [5,5].
> > > > > > And that will only happen if M eventually loops back to the conditional,
> > > > > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > > > coming from a testcase, right?
> > > > >
> > > > > This is how we make use of assert() when it uses __builtin_unreachable.
> > > > >
> > > > > Note the difficulty here is that we have to do this at a very specific point
> > > > > in the pass pipeline (setting the global range), since the first time we'll
> > > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > > > of the assert() is gone.
> > > > >
> > > > > So the idea was that with the IL present value-range analysis can determine
> > > > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > > > want to remove it) we want to put a global range on c_5(D).
> > > > >
> > > > > That means the best of both worlds would be to detect this pattern at
> > > > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > > > and do both - remove the IL _and_ set the global range.  The fear of
> > > > > doing this too early is of course that we lose the range by copy propagation
> > > > > or similar transforms.  The fear of doing this change at a single place only
> > > > > is that we miss it because there's a stray use before the conditional.
> > > > >
> > > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > > >
> > > > > In the end we want to perform dead code elimination here so maybe
> > > > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > > > early).
> > > >
> > > > I agree that doing both would be the ideal scenario.  However, I would
> > > > prefer this be done in a follow-up patch to minimize the amount of
> > > > stuff being changed in one go.
> > > >
> > > > To provide additional background, this was being done in evrp which
> > > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > > using the evrp engine and exporting global ranges.  For -O2, I don't
> > > > know how much DOM was originally (prior to ranger) contributing to
> > > > these assert global ranges,  since evrp was probably picking them up
> > > > first.  However, since evrp now runs in ranger mode, this
> > > > functionality has silently moved to DOM (without us realizing) for all
> > > > compilation levels.
> > >
> > > I see.
> > >
> > > > Do we care about this functionality at -O1?  ISTR at least one -O1
> > > > testcase failing in RTL land if exporting global ranges isn't done in
> > > > DOM.  I don't remember if it was a global range due to a
> > > > __builtin_unreachable or otherwise.
> > >
> > > Well, at -O1 we definitely didn't set the global range (did we?) but
> > > eventually pruned paths running into __builtin_unreachable ()
> > > anyway (would have to check at which point).
> >
> > I think we always have:
> >
> > evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
> >
> > ??
> >
> > DOM has been running as an incognito evrp even at -O1.  I don't know
> > if this has been by design, or as an unintended side-effect.  I mean,
> > it has been visiting all conditionals in the IL as well as calling
> > evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> > tries to optimize.  And of course, updating global ranges along the
> > way.
>
> As said, it wasn't intended to be that way originally.  But yes, as we
> separated out evrp_range_analyzer and made use of it in places not
> guarded by -ftree-vrp this might have happened.
>
> We might want to reconsider (based on now long-standing facts),
> and maybe alter the -ftree-vrp documentation indicating that even
> when not enabled effecive value-range propagation and optimization
> might happen.
>
> > >
> > > >
> > > > One more thing, we have batted around the idea of running a fast evrp
> > > > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > > have frange and prange), there's very little DOM will be doing that
> > > > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > > > to put this optimization is in the fast evrp, and fold the conditional
> > > > as has been suggested?  Anyways... I'm digressing.
> > >
> > > The idea is to get rid of DOM once its threading is separated out.
> >
> > Excellent.  There's no reason why we couldn't separate threading out
> > in this release.
> >
> > >
> > > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > > We didn't get around doing that for EVRP which I think had such
> > > bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> > > qualifies given gori and all it relies on.  The advantage of the simple
> > > whole-IL walk way was to reasonably easily guarantee such bound.
> > > That's unfortunately lost now :/
> >
> > Andrew was mumbling something about a fast ranger mode for this
> > release that should be on par with legacy evrp.  IIRC it would be
> > purely DOM based, won't visit back edges, and there's no caching.  But
> > he'll have to expand on it when he returns from vacation.  I don't
> > know the details.
>
> I would guess the stmt analysis building blocks (whatever API part of
> ranger that is) can be used to produce something like that.  But in the
> end it would be the old EVRP pass with the VRP stmt analysis it
> re-used replaced with the appropriate ranger parts.
>
> But yes, I'd welcome that.  I'd also like to revisit integration of some
> of this with value-numbering which doesn't do a DOM walk but instead
> a RPO walk.
>
> > >
> > > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > > happy to move it to whatever is recommended.
> > >
> > > As said the intent of this trick is to preserve range information from
> > > asserts even when we removed the IL carrying it.  When we care
> > > about range info, which means when some VRP pass runs, which
> > > is why it was done in such a pass.  That is, we probably want to
> > > avoid altering the global range for -fno-tree-vrp.
> >
> > Not just -fno-tree-vrp then.  There are other range consumers.  For
> > example, CCP, threading, etc.  Threading will look at global ranges
> > even in dumb mode, and CCP uses global ranges in a round about way--
> > by looking at global nonzero bits which the evrp code diligently sets:
> >
> >         {
> >           set_ssa_range_info (vrs[i].first, vrs[i].second);
> >           maybe_set_nonzero_bits (pred_e, vrs[i].first);
> >         }
> >
> > For example, in this particular testcase, DOM will set the global
> > range, and CCP will remove at least one of the conditionals by looking
> > at nonzero bits.
>
> Yes, as said we might want to reconsider and adjust documentation.
> In theory we could gate the global range/nonzero setters on
> flag_tree_vrp so they become no-ops when not enabled
> (there is also flag_tree_bit_ccp which is documented to track
> alignment, it fails to mention non-zero bits on integers).

Well, we could just not update global ranges for !flag_tree_vrp.  In
the old world:

evrp_range_analyzer (/*update_global_ranges=*/ flag_tree_vrp)

and in the new world:

if (flag_tree_vrp)
  ranger->export_global_ranges().

This should fit in nicely with my plan to integrate nonzero bits into
the irange, because we won't have to worry about them independently

>
> But let's say the plan is to more conciously enable some range
> propagation at -O1.

Sounds like a plan.

Aldy


  reply	other threads:[~2022-05-02  7:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 16:30 Aldy Hernandez
2022-04-28 18:43 ` Jeff Law
2022-04-29  7:09   ` Richard Biener
2022-04-29  9:52     ` Aldy Hernandez
2022-04-29 10:13       ` Richard Biener
2022-04-29 16:22         ` Aldy Hernandez
2022-05-02  6:30           ` Richard Biener
2022-05-02  7:17             ` Aldy Hernandez [this message]
2022-05-09 12:42             ` Andrew MacLeod
2022-05-09 12:55               ` Richard Biener
2022-04-29 11:47   ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGm3qMWb2yc8cVdkC=ftNunKmtto0B_xq5C-k3L5sWafUicsXg@mail.gmail.com' \
    --to=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).