public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Aldy Hernandez <aldyh@redhat.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>,
	GCC patches <gcc-patches@gcc.gnu.org>,
	 "MacLeod, Andrew" <amacleod@redhat.com>
Subject: Re: [PATCH] Replace EVRP in DOM with ranger.
Date: Mon, 2 May 2022 08:30:42 +0200	[thread overview]
Message-ID: <CAFiYyc1giCBE4t7NdsLZCX72yTvU3coonrmwo7pRD2QHqhCbHw@mail.gmail.com> (raw)
In-Reply-To: <CAGm3qMX+-MQO0MXaN5_FzStW7mf73X9T80Wo4WdO2=Dhf1kCrg@mail.gmail.com>

On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > > to the commit message.]
> > > > > And I just dropped it into the tester.  We should have the vast majority
> > > > > of targets tested by this time tomorrow.
> > > > >
> > > > > >
> > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > had closed.]
> > > > > You had other, non-work/gcc priorities.  No worries at missing gcc-12.
> > > > >
> > > > >
> > > > > >
> > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > splitting of this patch would be problematic.
> > > > > ACK
> > > > >
> > > > >
> > > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > > retrospect, but I've added a short description of each use to aid in
> > > > > > reviewing:
> > > > > >
> > > > > > * Convert evrp use in cprop to ranger.
> > > > > >
> > > > > >    This is easy, as cprop in DOM was converted to the ranger API last
> > > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > > >    evrp_range_analyzer.
> > > > > Yea.  We may have covered this before, but what does ranger do with a
> > > > > conditional equivalence?
> > > > >
> > > > > ie if we have something like
> > > > >
> > > > > if (a == b)
> > > > >    {
> > > > >      blah blah
> > > > >    }
> > > > >
> > > > > Within the true arm we know there's an equivalence.  *But* we don't want
> > > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > > primarily useful for simplification rather than propagation.
> > > > >
> > > > > In fact, we every much do not want to cprop in these cases.   It rarely
> > > > > helps in a meaningful way and there's no good way to know which is the
> > > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really help
> > > > > due to SSA_NAME recycling.
> > > > >
> > > > >
> > > > > >
> > > > > > * Convert evrp use in threader to ranger.
> > > > > >
> > > > > >    The idea here is to use the hybrid approach we used for the initial
> > > > > >    VRP threader conversion last cycle.  The DOM threader will continue
> > > > > >    using the forward threader infrastructure while continuing to query
> > > > > >    DOM data structures, and only if the conditional does not relsolve,
> > > > > >    using the ranger.  This gives us the best of both worlds, and is a
> > > > > >    proven approach.
> > > > > >
> > > > > >    Furthermore, as frange and prange come live in the next cycle, we
> > > > > >    can move away from the forward threader altogether, and just add
> > > > > >    another backward threader.  This will not only remove the last use
> > > > > >    of the forward threader, but will allow us to remove at least 1 or 2
> > > > > >    threader instances.
> > > > > Excellent.
> > > > >
> > > > > >
> > > > > > * Convert conditional folding to use the method used by the ranger and
> > > > > >    evrp.  Previously DOM was calling into the guts of
> > > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way now is
> > > > > >    using fold_cond() which rewrites the conditional and edges
> > > > > >    automatically.
> > > > > >
> > > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > > >    cleaned up, and there will only be one entry point into simplifying
> > > > > >    a statement.
> > > > > Sure.
> > > > >
> > > > > >
> > > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > > >    side-effect of using the evrp engine.  We must handle these cases
> > > > > >    before nuking evrp, and DOM seems like a good fit.  I've just moved
> > > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > >    walk.
> > > > >   It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > > be surprised if too much code depended on that.  Presumably there's a
> > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > refine/record global ranges?
> > > > >
> > > > > >
> > > > > >    For the record, this is the case *vrp handled:
> > > > > >
> > > > > >       <bb C>:
> > > > > >       ...
> > > > > >       if (c_5(D) != 5)
> > > > > >       goto <bb N>;
> > > > > >       else
> > > > > >       goto <bb M>;
> > > > > >       <bb N>:
> > > > > >       __builtin_unreachable ();
> > > > > >       <bb M>:
> > > > > >
> > > > > >    If M dominates all uses of c_5, we can set the global range of c_5
> > > > > >    to [5,5].
> > > > > And that will only happen if M eventually loops back to the conditional,
> > > > > right?  Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > > coming from a testcase, right?
> > > >
> > > > This is how we make use of assert() when it uses __builtin_unreachable.
> > > >
> > > > Note the difficulty here is that we have to do this at a very specific point
> > > > in the pass pipeline (setting the global range), since the first time we'll
> > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > > of the assert() is gone.
> > > >
> > > > So the idea was that with the IL present value-range analysis can determine
> > > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > > want to remove it) we want to put a global range on c_5(D).
> > > >
> > > > That means the best of both worlds would be to detect this pattern at
> > > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > > and do both - remove the IL _and_ set the global range.  The fear of
> > > > doing this too early is of course that we lose the range by copy propagation
> > > > or similar transforms.  The fear of doing this change at a single place only
> > > > is that we miss it because there's a stray use before the conditional.
> > > >
> > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > >
> > > > In the end we want to perform dead code elimination here so maybe
> > > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > > early).
> > >
> > > I agree that doing both would be the ideal scenario.  However, I would
> > > prefer this be done in a follow-up patch to minimize the amount of
> > > stuff being changed in one go.
> > >
> > > To provide additional background, this was being done in evrp which
> > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > using the evrp engine and exporting global ranges.  For -O2, I don't
> > > know how much DOM was originally (prior to ranger) contributing to
> > > these assert global ranges,  since evrp was probably picking them up
> > > first.  However, since evrp now runs in ranger mode, this
> > > functionality has silently moved to DOM (without us realizing) for all
> > > compilation levels.
> >
> > I see.
> >
> > > Do we care about this functionality at -O1?  ISTR at least one -O1
> > > testcase failing in RTL land if exporting global ranges isn't done in
> > > DOM.  I don't remember if it was a global range due to a
> > > __builtin_unreachable or otherwise.
> >
> > Well, at -O1 we definitely didn't set the global range (did we?) but
> > eventually pruned paths running into __builtin_unreachable ()
> > anyway (would have to check at which point).
>
> I think we always have:
>
> evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
>
> ??
>
> DOM has been running as an incognito evrp even at -O1.  I don't know
> if this has been by design, or as an unintended side-effect.  I mean,
> it has been visiting all conditionals in the IL as well as calling
> evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> tries to optimize.  And of course, updating global ranges along the
> way.

As said, it wasn't intended to be that way originally.  But yes, as we
separated out evrp_range_analyzer and made use of it in places not
guarded by -ftree-vrp this might have happened.

We might want to reconsider (based on now long-standing facts),
and maybe alter the -ftree-vrp documentation indicating that even
when not enabled effecive value-range propagation and optimization
might happen.

> >
> > >
> > > One more thing, we have batted around the idea of running a fast evrp
> > > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > have frange and prange), there's very little DOM will be doing that
> > > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > > to put this optimization is in the fast evrp, and fold the conditional
> > > as has been suggested?  Anyways... I'm digressing.
> >
> > The idea is to get rid of DOM once its threading is separated out.
>
> Excellent.  There's no reason why we couldn't separate threading out
> in this release.
>
> >
> > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > We didn't get around doing that for EVRP which I think had such
> > bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> > qualifies given gori and all it relies on.  The advantage of the simple
> > whole-IL walk way was to reasonably easily guarantee such bound.
> > That's unfortunately lost now :/
>
> Andrew was mumbling something about a fast ranger mode for this
> release that should be on par with legacy evrp.  IIRC it would be
> purely DOM based, won't visit back edges, and there's no caching.  But
> he'll have to expand on it when he returns from vacation.  I don't
> know the details.

I would guess the stmt analysis building blocks (whatever API part of
ranger that is) can be used to produce something like that.  But in the
end it would be the old EVRP pass with the VRP stmt analysis it
re-used replaced with the appropriate ranger parts.

But yes, I'd welcome that.  I'd also like to revisit integration of some
of this with value-numbering which doesn't do a DOM walk but instead
a RPO walk.

> >
> > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > happy to move it to whatever is recommended.
> >
> > As said the intent of this trick is to preserve range information from
> > asserts even when we removed the IL carrying it.  When we care
> > about range info, which means when some VRP pass runs, which
> > is why it was done in such a pass.  That is, we probably want to
> > avoid altering the global range for -fno-tree-vrp.
>
> Not just -fno-tree-vrp then.  There are other range consumers.  For
> example, CCP, threading, etc.  Threading will look at global ranges
> even in dumb mode, and CCP uses global ranges in a round about way--
> by looking at global nonzero bits which the evrp code diligently sets:
>
>         {
>           set_ssa_range_info (vrs[i].first, vrs[i].second);
>           maybe_set_nonzero_bits (pred_e, vrs[i].first);
>         }
>
> For example, in this particular testcase, DOM will set the global
> range, and CCP will remove at least one of the conditionals by looking
> at nonzero bits.

Yes, as said we might want to reconsider and adjust documentation.
In theory we could gate the global range/nonzero setters on
flag_tree_vrp so they become no-ops when not enabled
(there is also flag_tree_bit_ccp which is documented to track
alignment, it fails to mention non-zero bits on integers).

But let's say the plan is to more conciously enable some range
propagation at -O1.

> > > The test case is gcc.dg/builtin-unreachable-6a.c by the way.
> >
> > But 6.c checks the same but even w/ DOM disabled...
>
> That's what I thought, but they actually test the opposite:
>
> /* { dg-final { scan-tree-dump-times "__builtin_unreachable" 1 "fab1" } } */
> /* { dg-final { scan-tree-dump-not "__builtin_unreachable" "fab1" } } */
>
> The non-DOM, non-VRP, version checks that the unreachable is still
> there.  Whereas the plain -O2 version checks that we've cleaned it up.

Oh ... I guess we should ensure that we either have the global range set
or the unreachable is still there.

> > > >
> > > > > This is the only part of the patch that makes me go "ewwww", so I would
> > > > > like to at least explore if we can rethink the value of that test.
> > > > >
> > > > > > There is a a regression on Wstringop-overflow-4.C which I'm planning
> > > > > > on XFAILing.  It's another variant of the usual middle-end false
> > > > > > positives: having no ranges produces no warnings, but slightly refined
> > > > > > ranges, or worse-- isolating specific problematic cases in the
> > > > > > threader causes flare-ups.
> > > > > ACK.
> > > > >
> > > > >
> > > > > >
> > > > > > As an aside, as Richi has suggested, I think we should discuss
> > > > > > restricting the threader's ability to thread highly unlikely paths.
> > > > > > These cause no end of pain for middle-end warnings.  However,
> > > > > > I don't know if this would conflict with path isolation for
> > > > > > things like null dereferencing.  ISTR you were interested in this.
> > > > > I've never though too  much about it.  You're thinking of Richi :-)
> > > > >
> > > > > It's a balance.  I bet if we stop threading those paths we're going to
> > > > > see other issues start popping up like uninitialized warnings.
> > > > >
> > > > > It may be the case that it's really the constant propagation on an
> > > > > isolated path that's more problematical for those warnings.  But I would
> > > > > probably contend that what isolation is doing is showing how certain
> > > > > constant values can flow into places where they're not expected and
> > > > > could cause problems.  So I wouldn't necessary suggest throttling it.
> > > > >
> > > > > Happy to be proven wrong here!
> > > >
> > > > I think we need to revisit the whole costing of threading which is of course
> > > > best done when we only have the backwards threader left.  I also always
> > > > wanted to play with some kind of optimistic code generation & rollback
> > > > on things like threading and unrolling, basically value-number the
> > > > duplicated stmts on-the-fly and make it "stop" when we reach a defined
> > > > threshold (and then throw away the result).  I never went around to
> > > > do this for unrolling since there we can end up duplicating diverging
> > > > control flow, but that at least doesn't happen with threading(?)
> > >
> > > WRT stopping after a threshold is reached,
> > > back_threader_profitablity::profitable_path_p() bails after a number
> > > of statements is reached.  If so, the path is not even put into the
> > > queue.  So, no rolling back would be necessary.  Is this what you
> > > meant, or something else?  The threaders (all variants) queue things
> > > up and modify the IL at the end of the pass.
> >
> > I meant that as currently we have no idea about followup simplifications
> > on the isolated path we conservatively account for all stmts in there
> > and thus need an optimistically high threading --param.  We should
> > be able to do better by simplifying copied stmts with the some now
> > constant PHIs or conditionally derived values.  So we could do
> > "unbound" analysis but when we instantiate stop the actual copying
> > if simplification doesn't keep us within the desired bounds.
> > For example unrolling tries to estimate what's going to be optimized
> > away but if we actually did the simplification that would be more precise.
>
> Very cool!

Yeah, if I only get to implement it ;)

Richard.

> Aldy
>
> >
> > > >
> > > > And just a note - please wait with installing this change - if approved - until
> > > > after GCC 12 is actually released.
> > >
> > > Indeed.  This is far too invasive for the interim.  I just sent the
> > > patch early so Jeff could get a chance to review it before I leave on
> > > paternity leave.  But he's also volunteered to kick it over the goal
> > > line if we can't get it in before May 9.
> > >
> > > Aldy
> > >
> >
>

  reply	other threads:[~2022-05-02  6:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 16:30 Aldy Hernandez
2022-04-28 18:43 ` Jeff Law
2022-04-29  7:09   ` Richard Biener
2022-04-29  9:52     ` Aldy Hernandez
2022-04-29 10:13       ` Richard Biener
2022-04-29 16:22         ` Aldy Hernandez
2022-05-02  6:30           ` Richard Biener [this message]
2022-05-02  7:17             ` Aldy Hernandez
2022-05-09 12:42             ` Andrew MacLeod
2022-05-09 12:55               ` Richard Biener
2022-04-29 11:47   ` Aldy Hernandez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1giCBE4t7NdsLZCX72yTvU3coonrmwo7pRD2QHqhCbHw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).