From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>,
GCC patches <gcc-patches@gcc.gnu.org>,
"MacLeod, Andrew" <amacleod@redhat.com>
Subject: Re: [PATCH] Replace EVRP in DOM with ranger.
Date: Mon, 2 May 2022 09:17:12 +0200 [thread overview]
Message-ID: <CAGm3qMWb2yc8cVdkC=ftNunKmtto0B_xq5C-k3L5sWafUicsXg@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc1giCBE4t7NdsLZCX72yTvU3coonrmwo7pRD2QHqhCbHw@mail.gmail.com>
On Mon, May 2, 2022 at 8:30 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > > [Jeff, this is the same patch I sent you last week with minor tweaks
> > > > > > > to the commit message.]
> > > > > > And I just dropped it into the tester. We should have the vast majority
> > > > > > of targets tested by this time tomorrow.
> > > > > >
> > > > > > >
> > > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > > straightforward patch. It should've gone in last cycle, but there
> > > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > > had closed.]
> > > > > > You had other, non-work/gcc priorities. No worries at missing gcc-12.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > > splitting of this patch would be problematic.
> > > > > > ACK
> > > > > >
> > > > > >
> > > > > > > There's nothing here earth shattering. It's all pretty obvious in
> > > > > > > retrospect, but I've added a short description of each use to aid in
> > > > > > > reviewing:
> > > > > > >
> > > > > > > * Convert evrp use in cprop to ranger.
> > > > > > >
> > > > > > > This is easy, as cprop in DOM was converted to the ranger API last
> > > > > > > cycle, so this is just a matter of using a ranger instead of an
> > > > > > > evrp_range_analyzer.
> > > > > > Yea. We may have covered this before, but what does ranger do with a
> > > > > > conditional equivalence?
> > > > > >
> > > > > > ie if we have something like
> > > > > >
> > > > > > if (a == b)
> > > > > > {
> > > > > > blah blah
> > > > > > }
> > > > > >
> > > > > > Within the true arm we know there's an equivalence. *But* we don't want
> > > > > > to just blindly replace a with b or vice-versa. The equivalence is
> > > > > > primarily useful for simplification rather than propagation.
> > > > > >
> > > > > > In fact, we every much do not want to cprop in these cases. It rarely
> > > > > > helps in a meaningful way and there's no good way to know which is the
> > > > > > better value to use. Canonicalization on SSA_NAMEs doesn't really help
> > > > > > due to SSA_NAME recycling.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > * Convert evrp use in threader to ranger.
> > > > > > >
> > > > > > > The idea here is to use the hybrid approach we used for the initial
> > > > > > > VRP threader conversion last cycle. The DOM threader will continue
> > > > > > > using the forward threader infrastructure while continuing to query
> > > > > > > DOM data structures, and only if the conditional does not relsolve,
> > > > > > > using the ranger. This gives us the best of both worlds, and is a
> > > > > > > proven approach.
> > > > > > >
> > > > > > > Furthermore, as frange and prange come live in the next cycle, we
> > > > > > > can move away from the forward threader altogether, and just add
> > > > > > > another backward threader. This will not only remove the last use
> > > > > > > of the forward threader, but will allow us to remove at least 1 or 2
> > > > > > > threader instances.
> > > > > > Excellent.
> > > > > >
> > > > > > >
> > > > > > > * Convert conditional folding to use the method used by the ranger and
> > > > > > > evrp. Previously DOM was calling into the guts of
> > > > > > > simplify_using_ranges::vrp_visit_cond_stmt. The blessed way now is
> > > > > > > using fold_cond() which rewrites the conditional and edges
> > > > > > > automatically.
> > > > > > >
> > > > > > > When legacy is removed, simplify_using_ranges will be further
> > > > > > > cleaned up, and there will only be one entry point into simplifying
> > > > > > > a statement.
> > > > > > Sure.
> > > > > >
> > > > > > >
> > > > > > > * DOM was setting global ranges determined from unreachable edges as a
> > > > > > > side-effect of using the evrp engine. We must handle these cases
> > > > > > > before nuking evrp, and DOM seems like a good fit. I've just moved
> > > > > > > the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > > > walk.
> > > > > > It was a semi-desirable to record/refine global ranges in DOM, but I'd
> > > > > > be surprised if too much code depended on that. Presumably there's a
> > > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > > refine/record global ranges?
> > > > > >
> > > > > > >
> > > > > > > For the record, this is the case *vrp handled:
> > > > > > >
> > > > > > > <bb C>:
> > > > > > > ...
> > > > > > > if (c_5(D) != 5)
> > > > > > > goto <bb N>;
> > > > > > > else
> > > > > > > goto <bb M>;
> > > > > > > <bb N>:
> > > > > > > __builtin_unreachable ();
> > > > > > > <bb M>:
> > > > > > >
> > > > > > > If M dominates all uses of c_5, we can set the global range of c_5
> > > > > > > to [5,5].
> > > > > > And that will only happen if M eventually loops back to the conditional,
> > > > > > right? Otherwise all the uses wouldn't be dominated. I suspect this is
> > > > > > exceedingly rare in practice an it looks really familiar. This is
> > > > > > coming from a testcase, right?
> > > > >
> > > > > This is how we make use of assert() when it uses __builtin_unreachable.
> > > > >
> > > > > Note the difficulty here is that we have to do this at a very specific point
> > > > > in the pass pipeline (setting the global range), since the first time we'll
> > > > > fold the if (c_5(D) != 5) it will be folded to false and the IL representation
> > > > > of the assert() is gone.
> > > > >
> > > > > So the idea was that with the IL present value-range analysis can determine
> > > > > the range omf c_5(D) on the false path but once we remove it (and we do
> > > > > want to remove it) we want to put a global range on c_5(D).
> > > > >
> > > > > That means the best of both worlds would be to detect this pattern at
> > > > > a specific point in the pass pipeline (somewhen before loop opts for sure)
> > > > > and do both - remove the IL _and_ set the global range. The fear of
> > > > > doing this too early is of course that we lose the range by copy propagation
> > > > > or similar transforms. The fear of doing this change at a single place only
> > > > > is that we miss it because there's a stray use before the conditional.
> > > > >
> > > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > > >
> > > > > In the end we want to perform dead code elimination here so maybe
> > > > > DCE is the best fit to do this (but as said we want to avoid doing this too
> > > > > early).
> > > >
> > > > I agree that doing both would be the ideal scenario. However, I would
> > > > prefer this be done in a follow-up patch to minimize the amount of
> > > > stuff being changed in one go.
> > > >
> > > > To provide additional background, this was being done in evrp which
> > > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > > using the evrp engine and exporting global ranges. For -O2, I don't
> > > > know how much DOM was originally (prior to ranger) contributing to
> > > > these assert global ranges, since evrp was probably picking them up
> > > > first. However, since evrp now runs in ranger mode, this
> > > > functionality has silently moved to DOM (without us realizing) for all
> > > > compilation levels.
> > >
> > > I see.
> > >
> > > > Do we care about this functionality at -O1? ISTR at least one -O1
> > > > testcase failing in RTL land if exporting global ranges isn't done in
> > > > DOM. I don't remember if it was a global range due to a
> > > > __builtin_unreachable or otherwise.
> > >
> > > Well, at -O1 we definitely didn't set the global range (did we?) but
> > > eventually pruned paths running into __builtin_unreachable ()
> > > anyway (would have to check at which point).
> >
> > I think we always have:
> >
> > evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
> >
> > ??
> >
> > DOM has been running as an incognito evrp even at -O1. I don't know
> > if this has been by design, or as an unintended side-effect. I mean,
> > it has been visiting all conditionals in the IL as well as calling
> > evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> > tries to optimize. And of course, updating global ranges along the
> > way.
>
> As said, it wasn't intended to be that way originally. But yes, as we
> separated out evrp_range_analyzer and made use of it in places not
> guarded by -ftree-vrp this might have happened.
>
> We might want to reconsider (based on now long-standing facts),
> and maybe alter the -ftree-vrp documentation indicating that even
> when not enabled effecive value-range propagation and optimization
> might happen.
>
> > >
> > > >
> > > > One more thing, we have batted around the idea of running a fast evrp
> > > > even at -O1 (ranger without looking at back edges, IIRC). This would
> > > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > > have frange and prange), there's very little DOM will be doing that
> > > > for instance PRE can't get. So perhaps, if we remove DOM, the place
> > > > to put this optimization is in the fast evrp, and fold the conditional
> > > > as has been suggested? Anyways... I'm digressing.
> > >
> > > The idea is to get rid of DOM once its threading is separated out.
> >
> > Excellent. There's no reason why we couldn't separate threading out
> > in this release.
> >
> > >
> > > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > > We didn't get around doing that for EVRP which I think had such
> > > bound - the iterating VRP definitely didn't. I'm not sure ranger really
> > > qualifies given gori and all it relies on. The advantage of the simple
> > > whole-IL walk way was to reasonably easily guarantee such bound.
> > > That's unfortunately lost now :/
> >
> > Andrew was mumbling something about a fast ranger mode for this
> > release that should be on par with legacy evrp. IIRC it would be
> > purely DOM based, won't visit back edges, and there's no caching. But
> > he'll have to expand on it when he returns from vacation. I don't
> > know the details.
>
> I would guess the stmt analysis building blocks (whatever API part of
> ranger that is) can be used to produce something like that. But in the
> end it would be the old EVRP pass with the VRP stmt analysis it
> re-used replaced with the appropriate ranger parts.
>
> But yes, I'd welcome that. I'd also like to revisit integration of some
> of this with value-numbering which doesn't do a DOM walk but instead
> a RPO walk.
>
> > >
> > > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > > happy to move it to whatever is recommended.
> > >
> > > As said the intent of this trick is to preserve range information from
> > > asserts even when we removed the IL carrying it. When we care
> > > about range info, which means when some VRP pass runs, which
> > > is why it was done in such a pass. That is, we probably want to
> > > avoid altering the global range for -fno-tree-vrp.
> >
> > Not just -fno-tree-vrp then. There are other range consumers. For
> > example, CCP, threading, etc. Threading will look at global ranges
> > even in dumb mode, and CCP uses global ranges in a round about way--
> > by looking at global nonzero bits which the evrp code diligently sets:
> >
> > {
> > set_ssa_range_info (vrs[i].first, vrs[i].second);
> > maybe_set_nonzero_bits (pred_e, vrs[i].first);
> > }
> >
> > For example, in this particular testcase, DOM will set the global
> > range, and CCP will remove at least one of the conditionals by looking
> > at nonzero bits.
>
> Yes, as said we might want to reconsider and adjust documentation.
> In theory we could gate the global range/nonzero setters on
> flag_tree_vrp so they become no-ops when not enabled
> (there is also flag_tree_bit_ccp which is documented to track
> alignment, it fails to mention non-zero bits on integers).
Well, we could just not update global ranges for !flag_tree_vrp. In
the old world:
evrp_range_analyzer (/*update_global_ranges=*/ flag_tree_vrp)
and in the new world:
if (flag_tree_vrp)
ranger->export_global_ranges().
This should fit in nicely with my plan to integrate nonzero bits into
the irange, because we won't have to worry about them independently
>
> But let's say the plan is to more conciously enable some range
> propagation at -O1.
Sounds like a plan.
Aldy
next prev parent reply other threads:[~2022-05-02 7:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-28 16:30 Aldy Hernandez
2022-04-28 18:43 ` Jeff Law
2022-04-29 7:09 ` Richard Biener
2022-04-29 9:52 ` Aldy Hernandez
2022-04-29 10:13 ` Richard Biener
2022-04-29 16:22 ` Aldy Hernandez
2022-05-02 6:30 ` Richard Biener
2022-05-02 7:17 ` Aldy Hernandez [this message]
2022-05-09 12:42 ` Andrew MacLeod
2022-05-09 12:55 ` Richard Biener
2022-04-29 11:47 ` Aldy Hernandez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGm3qMWb2yc8cVdkC=ftNunKmtto0B_xq5C-k3L5sWafUicsXg@mail.gmail.com' \
--to=aldyh@redhat.com \
--cc=amacleod@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).