From: Richard Biener <rguenther@suse.de>
To: Andrew MacLeod <amacleod@redhat.com>
Cc: gcc-patches@gcc.gnu.org, aldyh@redhat.com
Subject: Re: [PATCH][RFH] Wire ranger into FRE
Date: Thu, 22 Sep 2022 06:33:28 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2209220624410.6652@jbgna.fhfr.qr> (raw)
In-Reply-To: <b311afe3-abd5-39f9-8f2d-ef1cb3934d3e@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 13004 bytes --]
On Wed, 21 Sep 2022, Andrew MacLeod wrote:
>
> On 9/21/22 06:13, Richard Biener wrote:
> > On Mon, 19 Sep 2022, Andrew MacLeod wrote:
> >
> >
> >> It looks like you created a fur_source to manually adjust PHIs within the
> >> fold_stmt query to ignore edges that are not marked executable.
> > Yes, and use the current values from the VN lattice when looking at
> > statement operands.
>
> yes, that is exactly how its intended to be used.
>
>
> >
> >> That would then just leave you with the stale cache state to deal with?
> >> And
> >> if we can resolve that, would all just work? at least in theory?
> > In theory, yes. Besides that the use-def walking of the cache it not
> > wired up with fur_*
>
> Well, yes. hmm, you want to set cache values based on the VN lattice as well.
> yes. OK, let me do a bit of cache explanation since I haven't done that yet.
> It does not need a fur_source of any kind, and I'll explain why.
>
> The cache has 2 primary functions..
> 1) maintain the global definition table (used to decide if a name has been
> processed). This is local and not the one the rest of GCC uses. and
> 2) maintain the range-on-entry cache andresolve queries to that efficiently.
>
> The cache does not actually create any NEW information. This is one of its
> key features in preventing any kind of cascading cyclic updates. All it does
> is propagate existing information from the definition table, with values
> extracted from the global value table. So your example is not good for this,
> as there isn't much in the cache for it. so lets tweak it and add another
> block. example:
>
> n_2 = 1
> i_4 = 0
> val_5 = 0
> <bb 3>:
> # i_1 = PHI <i_4(2), i_7(3)>
> #val_2 = PHI <val_5(2), val_6(3) >
> val_6 = val_2 + 1;
> i_7 = i_1 + 1
> if (i_7 > 22)
> goto <bb 12>
> else
> goto <bb 7>
> <bb 7>
> if (i_7 < n_3)
> goto <bb 3>;
> else
> goto <bb 4>;
> <bb 4>
> _8 = val_6
> return _8
>
> For the sake of simplicity, lets also assume bb2 and bb3 have been looked and
> all the ssa-names defined in those blocks have an entry in rangers defintion
> table.
>
> Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be evaluated, it
> checks that i_7 and n_3 have been evaluated before it proceeds. Both have
> entries, which means the next task is to get their values at this location.
> range_of_expr is called on each one, and as they are not defined in this
> block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise
> when it gets an answer back, it will do so for n_3 as well)
>
> The cache walks back the dominators until it finds either:
> a) the block with the definition of i_7, or
> b) a block which has an on-entry cache value for i_7 already set.
> During it walk, it tags any block which has i_7 in the export list, meaning an
> outgoing edge from that block may change the value of i_7.
>
> There are additional complexities, but the fundamental operation is to now
> take the value it saw from a) or b) as the starting value, and supply that to
> GORI at every intervening outgoing edge i_7 was exported from. Whenever the
> value changes along the way, we write a cache update at the end of the edge to
> facilitate future queries. At the end, the answer has been calculated and is
> stored as the on-entry value for this block.
>
> So returning to the example, assume i_7 was set to VARYING in bb3, GORI would
> apply !(i_7 > 22) to the value, and we would end up in <bb 7> with a
> range-on-entry of [0, 21] and it would be stored in bb7.
>
> In your example, if you have disabled that back edge, you would have a value
> of [1,1] for i_7. GORI would not have changed that value since its already <
> 22, and we would store [1,1] as the range-on-entry to <bb 7>
>
> Likewise, we do something similar for n_3. The point is, the cache has not
> gone and created an new information. its *only* purpose it to propagate known
> values thru the CFG, adjusting them for any outgoing edges that are
> encountered. It uses a temporal marking in an attempt to identify when a
> global value has been changed, meaning it may need to go and repopulate
> something, but the bottom line It never contains anything beyond "reductions"
> in the ranges of values in the global table. And it only every works on one
> name at a time.
>
> THe bottom line, Ranger effectively only every changes values via the global
> table. And the cache propagates simply those values around, adjusting them
> with GORI as appropriate.
Hmm, but when it reaches the definition of _7 then it ends up calling
range_of_stmt on it which then recursively processes operands and the
result is stored into the "global table"? For the definition of _7
fur_* isn't asked for the operands so valueization would not happen.
path-ranger seems to override the def walking (range_of_expr) and thus
could transparently valueize.
OK, so you say there's two things we need to invalidate - first the
range-on-entry cache and second the "global table" entries? Note that
value-numbering doesn't need to undo things in its value table when
iterating but it of course updates values when re-visiting definitions.
Since with ranger we'd query definitions upthread on-demand that
scheme likely would not work unless we "range-interpret" each stmt
when value numbering them. That _might_ be the cheapest option in
the end if all the undoing is too expensive - basically force-push
a new range each time we visit a stmt. For range-on-entry/exit
(aka range-on-edge) we probably would simply clear the range-on-entry
cache when (re-)entering a block?
And we'd like to just use the SSA range storage for defs outside of
the region.
Richard.
> So there are multiple approaches. We could simply kill the global table and
> cache line for any ssa_name we want to change the value of. That gets a
> little tricker for on-entry values of secondary effects (ie, those used in the
> calculation of the primary names). It would probably work, but something
> unforeseen could show up.
>
> More advanced would be to "layer" the cache. ie, we use the cache, and at
> some point, you issue a "push". The push creates a new cache, and all queries
> look first to the new cache, and if it cant be answered looks "down" thru to
> the previous caches. This resolves all queries as if the cache layers are
> all "one" thing. Sets would always go to the latest layer. When we "pop", we
> delete the latest layer.. the layers underneath should all reflect the exact
> state at the time of the push. All in theory of course :-) There would be
> some expense to it, but possibly not as much as one thinks since most
> components defer allocations and such until something is actually set. It
> seems like it might not be a hard experiment that I will try once I get thru
> the bits I'm on now.
>
>
> >
> >
> >> That would have to drawback of losing any information that had been
> >> calculated
> >> earlier, and possibly trigger some additional calculations as that value
> >> will
> >> now be stale elsewhere in the IL. We could make it more efficient if you
> >> also provided a bitmap over the basic blocks in the region (or some way of
> >> determining that). Then we would just kill the entries in those blocks for
> >> those ssa-names.
> >>
> >> As long as you weren't making any ranger queries from outside the region
> >> during this time, that might work.
> >>
> >> Before I go any further, are we on the same page, or are you looking for
> >> something different than this?
> >>
> >> You had mentioned the path_ranger model also, which effectively uses ranger
> >> for the values on entry to the path/region, and then uses over-rided API
> >> entry
> >> points to choose whether to use the local cache vector, or go to rangers
> >> on-entry values..
> >>
> >> This approach could also work, and one way would be to implement the
> >> mark/reset API. The new class would maybe instantiate an additional local
> >> cache from which values are set/picked up first for any queries within the
> >> region blocks and when you are done, throws away the cache for the region.
> >> This would require some tweaking from rangers primary cache lookup/set
> >> routine, but perhaps there is some benefit to generalizing this model and
> >> integrating it.. ie, a region_ranger which builds this facility on top and
> >> allows you to reset regions. Presumably you want to be able to "stack"
> >> these
> >> marks so you can push and pop regions? would also have to figure out how to
> >> work the relational oracle with it.
> > Yeah, so my thinking was that I'd keep my own range cache for in-region,
> > like path_ranger does, and track changes similar as to how I do for VN
> > so I can undo things efficiently. That way I also get to decide
> > how far to do the caching use-def walks.
> >
> > Of course the disadvantage is duplicating quite some ranger functionality
> > that's already there without too much changes besides my own cache plus
> > ensuring using the VNs computed values and edge execuability. Maybe
> > it isn't too much work though.
>
> Let me give the layered cache a shot.. i don't think that will be much work
> (ha. famous last words), and we can see if it works, and how expensive it
> is. There might be alternatives along this line, especially if you are only
> querying a few things here and there. The cache is designed to be able to
> plug in alternative mechanisms on a name-by-name basis (there are already a
> couple of variations depending on CFG size and name density usage) , so is
> might be that we can do something really cheap for these kind of pushed
> layers.
>
> I'm still mulling around options.
>
>
> >> I can think of a couple of possible approaches for that, some more
> >> efficient
> >> than others. I suppose the path_ranger could even be completely replaced
> >> with that model. huh. it would just call mark() at the start of a path, and
> >> reset() when its done with the path. Might not be as efficient however.
> >> Be
> >> a good proof of correctness test I guess at least.
> >>
> >> Anyway, am I in the ballpark somewhere of what you are looking for? Or am
> >> I
> >> over engineering something? :-)
> > No, you basically identified the problems.
> >
> > Note that when not iterating the proposed change should already work
> > optimally, but it's of course bad to be "better" in the weaker setting ...
> >
> > The second thing would be to look at how to use conditional equivalences.
> > What's required to make ranger aware of them and what's required to
> > query them since for them the actual simplification would happen in
> > the folding done by value-numbering like with
> >
> > if (a == b)
> > c = a - b;
> >
> > which EVRP gets simplified for example but VN fails at this.
> > There's proposed enhancements to VN that might catch this as well
> > but I'm not too happy about the implementation. Mainly because
> > equivalences are a bitch.
>
> The relational oracle currently snags this one by putting a and b in an
> equivalence set on that outgoing edge. the fold_stmt mechanism you are using
> automatically queries the relation oracle (if fur_source supplies one) for a
> relation between the 2 use operands of a statement, and passes that to the
> range-ops fold routine. so it will ultimately invoke operator_minus with
>
> fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)
>
> Range-ops for operator minus defines the routine "op1_op2_relation_effect()"
> (which is automatically applied to any value calculated by fold_range() after
> the fact. That routine has:
>
> // == and != produce [0,0] and ~[0,0] regardless of wrapping.
> if (rel == VREL_EQ)
> rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
> else if (rel == VREL_NE)
> rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
> VR_ANTI_RANGE);
>
> so as long as the relation was registered, fold_stmt will automatically get a
> range of [0,0] for c, all through range-ops for operator_minus.
>
> If the fur_source used when fold_stmt is called supplies the routine
> "register_relation" (which I think only a fur_depend currently does), then
> relations are also automatically registered with the oracle supplied by the
> fur_depend.
>
> There's a lot of this that can happen auto-magically. I just need to make
> sure we can reset some of the underlying cache entries (and oracle as well..
> it might need to be layered so that we don't pick up invalid relations. It
> should also be relatively simply to adjust I think)
>
> Andrew
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
prev parent reply other threads:[~2022-09-22 6:33 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 11:19 Richard Biener
2022-09-19 15:25 ` Andrew MacLeod
2022-09-21 10:13 ` Richard Biener
2022-09-21 18:52 ` Andrew MacLeod
2022-09-22 6:33 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2209220624410.6652@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=aldyh@redhat.com \
--cc=amacleod@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).