On Wed, 21 Sep 2022, Andrew MacLeod wrote:

> 
> On 9/21/22 06:13, Richard Biener wrote:
> > On Mon, 19 Sep 2022, Andrew MacLeod wrote:
> >
> >
> >> It looks like you created a fur_source to manually adjust PHIs within the
> >> fold_stmt query to ignore edges that are not marked executable.
> > Yes, and use the current values from the VN lattice when looking at
> > statement operands.
> 
> yes, that is exactly how its intended to be used.
> 
> 
> >
> >> That would then just leave you with the stale cache state to deal with?  
> >> And
> >> if we can resolve that, would all just work?  at least in theory?
> > In theory, yes.  Besides that the use-def walking of the cache it not
> > wired up with fur_*
> 
> Well, yes. hmm, you want to set cache values based on the VN lattice as well.
> yes. OK, let me do a bit of cache explanation since I haven't done that yet.
> It does not need a fur_source of any kind, and I'll explain why.
> 
> The cache has 2 primary functions..
>   1) maintain the global definition table (used to decide if a name has been
> processed). This is local and not the one the rest of GCC uses.   and
>   2) maintain the range-on-entry cache andresolve queries to that efficiently.
> 
> The cache does not actually create any NEW information.  This is one of its
> key features in preventing any kind of cascading cyclic updates.  All it does
> is propagate existing information from the definition table, with values
> extracted from the global value table.  So your example is not good for this,
> as there isn't much in the cache for it.  so lets tweak it and add another
> block. example:
> 
> n_2 = 1
>   i_4 = 0
>   val_5 = 0
> <bb 3>:
>   # i_1 = PHI <i_4(2), i_7(3)>
>   #val_2 = PHI <val_5(2), val_6(3) >
>   val_6 = val_2 + 1;
>   i_7 = i_1 + 1
>   if (i_7 > 22)
>      goto <bb 12>
>   else
>      goto <bb 7>
> <bb 7>
>   if (i_7 < n_3)
>     goto <bb 3>;
>   else
>     goto <bb 4>;
> <bb 4>
>   _8 = val_6
>   return _8
> 
> For the sake of simplicity, lets also assume bb2 and bb3 have been looked and
> all the ssa-names defined in those blocks have an entry in rangers defintion
> table.
> 
> Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be evaluated, it
> checks that i_7 and n_3 have been evaluated before it proceeds.  Both have
> entries, which means the next task is to get their values at this location. 
> range_of_expr is called on each one, and as they are not defined in this
> block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise
> when it gets an answer back, it will do so for n_3 as well)
> 
> The cache walks back the dominators until it finds either:
>   a) the block with the definition of i_7, or
>   b) a block which has an on-entry cache value for i_7 already set.
> During it walk, it tags any block which has i_7 in the export list, meaning an
> outgoing edge from that block may change the value of i_7.
> 
> There are additional complexities, but the fundamental operation is to now
> take the value it saw from a) or b) as the starting value, and supply that to
> GORI at every intervening outgoing edge i_7 was exported from. Whenever the
> value changes along the way, we write a cache update at the end of the edge to
> facilitate future queries.  At the end, the answer has been calculated and is
> stored as the on-entry value for this block.
> 
> So returning to the example, assume i_7 was set to VARYING in bb3, GORI would
> apply !(i_7 > 22) to the value, and we would end up in <bb 7> with a
> range-on-entry of [0, 21] and it would be stored in bb7.
> 
> In your example, if you have disabled that back edge, you would have a value
> of [1,1] for i_7.  GORI would not have changed that value since its already <
> 22, and we would store [1,1] as the range-on-entry to <bb 7>
> 
> Likewise, we do something similar for n_3.  The point is, the cache has not
> gone and created an new information.  its *only* purpose it to propagate known
> values thru the CFG, adjusting them for any outgoing edges that are
> encountered.  It uses a temporal marking in an attempt to identify when a
> global value has been changed, meaning it may need to go and repopulate
> something, but the bottom line It never contains anything beyond "reductions"
> in the ranges of values in the global table.  And it only every works on one
> name at a time.
> 
> THe bottom line, Ranger effectively only every changes values via the global
> table. And the cache propagates simply those values around, adjusting them
> with GORI as appropriate.

Hmm, but when it reaches the definition of _7 then it ends up calling
range_of_stmt on it which then recursively processes operands and the
result is stored into the "global table"?  For the definition of _7
fur_* isn't asked for the operands so valueization would not happen.
path-ranger seems to override the def walking (range_of_expr) and thus
could transparently valueize.

OK, so you say there's two things we need to invalidate - first the
range-on-entry cache and second the "global table" entries?  Note that
value-numbering doesn't need to undo things in its value table when
iterating but it of course updates values when re-visiting definitions.
Since with ranger we'd query definitions upthread on-demand that
scheme likely would not work unless we "range-interpret" each stmt
when value numbering them.  That _might_ be the cheapest option in
the end if all the undoing is too expensive - basically force-push
a new range each time we visit a stmt.  For range-on-entry/exit
(aka range-on-edge) we probably would simply clear the range-on-entry
cache when (re-)entering a block?

And we'd like to just use the SSA range storage for defs outside of
the region.

Richard.

> So there are multiple approaches.  We could simply kill the global table and
> cache line for any ssa_name we want to change the value of.  That gets a
> little tricker for on-entry values of secondary effects (ie, those used in the
> calculation of the primary names). It would probably work, but something
> unforeseen could show up.
> 
> More advanced would be to "layer" the cache.  ie, we use the cache, and at
> some point, you issue a "push".  The push creates a new cache, and all queries
> look first to the new cache, and if it cant be answered looks "down" thru to
> the previous caches.   This resolves all queries as if the cache layers are
> all "one" thing. Sets would always go to the latest layer.   When we "pop", we
> delete the latest layer..  the layers underneath should all reflect the exact
> state at the time of the push.  All in theory of course :-) There would be
> some expense to it, but possibly not as much as one thinks since most
> components defer allocations and such until something is actually set.   It
> seems like it might not be a hard experiment that I will try once I get thru
> the bits I'm on now.
> 
> 
> >
> >
> >> That would have to drawback of losing any information that had been
> >> calculated
> >> earlier, and possibly trigger some additional calculations as that value
> >> will
> >> now be stale elsewhere in the IL.   We could make it more efficient if you
> >> also provided a bitmap over the basic blocks in the region (or some way of
> >> determining that).  Then we would just kill the entries in those blocks for
> >> those ssa-names.
> >>
> >> As long as you weren't making any ranger queries from outside the region
> >> during this time, that might work.
> >>
> >> Before I go any further, are we on the same page, or are you looking for
> >> something different than this?
> >>
> >> You had mentioned the path_ranger model also, which effectively uses ranger
> >> for the values on entry to the path/region, and then uses over-rided API
> >> entry
> >> points to choose whether to use the local cache vector, or go to rangers
> >> on-entry values..
> >>
> >>  This approach could also work, and one way would be to implement the
> >> mark/reset API. The new class would maybe instantiate an additional local
> >> cache from which values are set/picked up first for any queries within the
> >> region blocks and when you are done, throws away the cache for the region.
> >> This would require some tweaking from rangers primary cache lookup/set
> >> routine, but perhaps there is some benefit to generalizing this model and
> >> integrating it.. ie, a region_ranger which builds this facility on top and
> >> allows you to reset regions.   Presumably you want to be able to "stack"
> >> these
> >> marks so you can push and pop regions? would also have to figure out how to
> >> work the relational oracle with it.
> > Yeah, so my thinking was that I'd keep my own range cache for in-region,
> > like path_ranger does, and track changes similar as to how I do for VN
> > so I can undo things efficiently.  That way I also get to decide
> > how far to do the caching use-def walks.
> >
> > Of course the disadvantage is duplicating quite some ranger functionality
> > that's already there without too much changes besides my own cache plus
> > ensuring using the VNs computed values and edge execuability.  Maybe
> > it isn't too much work though.
> 
> Let me give the layered cache a shot.. i don't think that will be much work
> (ha. famous last words), and we can see if it works, and how expensive it
> is.   There might be alternatives along this line, especially if you are only
> querying a few things here and there.   The cache is designed to be able to
> plug in alternative mechanisms on a name-by-name basis (there are already a
> couple of variations depending on CFG size and name density usage) , so is
> might be that we can do something really cheap for these kind of pushed
> layers.
> 
> I'm still mulling around options.
> 
> 
> >> I can think of a couple of possible approaches for that, some more
> >> efficient
> >> than others.    I suppose the path_ranger could even be completely replaced
> >> with that model. huh. it would just call mark() at the start of a path, and
> >> reset() when its done with the path.   Might not be as efficient however.  
> >> Be
> >> a good proof of correctness test I guess at least.
> >>
> >> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am
> >> I
> >> over engineering something? :-)
> > No, you basically identified the problems.
> >
> > Note that when not iterating the proposed change should already work
> > optimally, but it's of course bad to be "better" in the weaker setting ...
> >
> > The second thing would be to look at how to use conditional equivalences.
> > What's required to make ranger aware of them and what's required to
> > query them since for them the actual simplification would happen in
> > the folding done by value-numbering like with
> >
> >    if (a == b)
> >      c = a - b;
> >
> > which EVRP gets simplified for example but VN fails at this.
> > There's proposed enhancements to VN that might catch this as well
> > but I'm not too happy about the implementation.  Mainly because
> > equivalences are a bitch.
> 
> The relational oracle currently snags this one by putting a and b in an
> equivalence set on that outgoing edge.  the fold_stmt mechanism you are using
> automatically queries the relation oracle (if fur_source supplies one) for a
> relation between the 2 use operands of a statement, and passes that to the
> range-ops fold routine. so it will ultimately invoke operator_minus with
> 
>   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)
> 
> Range-ops for operator minus defines the routine "op1_op2_relation_effect()"
> (which is automatically applied to any value calculated by fold_range() after
> the fact.   That routine has:
> 
> // == and != produce [0,0] and ~[0,0] regardless of wrapping.
>   if (rel == VREL_EQ)
>     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
>   else if (rel == VREL_NE)
>     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
>                               VR_ANTI_RANGE);
> 
> so as long as the relation was registered, fold_stmt will automatically get a
> range of [0,0] for c, all through range-ops for operator_minus.
> 
> If the fur_source used when fold_stmt is called supplies the routine
> "register_relation" (which I think only a fur_depend currently does), then
> relations are also automatically registered with the oracle supplied by the
> fur_depend.
> 
> There's a lot of this that can happen auto-magically.   I just need to make
> sure we can reset some of the underlying cache entries (and oracle as well..
> it might need to be layered so that we don't pick up invalid relations.  It
> should also be relatively simply to adjust I think)
> 
> Andrew
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)