On Wed, 21 Sep 2022, Andrew MacLeod wrote: > > On 9/21/22 06:13, Richard Biener wrote: > > On Mon, 19 Sep 2022, Andrew MacLeod wrote: > > > > > >> It looks like you created a fur_source to manually adjust PHIs within the > >> fold_stmt query to ignore edges that are not marked executable. > > Yes, and use the current values from the VN lattice when looking at > > statement operands. > > yes, that is exactly how its intended to be used. > > > > > >> That would then just leave you with the stale cache state to deal with?   > >> And > >> if we can resolve that, would all just work?  at least in theory? > > In theory, yes. Besides that the use-def walking of the cache it not > > wired up with fur_* > > Well, yes. hmm, you want to set cache values based on the VN lattice as well. > yes. OK, let me do a bit of cache explanation since I haven't done that yet. > It does not need a fur_source of any kind, and I'll explain why. > > The cache has 2 primary functions.. >   1) maintain the global definition table (used to decide if a name has been > processed). This is local and not the one the rest of GCC uses.   and >   2) maintain the range-on-entry cache andresolve queries to that efficiently. > > The cache does not actually create any NEW information.  This is one of its > key features in preventing any kind of cascading cyclic updates.  All it does > is propagate existing information from the definition table, with values > extracted from the global value table.  So your example is not good for this, > as there isn't much in the cache for it.  so lets tweak it and add another > block. example: > > n_2 = 1 >   i_4 = 0 >   val_5 = 0 > : >   # i_1 = PHI >   #val_2 = PHI >   val_6 = val_2 + 1; > i_7 = i_1 + 1 > if (i_7 > 22) > goto > else > goto > >   if (i_7 < n_3) >     goto ; >   else >     goto ; > >   _8 = val_6 >   return _8 > > For the sake of simplicity, lets also assume bb2 and bb3 have been looked and > all the ssa-names defined in those blocks have an entry in rangers defintion > table. > > Moving to if we ask for the range of "if (i_7< n_3) to be evaluated, it > checks that i_7 and n_3 have been evaluated before it proceeds.  Both have > entries, which means the next task is to get their values at this location.  > range_of_expr is called on each one, and as they are not defined in this > block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise > when it gets an answer back, it will do so for n_3 as well) > > The cache walks back the dominators until it finds either: >   a) the block with the definition of i_7, or >   b) a block which has an on-entry cache value for i_7 already set. > During it walk, it tags any block which has i_7 in the export list, meaning an > outgoing edge from that block may change the value of i_7. > > There are additional complexities, but the fundamental operation is to now > take the value it saw from a) or b) as the starting value, and supply that to > GORI at every intervening outgoing edge i_7 was exported from. Whenever the > value changes along the way, we write a cache update at the end of the edge to > facilitate future queries.  At the end, the answer has been calculated and is > stored as the on-entry value for this block. > > So returning to the example, assume i_7 was set to VARYING in bb3, GORI would > apply !(i_7 > 22) to the value, and we would end up in with a > range-on-entry of [0, 21] and it would be stored in bb7. > > In your example, if you have disabled that back edge, you would have a value > of [1,1] for i_7.  GORI would not have changed that value since its already < > 22, and we would store [1,1] as the range-on-entry to > > Likewise, we do something similar for n_3.  The point is, the cache has not > gone and created an new information.  its *only* purpose it to propagate known > values thru the CFG, adjusting them for any outgoing edges that are > encountered.  It uses a temporal marking in an attempt to identify when a > global value has been changed, meaning it may need to go and repopulate > something, but the bottom line It never contains anything beyond "reductions" > in the ranges of values in the global table.  And it only every works on one > name at a time. > > THe bottom line, Ranger effectively only every changes values via the global > table. And the cache propagates simply those values around, adjusting them > with GORI as appropriate. Hmm, but when it reaches the definition of _7 then it ends up calling range_of_stmt on it which then recursively processes operands and the result is stored into the "global table"? For the definition of _7 fur_* isn't asked for the operands so valueization would not happen. path-ranger seems to override the def walking (range_of_expr) and thus could transparently valueize. OK, so you say there's two things we need to invalidate - first the range-on-entry cache and second the "global table" entries? Note that value-numbering doesn't need to undo things in its value table when iterating but it of course updates values when re-visiting definitions. Since with ranger we'd query definitions upthread on-demand that scheme likely would not work unless we "range-interpret" each stmt when value numbering them. That _might_ be the cheapest option in the end if all the undoing is too expensive - basically force-push a new range each time we visit a stmt. For range-on-entry/exit (aka range-on-edge) we probably would simply clear the range-on-entry cache when (re-)entering a block? And we'd like to just use the SSA range storage for defs outside of the region. Richard. > So there are multiple approaches.  We could simply kill the global table and > cache line for any ssa_name we want to change the value of.  That gets a > little tricker for on-entry values of secondary effects (ie, those used in the > calculation of the primary names). It would probably work, but something > unforeseen could show up. > > More advanced would be to "layer" the cache.  ie, we use the cache, and at > some point, you issue a "push".  The push creates a new cache, and all queries > look first to the new cache, and if it cant be answered looks "down" thru to > the previous caches.   This resolves all queries as if the cache layers are > all "one" thing. Sets would always go to the latest layer.   When we "pop", we > delete the latest layer..  the layers underneath should all reflect the exact > state at the time of the push.  All in theory of course :-) There would be > some expense to it, but possibly not as much as one thinks since most > components defer allocations and such until something is actually set.   It > seems like it might not be a hard experiment that I will try once I get thru > the bits I'm on now. > > > > > > > >> That would have to drawback of losing any information that had been > >> calculated > >> earlier, and possibly trigger some additional calculations as that value > >> will > >> now be stale elsewhere in the IL.   We could make it more efficient if you > >> also provided a bitmap over the basic blocks in the region (or some way of > >> determining that).  Then we would just kill the entries in those blocks for > >> those ssa-names. > >> > >> As long as you weren't making any ranger queries from outside the region > >> during this time, that might work. > >> > >> Before I go any further, are we on the same page, or are you looking for > >> something different than this? > >> > >> You had mentioned the path_ranger model also, which effectively uses ranger > >> for the values on entry to the path/region, and then uses over-rided API > >> entry > >> points to choose whether to use the local cache vector, or go to rangers > >> on-entry values.. > >> > >>  This approach could also work, and one way would be to implement the > >> mark/reset API. The new class would maybe instantiate an additional local > >> cache from which values are set/picked up first for any queries within the > >> region blocks and when you are done, throws away the cache for the region. > >> This would require some tweaking from rangers primary cache lookup/set > >> routine, but perhaps there is some benefit to generalizing this model and > >> integrating it.. ie, a region_ranger which builds this facility on top and > >> allows you to reset regions.   Presumably you want to be able to "stack" > >> these > >> marks so you can push and pop regions? would also have to figure out how to > >> work the relational oracle with it. > > Yeah, so my thinking was that I'd keep my own range cache for in-region, > > like path_ranger does, and track changes similar as to how I do for VN > > so I can undo things efficiently. That way I also get to decide > > how far to do the caching use-def walks. > > > > Of course the disadvantage is duplicating quite some ranger functionality > > that's already there without too much changes besides my own cache plus > > ensuring using the VNs computed values and edge execuability. Maybe > > it isn't too much work though. > > Let me give the layered cache a shot.. i don't think that will be much work > (ha. famous last words), and we can see if it works, and how expensive it > is.   There might be alternatives along this line, especially if you are only > querying a few things here and there.   The cache is designed to be able to > plug in alternative mechanisms on a name-by-name basis (there are already a > couple of variations depending on CFG size and name density usage) , so is > might be that we can do something really cheap for these kind of pushed > layers. > > I'm still mulling around options. > > > >> I can think of a couple of possible approaches for that, some more > >> efficient > >> than others.    I suppose the path_ranger could even be completely replaced > >> with that model. huh. it would just call mark() at the start of a path, and > >> reset() when its done with the path.   Might not be as efficient however.   > >> Be > >> a good proof of correctness test I guess at least. > >> > >> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am > >> I > >> over engineering something? :-) > > No, you basically identified the problems. > > > > Note that when not iterating the proposed change should already work > > optimally, but it's of course bad to be "better" in the weaker setting ... > > > > The second thing would be to look at how to use conditional equivalences. > > What's required to make ranger aware of them and what's required to > > query them since for them the actual simplification would happen in > > the folding done by value-numbering like with > > > > if (a == b) > > c = a - b; > > > > which EVRP gets simplified for example but VN fails at this. > > There's proposed enhancements to VN that might catch this as well > > but I'm not too happy about the implementation. Mainly because > > equivalences are a bitch. > > The relational oracle currently snags this one by putting a and b in an > equivalence set on that outgoing edge.  the fold_stmt mechanism you are using > automatically queries the relation oracle (if fur_source supplies one) for a > relation between the 2 use operands of a statement, and passes that to the > range-ops fold routine. so it will ultimately invoke operator_minus with > >   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ) > > Range-ops for operator minus defines the routine "op1_op2_relation_effect()" > (which is automatically applied to any value calculated by fold_range() after > the fact.   That routine has: > > // == and != produce [0,0] and ~[0,0] regardless of wrapping. >   if (rel == VREL_EQ) >     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec)); >   else if (rel == VREL_NE) >     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec), >                               VR_ANTI_RANGE); > > so as long as the relation was registered, fold_stmt will automatically get a > range of [0,0] for c, all through range-ops for operator_minus. > > If the fur_source used when fold_stmt is called supplies the routine > "register_relation" (which I think only a fur_depend currently does), then > relations are also automatically registered with the oracle supplied by the > fur_depend. > > There's a lot of this that can happen auto-magically.   I just need to make > sure we can reset some of the underlying cache entries (and oracle as well.. > it might need to be layered so that we don't pick up invalid relations.  It > should also be relatively simply to adjust I think) > > Andrew > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)