From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2A1A23858C52 for ; Wed, 21 Sep 2022 18:52:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2A1A23858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1663786367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kfGbFxQ0PSHsswFujLT+piJQPoxGAjFCwyNqL7cj5dg=; b=UAhdrRlWKOSZVwqDIqR8lLdtVnKJ5Dued4rpC5a8FmxWDGEIUVky0n8dAn5Lp0TTlyfKl3 MWUqYxElpre++38ese6vDPLR31sRF9xXVlBT2JK+cVPb53Rw7Vcqk/PCcU1V3K18OSAGFj VHhr9kFnuCkhzEKw+xP+vQftofIVVw8= Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-315-Cu6sQfZGMj2MN5B3W80_EA-1; Wed, 21 Sep 2022 14:52:45 -0400 X-MC-Unique: Cu6sQfZGMj2MN5B3W80_EA-1 Received: by mail-il1-f197.google.com with SMTP id w2-20020a056e021c8200b002f5c95226e0so4086016ill.9 for ; Wed, 21 Sep 2022 11:52:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=kfGbFxQ0PSHsswFujLT+piJQPoxGAjFCwyNqL7cj5dg=; b=lTCfg9AANgFrwZ0c4O/jVz7VGqDR/WZMX5MuWTmDr1E80pQ0/cV531zxv1zhtRjggA NYz1l6nkcU+dzLBl46amoIPB+xZjmPRQf+XZbP+8B1WmmfFayYc+8UjHwJdrkAIvodSE GDI45PUzL7fDGQUsrrdJ7Siq0019gunKCot86mv0KJRzYN+dXldzMCujOVwQl9gbUmSi 7A9+B/gyUKXEt5UZKlOkLk/fEybT7POr/Ggst8WB7VDM2t8I8SIsjOyh4/C3Cs01lYEw 2i6Cpe7upvHXduxSB1KRs78beqSVBTDsSMJpn548eAu5EsIrsnbfCmFgZZ2D7hgAO41B ejhg== X-Gm-Message-State: ACrzQf1SSnUgMueMIP92xzW2AWrEPFs5wUChxEz2aZQ0ZN6m0pFotZiw aSb6VR8GZcDuxUNRAZ4s5FcsENeeI1Z9poAibEcsGybc5itXrFFGMgRZO2rJXTk2XUqHJ6pSaDG pMODm1HrBjHu+JCWolA== X-Received: by 2002:a05:6638:379e:b0:35a:6503:453c with SMTP id w30-20020a056638379e00b0035a6503453cmr13957979jal.118.1663786364476; Wed, 21 Sep 2022 11:52:44 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5gIyTxY/nGOPMzbUKJNvlEEuxaL/RRL5CQOs3DEIYon9GXrUeFHiaMIvhy4Meg51TgDSy53w== X-Received: by 2002:a05:6638:379e:b0:35a:6503:453c with SMTP id w30-20020a056638379e00b0035a6503453cmr13957961jal.118.1663786363986; Wed, 21 Sep 2022 11:52:43 -0700 (PDT) Received: from ?IPV6:2607:fea8:a263:f600::3dbe? ([2607:fea8:a263:f600::3dbe]) by smtp.gmail.com with ESMTPSA id y2-20020a926402000000b002dd0bfd2467sm1281879ilb.11.2022.09.21.11.52.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 11:52:43 -0700 (PDT) Message-ID: Date: Wed, 21 Sep 2022 14:52:41 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.1 Subject: Re: [PATCH][RFH] Wire ranger into FRE To: Richard Biener Cc: gcc-patches@gcc.gnu.org, aldyh@redhat.com References: <20220919111916.192EE13A96@imap2.suse-dmz.suse.de> <6b618bf2-0c12-ea69-faf0-d701b3f4f2c5@redhat.com> From: Andrew MacLeod In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 9/21/22 06:13, Richard Biener wrote: > On Mon, 19 Sep 2022, Andrew MacLeod wrote: > > >> It looks like you created a fur_source to manually adjust PHIs within the >> fold_stmt query to ignore edges that are not marked executable. > Yes, and use the current values from the VN lattice when looking at > statement operands. yes, that is exactly how its intended to be used. > >> That would then just leave you with the stale cache state to deal with?   And >> if we can resolve that, would all just work?  at least in theory? > In theory, yes. Besides that the use-def walking of the cache it not > wired up with fur_* Well, yes. hmm, you want to set cache values based on the VN lattice as well. yes. OK, let me do a bit of cache explanation since I haven't done that yet. It does not need a fur_source of any kind, and I'll explain why. The cache has 2 primary functions..   1) maintain the global definition table (used to decide if a name has been processed). This is local and not the one the rest of GCC uses.   and   2) maintain the range-on-entry cache andresolve queries to that efficiently. The cache does not actually create any NEW information.  This is one of its key features in preventing any kind of cascading cyclic updates.  All it does is propagate existing information from the definition table, with values extracted from the global value table.  So your example is not good for this, as there isn't much in the cache for it.  so lets tweak it and add another block. example: n_2 = 1   i_4 = 0   val_5 = 0 :   # i_1 = PHI   #val_2 = PHI   val_6 = val_2 + 1; i_7 = i_1 + 1 if (i_7 > 22) goto else goto   if (i_7 < n_3)     goto ;   else     goto ;   _8 = val_6   return _8 For the sake of simplicity, lets also assume bb2 and bb3 have been looked and all the ssa-names defined in those blocks have an entry in rangers defintion table. Moving to if we ask for the range of "if (i_7< n_3) to be evaluated, it checks that i_7 and n_3 have been evaluated before it proceeds.  Both have entries, which means the next task is to get their values at this location.  range_of_expr is called on each one, and as they are not defined in this block, ranger asks the cache for the value of i_7 on entry to bb7. (likewise when it gets an answer back, it will do so for n_3 as well) The cache walks back the dominators until it finds either:   a) the block with the definition of i_7, or   b) a block which has an on-entry cache value for i_7 already set. During it walk, it tags any block which has i_7 in the export list, meaning an outgoing edge from that block may change the value of i_7. There are additional complexities, but the fundamental operation is to now take the value it saw from a) or b) as the starting value, and supply that to GORI at every intervening outgoing edge i_7 was exported from. Whenever the value changes along the way, we write a cache update at the end of the edge to facilitate future queries.  At the end, the answer has been calculated and is stored as the on-entry value for this block. So returning to the example, assume i_7 was set to VARYING in bb3, GORI would apply !(i_7 > 22) to the value, and we would end up in with a range-on-entry of [0, 21] and it would be stored in bb7. In your example, if you have disabled that back edge, you would have a value of [1,1] for i_7.  GORI would not have changed that value since its already < 22, and we would store [1,1] as the range-on-entry to Likewise, we do something similar for n_3.  The point is, the cache has not gone and created an new information.  its *only* purpose it to propagate known values thru the CFG, adjusting them for any outgoing edges that are encountered.  It uses a temporal marking in an attempt to identify when a global value has been changed, meaning it may need to go and repopulate something, but the bottom line It never contains anything beyond "reductions" in the ranges of values in the global table.  And it only every works on one name at a time. THe bottom line, Ranger effectively only every changes values via the global table. And the cache propagates simply those values around, adjusting them with GORI as appropriate. So there are multiple approaches.  We could simply kill the global table and cache line for any ssa_name we want to change the value of.  That gets a little tricker for on-entry values of secondary effects (ie, those used in the calculation of the primary names). It would probably work, but something unforeseen could show up. More advanced would be to "layer" the cache.  ie, we use the cache, and at some point, you issue a "push".  The push creates a new cache, and all queries look first to the new cache, and if it cant be answered looks "down" thru to the previous caches.   This resolves all queries as if the cache layers are all "one" thing. Sets would always go to the latest layer.   When we "pop", we delete the latest layer..  the layers underneath should all reflect the exact state at the time of the push.  All in theory of course :-) There would be some expense to it, but possibly not as much as one thinks since most components defer allocations and such until something is actually set.   It seems like it might not be a hard experiment that I will try once I get thru the bits I'm on now. > > >> That would have to drawback of losing any information that had been calculated >> earlier, and possibly trigger some additional calculations as that value will >> now be stale elsewhere in the IL.   We could make it more efficient if you >> also provided a bitmap over the basic blocks in the region (or some way of >> determining that).  Then we would just kill the entries in those blocks for >> those ssa-names. >> >> As long as you weren't making any ranger queries from outside the region >> during this time, that might work. >> >> Before I go any further, are we on the same page, or are you looking for >> something different than this? >> >> You had mentioned the path_ranger model also, which effectively uses ranger >> for the values on entry to the path/region, and then uses over-rided API entry >> points to choose whether to use the local cache vector, or go to rangers >> on-entry values.. >> >>  This approach could also work, and one way would be to implement the >> mark/reset API. The new class would maybe instantiate an additional local >> cache from which values are set/picked up first for any queries within the >> region blocks and when you are done, throws away the cache for the region. >> This would require some tweaking from rangers primary cache lookup/set >> routine, but perhaps there is some benefit to generalizing this model and >> integrating it.. ie, a region_ranger which builds this facility on top and >> allows you to reset regions.   Presumably you want to be able to "stack" these >> marks so you can push and pop regions? would also have to figure out how to >> work the relational oracle with it. > Yeah, so my thinking was that I'd keep my own range cache for in-region, > like path_ranger does, and track changes similar as to how I do for VN > so I can undo things efficiently. That way I also get to decide > how far to do the caching use-def walks. > > Of course the disadvantage is duplicating quite some ranger functionality > that's already there without too much changes besides my own cache plus > ensuring using the VNs computed values and edge execuability. Maybe > it isn't too much work though. Let me give the layered cache a shot.. i don't think that will be much work (ha. famous last words), and we can see if it works, and how expensive it is.   There might be alternatives along this line, especially if you are only querying a few things here and there.   The cache is designed to be able to plug in alternative mechanisms on a name-by-name basis (there are already a couple of variations depending on CFG size and name density usage) , so is might be that we can do something really cheap for these kind of pushed layers. I'm still mulling around options. >> I can think of a couple of possible approaches for that, some more efficient >> than others.    I suppose the path_ranger could even be completely replaced >> with that model. huh. it would just call mark() at the start of a path, and >> reset() when its done with the path.   Might not be as efficient however.   Be >> a good proof of correctness test I guess at least. >> >> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am I >> over engineering something? :-) > No, you basically identified the problems. > > Note that when not iterating the proposed change should already work > optimally, but it's of course bad to be "better" in the weaker setting ... > > The second thing would be to look at how to use conditional equivalences. > What's required to make ranger aware of them and what's required to > query them since for them the actual simplification would happen in > the folding done by value-numbering like with > > if (a == b) > c = a - b; > > which EVRP gets simplified for example but VN fails at this. > There's proposed enhancements to VN that might catch this as well > but I'm not too happy about the implementation. Mainly because > equivalences are a bitch. The relational oracle currently snags this one by putting a and b in an equivalence set on that outgoing edge.  the fold_stmt mechanism you are using automatically queries the relation oracle (if fur_source supplies one) for a relation between the 2 use operands of a statement, and passes that to the range-ops fold routine. so it will ultimately invoke operator_minus with   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ) Range-ops for operator minus defines the routine "op1_op2_relation_effect()" (which is automatically applied to any value calculated by fold_range() after the fact.   That routine has: // == and != produce [0,0] and ~[0,0] regardless of wrapping.   if (rel == VREL_EQ)     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));   else if (rel == VREL_NE)     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),                               VR_ANTI_RANGE); so as long as the relation was registered, fold_stmt will automatically get a range of [0,0] for c, all through range-ops for operator_minus. If the fur_source used when fold_stmt is called supplies the routine "register_relation" (which I think only a fur_depend currently does), then relations are also automatically registered with the oracle supplied by the fur_depend. There's a lot of this that can happen auto-magically.   I just need to make sure we can reset some of the underlying cache entries (and oracle as well.. it might need to be layered so that we don't pick up invalid relations.  It should also be relatively simply to adjust I think) Andrew