From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amacleod@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 2A1A23858C52
	for <gcc-patches@gcc.gnu.org>; Wed, 21 Sep 2022 18:52:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2A1A23858C52
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1663786367;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=kfGbFxQ0PSHsswFujLT+piJQPoxGAjFCwyNqL7cj5dg=;
	b=UAhdrRlWKOSZVwqDIqR8lLdtVnKJ5Dued4rpC5a8FmxWDGEIUVky0n8dAn5Lp0TTlyfKl3
	MWUqYxElpre++38ese6vDPLR31sRF9xXVlBT2JK+cVPb53Rw7Vcqk/PCcU1V3K18OSAGFj
	VHhr9kFnuCkhzEKw+xP+vQftofIVVw8=
Received: from mail-il1-f197.google.com (mail-il1-f197.google.com
 [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-315-Cu6sQfZGMj2MN5B3W80_EA-1; Wed, 21 Sep 2022 14:52:45 -0400
X-MC-Unique: Cu6sQfZGMj2MN5B3W80_EA-1
Received: by mail-il1-f197.google.com with SMTP id w2-20020a056e021c8200b002f5c95226e0so4086016ill.9
        for <gcc-patches@gcc.gnu.org>; Wed, 21 Sep 2022 11:52:45 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date;
        bh=kfGbFxQ0PSHsswFujLT+piJQPoxGAjFCwyNqL7cj5dg=;
        b=lTCfg9AANgFrwZ0c4O/jVz7VGqDR/WZMX5MuWTmDr1E80pQ0/cV531zxv1zhtRjggA
         NYz1l6nkcU+dzLBl46amoIPB+xZjmPRQf+XZbP+8B1WmmfFayYc+8UjHwJdrkAIvodSE
         GDI45PUzL7fDGQUsrrdJ7Siq0019gunKCot86mv0KJRzYN+dXldzMCujOVwQl9gbUmSi
         7A9+B/gyUKXEt5UZKlOkLk/fEybT7POr/Ggst8WB7VDM2t8I8SIsjOyh4/C3Cs01lYEw
         2i6Cpe7upvHXduxSB1KRs78beqSVBTDsSMJpn548eAu5EsIrsnbfCmFgZZ2D7hgAO41B
         ejhg==
X-Gm-Message-State: ACrzQf1SSnUgMueMIP92xzW2AWrEPFs5wUChxEz2aZQ0ZN6m0pFotZiw
	aSb6VR8GZcDuxUNRAZ4s5FcsENeeI1Z9poAibEcsGybc5itXrFFGMgRZO2rJXTk2XUqHJ6pSaDG
	pMODm1HrBjHu+JCWolA==
X-Received: by 2002:a05:6638:379e:b0:35a:6503:453c with SMTP id w30-20020a056638379e00b0035a6503453cmr13957979jal.118.1663786364476;
        Wed, 21 Sep 2022 11:52:44 -0700 (PDT)
X-Google-Smtp-Source: AMsMyM5gIyTxY/nGOPMzbUKJNvlEEuxaL/RRL5CQOs3DEIYon9GXrUeFHiaMIvhy4Meg51TgDSy53w==
X-Received: by 2002:a05:6638:379e:b0:35a:6503:453c with SMTP id w30-20020a056638379e00b0035a6503453cmr13957961jal.118.1663786363986;
        Wed, 21 Sep 2022 11:52:43 -0700 (PDT)
Received: from ?IPV6:2607:fea8:a263:f600::3dbe? ([2607:fea8:a263:f600::3dbe])
        by smtp.gmail.com with ESMTPSA id y2-20020a926402000000b002dd0bfd2467sm1281879ilb.11.2022.09.21.11.52.42
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Wed, 21 Sep 2022 11:52:43 -0700 (PDT)
Message-ID: <b311afe3-abd5-39f9-8f2d-ef1cb3934d3e@redhat.com>
Date: Wed, 21 Sep 2022 14:52:41 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.1
Subject: Re: [PATCH][RFH] Wire ranger into FRE
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, aldyh@redhat.com
References: <20220919111916.192EE13A96@imap2.suse-dmz.suse.de>
 <6b618bf2-0c12-ea69-faf0-d701b3f4f2c5@redhat.com>
 <nycvar.YFH.7.77.849.2209210958580.6652@jbgna.fhfr.qr>
From: Andrew MacLeod <amacleod@redhat.com>
In-Reply-To: <nycvar.YFH.7.77.849.2209210958580.6652@jbgna.fhfr.qr>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 9/21/22 06:13, Richard Biener wrote:
> On Mon, 19 Sep 2022, Andrew MacLeod wrote:
>
>
>> It looks like you created a fur_source to manually adjust PHIs within the
>> fold_stmt query to ignore edges that are not marked executable.
> Yes, and use the current values from the VN lattice when looking at
> statement operands.

yes, that is exactly how its intended to be used.


>
>> That would then just leave you with the stale cache state to deal with?   And
>> if we can resolve that, would all just work?  at least in theory?
> In theory, yes.  Besides that the use-def walking of the cache it not
> wired up with fur_*

Well, yes. hmm, you want to set cache values based on the VN lattice as 
well. yes. OK, let me do a bit of cache explanation since I haven't done 
that yet. It does not need a fur_source of any kind, and I'll explain why.

The cache has 2 primary functions..
   1) maintain the global definition table (used to decide if a name has 
been processed). This is local and not the one the rest of GCC uses.   and
   2) maintain the range-on-entry cache andresolve queries to that 
efficiently.

The cache does not actually create any NEW information.  This is one of 
its key features in preventing any kind of cascading cyclic updates.  
All it does is propagate existing information from the definition table, 
with values extracted from the global value table.  So your example is 
not good for this, as there isn't much in the cache for it.  so lets 
tweak it and add another block. example:

n_2 = 1
   i_4 = 0
   val_5 = 0
<bb 3>:
   # i_1 = PHI <i_4(2), i_7(3)>
   #val_2 = PHI <val_5(2), val_6(3) >
   val_6 = val_2 + 1;
   i_7 = i_1 + 1
   if (i_7 > 22)
      goto <bb 12>
   else
      goto <bb 7>
<bb 7>
   if (i_7 < n_3)
     goto <bb 3>;
   else
     goto <bb 4>;
<bb 4>
   _8 = val_6
   return _8

For the sake of simplicity, lets also assume bb2 and bb3 have been 
looked and all the ssa-names defined in those blocks have an entry in 
rangers defintion table.

Moving to <bb 7> if we ask for the range of "if (i_7< n_3) to be 
evaluated, it checks that i_7 and n_3 have been evaluated before it 
proceeds.  Both have entries, which means the next task is to get their 
values at this location.  range_of_expr is called on each one, and as 
they are not defined in this block, ranger asks the cache for the value 
of i_7 on entry to bb7. (likewise when it gets an answer back, it will 
do so for n_3 as well)

The cache walks back the dominators until it finds either:
   a) the block with the definition of i_7, or
   b) a block which has an on-entry cache value for i_7 already set.
During it walk, it tags any block which has i_7 in the export list, 
meaning an outgoing edge from that block may change the value of i_7.

There are additional complexities, but the fundamental operation is to 
now take the value it saw from a) or b) as the starting value, and 
supply that to GORI at every intervening outgoing edge i_7 was exported 
from. Whenever the value changes along the way, we write a cache update 
at the end of the edge to facilitate future queries.  At the end, the 
answer has been calculated and is stored as the on-entry value for this 
block.

So returning to the example, assume i_7 was set to VARYING in bb3, GORI 
would apply !(i_7 > 22) to the value, and we would end up in <bb 7> with 
a range-on-entry of [0, 21] and it would be stored in bb7.

In your example, if you have disabled that back edge, you would have a 
value of [1,1] for i_7.  GORI would not have changed that value since 
its already < 22, and we would store [1,1] as the range-on-entry to <bb 7>

Likewise, we do something similar for n_3.  The point is, the cache has 
not gone and created an new information.  its *only* purpose it to 
propagate known values thru the CFG, adjusting them for any outgoing 
edges that are encountered.  It uses a temporal marking in an attempt to 
identify when a global value has been changed, meaning it may need to go 
and repopulate something, but the bottom line It never contains anything 
beyond "reductions" in the ranges of values in the global table.  And it 
only every works on one name at a time.

THe bottom line, Ranger effectively only every changes values via the 
global table. And the cache propagates simply those values around, 
adjusting them with GORI as appropriate.

So there are multiple approaches.  We could simply kill the global table 
and cache line for any ssa_name we want to change the value of.  That 
gets a little tricker for on-entry values of secondary effects (ie, 
those used in the calculation of the primary names). It would probably 
work, but something unforeseen could show up.

More advanced would be to "layer" the cache.  ie, we use the cache, and 
at some point, you issue a "push".  The push creates a new cache, and 
all queries look first to the new cache, and if it cant be answered 
looks "down" thru to the previous caches.   This resolves all queries as 
if the cache layers are all "one" thing. Sets would always go to the 
latest layer.   When we "pop", we delete the latest layer..  the layers 
underneath should all reflect the exact state at the time of the push.  
All in theory of course :-) There would be some expense to it, but 
possibly not as much as one thinks since most components defer 
allocations and such until something is actually set.   It seems like it 
might not be a hard experiment that I will try once I get thru the bits 
I'm on now.


>
>
>> That would have to drawback of losing any information that had been calculated
>> earlier, and possibly trigger some additional calculations as that value will
>> now be stale elsewhere in the IL.   We could make it more efficient if you
>> also provided a bitmap over the basic blocks in the region (or some way of
>> determining that).  Then we would just kill the entries in those blocks for
>> those ssa-names.
>>
>> As long as you weren't making any ranger queries from outside the region
>> during this time, that might work.
>>
>> Before I go any further, are we on the same page, or are you looking for
>> something different than this?
>>
>> You had mentioned the path_ranger model also, which effectively uses ranger
>> for the values on entry to the path/region, and then uses over-rided API entry
>> points to choose whether to use the local cache vector, or go to rangers
>> on-entry values..
>>
>>   This approach could also work, and one way would be to implement the
>> mark/reset API. The new class would maybe instantiate an additional local
>> cache from which values are set/picked up first for any queries within the
>> region blocks and when you are done, throws away the cache for the region.
>> This would require some tweaking from rangers primary cache lookup/set
>> routine, but perhaps there is some benefit to generalizing this model and
>> integrating it.. ie, a region_ranger which builds this facility on top and
>> allows you to reset regions.   Presumably you want to be able to "stack" these
>> marks so you can push and pop regions? would also have to figure out how to
>> work the relational oracle with it.
> Yeah, so my thinking was that I'd keep my own range cache for in-region,
> like path_ranger does, and track changes similar as to how I do for VN
> so I can undo things efficiently.  That way I also get to decide
> how far to do the caching use-def walks.
>
> Of course the disadvantage is duplicating quite some ranger functionality
> that's already there without too much changes besides my own cache plus
> ensuring using the VNs computed values and edge execuability.  Maybe
> it isn't too much work though.

Let me give the layered cache a shot.. i don't think that will be much 
work (ha. famous last words), and we can see if it works, and how 
expensive it is.   There might be alternatives along this line, 
especially if you are only querying a few things here and there.   The 
cache is designed to be able to plug in alternative mechanisms on a 
name-by-name basis (there are already a couple of variations depending 
on CFG size and name density usage) , so is might be that we can do 
something really cheap for these kind of pushed layers.

I'm still mulling around options.


>> I can think of a couple of possible approaches for that, some more efficient
>> than others.    I suppose the path_ranger could even be completely replaced
>> with that model. huh. it would just call mark() at the start of a path, and
>> reset() when its done with the path.   Might not be as efficient however.   Be
>> a good proof of correctness test I guess at least.
>>
>> Anyway, am I in the ballpark somewhere of what you are looking for?  Or am I
>> over engineering something? :-)
> No, you basically identified the problems.
>
> Note that when not iterating the proposed change should already work
> optimally, but it's of course bad to be "better" in the weaker setting ...
>
> The second thing would be to look at how to use conditional equivalences.
> What's required to make ranger aware of them and what's required to
> query them since for them the actual simplification would happen in
> the folding done by value-numbering like with
>
>    if (a == b)
>      c = a - b;
>
> which EVRP gets simplified for example but VN fails at this.
> There's proposed enhancements to VN that might catch this as well
> but I'm not too happy about the implementation.  Mainly because
> equivalences are a bitch.

The relational oracle currently snags this one by putting a and b in an 
equivalence set on that outgoing edge.  the fold_stmt mechanism you are 
using automatically queries the relation oracle (if fur_source supplies 
one) for a relation between the 2 use operands of a statement, and 
passes that to the range-ops fold routine. so it will ultimately invoke 
operator_minus with

   fold_range (irange, int, range_of_a, range_of_b, VREL_EQ)

Range-ops for operator minus defines the routine 
"op1_op2_relation_effect()" (which is automatically applied to any value 
calculated by fold_range() after the fact.   That routine has:

// == and != produce [0,0] and ~[0,0] regardless of wrapping.
   if (rel == VREL_EQ)
     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec));
   else if (rel == VREL_NE)
     rel_range = int_range<2> (type, wi::zero (prec), wi::zero (prec),
                               VR_ANTI_RANGE);

so as long as the relation was registered, fold_stmt will automatically 
get a range of [0,0] for c, all through range-ops for operator_minus.

If the fur_source used when fold_stmt is called supplies the routine 
"register_relation" (which I think only a fur_depend currently does), 
then relations are also automatically registered with the oracle 
supplied by the fur_depend.

There's a lot of this that can happen auto-magically.   I just need to 
make sure we can reset some of the underlying cache entries (and oracle 
as well.. it might need to be layered so that we don't pick up invalid 
relations.  It should also be relatively simply to adjust I think)

Andrew