From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <amacleod@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 3FC4B3858D28
	for <gcc-patches@gcc.gnu.org>; Mon, 19 Sep 2022 15:25:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3FC4B3858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1663601149;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=ypAVDcLo6PsSlHUIhKeHpDM/PiEDWfKB8+fWJDhx3dY=;
	b=d6862diZ1d97mqx1wxyM5fztvvFE85B1N/QaGYkAPAJCsMCagCm22LsOfX5P86eK4ABN44
	dDAa7MJgeDjN4ynJcu/fOV61A2BrJtRgJLR2QUsi+uJe4JCHJnenRJdnUVcEzlsHZnwRWF
	bwHmc8/E0goP6MtavZ9S1YSfkEGfQMg=
Received: from mail-il1-f197.google.com (mail-il1-f197.google.com
 [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-377-QlFYfeI5OryQQT4GGNpeOw-1; Mon, 19 Sep 2022 11:25:48 -0400
X-MC-Unique: QlFYfeI5OryQQT4GGNpeOw-1
Received: by mail-il1-f197.google.com with SMTP id w2-20020a056e021c8200b002f5c95226e0so1881253ill.9
        for <gcc-patches@gcc.gnu.org>; Mon, 19 Sep 2022 08:25:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date;
        bh=ypAVDcLo6PsSlHUIhKeHpDM/PiEDWfKB8+fWJDhx3dY=;
        b=x4k2LudhIg5MpJmHphuWBcVQ4o2+SDnwaocY5Z+mzVYb61cN2HHxpT8FUwysh55iNa
         F8pC1CPdVPSv+OczmhrzY4a2AZL1Et3p5BTyZGQEqTBn029h9U2/xEQ+1LcYrpc3cefB
         8kQ/exYsTiwJbWzFkbzzBrdwqx2feFcN3gQNyaKoWVziOZxV2j0c1wkdENdvQIq81aM6
         jzbpJwOrcgB1KXKNymO5LwxOZwG7NFRDjPgxCuTZcqHJx5zqkYcvp28Dh2sja5Im9Tu1
         aOmqs1bMplwRi+k0kO0R3g8pxLg7zIag/viD0siZRLhcCEafXuvCxyhDWMmxtjsmvCU4
         iXFA==
X-Gm-Message-State: ACrzQf3dku7wTOBjuiT7UXabbR01gwn6pSl5OJIkVjyXkC8V6o0lLelZ
	KevcJy2bA9wYHqvg+P63nfBjb+GZLhJoHIZHOggV15opEVjMqlBnHbCcciGwZDfihAZSY3ORpj/
	4TTE5gCxfXxK8P8Og8A==
X-Received: by 2002:a05:6638:344e:b0:35a:10f6:400b with SMTP id q14-20020a056638344e00b0035a10f6400bmr8359084jav.142.1663601147940;
        Mon, 19 Sep 2022 08:25:47 -0700 (PDT)
X-Google-Smtp-Source: AMsMyM60gRYS3b4s9GmWsJLAk1alKcKUPsyDenLwCJBS4wSYGk5ahdTYEAsG+7nAtzHJSr4d1d1sxQ==
X-Received: by 2002:a05:6638:344e:b0:35a:10f6:400b with SMTP id q14-20020a056638344e00b0035a10f6400bmr8359065jav.142.1663601147445;
        Mon, 19 Sep 2022 08:25:47 -0700 (PDT)
Received: from ?IPV6:2607:fea8:a263:f600::3dbe? ([2607:fea8:a263:f600::3dbe])
        by smtp.gmail.com with ESMTPSA id p64-20020a022943000000b0035a8ae29d15sm4409581jap.84.2022.09.19.08.25.46
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 19 Sep 2022 08:25:46 -0700 (PDT)
Message-ID: <6b618bf2-0c12-ea69-faf0-d701b3f4f2c5@redhat.com>
Date: Mon, 19 Sep 2022 11:25:45 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.1
Subject: Re: [PATCH][RFH] Wire ranger into FRE
To: Richard Biener <rguenther@suse.de>, gcc-patches@gcc.gnu.org
Cc: aldyh@redhat.com
References: <20220919111916.192EE13A96@imap2.suse-dmz.suse.de>
From: Andrew MacLeod <amacleod@redhat.com>
In-Reply-To: <20220919111916.192EE13A96@imap2.suse-dmz.suse.de>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Yeah, currently the internal cache isnt really wired into the 
fold_using_range as its is suppose to be a consistent internal state.  
so its not currently expecting to work in a situation here what it 
thinks is global state might change.

I figured I better go back and watch your entire VN presentation, I only 
caught the last bit live.

Let me try to understand exactly what you want/need.. and let me use a 
simple example from that talk (yes I just typed it in :-).

   n_2 = 1
   i_4 = 0
   val_5 = 0
<bb 3>:
   # i_1 = PHI <i_4(2), i_7(3)>
   #val_2 = PHI <val_5(2), val_6(3) >
   val_6 = val_2 + 1;
   i_7 = i_1 + 1
   if (i_7 < n_3)
     goto <bb 3>;
   else
     goto <bb 4>;
<bb 4>
   _8 = val_6
   return _8


In an ideal world, While you are processing bb 3 the first iteration,  
you want edge 4->3 to be unexecutable as far as ranger is concerned.  
You walk thru the block, and get to the end.  and in this example, you'd 
be done cause any queries you make range would have if (2 < 1),

For the case when we do need to iterate (say n_2 = 3), you need to go 
back and rewalk that block, with that edge no longer marked as 
unexecutable?  And part of the problem would be that ranger's cache for 
bb 3 would have all those values from the first pass. and everything 
explodes.

It looks like you created a fur_source to manually adjust PHIs within 
the fold_stmt query to ignore edges that are not marked executable.

So now let me start with my questions.

In theory, if you could tell it "kill the cache for bb3"  change the 
executable state and re walk it, that would work?  This clearly get s a 
little more complicated as the size of the region that is being iterated 
grows, so you would want to be able to invalidate the cache for a range 
of blocks.

Thats not quite as straightforward as it might seems because the cache 
for values per basic block is indexed by ssa-name..  so you cant simply 
say, "kill bb3s value".. you need to walk all the ssa-names that have a 
cache object, and if they have an entry for bb3, kill that.  But lets 
leave that alone for a minute, because that is a solvable problem.

Ranger also has a globally available flag that it uses to internally 
flag and track executable state of edge:

   auto_edge_flag non_executable_edge_flag;

The gori object which is instantiated with any ranger instance only uses 
that to determine if an edge should be processed for a value  ie

gori_compute::outgoing_edge_range_p (vrange &r, edge e, tree name, 
range_query &q)
{
     if ((e->flags & m_not_executable_flag))
     {
       r.set_undefined ();
       return true;

So it seems to me that if you all set/cleared this flag along with 
EDGE_EXECUTABLE  (and it means the opposite by the way.  TRUE means the 
edge is not_executable.. )  That at least all your queries through 
ranger would be picking up the values you are looking for.  including 
those PHI's.

That would then just leave you with the stale cache state to deal 
with?   And if we can resolve that, would all just work?  at least in 
theory?

There is a concept in the cache of staleness which is used when we 
recalculate over back edges, but its really oriented towards a model 
where we only ever move edges from executable to non-executable.    Your 
VN approach moves them in the other way and I have no confidence it 
would work properly in that mode.

Its not super efficient, but as a starting point we might be able to do 
something like
   1 - when you enter a region call a new routine called "Mark state" 
which would  push an empty bitmap over ssa-names onto a stack
   2 - everytime the cache creates an entry, set the bit for the 
ssa-name in the bitmap at the top of the stack
  3 - when you call "reset state", simply kill all the cache entries for 
any ssa name set in the bitmap.  And also kill rangers internal global 
value for the name.   THIs would in essence reset eveyrthing it knows 
about that ssa name.

That would have to drawback of losing any information that had been 
calculated earlier, and possibly trigger some additional calculations as 
that value will now be stale elsewhere in the IL.   We could make it 
more efficient if you also provided a bitmap over the basic blocks in 
the region (or some way of determining that).  Then we would just kill 
the entries in those blocks for those ssa-names.

As long as you weren't making any ranger queries from outside the region 
during this time, that might work.

Before I go any further, are we on the same page, or are you looking for 
something different than this?

You had mentioned the path_ranger model also, which effectively uses 
ranger for the values on entry to the path/region, and then uses 
over-rided API entry points to choose whether to use the local cache 
vector, or go to rangers on-entry values..

  This approach could also work, and one way would be to implement the 
mark/reset API . The new class would maybe instantiate an additional 
local cache from which values are set/picked up first for any queries 
within the region blocks and when you are done, throws away the cache 
for the region. This would require some tweaking from rangers primary 
cache lookup/set routine, but perhaps there is some benefit to 
generalizing this model and integrating it.. ie, a region_ranger which 
builds this facility on top and allows you to reset regions.   
Presumably you want to be able to "stack" these marks so you can push 
and pop regions? would also have to figure out how to work the 
relational oracle with it.

I can think of a couple of possible approaches for that, some more 
efficient than others.    I suppose the path_ranger could even be 
completely replaced with that model. huh. it would just call mark() at 
the start of a path, and reset() when its done with the path.   Might 
not be as efficient however.   Be a good proof of correctness test I 
guess at least.

Anyway, am I in the ballpark somewhere of what you are looking for?  Or 
am I over engineering something? :-)

Andrew


On 9/19/22 07:19, Richard Biener wrote:
> The following is an attempt to utilize ranger for simplification
> of conditionals during value-numbering.  It fails the added
> testcase because it seems the fur source is only involved when
> processing the stmt to be folded but not when filling the cache
> of ranges on the operand use->def chain.
>
> When not iterating that's not a problem since earlier stmts
> have operands eliminated but when not iterating or when
> elimination is disabled the IL remains unchanged.
>
> When iterating it will probably do things wrong due to the lack
> of pruning the chache for the region we unwind and when doing
> region based VN it likely picks up stale EDGE_EXECUTABLE when
> leaving the region over the entry because there's no way to stop
> "local" cache prefilling when leaving the region (and just using
> global ranges from there).
>
> Knowing path-range-query it's probably possible to clone all
> the block walking and cache filling, making a special variant
> for value-numbering but I wonder whether that's really what's
> intended?  I would probably need to implement a special
> range_of_expr and range_of_stmt for this and keep my own cache?
> Basically a region_range_query with wired in valueization?
>
> 	* tree-ssa-sccvn.cc (...): ...
> ---
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 15 ++++
>   gcc/tree-ssa-sccvn.cc                       | 77 ++++++++++++++++++---
>   gcc/tree-ssa-sccvn.h                        |  3 +-
>   3 files changed, 83 insertions(+), 12 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> new file mode 100644
> index 00000000000..f479a322b70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-fre1" } */
> +
> +int foo (int *p, int *q)
> +{
> +  if (*p > 0 && *q > 0)
> +    {
> +      int v = *p + *q;
> +      if (v > 0)
> +        return *q;
> +    }
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "known outgoing edge" 1 "fre1" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 74b8d8d18ef..2753200c2c2 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "fold-const-call.h"
>   #include "ipa-modref-tree.h"
>   #include "ipa-modref.h"
> +#include "gimple-range.h"
>   #include "tree-ssa-sccvn.h"
>   
>   /* This algorithm is based on the SCC algorithm presented by Keith
> @@ -360,10 +361,11 @@ static vn_ssa_aux_t last_pushed_avail;
>      correct.  */
>   static vn_tables_t valid_info;
>   
> +tree (*vn_valueize) (tree);
>   
>   /* Valueization hook for simplify_replace_tree.  Valueize NAME if it is
>      an SSA name, otherwise just return it.  */
> -tree (*vn_valueize) (tree);
> +
>   static tree
>   vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
>   {
> @@ -380,6 +382,36 @@ vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
>     return res;
>   }
>   
> +class vn_fur : public fur_depend
> +{
> +public:
> +  vn_fur (gimple *s, gori_compute *g, range_query *q)
> +    : fur_depend (s, g, q) {}
> +  virtual bool get_operand (vrange &, tree);
> +  virtual bool get_phi_operand (vrange &, tree, edge);
> +};
> +
> +bool
> +vn_fur::get_operand (vrange &r, tree op)
> +{
> +  return fur_depend::get_operand (r, vn_valueize (op));
> +}
> +
> +bool
> +vn_fur::get_phi_operand (vrange &r, tree op, edge e)
> +{
> +  // ???  Check region boundary, avoid walking ourside of the region
> +  // somehow?  Possibly when initializing the ranger object?
> +  // or can/should we simply return 'false' when we arrive with
> +  // e being the entry edge?  What about for non-PHIs?  Can we
> +  // somehow force to only use the global range and stop caching
> +  // after that?
> +  if (e->flags & EDGE_EXECUTABLE)
> +    return fur_depend::get_phi_operand (r, vn_valueize (op), e);
> +  r.set_undefined ();
> +  return true;
> +}
> +
>   
>   /* This represents the top of the VN lattice, which is the universal
>      value.  */
> @@ -7292,12 +7324,12 @@ eliminate_with_rpo_vn (bitmap inserted_exprs)
>   
>   static unsigned
>   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> -	     bool iterate, bool eliminate, vn_lookup_kind kind);
> +	     bool iterate, bool eliminate, vn_lookup_kind kind, gimple_ranger *);
>   
>   void
>   run_rpo_vn (vn_lookup_kind kind)
>   {
> -  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind);
> +  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind, NULL);
>   
>     /* ???  Prune requirement of these.  */
>     constant_to_value_id = new hash_table<vn_constant_hasher> (23);
> @@ -7580,7 +7612,8 @@ insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
>   static unsigned
>   process_bb (rpo_elim &avail, basic_block bb,
>   	    bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
> -	    bool do_region, bitmap exit_bbs, bool skip_phis)
> +	    bool do_region, bitmap exit_bbs, bool skip_phis,
> +	    gimple_ranger *ranger)
>   {
>     unsigned todo = 0;
>     edge_iterator ei;
> @@ -7766,6 +7799,20 @@ process_bb (rpo_elim &avail, basic_block bb,
>   		      }
>   		  }
>   	      }
> +	    if (!val && ranger)
> +	      {
> +		int_range_max r;
> +		fold_using_range f;
> +		// ???  We get here with not valueized operands but
> +		// the fur_source only is involved for operands of
> +		// this very stmt, not when ranger prefills its cache
> +		// walking use->def edges.
> +		vn_fur src (last, &ranger->gori (), ranger);
> +		tree tem;
> +		if (f.fold_stmt (r, last, src)
> +		    && r.singleton_p (&tem))
> +		  val = tem;
> +	      }
>   	    if (val)
>   	      e = find_taken_edge (bb, val);
>   	    if (! e)
> @@ -7997,7 +8044,8 @@ do_unwind (unwind_state *to, rpo_elim &avail)
>   
>   static unsigned
>   do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
> -	     bool iterate, bool eliminate, vn_lookup_kind kind)
> +	     bool iterate, bool eliminate, vn_lookup_kind kind,
> +	     gimple_ranger *ranger)
>   {
>     unsigned todo = 0;
>     default_vn_walk_kind = kind;
> @@ -8213,7 +8261,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>   	todo |= process_bb (avail, bb,
>   			    rpo_state[idx].visited != 0,
>   			    rpo_state[idx].iterate,
> -			    iterate, eliminate, do_region, exit_bbs, false);
> +			    iterate, eliminate, do_region, exit_bbs, false,
> +			    ranger);
>   	rpo_state[idx].visited++;
>   
>   	/* Verify if changed values flow over executable outgoing backedges
> @@ -8326,7 +8375,8 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>   	  nblk++;
>   	  todo |= process_bb (avail, bb, false, false, false, eliminate,
>   			      do_region, exit_bbs,
> -			      skip_entry_phis && bb == entry->dest);
> +			      skip_entry_phis && bb == entry->dest,
> +			      ranger);
>   	  rpo_state[idx].visited++;
>   
>   	  FOR_EACH_EDGE (e, ei, bb->succs)
> @@ -8419,14 +8469,17 @@ do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
>      If ITERATE is true then treat backedges optimistically as not
>      executed and iterate.  If ELIMINATE is true then perform
>      elimination, otherwise leave that to the caller.
> -   KIND specifies the amount of work done for handling memory operations.  */
> +   KIND specifies the amount of work done for handling memory operations.
> +   When RANGER is specified use that to simplify conditionals.  */
>   
>   unsigned
>   do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
> -	   bool iterate, bool eliminate, vn_lookup_kind kind)
> +	   bool iterate, bool eliminate, vn_lookup_kind kind,
> +	   gimple_ranger *ranger)
>   {
>     auto_timevar tv (TV_TREE_RPO_VN);
> -  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind);
> +  unsigned todo = do_rpo_vn_1 (fn, entry, exit_bbs, iterate, eliminate, kind,
> +			       ranger);
>     free_rpo_vn ();
>     return todo;
>   }
> @@ -8482,7 +8535,9 @@ pass_fre::execute (function *fun)
>     if (iterate_p)
>       loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
>   
> -  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE);
> +  gimple_ranger ranger;
> +  todo = do_rpo_vn_1 (fun, NULL, NULL, iterate_p, true, VN_WALKREWRITE,
> +		      &ranger); //!iterate_p ? &ranger : NULL);
>     free_rpo_vn ();
>   
>     if (iterate_p)
> diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
> index abcf7e666c2..b9c4ac787b4 100644
> --- a/gcc/tree-ssa-sccvn.h
> +++ b/gcc/tree-ssa-sccvn.h
> @@ -298,7 +298,8 @@ tree vn_nary_simplify (vn_nary_op_t);
>   unsigned do_rpo_vn (function *, edge, bitmap,
>   		    /* iterate */ bool = false,
>   		    /* eliminate */ bool = true,
> -		    vn_lookup_kind = VN_WALKREWRITE);
> +		    vn_lookup_kind = VN_WALKREWRITE,
> +		    class gimple_ranger * = NULL);
>   
>   /* Private interface for PRE.  */
>   void run_rpo_vn (vn_lookup_kind);