public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] tree-optimization/101014 - Remove poor value computations.
@ 2021-06-18 21:46 Andrew MacLeod
  2021-06-19  7:51 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew MacLeod @ 2021-06-18 21:46 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3259 bytes --]

I am pleased to say that this patch kills the poor value computations in 
the ranger's cache.

Its been a bit of a thorn, and is mostly a hack that was applied early 
on to enable getting some opportunities that were hard to get otherwise.

The more consistent propagation we now do combined with other changes 
means I can kill this wart on trunk. It even results in a 1% speedup.. 
and should resolve some of the excessive compile time issues causes by 
undesirable iteration, including 101014.. for good I hope :-).

I tried turning off the poor_value computations on the GCC11 branch, and 
we may want to consider doing it there too.  In my testsuite, we miss a 
total of 3 cases out of 4700 new ones identified by ranger.  For the 
stability, I'd suggest we turn off poor_value computations there as 
well.  This patch rips out all the code, but for GCC11 I'd just change 
push_poor_value to always return false, thus never registering any 
values. less churn that way. I'll run some tests and post that 
separately if you think we should go ahead with it.

Bootstraps on 86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

The details:

The cache is designed to propagate info without creating NEW info.  Ie, 
It cannot query new ranges, cannot set global ranges, etc... only the 
higher level ranger can do that.  This is how we avoid cycles when 
iterating..  Ranger says "This set this value to X here" , and the 
cache's job is to propagate that info around the CFG as needed, applying 
static GORI outgoing ranges along the way.   Only the ranger can request 
/set NEW information.

There were some cases where back edges were missing key bits of info 
that hadn't been created yet.  The "poor value" approach was a stop-gap 
measure until things improve. When the cache is trying to propagate a 
range, the GORI edge computations sometimes wants a value which is not 
available.  Under some conditions it is allowed to register this as a 
"poor value" and continue propagating.  When done, it looks at the poor 
value list, and asks the ranger to "go get a new value for this".  If  
the ranger finds a better value, then this new value is propagated 
around.  So its a bit of a cheat from the original design. The ranger is 
still the only new-info creator, but the request is sometimes started 
from the cache. This is not desirable, and can lead to some 
inconsistencies & inefficiencies.

As one can imagine. this sometimes causes significant iteration, as in 
this testcase.  Ie, the new ranger request from the cache triggers 
another poor value request, etc etc.  Which is why it wasn't designed to 
work that way.  Anyway, longer story shorter, I revisited the poor value 
code, and discovered that with the new GORI and fold_using_range rework 
from the past month, dropping the poor value code results in *0* 
difference in any of our test cases/suites.   Further more, the biggest 
thing that it really enabled was picking up range restrictions imposed 
by unvisited statements that were being forced to VARYING.  The new 
rework allows such statements to simply be folded using the new 
global_range_query and even so that bit of info is now easily captured.  
That's the enhancement from the second patch.



[-- Attachment #2: no-poor.patch --]
[-- Type: text/x-patch, Size: 10464 bytes --]

commit 870b674f72d4894b94efa61764fd87ecec29ffde
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Fri Jun 18 12:33:18 2021 -0400

    Remove poor value computations.
    
    Remove the old "poor value" approach which made callbacks into ranger
    from the cache.  Use only the best available value for all propagation.
    
            PR tree-optimization/101014
            * gimple-range-cache.cc (ranger_cache::ranger_cache): Remove poor
            value list.
            (ranger_cache::~ranger_cache): Ditto.
            (ranger_cache::enable_new_values): Delete.
            (ranger_cache::push_poor_value): Delete.
            (ranger_cache::range_of_def): Remove poor value processing.
            (ranger_cache::entry_range): Ditto.
            (ranger_cache::fill_block_cache): Ditto.
            * gimple-range-cache.h (class ranger_cache): Remove poor value members.
            * gimple-range.cc (gimple_ranger::range_of_expr): Remove call.
            * gimple-range.h (class gimple_ranger): Adjust.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 2c6e5bb2d38..c85b299d13e 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -706,16 +706,13 @@ temporal_cache::set_always_current (tree name)
 
 // --------------------------------------------------------------------------
 
-ranger_cache::ranger_cache (gimple_ranger &q) : query (q)
+ranger_cache::ranger_cache ()
 {
   m_workback.create (0);
   m_workback.safe_grow_cleared (last_basic_block_for_fn (cfun));
   m_update_list.create (0);
   m_update_list.safe_grow_cleared (last_basic_block_for_fn (cfun));
   m_update_list.truncate (0);
-  m_poor_value_list.create (0);
-  m_poor_value_list.safe_grow_cleared (20);
-  m_poor_value_list.truncate (0);
   m_temporal = new temporal_cache;
   unsigned x, lim = last_basic_block_for_fn (cfun);
   // Calculate outgoing range info upfront.  This will fully populate the
@@ -727,13 +724,11 @@ ranger_cache::ranger_cache (gimple_ranger &q) : query (q)
       if (bb)
 	m_gori.exports (bb);
     }
-  m_new_value_p = true;
 }
 
 ranger_cache::~ranger_cache ()
 {
   delete m_temporal;
-  m_poor_value_list.release ();
   m_workback.release ();
   m_update_list.release ();
 }
@@ -748,17 +743,6 @@ ranger_cache::dump (FILE *f)
   fprintf (f, "\n");
 }
 
-// Allow or disallow the cache to flag and query new values when propagation
-// is forced to use an unknown value.  The previous state is returned.
-
-bool
-ranger_cache::enable_new_values (bool state)
-{
-  bool ret = m_new_value_p;
-  m_new_value_p = state;
-  return ret;
-}
-
 // Dump the caches for basic block BB to file F.
 
 void
@@ -836,30 +820,6 @@ ranger_cache::set_global_range (tree name, const irange &r)
   m_temporal->set_timestamp (name);
 }
 
-// Push a request for a new lookup in block BB of name.  Return true if
-// the request is actually made (ie, isn't a duplicate).
-
-bool
-ranger_cache::push_poor_value (basic_block bb, tree name)
-{
-  if (!m_new_value_p)
-    return false;
-  if (m_poor_value_list.length ())
-    {
-      // Don't push anything else to the same block.  If there are multiple 
-      // things required, another request will come during a later evaluation
-      // and this prevents oscillation building uneccessary depth.
-      if ((m_poor_value_list.last ()).bb == bb)
-	return false;
-    }
-
-  struct update_record rec;
-  rec.bb = bb;
-  rec.calc = name;
-  m_poor_value_list.safe_push (rec);
-  return true;
-}
-
 //  Provide lookup for the gori-computes class to access the best known range
 //  of an ssa_name in any given basic block.  Note, this does no additonal
 //  lookups, just accesses the data that is already known.
@@ -872,31 +832,16 @@ ranger_cache::range_of_def (irange &r, tree name, basic_block bb)
   gcc_checking_assert (gimple_range_ssa_p (name));
   gcc_checking_assert (bb == gimple_bb (SSA_NAME_DEF_STMT (name)));
 
+  // Pick up the best global range available.
   if (!m_globals.get_global_range (r, name))
-    {
-      // If it doesn't have a value calculated, it means it's a
-      // "poor" value being used in some calculation.  Queue it up
-      // as a poor value to be improved later.
-      r = gimple_range_global (name);
-      if (push_poor_value (bb, name))
-	{
-	  if (DEBUG_RANGE_CACHE)
-	    {
-	      fprintf (dump_file,
-		       "*CACHE* no global def in bb %d for ", bb->index);
-	      print_generic_expr (dump_file, name, TDF_SLIM);
-	      fprintf (dump_file, " depth : %d\n",
-		       m_poor_value_list.length ());
-	    }
-	}
-    }
+    r = gimple_range_global (name);
+
   if (r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
       !cfun->can_throw_non_call_exceptions)
     r = range_nonzero (TREE_TYPE (name));
 }
 
-// Get the range of NAME as it occurs on entry to block BB.  If it is not set,
-// mark it as a poor value for possible later improvement.
+// Get the range of NAME as it occurs on entry to block BB.
 
 void
 ranger_cache::entry_range (irange &r, tree name, basic_block bb)
@@ -910,20 +855,7 @@ ranger_cache::entry_range (irange &r, tree name, basic_block bb)
   // Look for the on-entry value of name in BB from the cache.
   if (!m_on_entry.get_bb_range (r, name, bb))
     {
-      // If it has no entry but should, then mark this as a poor value.
-      // Its not a poor value if it does not have *any* edge ranges,
-      // Then global range is as good as it gets.
-      if (m_gori.has_edge_range_p (name) && push_poor_value (bb, name))
-	{
-	  if (DEBUG_RANGE_CACHE)
-	    {
-	      fprintf (dump_file,
-		       "*CACHE* no on entry range in bb %d for ", bb->index);
-	      print_generic_expr (dump_file, name, TDF_SLIM);
-	      fprintf (dump_file, " depth : %d\n", m_poor_value_list.length ());
-	    }
-	}
-      // Try to pick up any known global value as a best guess for now.
+      // Try to pick up any known global value.
       if (!m_globals.get_global_range (r, name))
 	r = gimple_range_global (name);
     }
@@ -1195,7 +1127,6 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   edge e;
   int_range_max block_result;
   int_range_max undefined;
-  unsigned poor_list_start = m_poor_value_list.length ();  
 
   // At this point we shouldn't be looking at the def, entry or exit block.
   gcc_checking_assert (bb != def_bb && bb != ENTRY_BLOCK_PTR_FOR_FN (cfun) &&
@@ -1301,49 +1232,5 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   propagate_cache (name);
   if (DEBUG_RANGE_CACHE)
     fprintf (dump_file, "  Propagation update done.\n");
-
-  // Now that the cache has been updated, check to see if there were any 
-  // SSA_NAMES used in filling the cache which were "poor values".
-  // Evaluate them, and inject any new values into the propagation
-  // list, and see if it improves any on-entry values.
-  if (poor_list_start !=  m_poor_value_list.length ())
-    {
-      gcc_checking_assert (poor_list_start < m_poor_value_list.length ());
-      while (poor_list_start < m_poor_value_list.length ())
-	{
-	  // Find a range for this unresolved value.   
-	  // Note, this may spawn new cache filling cycles, but by the time it
-	  // is finished, the work vectors will all be back to the same state
-	  // as before the call.  The update record vector will always be
-	  // returned to the current state upon return.
-	  struct update_record rec = m_poor_value_list.pop ();
-	  basic_block calc_bb = rec.bb;
-	  int_range_max tmp;
-
-	  if (DEBUG_RANGE_CACHE)
-	    {
-	      fprintf (dump_file, "(%d:%d)Calculating ",
-		       m_poor_value_list.length () + 1, poor_list_start);
-	      print_generic_expr (dump_file, name, TDF_SLIM);
-	      fprintf (dump_file, " used POOR VALUE for ");
-	      print_generic_expr (dump_file, rec.calc, TDF_SLIM);
-	      fprintf (dump_file, " in bb%d, trying to improve:\n",
-		       calc_bb->index);
-	    }
-
-	  // Calculate a range at the exit from the block so the caches feeding
-	  // this block will be filled, and we'll get a "better" value.
-	  // Disallow additonal "poor values" during this phase to avoid
-	  // iterations that are unlikely to be profitable for this name.
-	  // See PR 101014.
-	  bool state = enable_new_values (false);
-	  query.range_on_exit (tmp, calc_bb, rec.calc);
-	  enable_new_values (state);
-	  
-	  // Then ask for NAME to be re-evaluated on outgoing edges and 
-	  // use any new values.
-	  propagate_updated_value (name, calc_bb);
-	}
-    }
 }
 
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 1a2aaceb3ed..e67af68f30b 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -90,7 +90,7 @@ private:
 class ranger_cache : public range_query
 {
 public:
-  ranger_cache (class gimple_ranger &q);
+  ranger_cache ();
   ~ranger_cache ();
 
   virtual bool range_of_expr (irange &r, tree name, gimple *stmt);
@@ -123,17 +123,6 @@ private:
 
   vec<basic_block> m_workback;
   vec<basic_block> m_update_list;
-
-  // Iterative "poor value" calculations.
-  struct update_record
-  {
-    basic_block bb;	// Block which value needs to be calculated in.
-    tree calc;		// SSA_NAME which needs its value calculated.
-  };
-  bool push_poor_value (basic_block bb, tree name);
-  vec<update_record> m_poor_value_list;
-  class gimple_ranger &query;
-  bool m_new_value_p;
 };
 
 #endif // GCC_SSA_RANGE_CACHE_H
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 624cfb12562..0a2c72b29aa 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1167,9 +1167,7 @@ gimple_ranger::range_of_expr (irange &r, tree expr, gimple *stmt)
   // trigger new value calculations.  PR 100781.
   if (is_gimple_debug (stmt))
     {
-      bool state = m_cache.enable_new_values (false);
       m_cache.range_of_expr (r, expr, stmt);
-      m_cache.enable_new_values (state);
       return true;
     }
   basic_block bb = gimple_bb (stmt);
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 9ac779a720c..fc28123f9ec 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -58,7 +58,6 @@ along with GCC; see the file COPYING3.  If not see
 class gimple_ranger : public range_query
 {
 public:
-  gimple_ranger () : m_cache (*this) { }
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) OVERRIDE;
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) OVERRIDE;
   virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;

[-- Attachment #3: global.patch --]
[-- Type: text/x-patch, Size: 3240 bytes --]

commit cb448ade74da1de1633e6ed97f8c5ecbac24b27a
Author: Andrew MacLeod <amacleod@redhat.com>
Date:   Fri Jun 18 12:52:10 2021 -0400

    Calculate a global definition if one has not been registered.
    
    With poor values gone, Pick up range restrictions from statements
    by folding them with global cache values.
    
            * gimple-range-cache.cc (ranger_cache::range_of_def):  Calculate
            a range if global is not available.
            (ranger_cache::entry_range): Fallback to range_of_def.
            * gimple-range-cache.h (range_of_def): Adjust prototype.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index c85b299d13e..def604dc149 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -824,19 +824,27 @@ ranger_cache::set_global_range (tree name, const irange &r)
 //  of an ssa_name in any given basic block.  Note, this does no additonal
 //  lookups, just accesses the data that is already known.
 
-// Get the range of NAME when the def occurs in block BB
+// Get the range of NAME when the def occurs in block BB.  If BB is NULL
+// get the best global value available.
 
 void
 ranger_cache::range_of_def (irange &r, tree name, basic_block bb)
 {
   gcc_checking_assert (gimple_range_ssa_p (name));
-  gcc_checking_assert (bb == gimple_bb (SSA_NAME_DEF_STMT (name)));
+  gcc_checking_assert (!bb || bb == gimple_bb (SSA_NAME_DEF_STMT (name)));
 
   // Pick up the best global range available.
   if (!m_globals.get_global_range (r, name))
-    r = gimple_range_global (name);
+    {
+      // If that fails, try to calculate the range using just global values.
+      gimple *s = SSA_NAME_DEF_STMT (name);
+      if (gimple_get_lhs (s) == name)
+	fold_range (r, s, get_global_range_query ());
+      else
+	r = gimple_range_global (name);
+    }
 
-  if (r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
+  if (bb && r.varying_p () && m_non_null.non_null_deref_p (name, bb, false) &&
       !cfun->can_throw_non_call_exceptions)
     r = range_nonzero (TREE_TYPE (name));
 }
@@ -853,12 +861,10 @@ ranger_cache::entry_range (irange &r, tree name, basic_block bb)
     }
 
   // Look for the on-entry value of name in BB from the cache.
+  // Otherwise pick up the best available global value.
   if (!m_on_entry.get_bb_range (r, name, bb))
-    {
-      // Try to pick up any known global value.
-      if (!m_globals.get_global_range (r, name))
-	r = gimple_range_global (name);
-    }
+    range_of_def (r, name);
+
   // Check if pointers have any non-null dereferences.  Non-call
   // exceptions mean we could throw in the middle of the block, so just
   // punt for now on those.
diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index e67af68f30b..04150ea54a8 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -115,7 +115,7 @@ private:
   void fill_block_cache (tree name, basic_block bb, basic_block def_bb);
   void propagate_cache (tree name);
 
-  void range_of_def (irange &r, tree name, basic_block bb);
+  void range_of_def (irange &r, tree name, basic_block bb = NULL);
   void entry_range (irange &r, tree expr, basic_block bb);
   void exit_range (irange &r, tree expr, basic_block bb);
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] tree-optimization/101014 - Remove poor value computations.
  2021-06-18 21:46 [PATCH] tree-optimization/101014 - Remove poor value computations Andrew MacLeod
@ 2021-06-19  7:51 ` Richard Biener
  2021-06-21 12:58   ` Andrew MacLeod
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2021-06-19  7:51 UTC (permalink / raw)
  To: Andrew MacLeod, gcc-patches

On June 18, 2021 11:46:08 PM GMT+02:00, Andrew MacLeod <amacleod@redhat.com> wrote:
>I am pleased to say that this patch kills the poor value computations
>in 
>the ranger's cache.
>
>Its been a bit of a thorn, and is mostly a hack that was applied early 
>on to enable getting some opportunities that were hard to get
>otherwise.
>
>The more consistent propagation we now do combined with other changes 
>means I can kill this wart on trunk. It even results in a 1% speedup.. 
>and should resolve some of the excessive compile time issues causes by 
>undesirable iteration, including 101014.. for good I hope :-).
>
>I tried turning off the poor_value computations on the GCC11 branch,
>and 
>we may want to consider doing it there too.  In my testsuite, we miss a
>
>total of 3 cases out of 4700 new ones identified by ranger.  For the 
>stability, I'd suggest we turn off poor_value computations there as 
>well.  This patch rips out all the code, but for GCC11 I'd just change 
>push_poor_value to always return false, thus never registering any 
>values. less churn that way. I'll run some tests and post that 
>separately if you think we should go ahead with it.
>
>Bootstraps on 86_64-pc-linux-gnu with no regressions.  pushed.

Nice. I think we should indeed consider mostly syncing the algorithmic changes with GCC 11 to make maintenance easier, at least up to 11.2. Now, please leave such changes some time to bake on trunk before backporting. 

Thanks, 
Richard. 

>Andrew
>
>The details:
>
>The cache is designed to propagate info without creating NEW info.  Ie,
>
>It cannot query new ranges, cannot set global ranges, etc... only the 
>higher level ranger can do that.  This is how we avoid cycles when 
>iterating..  Ranger says "This set this value to X here" , and the 
>cache's job is to propagate that info around the CFG as needed,
>applying 
>static GORI outgoing ranges along the way.   Only the ranger can
>request 
>/set NEW information.
>
>There were some cases where back edges were missing key bits of info 
>that hadn't been created yet.  The "poor value" approach was a stop-gap
>
>measure until things improve. When the cache is trying to propagate a 
>range, the GORI edge computations sometimes wants a value which is not 
>available.  Under some conditions it is allowed to register this as a 
>"poor value" and continue propagating.  When done, it looks at the poor
>
>value list, and asks the ranger to "go get a new value for this".  If  
>the ranger finds a better value, then this new value is propagated 
>around.  So its a bit of a cheat from the original design. The ranger
>is 
>still the only new-info creator, but the request is sometimes started 
>from the cache. This is not desirable, and can lead to some 
>inconsistencies & inefficiencies.
>
>As one can imagine. this sometimes causes significant iteration, as in 
>this testcase.  Ie, the new ranger request from the cache triggers 
>another poor value request, etc etc.  Which is why it wasn't designed
>to 
>work that way.  Anyway, longer story shorter, I revisited the poor
>value 
>code, and discovered that with the new GORI and fold_using_range rework
>
>from the past month, dropping the poor value code results in *0* 
>difference in any of our test cases/suites.   Further more, the biggest
>
>thing that it really enabled was picking up range restrictions imposed 
>by unvisited statements that were being forced to VARYING.  The new 
>rework allows such statements to simply be folded using the new 
>global_range_query and even so that bit of info is now easily
>captured.  
>That's the enhancement from the second patch.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] tree-optimization/101014 - Remove poor value computations.
  2021-06-19  7:51 ` Richard Biener
@ 2021-06-21 12:58   ` Andrew MacLeod
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew MacLeod @ 2021-06-21 12:58 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

On 6/19/21 3:51 AM, Richard Biener wrote:
> On June 18, 2021 11:46:08 PM GMT+02:00, Andrew MacLeod <amacleod@redhat.com> wrote:
>> I am pleased to say that this patch kills the poor value computations
>> in
>> the ranger's cache.
>>
>> Its been a bit of a thorn, and is mostly a hack that was applied early
>> on to enable getting some opportunities that were hard to get
>> otherwise.
>>
>> The more consistent propagation we now do combined with other changes
>> means I can kill this wart on trunk. It even results in a 1% speedup..
>> and should resolve some of the excessive compile time issues causes by
>> undesirable iteration, including 101014.. for good I hope :-).
>>
>> I tried turning off the poor_value computations on the GCC11 branch,
>> and
>> we may want to consider doing it there too.  In my testsuite, we miss a
>>
>> total of 3 cases out of 4700 new ones identified by ranger.  For the
>> stability, I'd suggest we turn off poor_value computations there as
>> well.  This patch rips out all the code, but for GCC11 I'd just change
>> push_poor_value to always return false, thus never registering any
>> values. less churn that way. I'll run some tests and post that
>> separately if you think we should go ahead with it.
>>
>> Bootstraps on 86_64-pc-linux-gnu with no regressions.  pushed.
> Nice. I think we should indeed consider mostly syncing the algorithmic changes with GCC 11 to make maintenance easier, at least up to 11.2. Now, please leave such changes some time to bake on trunk before backporting.
>
> Thanks,
> Richard.
>
For sure.  Im accumulating the gcc11 patches, and will hold them for a 
bit yet.

Andrew


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-21 12:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-18 21:46 [PATCH] tree-optimization/101014 - Remove poor value computations Andrew MacLeod
2021-06-19  7:51 ` Richard Biener
2021-06-21 12:58   ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).