[PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
@ 2022-08-09 13:01 Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-09 13:01 UTC (permalink / raw)
  To: gcc-patches

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
    x = 1;
  else
    x = 5;
  if (x == 1)
    ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
path_range_query::ssa_range_in_phi fix, now re-running with.

OK?

Thanks,
Richard.

	PR tree-optimization/106514
	* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
	Compute and unwind both m_imports and interesting on the fly during
	path discovery.
	(back_threader::find_paths): Compute the original m_imports
	from just the SSA uses of the exit conditional.  Drop
	handling single_succ_to_potentially_threadable_block.
	* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
	constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
---
 gcc/gimple-range-path.cc       |  52 ++++++++---------
 gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
 2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..b4376011ea8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@ void
 path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
 {
   tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
 
   if (at_entry ())
     {
@@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
+      unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
@@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       return;
     }
 
+  basic_block bb = gimple_bb (phi);
   basic_block prev = prev_bb ();
   edge e_in = find_edge (prev, bb);
-
-  for (size_t i = 0; i < nargs; ++i)
-    if (e_in == gimple_phi_arg_edge (phi, i))
-      {
-	tree arg = gimple_phi_arg_def (phi, i);
-	// Avoid using the cache for ARGs defined in this block, as
-	// that could create an ordering problem.
-	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
-	  {
-	    if (m_resolve)
-	      {
-		Value_Range tmp (TREE_TYPE (name));
-		// Using both the range on entry to the path, and the
-		// range on this edge yields significantly better
-		// results.
-		if (defined_outside_path (arg))
-		  range_on_path_entry (r, arg);
-		else
-		  r.set_varying (TREE_TYPE (name));
-		m_ranger->range_on_edge (tmp, e_in, arg);
-		r.intersect (tmp);
-		return;
-	      }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
+  // Avoid using the cache for ARGs defined in this block, as
+  // that could create an ordering problem.
+  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
+    {
+      if (m_resolve)
+	{
+	  Value_Range tmp (TREE_TYPE (name));
+	  // Using both the range on entry to the path, and the
+	  // range on this edge yields significantly better
+	  // results.
+	  if (TREE_CODE (arg) == SSA_NAME
+	      && defined_outside_path (arg))
+	    range_on_path_entry (r, arg);
+	  else
 	    r.set_varying (TREE_TYPE (name));
-	  }
-	return;
-      }
-  gcc_unreachable ();
+	  m_ranger->range_on_edge (tmp, e_in, arg);
+	  r.intersect (tmp);
+	  return;
+	}
+      r.set_varying (TREE_TYPE (name));
+    }
 }
 
 // If NAME is defined in BB, set R to the range of NAME, and return
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 4cd4d21c712..a544e97b2af 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
     {
       // For further greedy searching we want to remove interesting
       // names defined in BB but add ones on the PHI edges for the
-      // respective edges.  We do this by starting with all names
+      // respective edges and adding imports from those stmts.
+      // We do this by starting with all names
       // not defined in BB as interesting, collecting a list of
       // interesting PHIs in BB on the fly.  Then we iterate over
       // predecessor edges, adding interesting PHI edge defs to
       // the set of interesting names to consider when processing it.
       auto_bitmap new_interesting;
+      auto_vec<int, 16> new_imports;
       auto_vec<gphi *, 4> interesting_phis;
       bitmap_iterator bi;
       unsigned i;
+      auto_vec<tree, 16> worklist;
       EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
 	{
 	  tree name = ssa_name (i);
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	  /* Imports remain interesting.  */
 	  if (gimple_bb (def_stmt) != bb)
-	    bitmap_set_bit (new_interesting, i);
-	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
 	    {
-	      tree res = gimple_phi_result (phi);
-	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
-		interesting_phis.safe_push (phi);
+	      bitmap_set_bit (new_interesting, i);
+	      continue;
+	    }
+	  worklist.quick_push (name);
+	  while (!worklist.is_empty ())
+	    {
+	      tree name = worklist.pop ();
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	      /* Newly discovered imports are interesting.  */
+	      if (gimple_bb (def_stmt) != bb)
+		{
+		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
+		  continue;
+		}
+	      /* Local PHIs participate in renaming below.  */
+	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
+		{
+		  tree res = gimple_phi_result (phi);
+		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
+		    interesting_phis.safe_push (phi);
+		}
+	      /* For other local defs process their uses, amending
+		 imports on the way.  */
+	      else if (is_gimple_assign (def_stmt))
+		{
+		  for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
+		    {
+		      tree rhs = gimple_op (def_stmt, j);
+		      if (TREE_CODE (rhs) == SSA_NAME
+			  && bitmap_set_bit (m_imports,
+					     SSA_NAME_VERSION (rhs)))
+			{
+			  /* ???  We probably want to track a 'visited'
+			     flag separately and add to imports only
+			     when the def is handled by ranger.  The
+			     current way adds us defs that are neither
+			     PHIs nor "interesting" assigns, like for
+			     example loads.  */
+			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
+			  worklist.safe_push (rhs);
+			}
+		    }
+		}
 	    }
 	}
+      worklist.release ();
       if (!bitmap_empty_p (new_interesting)
 	  || !interesting_phis.is_empty ())
 	{
-	  auto_vec<tree, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
 	  edge_iterator iter;
 	  edge e;
 	  FOR_EACH_EDGE (e, iter, bb->preds)
@@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
 		{
 		  tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
 		  if (TREE_CODE (def) == SSA_NAME)
-		    if (bitmap_set_bit (new_interesting,
-					SSA_NAME_VERSION (def)))
-		      {
-			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
-			unwind.quick_push (def);
-		      }
+		    {
+		      int ver = SSA_NAME_VERSION (def);
+		      if (bitmap_set_bit (new_interesting, ver))
+			{
+			  if (bitmap_set_bit (m_imports, ver))
+			    imports_unwind.quick_push (ver);
+			  unwind.quick_push (ver);
+			}
+		    }
 		}
 	      find_paths_to_names (e->src, new_interesting, overall_paths);
-	      // Restore new_interesting.  We leave m_imports alone since
-	      // we do not prune defs in BB from it and separately keeping
-	      // track of which bits to unwind isn't worth the trouble.
-	      for (tree def : unwind)
-		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
+	      // Restore new_interesting.
+	      for (int def : unwind)
+		bitmap_clear_bit (new_interesting, def);
 	      unwind.truncate (0);
+	      // Restore and m_imports.
+	      for (int def : imports_unwind)
+		bitmap_clear_bit (m_imports, def);
+	      imports_unwind.truncate (0);
 	    }
 	}
+      /* m_imports tracks all interesting names on the path, so when
+	 backtracking we have to restore it.  */
+      for (int j : new_imports)
+	bitmap_clear_bit (m_imports, j);
     }
   else if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
@@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
 	  && gimple_code (stmt) != GIMPLE_SWITCH))
     return;
 
-  if (EDGE_COUNT (bb->succs) > 1
-      || single_succ_to_potentially_threadable_block (bb))
+  if (EDGE_COUNT (bb->succs) > 1)
     {
       m_last_stmt = stmt;
       m_visited_bbs.empty ();
       m_path.truncate (0);
       m_name = name;
-      m_solver->compute_imports (m_imports, bb);
 
+      // We compute imports of the path during discovery starting
+      // just with names used in the conditional.
+      bitmap_clear (m_imports);
+      ssa_op_iter iter;
+      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
+	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+
+      // Interesting is the set of imports we still not have see
+      // the definition of.  So while imports only grow, the
+      // set of interesting defs dwindles and once empty we can
+      // stop searching.
       auto_bitmap interesting;
       bitmap_copy (interesting, m_imports);
       find_paths_to_names (bb, interesting, 1);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-11 17:46       ` Andrew MacLeod
@ 2022-08-15  9:59         ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-15  9:59 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, aldyh

On Thu, 11 Aug 2022, Andrew MacLeod wrote:

> 
> On 8/10/22 06:46, Richard Biener wrote:
> >
> > I see the solver itself adds relations from edges on the path so
> > the cruical item here seems to be to add imports for the path
> > entry conditional, but those would likely be GORI imports for that
> > block?  Unfortunately that fails to add t[012], the GORI exports
> > seem to cover all that's needed but then exports might be too much
> 
> the GORI export list is the list of names which which GORi thinks might be
> different on the outgoing edges from a block.
> 
> The GORI import list is the list of names which GORI thinks can affect those
> ranges from outside the block.  It doesnt try to look at individual PHI
> aguments, so IIRC it treats a PHI def as originating outside the block for
> import purposes.  This should be a subset of the export list.

Ah - so for the exit block of the path (the block with the conditional
we want to simplify in the threader) using the GORI exports list as
m_imports for the path query (the set of interesting names to compute
ranges for) would be correct then.  Currently 
path_range_query::compute_imports starts with

  // Start with the imports from the exit block...
  basic_block exit = path[0];
  gori_compute &gori = m_ranger->gori ();
  bitmap r_imports = gori.imports (exit);
  bitmap_copy (imports, r_imports);

so that should use gori.exports (exit) I think.  For blocks further up
the path we cannot easily re-use the GORI imports or exports since those
would concern only with names related to the exit conditional of those
blocks, not of the conditional of the exit block on the path as we are
interested in.  So the special computation in the range path query
and the threader seems OK.

> GORI info is only every concerned ONLY with the basic block... all external
> influences are considrered to be in the import list. And all GORI calculations
> utilize a range_of_expr query from a range_query to resolve the incoming value
> of an import in order to calculate an outgoing edge.
> 
> outgoing_edge_range_p () has some additional smarts in it which allows for
> recomputations. You may have noticed the depend1/depend2 fields in the
> range_def_chain structure.  These are quick and dirty caches which represent
> the first 2 ssa-names encountered on the stmt when it was processed.  so for
> PHIs, the first 2 ssa names encountered, and for simple range-op qualifying
> stmts, the 1 or 2 ssanames in the def stmt for an ssa-name.
> 
> If you ask for the range of a_32 on an outgoing edge, and it is not in the
> export list, GORI would not be able to provide a range. Outgoing_edge_range_p
> utilizes those 2 depend fields as a quick check to see if a recomputation may
> give us a better range.  if either depend1 or depend2 asociated with a_32 IS i
> the export list, then it performs a recompuation of a_32 instead of a GORI
> calculation.  ie  from the example in my previous note:
> 
>   a_32 = f_16 + 10
> <...>
> bb88:
>   if (f_16 < 20)
> bb89:
>      b_8 = a_32 + 8
> 
> depend1 for a_32 will be f_16.
> during rangers evaluation of b_8 in bb89, it will ask for the range of a_32 on
> the edge 88->89.  a_32 is not an export, but it sees that depend1 (f_16) is in
> the export list, so it does a recalculation of a_32 on the edge, coming up
> with f_16 being [0, 19] and returning a_32 as [10, 29].
> 
> This is why you may see outgoing_edge_range_p coming up with things that GORI
> itself doesnt actually provide...

I see ;)

> your example:
> 
> void foo (int nest, int print_nest)
> {
>   _Bool t0 = nest != 0;
>   _Bool t1 = nest == print_nest;
>   _Bool t2 = t0 & t1;
>   if (t2)
>     __builtin_puts ("x");
>   nest++;
>   if (nest > 2)
>     __builtin_abort ();
>   if (print_nest == nest)
>     __builtin_puts ("y");
> }
> 
> 
> if (nest > 2) , : nest is the only import and export in that block, but
> outgoing_edge_range_p would be able to recompute t1 and t0 on those edges as
> nest is used in defining them. Likewise, print_nest is used in defining t1, so
> it can also be recomputed. on the final edges.     This could be why adding
> some dependencies from outside the block sometimes gets a better result.
> 
> the recomputation information is not rolled into the GORI exports, as it is
> more dynamic.  we don't know which order this will be seen in, so we dont know
> what ssa_names in advance use 'nest' in their calculations.   we could add
> them as they are seen, but then the bitmaps could explode in size.. so leaving
> it in this dynamic quick check seemed more practical.
> 
> There is no mapping from a name to the other ssanames that depend on it in
> either..  ie, given 'nest', there is no mechaism to see that t0 and t1 use
> it.   I had considered building this list as well, but at the moment didnt
> have a lot of use for it and thought a use map might get large quickly.
> 
> I wonder if what might help is to loop thru all the interesting names that
> have been calculated, and adding  the GORI export names from the original
> block which occur in either a depend1 or depend2 field of those names..   That
> would be more concise than the entire gori export list. if it is just the
> entry path exports that are relevant, then perhaps as names are added to the
> interesting list we could check if either of its 'depend's fields is in the
> entry blocks export list, and add it then.
> 
> 
> > here?  At least that's what the code in compute_imports effectively
> > does (also for the entry block).  But I do wonder why compute_ranges
> > does not add relations computed by the entry conditional ...
> > That's probably because of
> >
> > void
> > path_range_query::compute_outgoing_relations (basic_block bb, basic_block
> > next)
> > {
> >    gimple *stmt = last_stmt (bb);
> >
> >    if (stmt
> >        && gimple_code (stmt) == GIMPLE_COND
> >        && (import_p (gimple_cond_lhs (stmt))
> >            || import_p (gimple_cond_rhs (stmt))))
> >
> > where for a condition like above the names in the imports are not
> > in the condition itself.  The gori.outgoing_edge_range_p compute
> > of the import ranges is appearantly not affected by this
> > restriction and picks up nest as ~[0, 0] on the respective path
> > even though, while 'nest' is in imports, the temporaries are not.
> > That would support the argument to drop the import_p checks in
> > path_range_query::compute_outgoing_relations.
> >
> > I'm not sure it's worth fixing incrementally though, it's fallout
> > of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
> > my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
> > still computes imports itself before calling compute_ranges.
> >
> > For comparing thread numbers fixing the current state incrementally
> > would be nice I guess, I will test something but not further pursue it
> > if not successful.
> >
> > Richard.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-10 10:46     ` Richard Biener
  2022-08-10 16:08       ` Aldy Hernandez
@ 2022-08-11 17:46       ` Andrew MacLeod
  2022-08-15  9:59         ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew MacLeod @ 2022-08-11 17:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, aldyh

On 8/10/22 06:46, Richard Biener wrote:
>
> I see the solver itself adds relations from edges on the path so
> the cruical item here seems to be to add imports for the path
> entry conditional, but those would likely be GORI imports for that
> block?  Unfortunately that fails to add t[012], the GORI exports
> seem to cover all that's needed but then exports might be too much

the GORI export list is the list of names which which GORi thinks might 
be different on the outgoing edges from a block.

The GORI import list is the list of names which GORI thinks can affect 
those ranges from outside the block.  It doesnt try to look at 
individual PHI aguments, so IIRC it treats a PHI def as originating 
outside the block for import purposes.  This should be a subset of the 
export list.

GORI info is only every concerned ONLY with the basic block... all 
external influences are considrered to be in the import list. And all 
GORI calculations utilize a range_of_expr query from a range_query to 
resolve the incoming value of an import in order to calculate an 
outgoing edge.

outgoing_edge_range_p () has some additional smarts in it which allows 
for recomputations. You may have noticed the depend1/depend2 fields in 
the range_def_chain structure.  These are quick and dirty caches which 
represent the first 2 ssa-names encountered on the stmt when it was 
processed.  so for PHIs, the first 2 ssa names encountered, and for 
simple range-op qualifying stmts, the 1 or 2 ssanames in the def stmt 
for an ssa-name.

If you ask for the range of a_32 on an outgoing edge, and it is not in 
the export list, GORI would not be able to provide a range. 
Outgoing_edge_range_p utilizes those 2 depend fields as a quick check to 
see if a recomputation may give us a better range.  if either depend1 or 
depend2 asociated with a_32 IS i the export list, then it performs a 
recompuation of a_32 instead of a GORI calculation.  ie  from the 
example in my previous note:

   a_32 = f_16 + 10
<...>
bb88:
   if (f_16 < 20)
bb89:
      b_8 = a_32 + 8

depend1 for a_32 will be f_16.
during rangers evaluation of b_8 in bb89, it will ask for the range of 
a_32 on the edge 88->89.  a_32 is not an export, but it sees that 
depend1 (f_16) is in the export list, so it does a recalculation of a_32 
on the edge, coming up with f_16 being [0, 19] and returning a_32 as 
[10, 29].

This is why you may see outgoing_edge_range_p coming up with things that 
GORI itself doesnt actually provide...

your example:

void foo (int nest, int print_nest)
{
   _Bool t0 = nest != 0;
   _Bool t1 = nest == print_nest;
   _Bool t2 = t0 & t1;
   if (t2)
     __builtin_puts ("x");
   nest++;
   if (nest > 2)
     __builtin_abort ();
   if (print_nest == nest)
     __builtin_puts ("y");
}

if (nest > 2) , : nest is the only import and export in that block, but 
outgoing_edge_range_p would be able to recompute t1 and t0 on those 
edges as nest is used in defining them. Likewise, print_nest is used in 
defining t1, so it can also be recomputed. on the final edges.     This 
could be why adding some dependencies from outside the block sometimes 
gets a better result.

the recomputation information is not rolled into the GORI exports, as it 
is more dynamic.  we don't know which order this will be seen in, so we 
dont know what ssa_names in advance use 'nest' in their calculations.   
we could add them as they are seen, but then the bitmaps could explode 
in size.. so leaving it in this dynamic quick check seemed more practical.

There is no mapping from a name to the other ssanames that depend on it 
in either..  ie, given 'nest', there is no mechaism to see that t0 and 
t1 use it.   I had considered building this list as well, but at the 
moment didnt have a lot of use for it and thought a use map might get 
large quickly.

I wonder if what might help is to loop thru all the interesting names 
that have been calculated, and adding  the GORI export names from the 
original block which occur in either a depend1 or depend2 field of those 
names..   That would be more concise than the entire gori export list. 
if it is just the entry path exports that are relevant, then perhaps as 
names are added to the interesting list we could check if either of its 
'depend's fields is in the entry blocks export list, and add it then.

> here?  At least that's what the code in compute_imports effectively
> does (also for the entry block).  But I do wonder why compute_ranges
> does not add relations computed by the entry conditional ...
> That's probably because of
>
> void
> path_range_query::compute_outgoing_relations (basic_block bb, basic_block
> next)
> {
>    gimple *stmt = last_stmt (bb);
>
>    if (stmt
>        && gimple_code (stmt) == GIMPLE_COND
>        && (import_p (gimple_cond_lhs (stmt))
>            || import_p (gimple_cond_rhs (stmt))))
>
> where for a condition like above the names in the imports are not
> in the condition itself.  The gori.outgoing_edge_range_p compute
> of the import ranges is appearantly not affected by this
> restriction and picks up nest as ~[0, 0] on the respective path
> even though, while 'nest' is in imports, the temporaries are not.
> That would support the argument to drop the import_p checks in
> path_range_query::compute_outgoing_relations.
>
> I'm not sure it's worth fixing incrementally though, it's fallout
> of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
> my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
> still computes imports itself before calling compute_ranges.
>
> For comparing thread numbers fixing the current state incrementally
> would be nice I guess, I will test something but not further pursue it
> if not successful.
>
> Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-10 10:46     ` Richard Biener
@ 2022-08-10 16:08       ` Aldy Hernandez
  2022-08-11 17:46       ` Andrew MacLeod
  1 sibling, 0 replies; 10+ messages in thread
From: Aldy Hernandez @ 2022-08-10 16:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew MacLeod, gcc-patches

On Wed, Aug 10, 2022 at 12:46 PM Richard Biener <rguenther@suse.de> wrote:
>
> On Wed, 10 Aug 2022, Richard Biener wrote:
>
> > On Tue, 9 Aug 2022, Andrew MacLeod wrote:
> >
> > >
> > > On 8/9/22 09:01, Richard Biener wrote:
> > > > This revisits how we compute imports later used for the ranger path
> > > > query during backwards threading.  The compute_imports function
> > > > of the path solver ends up pulling the SSA def chain of regular
> > > > stmts without limit and since it starts with just the gori imports
> > > > of the path exit it misses some interesting names to translate
> > > > during path discovery.  In fact with a still empty path this
> > > > compute_imports function looks like not the correct tool.
> > >
> > > I don't really know how this works in practice.  Aldys off this week, so he
> > > can comment when he returns.
> > >
> > > The original premise was along the line of recognizing that only changes to a
> > > GORI import name to a block can affect the branch at the end of the block.
> > > ie, if the path doesn't change any import to block A, then the branch at the
> > > end of block A will not change either.    Likewise, if it does change an
> > > import, then we look at whether the branch can be threaded.    Beyond that
> > > basic premise, I dont know what all it does.
> >
> > Yep, I also think that's the idea.
>
> [...]
>
> > Anyway, the important result of the change is that the imports
> > set is vastly smaller since it is now constrained to the
> > actual path.  Plus it now contains the local defs in blocks
> > of the path that do not dominate the exit block which means
> > it might get more threading opportunities.  Which reminds me
> > to re-do the cc1files experiment for this change.
>
> Results are a bit unconclusive.  We have
> ethread 14044 -> 14058, first threadfull 36986 -> 37174,
> first thread 8060 -> 8049, dom 21723 -> 21822, second thread
> (after loop opts) 6242 -> 5582, second dom 8522 -> 8765,
> second threadfull 3072 -> 2998 which makes an overall drop
> in the number of threads from 98649 to 98448.

The threading dance between all passes is a bit fickle as you can see.
Particularly, DOM is a red herring.  If it starts firing up, as it
looks like from your numbers, it's usually because we regressed in the
backward threaders and DOM is picking up the slack.  Though a slightly
increase in DOM threading can be because we opened up more
opportunities for DOM because of better threading in the backward
threaders.  Sometimes it helps to run without DOM (or disable DOM
threading) to compare apples with apples...being careful of course,
that you're not dropping a bunch of threads in thread2 for instance,
and picking them up in threadfull2 or whatever.

Ughhh...I really hate that we have 20 million threading passes.

>
> I've isolated one testcase we threaded before but no longer after
> this change and that is
>
> void foo (int nest, int print_nest)
> {
>   _Bool t0 = nest != 0;
>   _Bool t1 = nest == print_nest;
>   _Bool t2 = t0 & t1;
>   if (t2)
>     __builtin_puts ("x");
>   nest++;
>   if (nest > 2)
>     __builtin_abort ();
>   if (print_nest == nest)
>     __builtin_puts ("y");
> }
>
> where we are able to thread from if (t2) to if (print_nest == nest)
> resolving that to false when t2 is true using the nest == print_nest
> relation.  Now, the reason is because of the imports added by
>
>   // Exported booleans along the path, may help conditionals.
>   if (m_resolve)
>     for (i = 0; i < m_path.length (); ++i)
>       {
>         basic_block bb = m_path[i];
>         tree name;
>         FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
>           if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
>             bitmap_set_bit (imports, SSA_NAME_VERSION (name));
>       }
>
> _BUT_ - it turns out that the m_path local to the solver is still
> the one from the last back_threader::find_taken_edge_switch
> calling m_solver->compute_ranges (path, m_imports), so for a possibly
> completely unrelated path.  Truncating the path also on the solver
> side before calling compute_imports makes the testcase fail to thread.
> I'll note the PHI handling also looks at the stale m_path.

Not good.  Thanks for noticing.

>
> I see the solver itself adds relations from edges on the path so
> the cruical item here seems to be to add imports for the path
> entry conditional, but those would likely be GORI imports for that
> block?  Unfortunately that fails to add t[012], the GORI exports
> seem to cover all that's needed but then exports might be too much
> here?  At least that's what the code in compute_imports effectively
> does (also for the entry block).  But I do wonder why compute_ranges
> does not add relations computed by the entry conditional ...
> That's probably because of
>
> void
> path_range_query::compute_outgoing_relations (basic_block bb, basic_block
> next)
> {
>   gimple *stmt = last_stmt (bb);
>
>   if (stmt
>       && gimple_code (stmt) == GIMPLE_COND
>       && (import_p (gimple_cond_lhs (stmt))
>           || import_p (gimple_cond_rhs (stmt))))
>
> where for a condition like above the names in the imports are not
> in the condition itself.  The gori.outgoing_edge_range_p compute
> of the import ranges is appearantly not affected by this
> restriction and picks up nest as ~[0, 0] on the respective path
> even though, while 'nest' is in imports, the temporaries are not.
> That would support the argument to drop the import_p checks in
> path_range_query::compute_outgoing_relations.

Perhaps Andrew can better answer your questions on  GORI imports /
exports here and elsewhere.

>
> I'm not sure it's worth fixing incrementally though, it's fallout
> of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
> my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
> still computes imports itself before calling compute_ranges.

Hmmm, compute_ranges_from_state(), which is the interface from the
forward threader into the path solver probably shouldn't be rolling
its own.  It looks like a poor man's calculation of imports.  It's
just the GORI imports to the final conditional and the SSA operands to
the final conditional (in case GORI didn't have it in the list of
imports??).  This may have been there originally because the
calculation of imports was in the backward threader, and the forward
threader was doing its own thing before calling the path solver.

ISTM, both the forward and backward threader should use
path_range_query::compute_imports.  In practice, it probably didn't
matter because the forward threader couldn't feed the path solver long
or complex enough paths for it to make a difference in the thread
count.

Aldy

>
> For comparing thread numbers fixing the current state incrementally
> would be nice I guess, I will test something but not further pursue it
> if not successful.
>
> Richard.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-10  6:45   ` Richard Biener
  2022-08-10 10:46     ` Richard Biener
@ 2022-08-10 15:42     ` Aldy Hernandez
  1 sibling, 0 replies; 10+ messages in thread
From: Aldy Hernandez @ 2022-08-10 15:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: Andrew MacLeod, gcc-patches

I'm still on vacation, but I figured I'd mention a few things to
either unblock you, or move the conversation along.

On Wed, Aug 10, 2022 at 8:45 AM Richard Biener <rguenther@suse.de> wrote:
>
> On Tue, 9 Aug 2022, Andrew MacLeod wrote:
>
> >
> > On 8/9/22 09:01, Richard Biener wrote:
> > > This revisits how we compute imports later used for the ranger path
> > > query during backwards threading.  The compute_imports function
> > > of the path solver ends up pulling the SSA def chain of regular
> > > stmts without limit and since it starts with just the gori imports
> > > of the path exit it misses some interesting names to translate
> > > during path discovery.  In fact with a still empty path this
> > > compute_imports function looks like not the correct tool.
> >
> > I don't really know how this works in practice.  Aldys off this week, so he
> > can comment when he returns.
> >
> > The original premise was along the line of recognizing that only changes to a
> > GORI import name to a block can affect the branch at the end of the block.
> > ie, if the path doesn't change any import to block A, then the branch at the
> > end of block A will not change either.    Likewise, if it does change an
> > import, then we look at whether the branch can be threaded.    Beyond that
> > basic premise, I dont know what all it does.
>
> Yep, I also think that's the idea.

Yes.

>
> > I presume the unbounded def chain is for local defs within a block that in
> > turn feeds the import to another block.   Im not sure why we need to do much
> > with those..  again, its only the import to the defchain that can affect the
> > outcome t the end of the chain.. and if it changes, then you need to
> > recalculate the entire chain.. but that would be part of the normal path
> > walk.  I suspect ther eis also some pruning that can be done there, as GORi
> > reflects "can affect the range" not "will affect the range".
> >
> > Perhaps whats going on is that all those local elements are being added up
> > front to the list of interesting names?  That would certainly blow up the
> > bitmaps and loops and such.
>
> What it does is, if we have
>
> bb:
>   _3 = _5 + 1;
>   _1 = _3 + _4;
>   if (_1 > _2)
>
> it puts _3 and _4 and _5 into the set of interesting names.  That's
> OK and desired I think?  The actual problem is that compute_imports
> will follow the def of _5 and _4 into dominating blocks recursively,
> adding things to the imports even if the definition blocks are not
> on the path (the path is empty at the point we call compute_imports).

That sounds like a bug.  We shouldn't recurse unbounded outside of the
path.  For that matter, for anything outside the path, we should be
querying the ranger, which can cache things appropriately.  The intent
was that for anything queried outside of the path, we should be using
the ranger through range_on_path_entry.

> For the testcase at hand this pulls in some 1000s of names into the
> initial set of imports.  Now, the current path discovery code
> only adds to imports by means of PHI translating from a PHI
> def to a PHI use on the path edge - it doesn't add any further
> local names used to define such PHI edge use from blocks not
> dominating the path exit (but on the about to be threaded path).
>
> I'll also note that compute_imports does
>
>   // Exported booleans along the path, may help conditionals.
>   if (m_resolve)
>     for (i = 0; i < m_path.length (); ++i)
>       {
>         basic_block bb = m_path[i];
>         tree name;
>         FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
>           if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
>             bitmap_set_bit (imports, SSA_NAME_VERSION (name));
>       }
>
> but at least for the backwards threader this isn't effective
> since the path is empty at the point we call this.  And no
> other code in the backwards threader does sth like this.

IIRC, this came about to help with statements leading up to the
condition.  See fur_source::register_outgoing_edges:

  // Now look for other relations in the exports.  This will find stmts
  // leading to the condition such as:
  // c_2 = a_4 < b_7
  // if (c_2)
  FOR_EACH_GORI_EXPORT_NAME (*(gori ()), bb, name)
    {
      if (TREE_CODE (TREE_TYPE (name)) != BOOLEAN_TYPE)
    continue;

At least back when I wrote it, it made a difference in the number of
threads we got.

> I would also say for the exit block it doesn't make
> much sense to look at gori exports.  Plus I don't understand
> what this tries to capture - it presumably catches stmts
> like _1 = _2 == _3 that have uses in downstream blocks but
> might be not directly part of the conditional op def chain.
>
> In the end all the threader does is, once the path is complete,
> compute ranges for the path and the imports and then fold the
> path exit stmt.
>
> I'm not sure which names we need in the import set for this
> but as I guess the set of imports constrain the names we
> compute ranges for so for the BB above, if we want to have
> a range of _1 we need _3 in the imports set even though it
> is not in the set of GORI imports?
>
> What I'm seeing with the threader is that we have quite some
> cases where we have an "unrelated" CFG diamond on the path but
> end up threading a random path through it (because neither the
> path query infrastructure nor the copier knows to copy the
> whole diamond).  When the def chain of the operands on the

Hmmm, this is unrelated, but we had talked about constraining the
backward threader to notice that we had gone back up to the beginning
of a diamond, and there was no sense going up the other arm to the top
of the diamond.  So something like this:

  A
 /    \
L    R
 \    /
   B

If we had gone up B<-R-<-A, we should have a way to notice that
anything before B<-L<-A would yield the same result, and only recurse
back on one side.  Alas, stage3 came upon us, and besides, I wasn't
convinced that calculating all that would be worth it.

> controlling conditions are not in the import set then I
> suppose the path range computation doesn't do anything to
> simplify things there (we do after all specify a condition
> result by means of arbitrarily choosing one of the paths).
> Such diamonds are the main source of exponential behavior
> of the path discovery and I'm thinking of ways to improve
> things here.  The original backwards threader, supposedly
> due to a bug, only ever considered one path for continuing
> beyond a diamond - but maybe that was on purpose because
> the intermediate condition doesn't meaninfully contribute
> to resolving the exit condition.
>
> Anyway, the important result of the change is that the imports
> set is vastly smaller since it is now constrained to the
> actual path.  Plus it now contains the local defs in blocks

That sounds good.

> of the path that do not dominate the exit block which means
> it might get more threading opportunities.  Which reminds me
> to re-do the cc1files experiment for this change.

Double check threading results at each step/patch.  Things that are
seemingly obvious can have drastic changes in the thread count.

>
> > Im sure Aldy will pitch in when he returns from vacation.
>
> Good, I'll wait for his comments.
>
> Richard.

Overall, I don't want you to be blocked until I return.  You seem to
have a good grasp of what's going on, and if you're careful to check
we're not regressing in the thread count (unless the threads are
unreachable or senseless), we could always do things as a follow-up.
It also helps to have more than oner person knowledgeable in this
area.

Thanks again.
Aldy

>
> >
> >
> > >
> > > The following instead implements what it does during the path discovery
> > > and since we add the exit block we seed the initial imports and
> > > interesting names from just the exit conditional.  When we then
> > > process interesting names (aka imports we did not yet see the definition
> > > of) we prune local defs but add their uses in a similar way as
> > > compute_imports would have done.
> > >
> > > The patch also properly unwinds m_imports during the path discovery
> > > backtracking and from a debugging session I have verified the two
> > > sets evolve as expected now while previously behaving slightly erratic.
> > >
> > > Fortunately the m_imports set now also is shrunken significantly for
> > > the PR69592 testcase (aka PR106514) so that there's overall speedup
> > > when increasing --param max-jump-thread-duplication-stmts as
> > > 15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
> > > 1s -> 2s -> 4s -> 8s.
> > >
> > > This runs into a latent issue in X which doesn't seem to expect
> > > any PHI nodes with a constant argument on an edge inside the path.
> > > But we now have those as interesting, for example for the ICEing
> > > g++.dg/torture/pr100925.C which just has sth like
> > >
> > >    if (i)
> > >      x = 1;
> > >    else
> > >      x = 5;
> > >    if (x == 1)
> > >      ...
> > >
> > > where we now have the path from if (i) to if (x) and the PHI for x
> > > in the set of imports to consider for resolving x == 1 which IMHO
> > > looks exactly like what we want.  The path_range_query::ssa_range_in_phi
> > > papers over the issue and drops the range to varying instead of
> > > crashing.  I didn't want to mess with this any further in this patch
> > > (but I couldn't resist replacing the loop over PHI args with
> > > PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
> > > path_range_query::ssa_range_in_phi fix, now re-running with.
> > >
> > > OK?
> > >
> > > Thanks,
> > > Richard.
> > >
> > >  PR tree-optimization/106514
> > >  * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
> > >  Compute and unwind both m_imports and interesting on the fly during
> > >  path discovery.
> > >  (back_threader::find_paths): Compute the original m_imports
> > >  from just the SSA uses of the exit conditional.  Drop
> > >  handling single_succ_to_potentially_threadable_block.
> > >  * gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
> > >  constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
> > > ---
> > >   gcc/gimple-range-path.cc       |  52 ++++++++---------
> > >   gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
> > >   2 files changed, 106 insertions(+), 50 deletions(-)
> > >
> > > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > > index 43e7526b6fc..b4376011ea8 100644
> > > --- a/gcc/gimple-range-path.cc
> > > +++ b/gcc/gimple-range-path.cc
> > > @@ -276,8 +276,6 @@ void
> > >   path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
> > >   {
> > >     tree name = gimple_phi_result (phi);
> > > -  basic_block bb = gimple_bb (phi);
> > > -  unsigned nargs = gimple_phi_num_args (phi);
> > >
> > >     if (at_entry ())
> > >       {
> > > @@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
> > > *phi)
> > >         // Try to fold the phi exclusively with global or cached values.
> > >         // This will get things like PHI <5(99), 6(88)>.  We do this by
> > >         // calling range_of_expr with no context.
> > > +      unsigned nargs = gimple_phi_num_args (phi);
> > >         Value_Range arg_range (TREE_TYPE (name));
> > >         r.set_undefined ();
> > >         for (size_t i = 0; i < nargs; ++i)
> > > @@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
> > > *phi)
> > >         return;
> > >       }
> > >   +  basic_block bb = gimple_bb (phi);
> > >     basic_block prev = prev_bb ();
> > >     edge e_in = find_edge (prev, bb);
> > > -
> > > -  for (size_t i = 0; i < nargs; ++i)
> > > -    if (e_in == gimple_phi_arg_edge (phi, i))
> > > -      {
> > > -   tree arg = gimple_phi_arg_def (phi, i);
> > > -   // Avoid using the cache for ARGs defined in this block, as
> > > -   // that could create an ordering problem.
> > > -   if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> > > -     {
> > > -       if (m_resolve)
> > > -         {
> > > -           Value_Range tmp (TREE_TYPE (name));
> > > -           // Using both the range on entry to the path, and the
> > > -           // range on this edge yields significantly better
> > > -           // results.
> > > -           if (defined_outside_path (arg))
> > > -             range_on_path_entry (r, arg);
> > > -           else
> > > -             r.set_varying (TREE_TYPE (name));
> > > -           m_ranger->range_on_edge (tmp, e_in, arg);
> > > -           r.intersect (tmp);
> > > -           return;
> > > -         }
> > > +  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
> > > +  // Avoid using the cache for ARGs defined in this block, as
> > > +  // that could create an ordering problem.
> > > +  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> > > +    {
> > > +      if (m_resolve)
> > > +   {
> > > +     Value_Range tmp (TREE_TYPE (name));
> > > +     // Using both the range on entry to the path, and the
> > > +     // range on this edge yields significantly better
> > > +     // results.
> > > +     if (TREE_CODE (arg) == SSA_NAME
> > > +         && defined_outside_path (arg))
> > > +       range_on_path_entry (r, arg);
> > > +     else
> > >        r.set_varying (TREE_TYPE (name));
> > > -     }
> > > -   return;
> > > -      }
> > > -  gcc_unreachable ();
> > > +     m_ranger->range_on_edge (tmp, e_in, arg);
> > > +     r.intersect (tmp);
> > > +     return;
> > > +   }
> > > +      r.set_varying (TREE_TYPE (name));
> > > +    }
> > >   }
> > >
> > >   // If NAME is defined in BB, set R to the range of NAME, and return
> > > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > > index 4cd4d21c712..a544e97b2af 100644
> > > --- a/gcc/tree-ssa-threadbackward.cc
> > > +++ b/gcc/tree-ssa-threadbackward.cc
> > > @@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb,
> > > bitmap interesting,
> > >       {
> > >         // For further greedy searching we want to remove interesting
> > >         // names defined in BB but add ones on the PHI edges for the
> > > -      // respective edges.  We do this by starting with all names
> > > +      // respective edges and adding imports from those stmts.
> > > +      // We do this by starting with all names
> > >         // not defined in BB as interesting, collecting a list of
> > >         // interesting PHIs in BB on the fly.  Then we iterate over
> > >         // predecessor edges, adding interesting PHI edge defs to
> > >         // the set of interesting names to consider when processing it.
> > >         auto_bitmap new_interesting;
> > > +      auto_vec<int, 16> new_imports;
> > >         auto_vec<gphi *, 4> interesting_phis;
> > >         bitmap_iterator bi;
> > >         unsigned i;
> > > +      auto_vec<tree, 16> worklist;
> > >          EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
> > >    {
> > >      tree name = ssa_name (i);
> > >      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> > > +     /* Imports remain interesting.  */
> > >       if (gimple_bb (def_stmt) != bb)
> > > -       bitmap_set_bit (new_interesting, i);
> > > -     else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
> > >        {
> > > -         tree res = gimple_phi_result (phi);
> > > -         if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> > > -           interesting_phis.safe_push (phi);
> > > +         bitmap_set_bit (new_interesting, i);
> > > +         continue;
> > > +       }
> > > +     worklist.quick_push (name);
> > > +     while (!worklist.is_empty ())
> > > +       {
> > > +         tree name = worklist.pop ();
> > > +         gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> > > +         /* Newly discovered imports are interesting.  */
> > > +         if (gimple_bb (def_stmt) != bb)
> > > +           {
> > > +             bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
> > > +             continue;
> > > +           }
> > > +         /* Local PHIs participate in renaming below.  */
> > > +         if (gphi *phi = dyn_cast<gphi *> (def_stmt))
> > > +           {
> > > +             tree res = gimple_phi_result (phi);
> > > +             if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> > > +               interesting_phis.safe_push (phi);
> > > +           }
> > > +         /* For other local defs process their uses, amending
> > > +            imports on the way.  */
> > > +         else if (is_gimple_assign (def_stmt))
> > > +           {
> > > +             for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
> > > +               {
> > > +                 tree rhs = gimple_op (def_stmt, j);
> > > +                 if (TREE_CODE (rhs) == SSA_NAME
> > > +                     && bitmap_set_bit (m_imports,
> > > +                                        SSA_NAME_VERSION (rhs)))
> > > +                   {
> > > +                     /* ???  We probably want to track a 'visited'
> > > +                        flag separately and add to imports only
> > > +                        when the def is handled by ranger.  The
> > > +                        current way adds us defs that are neither
> > > +                        PHIs nor "interesting" assigns, like for
> > > +                        example loads.  */
> > > +                     new_imports.safe_push (SSA_NAME_VERSION (rhs));
> > > +                     worklist.safe_push (rhs);
> > > +                   }
> > > +               }
> > > +           }
> > >        }
> > >     }
> > > +      worklist.release ();
> > >         if (!bitmap_empty_p (new_interesting)
> > >      || !interesting_phis.is_empty ())
> > >     {
> > > -     auto_vec<tree, 4> unwind (interesting_phis.length ());
> > > +     auto_vec<int, 4> unwind (interesting_phis.length ());
> > > +     auto_vec<int, 4> imports_unwind (interesting_phis.length ());
> > >      edge_iterator iter;
> > >      edge e;
> > >      FOR_EACH_EDGE (e, iter, bb->preds)
> > > @@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb,
> > > bitmap interesting,
> > >     {
> > >       tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
> > >       if (TREE_CODE (def) == SSA_NAME)
> > > -               if (bitmap_set_bit (new_interesting,
> > > -                                   SSA_NAME_VERSION (def)))
> > > -                 {
> > > -                   bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
> > > -                   unwind.quick_push (def);
> > > -                 }
> > > +               {
> > > +                 int ver = SSA_NAME_VERSION (def);
> > > +                 if (bitmap_set_bit (new_interesting, ver))
> > > +                   {
> > > +                     if (bitmap_set_bit (m_imports, ver))
> > > +                       imports_unwind.quick_push (ver);
> > > +                     unwind.quick_push (ver);
> > > +                   }
> > > +               }
> > >             }
> > >           find_paths_to_names (e->src, new_interesting, overall_paths);
> > > -         // Restore new_interesting.  We leave m_imports alone since
> > > -         // we do not prune defs in BB from it and separately keeping
> > > -         // track of which bits to unwind isn't worth the trouble.
> > > -         for (tree def : unwind)
> > > -           bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
> > > +         // Restore new_interesting.
> > > +         for (int def : unwind)
> > > +           bitmap_clear_bit (new_interesting, def);
> > >           unwind.truncate (0);
> > > +         // Restore and m_imports.
> > > +         for (int def : imports_unwind)
> > > +           bitmap_clear_bit (m_imports, def);
> > > +         imports_unwind.truncate (0);
> > >        }
> > >     }
> > > +      /* m_imports tracks all interesting names on the path, so when
> > > +    backtracking we have to restore it.  */
> > > +      for (int j : new_imports)
> > > +   bitmap_clear_bit (m_imports, j);
> > >       }
> > >     else if (dump_file && (dump_flags & TDF_DETAILS))
> > >       fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
> > > @@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
> > >               && gimple_code (stmt) != GIMPLE_SWITCH))
> > >       return;
> > >   -  if (EDGE_COUNT (bb->succs) > 1
> > > -      || single_succ_to_potentially_threadable_block (bb))
> > > +  if (EDGE_COUNT (bb->succs) > 1)
> > >       {
> > >         m_last_stmt = stmt;
> > >         m_visited_bbs.empty ();
> > >         m_path.truncate (0);
> > >         m_name = name;
> > > -      m_solver->compute_imports (m_imports, bb);
> > >   +      // We compute imports of the path during discovery starting
> > > +      // just with names used in the conditional.
> > > +      bitmap_clear (m_imports);
> > > +      ssa_op_iter iter;
> > > +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
> > > +   bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> > > +
> > > +      // Interesting is the set of imports we still not have see
> > > +      // the definition of.  So while imports only grow, the
> > > +      // set of interesting defs dwindles and once empty we can
> > > +      // stop searching.
> > >         auto_bitmap interesting;
> > >         bitmap_copy (interesting, m_imports);
> > >         find_paths_to_names (bb, interesting, 1);
> >
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-10  6:45   ` Richard Biener
@ 2022-08-10 10:46     ` Richard Biener
  2022-08-10 16:08       ` Aldy Hernandez
  2022-08-11 17:46       ` Andrew MacLeod
  2022-08-10 15:42     ` Aldy Hernandez
  1 sibling, 2 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-10 10:46 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, aldyh

On Wed, 10 Aug 2022, Richard Biener wrote:

> On Tue, 9 Aug 2022, Andrew MacLeod wrote:
> 
> > 
> > On 8/9/22 09:01, Richard Biener wrote:
> > > This revisits how we compute imports later used for the ranger path
> > > query during backwards threading.  The compute_imports function
> > > of the path solver ends up pulling the SSA def chain of regular
> > > stmts without limit and since it starts with just the gori imports
> > > of the path exit it misses some interesting names to translate
> > > during path discovery.  In fact with a still empty path this
> > > compute_imports function looks like not the correct tool.
> > 
> > I don't really know how this works in practice.  Aldys off this week, so he
> > can comment when he returns.
> > 
> > The original premise was along the line of recognizing that only changes to a
> > GORI import name to a block can affect the branch at the end of the block. 
> > ie, if the path doesn't change any import to block A, then the branch at the
> > end of block A will not change either.    Likewise, if it does change an
> > import, then we look at whether the branch can be threaded.    Beyond that
> > basic premise, I dont know what all it does.
> 
> Yep, I also think that's the idea.

[...]

> Anyway, the important result of the change is that the imports
> set is vastly smaller since it is now constrained to the
> actual path.  Plus it now contains the local defs in blocks
> of the path that do not dominate the exit block which means
> it might get more threading opportunities.  Which reminds me
> to re-do the cc1files experiment for this change.

Results are a bit unconclusive.  We have
ethread 14044 -> 14058, first threadfull 36986 -> 37174,
first thread 8060 -> 8049, dom 21723 -> 21822, second thread
(after loop opts) 6242 -> 5582, second dom 8522 -> 8765,
second threadfull 3072 -> 2998 which makes an overall drop
in the number of threads from 98649 to 98448.

I've isolated one testcase we threaded before but no longer after
this change and that is

void foo (int nest, int print_nest)
{
  _Bool t0 = nest != 0;
  _Bool t1 = nest == print_nest;
  _Bool t2 = t0 & t1;
  if (t2)
    __builtin_puts ("x");
  nest++;
  if (nest > 2)
    __builtin_abort ();
  if (print_nest == nest)
    __builtin_puts ("y");
}

where we are able to thread from if (t2) to if (print_nest == nest)
resolving that to false when t2 is true using the nest == print_nest
relation.  Now, the reason is because of the imports added by

  // Exported booleans along the path, may help conditionals.
  if (m_resolve)
    for (i = 0; i < m_path.length (); ++i)
      {   
        basic_block bb = m_path[i];
        tree name;
        FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
          if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
            bitmap_set_bit (imports, SSA_NAME_VERSION (name));
      }

_BUT_ - it turns out that the m_path local to the solver is still
the one from the last back_threader::find_taken_edge_switch
calling m_solver->compute_ranges (path, m_imports), so for a possibly
completely unrelated path.  Truncating the path also on the solver
side before calling compute_imports makes the testcase fail to thread.
I'll note the PHI handling also looks at the stale m_path.

I see the solver itself adds relations from edges on the path so
the cruical item here seems to be to add imports for the path
entry conditional, but those would likely be GORI imports for that
block?  Unfortunately that fails to add t[012], the GORI exports
seem to cover all that's needed but then exports might be too much
here?  At least that's what the code in compute_imports effectively
does (also for the entry block).  But I do wonder why compute_ranges
does not add relations computed by the entry conditional ...
That's probably because of

void
path_range_query::compute_outgoing_relations (basic_block bb, basic_block 
next)
{
  gimple *stmt = last_stmt (bb);

  if (stmt
      && gimple_code (stmt) == GIMPLE_COND
      && (import_p (gimple_cond_lhs (stmt))
          || import_p (gimple_cond_rhs (stmt))))

where for a condition like above the names in the imports are not
in the condition itself.  The gori.outgoing_edge_range_p compute
of the import ranges is appearantly not affected by this
restriction and picks up nest as ~[0, 0] on the respective path
even though, while 'nest' is in imports, the temporaries are not.
That would support the argument to drop the import_p checks in
path_range_query::compute_outgoing_relations.

I'm not sure it's worth fixing incrementally though, it's fallout
of r12-5157-gbfa04d0ec958eb that in some way did the reverse of
my patch.  Interestingly hybrid_jt_simplifier::compute_ranges_from_state
still computes imports itself before calling compute_ranges.

For comparing thread numbers fixing the current state incrementally
would be nice I guess, I will test something but not further pursue it
if not successful.

Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
  2022-08-09 18:52 ` Andrew MacLeod
@ 2022-08-10  6:45   ` Richard Biener
  2022-08-10 10:46     ` Richard Biener
  2022-08-10 15:42     ` Aldy Hernandez
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-10  6:45 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc-patches, aldyh

On Tue, 9 Aug 2022, Andrew MacLeod wrote:

> 
> On 8/9/22 09:01, Richard Biener wrote:
> > This revisits how we compute imports later used for the ranger path
> > query during backwards threading.  The compute_imports function
> > of the path solver ends up pulling the SSA def chain of regular
> > stmts without limit and since it starts with just the gori imports
> > of the path exit it misses some interesting names to translate
> > during path discovery.  In fact with a still empty path this
> > compute_imports function looks like not the correct tool.
> 
> I don't really know how this works in practice.  Aldys off this week, so he
> can comment when he returns.
> 
> The original premise was along the line of recognizing that only changes to a
> GORI import name to a block can affect the branch at the end of the block. 
> ie, if the path doesn't change any import to block A, then the branch at the
> end of block A will not change either.    Likewise, if it does change an
> import, then we look at whether the branch can be threaded.    Beyond that
> basic premise, I dont know what all it does.

Yep, I also think that's the idea.

> I presume the unbounded def chain is for local defs within a block that in
> turn feeds the import to another block.   Im not sure why we need to do much
> with those..  again, its only the import to the defchain that can affect the
> outcome t the end of the chain.. and if it changes, then you need to
> recalculate the entire chain.. but that would be part of the normal path
> walk.  I suspect ther eis also some pruning that can be done there, as GORi
> reflects "can affect the range" not "will affect the range".
>
> Perhaps whats going on is that all those local elements are being added up
> front to the list of interesting names?  That would certainly blow up the
> bitmaps and loops and such.

What it does is, if we have

bb:
  _3 = _5 + 1;
  _1 = _3 + _4;
  if (_1 > _2)

it puts _3 and _4 and _5 into the set of interesting names.  That's
OK and desired I think?  The actual problem is that compute_imports
will follow the def of _5 and _4 into dominating blocks recursively,
adding things to the imports even if the definition blocks are not
on the path (the path is empty at the point we call compute_imports).
For the testcase at hand this pulls in some 1000s of names into the
initial set of imports.  Now, the current path discovery code
only adds to imports by means of PHI translating from a PHI
def to a PHI use on the path edge - it doesn't add any further
local names used to define such PHI edge use from blocks not
dominating the path exit (but on the about to be threaded path).

I'll also note that compute_imports does

  // Exported booleans along the path, may help conditionals.
  if (m_resolve)
    for (i = 0; i < m_path.length (); ++i)
      {
        basic_block bb = m_path[i];
        tree name;
        FOR_EACH_GORI_EXPORT_NAME (gori, bb, name)
          if (TREE_CODE (TREE_TYPE (name)) == BOOLEAN_TYPE)
            bitmap_set_bit (imports, SSA_NAME_VERSION (name));
      }

but at least for the backwards threader this isn't effective
since the path is empty at the point we call this.  And no
other code in the backwards threader does sth like this.
I would also say for the exit block it doesn't make
much sense to look at gori exports.  Plus I don't understand
what this tries to capture - it presumably catches stmts
like _1 = _2 == _3 that have uses in downstream blocks but
might be not directly part of the conditional op def chain.

In the end all the threader does is, once the path is complete,
compute ranges for the path and the imports and then fold the
path exit stmt.

I'm not sure which names we need in the import set for this
but as I guess the set of imports constrain the names we
compute ranges for so for the BB above, if we want to have
a range of _1 we need _3 in the imports set even though it
is not in the set of GORI imports?

What I'm seeing with the threader is that we have quite some
cases where we have an "unrelated" CFG diamond on the path but
end up threading a random path through it (because neither the
path query infrastructure nor the copier knows to copy the
whole diamond).  When the def chain of the operands on the
controlling conditions are not in the import set then I
suppose the path range computation doesn't do anything to
simplify things there (we do after all specify a condition
result by means of arbitrarily choosing one of the paths).
Such diamonds are the main source of exponential behavior
of the path discovery and I'm thinking of ways to improve
things here.  The original backwards threader, supposedly
due to a bug, only ever considered one path for continuing
beyond a diamond - but maybe that was on purpose because
the intermediate condition doesn't meaninfully contribute
to resolving the exit condition.

Anyway, the important result of the change is that the imports
set is vastly smaller since it is now constrained to the
actual path.  Plus it now contains the local defs in blocks
of the path that do not dominate the exit block which means
it might get more threading opportunities.  Which reminds me
to re-do the cc1files experiment for this change.

> Im sure Aldy will pitch in when he returns from vacation.

Good, I'll wait for his comments.

Richard.

> 
> 
> >
> > The following instead implements what it does during the path discovery
> > and since we add the exit block we seed the initial imports and
> > interesting names from just the exit conditional.  When we then
> > process interesting names (aka imports we did not yet see the definition
> > of) we prune local defs but add their uses in a similar way as
> > compute_imports would have done.
> >
> > The patch also properly unwinds m_imports during the path discovery
> > backtracking and from a debugging session I have verified the two
> > sets evolve as expected now while previously behaving slightly erratic.
> >
> > Fortunately the m_imports set now also is shrunken significantly for
> > the PR69592 testcase (aka PR106514) so that there's overall speedup
> > when increasing --param max-jump-thread-duplication-stmts as
> > 15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
> > 1s -> 2s -> 4s -> 8s.
> >
> > This runs into a latent issue in X which doesn't seem to expect
> > any PHI nodes with a constant argument on an edge inside the path.
> > But we now have those as interesting, for example for the ICEing
> > g++.dg/torture/pr100925.C which just has sth like
> >
> >    if (i)
> >      x = 1;
> >    else
> >      x = 5;
> >    if (x == 1)
> >      ...
> >
> > where we now have the path from if (i) to if (x) and the PHI for x
> > in the set of imports to consider for resolving x == 1 which IMHO
> > looks exactly like what we want.  The path_range_query::ssa_range_in_phi
> > papers over the issue and drops the range to varying instead of
> > crashing.  I didn't want to mess with this any further in this patch
> > (but I couldn't resist replacing the loop over PHI args with
> > PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
> > path_range_query::ssa_range_in_phi fix, now re-running with.
> >
> > OK?
> >
> > Thanks,
> > Richard.
> >
> >  PR tree-optimization/106514
> >  * tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
> >  Compute and unwind both m_imports and interesting on the fly during
> >  path discovery.
> >  (back_threader::find_paths): Compute the original m_imports
> >  from just the SSA uses of the exit conditional.  Drop
> >  handling single_succ_to_potentially_threadable_block.
> >  * gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
> >  constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
> > ---
> >   gcc/gimple-range-path.cc       |  52 ++++++++---------
> >   gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
> >   2 files changed, 106 insertions(+), 50 deletions(-)
> >
> > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > index 43e7526b6fc..b4376011ea8 100644
> > --- a/gcc/gimple-range-path.cc
> > +++ b/gcc/gimple-range-path.cc
> > @@ -276,8 +276,6 @@ void
> >   path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
> >   {
> >     tree name = gimple_phi_result (phi);
> > -  basic_block bb = gimple_bb (phi);
> > -  unsigned nargs = gimple_phi_num_args (phi);
> >   
> >     if (at_entry ())
> >       {
> > @@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
> > *phi)
> >         // Try to fold the phi exclusively with global or cached values.
> >         // This will get things like PHI <5(99), 6(88)>.  We do this by
> >         // calling range_of_expr with no context.
> > +      unsigned nargs = gimple_phi_num_args (phi);
> >         Value_Range arg_range (TREE_TYPE (name));
> >         r.set_undefined ();
> >         for (size_t i = 0; i < nargs; ++i)
> > @@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
> > *phi)
> >         return;
> >       }
> >   +  basic_block bb = gimple_bb (phi);
> >     basic_block prev = prev_bb ();
> >     edge e_in = find_edge (prev, bb);
> > -
> > -  for (size_t i = 0; i < nargs; ++i)
> > -    if (e_in == gimple_phi_arg_edge (phi, i))
> > -      {
> > -	tree arg = gimple_phi_arg_def (phi, i);
> > -	// Avoid using the cache for ARGs defined in this block, as
> > -	// that could create an ordering problem.
> > -	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> > -	  {
> > -	    if (m_resolve)
> > -	      {
> > -		Value_Range tmp (TREE_TYPE (name));
> > -		// Using both the range on entry to the path, and the
> > -		// range on this edge yields significantly better
> > -		// results.
> > -		if (defined_outside_path (arg))
> > -		  range_on_path_entry (r, arg);
> > -		else
> > -		  r.set_varying (TREE_TYPE (name));
> > -		m_ranger->range_on_edge (tmp, e_in, arg);
> > -		r.intersect (tmp);
> > -		return;
> > -	      }
> > +  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
> > +  // Avoid using the cache for ARGs defined in this block, as
> > +  // that could create an ordering problem.
> > +  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> > +    {
> > +      if (m_resolve)
> > +	{
> > +	  Value_Range tmp (TREE_TYPE (name));
> > +	  // Using both the range on entry to the path, and the
> > +	  // range on this edge yields significantly better
> > +	  // results.
> > +	  if (TREE_CODE (arg) == SSA_NAME
> > +	      && defined_outside_path (arg))
> > +	    range_on_path_entry (r, arg);
> > +	  else
> >        r.set_varying (TREE_TYPE (name));
> > -	  }
> > -	return;
> > -      }
> > -  gcc_unreachable ();
> > +	  m_ranger->range_on_edge (tmp, e_in, arg);
> > +	  r.intersect (tmp);
> > +	  return;
> > +	}
> > +      r.set_varying (TREE_TYPE (name));
> > +    }
> >   }
> >   
> >   // If NAME is defined in BB, set R to the range of NAME, and return
> > diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> > index 4cd4d21c712..a544e97b2af 100644
> > --- a/gcc/tree-ssa-threadbackward.cc
> > +++ b/gcc/tree-ssa-threadbackward.cc
> > @@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb,
> > bitmap interesting,
> >       {
> >         // For further greedy searching we want to remove interesting
> >         // names defined in BB but add ones on the PHI edges for the
> > -      // respective edges.  We do this by starting with all names
> > +      // respective edges and adding imports from those stmts.
> > +      // We do this by starting with all names
> >         // not defined in BB as interesting, collecting a list of
> >         // interesting PHIs in BB on the fly.  Then we iterate over
> >         // predecessor edges, adding interesting PHI edge defs to
> >         // the set of interesting names to consider when processing it.
> >         auto_bitmap new_interesting;
> > +      auto_vec<int, 16> new_imports;
> >         auto_vec<gphi *, 4> interesting_phis;
> >         bitmap_iterator bi;
> >         unsigned i;
> > +      auto_vec<tree, 16> worklist;
> >          EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
> >    {
> >      tree name = ssa_name (i);
> >      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> > +	  /* Imports remain interesting.  */
> >   	  if (gimple_bb (def_stmt) != bb)
> > -	    bitmap_set_bit (new_interesting, i);
> > -	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
> >        {
> > -	      tree res = gimple_phi_result (phi);
> > -	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> > -		interesting_phis.safe_push (phi);
> > +	      bitmap_set_bit (new_interesting, i);
> > +	      continue;
> > +	    }
> > +	  worklist.quick_push (name);
> > +	  while (!worklist.is_empty ())
> > +	    {
> > +	      tree name = worklist.pop ();
> > +	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> > +	      /* Newly discovered imports are interesting.  */
> > +	      if (gimple_bb (def_stmt) != bb)
> > +		{
> > +		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
> > +		  continue;
> > +		}
> > +	      /* Local PHIs participate in renaming below.  */
> > +	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
> > +		{
> > +		  tree res = gimple_phi_result (phi);
> > +		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> > +		    interesting_phis.safe_push (phi);
> > +		}
> > +	      /* For other local defs process their uses, amending
> > +		 imports on the way.  */
> > +	      else if (is_gimple_assign (def_stmt))
> > +		{
> > +		  for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
> > +		    {
> > +		      tree rhs = gimple_op (def_stmt, j);
> > +		      if (TREE_CODE (rhs) == SSA_NAME
> > +			  && bitmap_set_bit (m_imports,
> > +					     SSA_NAME_VERSION (rhs)))
> > +			{
> > +			  /* ???  We probably want to track a 'visited'
> > +			     flag separately and add to imports only
> > +			     when the def is handled by ranger.  The
> > +			     current way adds us defs that are neither
> > +			     PHIs nor "interesting" assigns, like for
> > +			     example loads.  */
> > +			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
> > +			  worklist.safe_push (rhs);
> > +			}
> > +		    }
> > +		}
> >        }
> >   	}
> > +      worklist.release ();
> >         if (!bitmap_empty_p (new_interesting)
> >      || !interesting_phis.is_empty ())
> >   	{
> > -	  auto_vec<tree, 4> unwind (interesting_phis.length ());
> > +	  auto_vec<int, 4> unwind (interesting_phis.length ());
> > +	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
> >      edge_iterator iter;
> >      edge e;
> >      FOR_EACH_EDGE (e, iter, bb->preds)
> > @@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb,
> > bitmap interesting,
> >     {
> >       tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
> >       if (TREE_CODE (def) == SSA_NAME)
> > -		    if (bitmap_set_bit (new_interesting,
> > -					SSA_NAME_VERSION (def)))
> > -		      {
> > -			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
> > -			unwind.quick_push (def);
> > -		      }
> > +		    {
> > +		      int ver = SSA_NAME_VERSION (def);
> > +		      if (bitmap_set_bit (new_interesting, ver))
> > +			{
> > +			  if (bitmap_set_bit (m_imports, ver))
> > +			    imports_unwind.quick_push (ver);
> > +			  unwind.quick_push (ver);
> > +			}
> > +		    }
> >          	}
> >   	      find_paths_to_names (e->src, new_interesting, overall_paths);
> > -	      // Restore new_interesting.  We leave m_imports alone since
> > -	      // we do not prune defs in BB from it and separately keeping
> > -	      // track of which bits to unwind isn't worth the trouble.
> > -	      for (tree def : unwind)
> > -		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
> > +	      // Restore new_interesting.
> > +	      for (int def : unwind)
> > +		bitmap_clear_bit (new_interesting, def);
> >   	      unwind.truncate (0);
> > +	      // Restore and m_imports.
> > +	      for (int def : imports_unwind)
> > +		bitmap_clear_bit (m_imports, def);
> > +	      imports_unwind.truncate (0);
> >        }
> >   	}
> > +      /* m_imports tracks all interesting names on the path, so when
> > +	 backtracking we have to restore it.  */
> > +      for (int j : new_imports)
> > +	bitmap_clear_bit (m_imports, j);
> >       }
> >     else if (dump_file && (dump_flags & TDF_DETAILS))
> >       fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
> > @@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
> >       	  && gimple_code (stmt) != GIMPLE_SWITCH))
> >       return;
> >   -  if (EDGE_COUNT (bb->succs) > 1
> > -      || single_succ_to_potentially_threadable_block (bb))
> > +  if (EDGE_COUNT (bb->succs) > 1)
> >       {
> >         m_last_stmt = stmt;
> >         m_visited_bbs.empty ();
> >         m_path.truncate (0);
> >         m_name = name;
> > -      m_solver->compute_imports (m_imports, bb);
> >   +      // We compute imports of the path during discovery starting
> > +      // just with names used in the conditional.
> > +      bitmap_clear (m_imports);
> > +      ssa_op_iter iter;
> > +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
> > +	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> > +
> > +      // Interesting is the set of imports we still not have see
> > +      // the definition of.  So while imports only grow, the
> > +      // set of interesting defs dwindles and once empty we can
> > +      // stop searching.
> >         auto_bitmap interesting;
> >         bitmap_copy (interesting, m_imports);
> >         find_paths_to_names (bb, interesting, 1);
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
       [not found] <37255.122080909012202281@us-mta-359.us.mimecast.lan>
@ 2022-08-09 18:52 ` Andrew MacLeod
  2022-08-10  6:45   ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew MacLeod @ 2022-08-09 18:52 UTC (permalink / raw)
  To: Richard Biener, gcc-patches


On 8/9/22 09:01, Richard Biener wrote:
> This revisits how we compute imports later used for the ranger path
> query during backwards threading.  The compute_imports function
> of the path solver ends up pulling the SSA def chain of regular
> stmts without limit and since it starts with just the gori imports
> of the path exit it misses some interesting names to translate
> during path discovery.  In fact with a still empty path this
> compute_imports function looks like not the correct tool.

I don't really know how this works in practice.  Aldys off this week, so 
he can comment when he returns.

The original premise was along the line of recognizing that only changes 
to a GORI import name to a block can affect the branch at the end of the 
block.  ie, if the path doesn't change any import to block A, then the 
branch at the end of block A will not change either.    Likewise, if it 
does change an import, then we look at whether the branch can be 
threaded.    Beyond that basic premise, I dont know what all it does.

I presume the unbounded def chain is for local defs within a block that 
in turn feeds the import to another block.   Im not sure why we need to 
do much with those..  again, its only the import to the defchain that 
can affect the outcome t the end of the chain.. and if it changes, then 
you need to recalculate the entire chain.. but that would be part of the 
normal path walk.  I suspect ther eis also some pruning that can be done 
there, as GORi reflects "can affect the range" not "will affect the range".

Perhaps whats going on is that all those local elements are being added 
up front to the list of interesting names?  That would certainly blow up 
the bitmaps and loops and such.

Im sure Aldy will pitch in when he returns from vacation.



>
> The following instead implements what it does during the path discovery
> and since we add the exit block we seed the initial imports and
> interesting names from just the exit conditional.  When we then
> process interesting names (aka imports we did not yet see the definition
> of) we prune local defs but add their uses in a similar way as
> compute_imports would have done.
>
> The patch also properly unwinds m_imports during the path discovery
> backtracking and from a debugging session I have verified the two
> sets evolve as expected now while previously behaving slightly erratic.
>
> Fortunately the m_imports set now also is shrunken significantly for
> the PR69592 testcase (aka PR106514) so that there's overall speedup
> when increasing --param max-jump-thread-duplication-stmts as
> 15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
> 1s -> 2s -> 4s -> 8s.
>
> This runs into a latent issue in X which doesn't seem to expect
> any PHI nodes with a constant argument on an edge inside the path.
> But we now have those as interesting, for example for the ICEing
> g++.dg/torture/pr100925.C which just has sth like
>
>    if (i)
>      x = 1;
>    else
>      x = 5;
>    if (x == 1)
>      ...
>
> where we now have the path from if (i) to if (x) and the PHI for x
> in the set of imports to consider for resolving x == 1 which IMHO
> looks exactly like what we want.  The path_range_query::ssa_range_in_phi
> papers over the issue and drops the range to varying instead of
> crashing.  I didn't want to mess with this any further in this patch
> (but I couldn't resist replacing the loop over PHI args with
> PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
> path_range_query::ssa_range_in_phi fix, now re-running with.
>
> OK?
>
> Thanks,
> Richard.
>
> 	PR tree-optimization/106514
> 	* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
> 	Compute and unwind both m_imports and interesting on the fly during
> 	path discovery.
> 	(back_threader::find_paths): Compute the original m_imports
> 	from just the SSA uses of the exit conditional.  Drop
> 	handling single_succ_to_potentially_threadable_block.
> 	* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
> 	constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
> ---
>   gcc/gimple-range-path.cc       |  52 ++++++++---------
>   gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
>   2 files changed, 106 insertions(+), 50 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index 43e7526b6fc..b4376011ea8 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -276,8 +276,6 @@ void
>   path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
>   {
>     tree name = gimple_phi_result (phi);
> -  basic_block bb = gimple_bb (phi);
> -  unsigned nargs = gimple_phi_num_args (phi);
>   
>     if (at_entry ())
>       {
> @@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
>         // Try to fold the phi exclusively with global or cached values.
>         // This will get things like PHI <5(99), 6(88)>.  We do this by
>         // calling range_of_expr with no context.
> +      unsigned nargs = gimple_phi_num_args (phi);
>         Value_Range arg_range (TREE_TYPE (name));
>         r.set_undefined ();
>         for (size_t i = 0; i < nargs; ++i)
> @@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
>         return;
>       }
>   
> +  basic_block bb = gimple_bb (phi);
>     basic_block prev = prev_bb ();
>     edge e_in = find_edge (prev, bb);
> -
> -  for (size_t i = 0; i < nargs; ++i)
> -    if (e_in == gimple_phi_arg_edge (phi, i))
> -      {
> -	tree arg = gimple_phi_arg_def (phi, i);
> -	// Avoid using the cache for ARGs defined in this block, as
> -	// that could create an ordering problem.
> -	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> -	  {
> -	    if (m_resolve)
> -	      {
> -		Value_Range tmp (TREE_TYPE (name));
> -		// Using both the range on entry to the path, and the
> -		// range on this edge yields significantly better
> -		// results.
> -		if (defined_outside_path (arg))
> -		  range_on_path_entry (r, arg);
> -		else
> -		  r.set_varying (TREE_TYPE (name));
> -		m_ranger->range_on_edge (tmp, e_in, arg);
> -		r.intersect (tmp);
> -		return;
> -	      }
> +  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
> +  // Avoid using the cache for ARGs defined in this block, as
> +  // that could create an ordering problem.
> +  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
> +    {
> +      if (m_resolve)
> +	{
> +	  Value_Range tmp (TREE_TYPE (name));
> +	  // Using both the range on entry to the path, and the
> +	  // range on this edge yields significantly better
> +	  // results.
> +	  if (TREE_CODE (arg) == SSA_NAME
> +	      && defined_outside_path (arg))
> +	    range_on_path_entry (r, arg);
> +	  else
>   	    r.set_varying (TREE_TYPE (name));
> -	  }
> -	return;
> -      }
> -  gcc_unreachable ();
> +	  m_ranger->range_on_edge (tmp, e_in, arg);
> +	  r.intersect (tmp);
> +	  return;
> +	}
> +      r.set_varying (TREE_TYPE (name));
> +    }
>   }
>   
>   // If NAME is defined in BB, set R to the range of NAME, and return
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 4cd4d21c712..a544e97b2af 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
>       {
>         // For further greedy searching we want to remove interesting
>         // names defined in BB but add ones on the PHI edges for the
> -      // respective edges.  We do this by starting with all names
> +      // respective edges and adding imports from those stmts.
> +      // We do this by starting with all names
>         // not defined in BB as interesting, collecting a list of
>         // interesting PHIs in BB on the fly.  Then we iterate over
>         // predecessor edges, adding interesting PHI edge defs to
>         // the set of interesting names to consider when processing it.
>         auto_bitmap new_interesting;
> +      auto_vec<int, 16> new_imports;
>         auto_vec<gphi *, 4> interesting_phis;
>         bitmap_iterator bi;
>         unsigned i;
> +      auto_vec<tree, 16> worklist;
>         EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
>   	{
>   	  tree name = ssa_name (i);
>   	  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> +	  /* Imports remain interesting.  */
>   	  if (gimple_bb (def_stmt) != bb)
> -	    bitmap_set_bit (new_interesting, i);
> -	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
>   	    {
> -	      tree res = gimple_phi_result (phi);
> -	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> -		interesting_phis.safe_push (phi);
> +	      bitmap_set_bit (new_interesting, i);
> +	      continue;
> +	    }
> +	  worklist.quick_push (name);
> +	  while (!worklist.is_empty ())
> +	    {
> +	      tree name = worklist.pop ();
> +	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
> +	      /* Newly discovered imports are interesting.  */
> +	      if (gimple_bb (def_stmt) != bb)
> +		{
> +		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
> +		  continue;
> +		}
> +	      /* Local PHIs participate in renaming below.  */
> +	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
> +		{
> +		  tree res = gimple_phi_result (phi);
> +		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
> +		    interesting_phis.safe_push (phi);
> +		}
> +	      /* For other local defs process their uses, amending
> +		 imports on the way.  */
> +	      else if (is_gimple_assign (def_stmt))
> +		{
> +		  for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
> +		    {
> +		      tree rhs = gimple_op (def_stmt, j);
> +		      if (TREE_CODE (rhs) == SSA_NAME
> +			  && bitmap_set_bit (m_imports,
> +					     SSA_NAME_VERSION (rhs)))
> +			{
> +			  /* ???  We probably want to track a 'visited'
> +			     flag separately and add to imports only
> +			     when the def is handled by ranger.  The
> +			     current way adds us defs that are neither
> +			     PHIs nor "interesting" assigns, like for
> +			     example loads.  */
> +			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
> +			  worklist.safe_push (rhs);
> +			}
> +		    }
> +		}
>   	    }
>   	}
> +      worklist.release ();
>         if (!bitmap_empty_p (new_interesting)
>   	  || !interesting_phis.is_empty ())
>   	{
> -	  auto_vec<tree, 4> unwind (interesting_phis.length ());
> +	  auto_vec<int, 4> unwind (interesting_phis.length ());
> +	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
>   	  edge_iterator iter;
>   	  edge e;
>   	  FOR_EACH_EDGE (e, iter, bb->preds)
> @@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
>   		{
>   		  tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
>   		  if (TREE_CODE (def) == SSA_NAME)
> -		    if (bitmap_set_bit (new_interesting,
> -					SSA_NAME_VERSION (def)))
> -		      {
> -			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
> -			unwind.quick_push (def);
> -		      }
> +		    {
> +		      int ver = SSA_NAME_VERSION (def);
> +		      if (bitmap_set_bit (new_interesting, ver))
> +			{
> +			  if (bitmap_set_bit (m_imports, ver))
> +			    imports_unwind.quick_push (ver);
> +			  unwind.quick_push (ver);
> +			}
> +		    }
>   		}
>   	      find_paths_to_names (e->src, new_interesting, overall_paths);
> -	      // Restore new_interesting.  We leave m_imports alone since
> -	      // we do not prune defs in BB from it and separately keeping
> -	      // track of which bits to unwind isn't worth the trouble.
> -	      for (tree def : unwind)
> -		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
> +	      // Restore new_interesting.
> +	      for (int def : unwind)
> +		bitmap_clear_bit (new_interesting, def);
>   	      unwind.truncate (0);
> +	      // Restore and m_imports.
> +	      for (int def : imports_unwind)
> +		bitmap_clear_bit (m_imports, def);
> +	      imports_unwind.truncate (0);
>   	    }
>   	}
> +      /* m_imports tracks all interesting names on the path, so when
> +	 backtracking we have to restore it.  */
> +      for (int j : new_imports)
> +	bitmap_clear_bit (m_imports, j);
>       }
>     else if (dump_file && (dump_flags & TDF_DETAILS))
>       fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
> @@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
>   	  && gimple_code (stmt) != GIMPLE_SWITCH))
>       return;
>   
> -  if (EDGE_COUNT (bb->succs) > 1
> -      || single_succ_to_potentially_threadable_block (bb))
> +  if (EDGE_COUNT (bb->succs) > 1)
>       {
>         m_last_stmt = stmt;
>         m_visited_bbs.empty ();
>         m_path.truncate (0);
>         m_name = name;
> -      m_solver->compute_imports (m_imports, bb);
>   
> +      // We compute imports of the path during discovery starting
> +      // just with names used in the conditional.
> +      bitmap_clear (m_imports);
> +      ssa_op_iter iter;
> +      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
> +	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
> +
> +      // Interesting is the set of imports we still not have see
> +      // the definition of.  So while imports only grow, the
> +      // set of interesting defs dwindles and once empty we can
> +      // stop searching.
>         auto_bitmap interesting;
>         bitmap_copy (interesting, m_imports);
>         find_paths_to_names (bb, interesting, 1);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
@ 2022-08-09 13:01 Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-09 13:01 UTC (permalink / raw)
  To: gcc-patches

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
    x = 1;
  else
    x = 5;
  if (x == 1)
    ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
path_range_query::ssa_range_in_phi fix, now re-running with.

OK?

Thanks,
Richard.

	PR tree-optimization/106514
	* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
	Compute and unwind both m_imports and interesting on the fly during
	path discovery.
	(back_threader::find_paths): Compute the original m_imports
	from just the SSA uses of the exit conditional.  Drop
	handling single_succ_to_potentially_threadable_block.
	* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
	constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
---
 gcc/gimple-range-path.cc       |  52 ++++++++---------
 gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
 2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..b4376011ea8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@ void
 path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
 {
   tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
 
   if (at_entry ())
     {
@@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
+      unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
@@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       return;
     }
 
+  basic_block bb = gimple_bb (phi);
   basic_block prev = prev_bb ();
   edge e_in = find_edge (prev, bb);
-
-  for (size_t i = 0; i < nargs; ++i)
-    if (e_in == gimple_phi_arg_edge (phi, i))
-      {
-	tree arg = gimple_phi_arg_def (phi, i);
-	// Avoid using the cache for ARGs defined in this block, as
-	// that could create an ordering problem.
-	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
-	  {
-	    if (m_resolve)
-	      {
-		Value_Range tmp (TREE_TYPE (name));
-		// Using both the range on entry to the path, and the
-		// range on this edge yields significantly better
-		// results.
-		if (defined_outside_path (arg))
-		  range_on_path_entry (r, arg);
-		else
-		  r.set_varying (TREE_TYPE (name));
-		m_ranger->range_on_edge (tmp, e_in, arg);
-		r.intersect (tmp);
-		return;
-	      }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
+  // Avoid using the cache for ARGs defined in this block, as
+  // that could create an ordering problem.
+  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
+    {
+      if (m_resolve)
+	{
+	  Value_Range tmp (TREE_TYPE (name));
+	  // Using both the range on entry to the path, and the
+	  // range on this edge yields significantly better
+	  // results.
+	  if (TREE_CODE (arg) == SSA_NAME
+	      && defined_outside_path (arg))
+	    range_on_path_entry (r, arg);
+	  else
 	    r.set_varying (TREE_TYPE (name));
-	  }
-	return;
-      }
-  gcc_unreachable ();
+	  m_ranger->range_on_edge (tmp, e_in, arg);
+	  r.intersect (tmp);
+	  return;
+	}
+      r.set_varying (TREE_TYPE (name));
+    }
 }
 
 // If NAME is defined in BB, set R to the range of NAME, and return
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 4cd4d21c712..a544e97b2af 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
     {
       // For further greedy searching we want to remove interesting
       // names defined in BB but add ones on the PHI edges for the
-      // respective edges.  We do this by starting with all names
+      // respective edges and adding imports from those stmts.
+      // We do this by starting with all names
       // not defined in BB as interesting, collecting a list of
       // interesting PHIs in BB on the fly.  Then we iterate over
       // predecessor edges, adding interesting PHI edge defs to
       // the set of interesting names to consider when processing it.
       auto_bitmap new_interesting;
+      auto_vec<int, 16> new_imports;
       auto_vec<gphi *, 4> interesting_phis;
       bitmap_iterator bi;
       unsigned i;
+      auto_vec<tree, 16> worklist;
       EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
 	{
 	  tree name = ssa_name (i);
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	  /* Imports remain interesting.  */
 	  if (gimple_bb (def_stmt) != bb)
-	    bitmap_set_bit (new_interesting, i);
-	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
 	    {
-	      tree res = gimple_phi_result (phi);
-	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
-		interesting_phis.safe_push (phi);
+	      bitmap_set_bit (new_interesting, i);
+	      continue;
+	    }
+	  worklist.quick_push (name);
+	  while (!worklist.is_empty ())
+	    {
+	      tree name = worklist.pop ();
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	      /* Newly discovered imports are interesting.  */
+	      if (gimple_bb (def_stmt) != bb)
+		{
+		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
+		  continue;
+		}
+	      /* Local PHIs participate in renaming below.  */
+	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
+		{
+		  tree res = gimple_phi_result (phi);
+		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
+		    interesting_phis.safe_push (phi);
+		}
+	      /* For other local defs process their uses, amending
+		 imports on the way.  */
+	      else if (is_gimple_assign (def_stmt))
+		{
+		  for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
+		    {
+		      tree rhs = gimple_op (def_stmt, j);
+		      if (TREE_CODE (rhs) == SSA_NAME
+			  && bitmap_set_bit (m_imports,
+					     SSA_NAME_VERSION (rhs)))
+			{
+			  /* ???  We probably want to track a 'visited'
+			     flag separately and add to imports only
+			     when the def is handled by ranger.  The
+			     current way adds us defs that are neither
+			     PHIs nor "interesting" assigns, like for
+			     example loads.  */
+			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
+			  worklist.safe_push (rhs);
+			}
+		    }
+		}
 	    }
 	}
+      worklist.release ();
       if (!bitmap_empty_p (new_interesting)
 	  || !interesting_phis.is_empty ())
 	{
-	  auto_vec<tree, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
 	  edge_iterator iter;
 	  edge e;
 	  FOR_EACH_EDGE (e, iter, bb->preds)
@@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
 		{
 		  tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
 		  if (TREE_CODE (def) == SSA_NAME)
-		    if (bitmap_set_bit (new_interesting,
-					SSA_NAME_VERSION (def)))
-		      {
-			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
-			unwind.quick_push (def);
-		      }
+		    {
+		      int ver = SSA_NAME_VERSION (def);
+		      if (bitmap_set_bit (new_interesting, ver))
+			{
+			  if (bitmap_set_bit (m_imports, ver))
+			    imports_unwind.quick_push (ver);
+			  unwind.quick_push (ver);
+			}
+		    }
 		}
 	      find_paths_to_names (e->src, new_interesting, overall_paths);
-	      // Restore new_interesting.  We leave m_imports alone since
-	      // we do not prune defs in BB from it and separately keeping
-	      // track of which bits to unwind isn't worth the trouble.
-	      for (tree def : unwind)
-		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
+	      // Restore new_interesting.
+	      for (int def : unwind)
+		bitmap_clear_bit (new_interesting, def);
 	      unwind.truncate (0);
+	      // Restore and m_imports.
+	      for (int def : imports_unwind)
+		bitmap_clear_bit (m_imports, def);
+	      imports_unwind.truncate (0);
 	    }
 	}
+      /* m_imports tracks all interesting names on the path, so when
+	 backtracking we have to restore it.  */
+      for (int j : new_imports)
+	bitmap_clear_bit (m_imports, j);
     }
   else if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
@@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
 	  && gimple_code (stmt) != GIMPLE_SWITCH))
     return;
 
-  if (EDGE_COUNT (bb->succs) > 1
-      || single_succ_to_potentially_threadable_block (bb))
+  if (EDGE_COUNT (bb->succs) > 1)
     {
       m_last_stmt = stmt;
       m_visited_bbs.empty ();
       m_path.truncate (0);
       m_name = name;
-      m_solver->compute_imports (m_imports, bb);
 
+      // We compute imports of the path during discovery starting
+      // just with names used in the conditional.
+      bitmap_clear (m_imports);
+      ssa_op_iter iter;
+      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
+	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+
+      // Interesting is the set of imports we still not have see
+      // the definition of.  So while imports only grow, the
+      // set of interesting defs dwindles and once empty we can
+      // stop searching.
       auto_bitmap interesting;
       bitmap_copy (interesting, m_imports);
       find_paths_to_names (bb, interesting, 1);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading
@ 2022-08-09 13:01 Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2022-08-09 13:01 UTC (permalink / raw)
  To: gcc-patches

This revisits how we compute imports later used for the ranger path
query during backwards threading.  The compute_imports function
of the path solver ends up pulling the SSA def chain of regular
stmts without limit and since it starts with just the gori imports
of the path exit it misses some interesting names to translate
during path discovery.  In fact with a still empty path this
compute_imports function looks like not the correct tool.

The following instead implements what it does during the path discovery
and since we add the exit block we seed the initial imports and
interesting names from just the exit conditional.  When we then
process interesting names (aka imports we did not yet see the definition
of) we prune local defs but add their uses in a similar way as
compute_imports would have done.

The patch also properly unwinds m_imports during the path discovery
backtracking and from a debugging session I have verified the two
sets evolve as expected now while previously behaving slightly erratic.

Fortunately the m_imports set now also is shrunken significantly for
the PR69592 testcase (aka PR106514) so that there's overall speedup
when increasing --param max-jump-thread-duplication-stmts as
15 -> 30 -> 60 -> 120 from 1s -> 2s -> 13s -> 27s to with the patch
1s -> 2s -> 4s -> 8s.

This runs into a latent issue in X which doesn't seem to expect
any PHI nodes with a constant argument on an edge inside the path.
But we now have those as interesting, for example for the ICEing
g++.dg/torture/pr100925.C which just has sth like

  if (i)
    x = 1;
  else
    x = 5;
  if (x == 1)
    ...

where we now have the path from if (i) to if (x) and the PHI for x
in the set of imports to consider for resolving x == 1 which IMHO
looks exactly like what we want.  The path_range_query::ssa_range_in_phi
papers over the issue and drops the range to varying instead of
crashing.  I didn't want to mess with this any further in this patch
(but I couldn't resist replacing the loop over PHI args with
PHI_ARG_DEF_FROM_EDGE, so mind the re-indenting).

Bootstrapped and tested on x86_64-unknown-linux-gnu w/o the
path_range_query::ssa_range_in_phi fix, now re-running with.

OK?

Thanks,
Richard.

	PR tree-optimization/106514
	* tree-ssa-threadbackward.cc (back_threader::find_paths_to_names):
	Compute and unwind both m_imports and interesting on the fly during
	path discovery.
	(back_threader::find_paths): Compute the original m_imports
	from just the SSA uses of the exit conditional.  Drop
	handling single_succ_to_potentially_threadable_block.
	* gimple-range-path.cc (path_range_query::ssa_range_in_phi): Handle
	constant PHI arguments without crashing.  Use PHI_ARG_DEF_FROM_EDGE.
---
 gcc/gimple-range-path.cc       |  52 ++++++++---------
 gcc/tree-ssa-threadbackward.cc | 104 ++++++++++++++++++++++++++-------
 2 files changed, 106 insertions(+), 50 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 43e7526b6fc..b4376011ea8 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -276,8 +276,6 @@ void
 path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
 {
   tree name = gimple_phi_result (phi);
-  basic_block bb = gimple_bb (phi);
-  unsigned nargs = gimple_phi_num_args (phi);
 
   if (at_entry ())
     {
@@ -287,6 +285,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       // Try to fold the phi exclusively with global or cached values.
       // This will get things like PHI <5(99), 6(88)>.  We do this by
       // calling range_of_expr with no context.
+      unsigned nargs = gimple_phi_num_args (phi);
       Value_Range arg_range (TREE_TYPE (name));
       r.set_undefined ();
       for (size_t i = 0; i < nargs; ++i)
@@ -303,36 +302,31 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       return;
     }
 
+  basic_block bb = gimple_bb (phi);
   basic_block prev = prev_bb ();
   edge e_in = find_edge (prev, bb);
-
-  for (size_t i = 0; i < nargs; ++i)
-    if (e_in == gimple_phi_arg_edge (phi, i))
-      {
-	tree arg = gimple_phi_arg_def (phi, i);
-	// Avoid using the cache for ARGs defined in this block, as
-	// that could create an ordering problem.
-	if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
-	  {
-	    if (m_resolve)
-	      {
-		Value_Range tmp (TREE_TYPE (name));
-		// Using both the range on entry to the path, and the
-		// range on this edge yields significantly better
-		// results.
-		if (defined_outside_path (arg))
-		  range_on_path_entry (r, arg);
-		else
-		  r.set_varying (TREE_TYPE (name));
-		m_ranger->range_on_edge (tmp, e_in, arg);
-		r.intersect (tmp);
-		return;
-	      }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e_in);
+  // Avoid using the cache for ARGs defined in this block, as
+  // that could create an ordering problem.
+  if (ssa_defined_in_bb (arg, bb) || !get_cache (r, arg))
+    {
+      if (m_resolve)
+	{
+	  Value_Range tmp (TREE_TYPE (name));
+	  // Using both the range on entry to the path, and the
+	  // range on this edge yields significantly better
+	  // results.
+	  if (TREE_CODE (arg) == SSA_NAME
+	      && defined_outside_path (arg))
+	    range_on_path_entry (r, arg);
+	  else
 	    r.set_varying (TREE_TYPE (name));
-	  }
-	return;
-      }
-  gcc_unreachable ();
+	  m_ranger->range_on_edge (tmp, e_in, arg);
+	  r.intersect (tmp);
+	  return;
+	}
+      r.set_varying (TREE_TYPE (name));
+    }
 }
 
 // If NAME is defined in BB, set R to the range of NAME, and return
diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
index 4cd4d21c712..a544e97b2af 100644
--- a/gcc/tree-ssa-threadbackward.cc
+++ b/gcc/tree-ssa-threadbackward.cc
@@ -482,32 +482,76 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
     {
       // For further greedy searching we want to remove interesting
       // names defined in BB but add ones on the PHI edges for the
-      // respective edges.  We do this by starting with all names
+      // respective edges and adding imports from those stmts.
+      // We do this by starting with all names
       // not defined in BB as interesting, collecting a list of
       // interesting PHIs in BB on the fly.  Then we iterate over
       // predecessor edges, adding interesting PHI edge defs to
       // the set of interesting names to consider when processing it.
       auto_bitmap new_interesting;
+      auto_vec<int, 16> new_imports;
       auto_vec<gphi *, 4> interesting_phis;
       bitmap_iterator bi;
       unsigned i;
+      auto_vec<tree, 16> worklist;
       EXECUTE_IF_SET_IN_BITMAP (interesting, 0, i, bi)
 	{
 	  tree name = ssa_name (i);
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	  /* Imports remain interesting.  */
 	  if (gimple_bb (def_stmt) != bb)
-	    bitmap_set_bit (new_interesting, i);
-	  else if (gphi *phi = dyn_cast<gphi *> (def_stmt))
 	    {
-	      tree res = gimple_phi_result (phi);
-	      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
-		interesting_phis.safe_push (phi);
+	      bitmap_set_bit (new_interesting, i);
+	      continue;
+	    }
+	  worklist.quick_push (name);
+	  while (!worklist.is_empty ())
+	    {
+	      tree name = worklist.pop ();
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (name);
+	      /* Newly discovered imports are interesting.  */
+	      if (gimple_bb (def_stmt) != bb)
+		{
+		  bitmap_set_bit (new_interesting, SSA_NAME_VERSION (name));
+		  continue;
+		}
+	      /* Local PHIs participate in renaming below.  */
+	      if (gphi *phi = dyn_cast<gphi *> (def_stmt))
+		{
+		  tree res = gimple_phi_result (phi);
+		  if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (res))
+		    interesting_phis.safe_push (phi);
+		}
+	      /* For other local defs process their uses, amending
+		 imports on the way.  */
+	      else if (is_gimple_assign (def_stmt))
+		{
+		  for (unsigned j = 1; j < gimple_num_ops (def_stmt); ++j)
+		    {
+		      tree rhs = gimple_op (def_stmt, j);
+		      if (TREE_CODE (rhs) == SSA_NAME
+			  && bitmap_set_bit (m_imports,
+					     SSA_NAME_VERSION (rhs)))
+			{
+			  /* ???  We probably want to track a 'visited'
+			     flag separately and add to imports only
+			     when the def is handled by ranger.  The
+			     current way adds us defs that are neither
+			     PHIs nor "interesting" assigns, like for
+			     example loads.  */
+			  new_imports.safe_push (SSA_NAME_VERSION (rhs));
+			  worklist.safe_push (rhs);
+			}
+		    }
+		}
 	    }
 	}
+      worklist.release ();
       if (!bitmap_empty_p (new_interesting)
 	  || !interesting_phis.is_empty ())
 	{
-	  auto_vec<tree, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> unwind (interesting_phis.length ());
+	  auto_vec<int, 4> imports_unwind (interesting_phis.length ());
 	  edge_iterator iter;
 	  edge e;
 	  FOR_EACH_EDGE (e, iter, bb->preds)
@@ -525,22 +569,31 @@ back_threader::find_paths_to_names (basic_block bb, bitmap interesting,
 		{
 		  tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
 		  if (TREE_CODE (def) == SSA_NAME)
-		    if (bitmap_set_bit (new_interesting,
-					SSA_NAME_VERSION (def)))
-		      {
-			bitmap_set_bit (m_imports, SSA_NAME_VERSION (def));
-			unwind.quick_push (def);
-		      }
+		    {
+		      int ver = SSA_NAME_VERSION (def);
+		      if (bitmap_set_bit (new_interesting, ver))
+			{
+			  if (bitmap_set_bit (m_imports, ver))
+			    imports_unwind.quick_push (ver);
+			  unwind.quick_push (ver);
+			}
+		    }
 		}
 	      find_paths_to_names (e->src, new_interesting, overall_paths);
-	      // Restore new_interesting.  We leave m_imports alone since
-	      // we do not prune defs in BB from it and separately keeping
-	      // track of which bits to unwind isn't worth the trouble.
-	      for (tree def : unwind)
-		bitmap_clear_bit (new_interesting, SSA_NAME_VERSION (def));
+	      // Restore new_interesting.
+	      for (int def : unwind)
+		bitmap_clear_bit (new_interesting, def);
 	      unwind.truncate (0);
+	      // Restore and m_imports.
+	      for (int def : imports_unwind)
+		bitmap_clear_bit (m_imports, def);
+	      imports_unwind.truncate (0);
 	    }
 	}
+      /* m_imports tracks all interesting names on the path, so when
+	 backtracking we have to restore it.  */
+      for (int j : new_imports)
+	bitmap_clear_bit (m_imports, j);
     }
   else if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "  FAIL: Search space limit %d reached.\n",
@@ -566,15 +619,24 @@ back_threader::find_paths (basic_block bb, tree name)
 	  && gimple_code (stmt) != GIMPLE_SWITCH))
     return;
 
-  if (EDGE_COUNT (bb->succs) > 1
-      || single_succ_to_potentially_threadable_block (bb))
+  if (EDGE_COUNT (bb->succs) > 1)
     {
       m_last_stmt = stmt;
       m_visited_bbs.empty ();
       m_path.truncate (0);
       m_name = name;
-      m_solver->compute_imports (m_imports, bb);
 
+      // We compute imports of the path during discovery starting
+      // just with names used in the conditional.
+      bitmap_clear (m_imports);
+      ssa_op_iter iter;
+      FOR_EACH_SSA_TREE_OPERAND (name, stmt, iter, SSA_OP_USE)
+	bitmap_set_bit (m_imports, SSA_NAME_VERSION (name));
+
+      // Interesting is the set of imports we still not have see
+      // the definition of.  So while imports only grow, the
+      // set of interesting defs dwindles and once empty we can
+      // stop searching.
       auto_bitmap interesting;
       bitmap_copy (interesting, m_imports);
       find_paths_to_names (bb, interesting, 1);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-08-15  9:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-09 13:01 [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading Richard Biener
2022-08-09 13:01 Richard Biener
2022-08-09 13:01 Richard Biener
     [not found] <37255.122080909012202281@us-mta-359.us.mimecast.lan>
2022-08-09 18:52 ` Andrew MacLeod
2022-08-10  6:45   ` Richard Biener
2022-08-10 10:46     ` Richard Biener
2022-08-10 16:08       ` Aldy Hernandez
2022-08-11 17:46       ` Andrew MacLeod
2022-08-15  9:59         ` Richard Biener
2022-08-10 15:42     ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).