public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
@ 2015-10-07  2:26 Jeff Law
  2015-10-07  8:26 ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-10-07  2:26 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]


As touched on in the BZ, we record jump threads as a list of edges to 
traverse.  A jump thread may be recorded through a block which hasn't 
been optimized by DOM yet.

If DOM is able to optimize a control flow statement in such a block, 
then it will remove one or more outgoing edges from the block.

Removal of an edge triggers releasing the edge back to the GC system, so 
naturally bad things happen if the threader then looks at the content of 
those edges.

After some instrumentation I found this was happening for both jump 
threads with joiner blocks as well as FSM jump threads.  The former are 
actually more common than the latter.

Given the sequencing of the recording of the jump thread, DOM optimizing 
the COND_EXPR, subsequent releasing the edge structure and finally 
examination of the jump thread to modify the CFG I saw only one good 
solution and one bad solution.

The bad solution involved removing jump thread paths at the time in 
which DOM optimizes the COND_EXPR.  The problem is we have to walk the 
entire path of every recorded jump thread each time DOM performs this 
optimization.  Obviously not good.

So instead we record the affected edge pointers into a hash table and 
use those later to prune the jump thread paths.  It's a single walk over 
each recorded jump threading path.  Since we're not looking at the 
contents of the edge, this works reasonably well.

One more reason to push harder for jump threading to occur independently 
of DOM using Sebastian's backward walking FSM bits.

Anyway, bootstrapped and regression tested on x86_64-linux-gnu.  Also 
bootstrapped and testcase tested on x86_64-linux-gnu with just release 
checking enabled.

Installed on the trunk.

Jeff

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 6430 bytes --]

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 732b3d1..db6f1b6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@
+2015-10-06  Jeff Law  <law@redhat.com>
+
+	PR tree-optimization/67816
+	* tree-ssa-threadupdate.h (remove_jump_threads_including): Renamed
+	from remove_jump_threads_starting_at.  Accept an edge rather than
+	a basic block.
+	* tree-ssa-threadupdate.c (removed_edges): New hash table.
+	(remove_jump_threads_including): Note edges that get removed from
+	the CFG for later pruning of jump threading paths including them.
+	(thread_through_all_blocks): Remove paths which include edges that
+	have been removed.
+	* tree-ssa-dom.c (optimize_stmt): Call remove_jump_threads_including
+	on each outgoing edges when optimizing away a control statement.
+
 2015-10-06  Trevor Saunders  <tbsaunde+gcc@tbsaunde.org>
 
 	* reorg.c (emit_delay_sequence): Store list of delay slot insns
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4ec4743..1882fbd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2015-10-06  Jeff Law  <law@redhat.com>
+
+	* gcc.c-torture/compile/pr67816.c: New test.
+
 2015-10-07  Kugan Vivekanandarajah  <kuganv@linaro.org>
 
 	* gcc.target/aarch64/get_lane_f16_1.c: New test.
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr67816.c b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
new file mode 100644
index 0000000..c3db3a3
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
@@ -0,0 +1,19 @@
+int a, c, d, e;
+int b[10];
+void fn1() {
+  int i, f = 0;
+  for (;;) {
+    i = 0;
+    for (; i < a; i++)
+      if (c) {
+        if (b[i])
+          f = 1;
+      } else if (b[i])
+        f = 0;
+    if (f)
+      d++;
+    while (e)
+      ;
+  }
+}
+
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index c226da5..941087d 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1840,8 +1840,13 @@ optimize_stmt (basic_block bb, gimple_stmt_iterator si,
 	  edge taken_edge = find_taken_edge (bb, val);
 	  if (taken_edge)
 	    {
-	      /* Delete threads that start at BB.  */
-	      remove_jump_threads_starting_at (bb);
+
+	      /* We need to remove any queued jump threads that
+		 reference outgoing edges from this block.  */
+	      edge_iterator ei;
+	      edge e;
+	      FOR_EACH_EDGE (e, ei, bb->succs)
+		remove_jump_threads_including (e);
 
 	      /* If BB is in a loop, then removing an outgoing edge from BB
 		 may cause BB to move outside the loop, changes in the
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 4a147bb..26b199b 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -215,6 +215,18 @@ redirection_data::equal (const redirection_data *p1, const redirection_data *p2)
   return true;
 }
 
+/* Rather than search all the edges in jump thread paths each time
+   DOM is able to simply if control statement, we build a hash table
+   with the deleted edges.  We only care about the address of the edge,
+   not its contents.  */
+struct removed_edges : nofree_ptr_hash<edge_def>
+{
+  static hashval_t hash (edge e) { return htab_hash_pointer (e); }
+  static bool equal (edge e1, edge e2) { return e1 == e2; }
+};
+
+static hash_table<removed_edges> *removed_edges;
+
 /* Data structure of information to pass to hash table traversal routines.  */
 struct ssa_local_info_t
 {
@@ -2539,35 +2551,23 @@ valid_jump_thread_path (vec<jump_thread_edge *> *path)
   return true;
 }
 
-/* Remove any queued jump threads that start at BB.  */
+/* Remove any queued jump threads that include edge E.
+
+   We don't actually remove them here, just record the edges into ax
+   hash table.  That way we can do the search once per iteration of
+   DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR.  */
 
 void
-remove_jump_threads_starting_at (basic_block bb)
+remove_jump_threads_including (edge_def *e)
 {
   if (!paths.exists ())
     return;
 
-  for (unsigned i = 0; i < paths.length ();)
-    {
-      vec<jump_thread_edge *> *path = paths[i];
+  if (!removed_edges)
+    removed_edges = new hash_table<struct removed_edges> (17);
 
-      /* Sadly, FSM jump threads have a slightly different
-	 representation than the rest of the jump threads.  */
-      if ((*path)[0]->type == EDGE_FSM_THREAD
-	  && (*path)[0]->e->src == bb)
-	{
-	  delete_jump_thread_path (path);
-	  paths.unordered_remove (i);
-	}
-      else if ((*path)[0]->type != EDGE_FSM_THREAD
-	       && (*path)[0]->e->dest == bb)
-	{
-	  delete_jump_thread_path (path);
-	  paths.unordered_remove (i);
-	}
-      else
-	i++;
-    }
+  edge *slot = removed_edges->find_slot (e, INSERT);
+  *slot = e;
 }
 
 /* Walk through all blocks and thread incoming edges to the appropriate
@@ -2591,11 +2591,37 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   struct loop *loop;
 
   if (!paths.exists ())
-    return false;
+    {
+      retval = false;
+      goto out;
+    }
 
   threaded_blocks = BITMAP_ALLOC (NULL);
   memset (&thread_stats, 0, sizeof (thread_stats));
 
+  /* Remove any paths that referenced removed edges.  */
+  if (removed_edges)
+    for (i = 0; i < paths.length (); )
+      {
+	unsigned int j;
+	vec<jump_thread_edge *> *path = paths[i];
+
+	for (j = 0; j < path->length (); j++)
+	  {
+	    edge e = (*path)[j]->e;
+	    if (removed_edges->find_slot (e, NO_INSERT))
+	      break;
+	  }
+
+	if (j != path->length ())
+	  {
+	    delete_jump_thread_path (path);
+	    paths.unordered_remove (i);
+	    continue;
+	  }
+	i++;
+      }
+
   /* Jump-thread all FSM threads before other jump-threads.  */
   for (i = 0; i < paths.length ();)
     {
@@ -2778,6 +2804,9 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   if (retval)
     loops_state_set (LOOPS_NEED_FIXUP);
 
+ out:
+  delete removed_edges;
+  removed_edges = NULL;
   return retval;
 }
 
diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
index 30428e8..984b6c4 100644
--- a/gcc/tree-ssa-threadupdate.h
+++ b/gcc/tree-ssa-threadupdate.h
@@ -43,7 +43,7 @@ public:
 };
 
 extern void register_jump_thread (vec <class jump_thread_edge *> *);
-extern void remove_jump_threads_starting_at (basic_block);
+extern void remove_jump_threads_including (edge);
 extern void delete_jump_thread_path (vec <class jump_thread_edge *> *);
 extern void remove_ctrl_stmt_and_useless_edges (basic_block, basic_block);
 #endif

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-10-07  2:26 [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path Jeff Law
@ 2015-10-07  8:26 ` Richard Biener
  2015-10-07 22:00   ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-10-07  8:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Wed, Oct 7, 2015 at 4:26 AM, Jeff Law <law@redhat.com> wrote:
>
> As touched on in the BZ, we record jump threads as a list of edges to
> traverse.  A jump thread may be recorded through a block which hasn't been
> optimized by DOM yet.
>
> If DOM is able to optimize a control flow statement in such a block, then it
> will remove one or more outgoing edges from the block.
>
> Removal of an edge triggers releasing the edge back to the GC system, so
> naturally bad things happen if the threader then looks at the content of
> those edges.
>
> After some instrumentation I found this was happening for both jump threads
> with joiner blocks as well as FSM jump threads.  The former are actually
> more common than the latter.
>
> Given the sequencing of the recording of the jump thread, DOM optimizing the
> COND_EXPR, subsequent releasing the edge structure and finally examination
> of the jump thread to modify the CFG I saw only one good solution and one
> bad solution.
>
> The bad solution involved removing jump thread paths at the time in which
> DOM optimizes the COND_EXPR.  The problem is we have to walk the entire path
> of every recorded jump thread each time DOM performs this optimization.
> Obviously not good.
>
> So instead we record the affected edge pointers into a hash table and use
> those later to prune the jump thread paths.  It's a single walk over each
> recorded jump threading path.  Since we're not looking at the contents of
> the edge, this works reasonably well.
>
> One more reason to push harder for jump threading to occur independently of
> DOM using Sebastian's backward walking FSM bits.
>
> Anyway, bootstrapped and regression tested on x86_64-linux-gnu.  Also
> bootstrapped and testcase tested on x86_64-linux-gnu with just release
> checking enabled.
>
> Installed on the trunk.

Hmm, other passes avoid all this by not removing edges or blocks themselves
but leaving that to cfgcleanup.  They simply replace the condition in
GIMPLE_CONDs
or the switch value in GIMPLE_SWITCHes and let cleanup_control_expr_graph
do the hard part of the job.

Now - I don't know how that would interact with jump threads covering
those paths,
that is, what kind of "mess" jump threading would produce with the
conditions/switches
mangled that way.

The other nice thing is that CFG cleanup properly deals with loop
structure changes.

Richard.

> Jeff
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 732b3d1..db6f1b6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,17 @@
> +2015-10-06  Jeff Law  <law@redhat.com>
> +
> +       PR tree-optimization/67816
> +       * tree-ssa-threadupdate.h (remove_jump_threads_including): Renamed
> +       from remove_jump_threads_starting_at.  Accept an edge rather than
> +       a basic block.
> +       * tree-ssa-threadupdate.c (removed_edges): New hash table.
> +       (remove_jump_threads_including): Note edges that get removed from
> +       the CFG for later pruning of jump threading paths including them.
> +       (thread_through_all_blocks): Remove paths which include edges that
> +       have been removed.
> +       * tree-ssa-dom.c (optimize_stmt): Call remove_jump_threads_including
> +       on each outgoing edges when optimizing away a control statement.
> +
>  2015-10-06  Trevor Saunders  <tbsaunde+gcc@tbsaunde.org>
>
>         * reorg.c (emit_delay_sequence): Store list of delay slot insns
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 4ec4743..1882fbd 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,7 @@
> +2015-10-06  Jeff Law  <law@redhat.com>
> +
> +       * gcc.c-torture/compile/pr67816.c: New test.
> +
>  2015-10-07  Kugan Vivekanandarajah  <kuganv@linaro.org>
>
>         * gcc.target/aarch64/get_lane_f16_1.c: New test.
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr67816.c
> b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
> new file mode 100644
> index 0000000..c3db3a3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr67816.c
> @@ -0,0 +1,19 @@
> +int a, c, d, e;
> +int b[10];
> +void fn1() {
> +  int i, f = 0;
> +  for (;;) {
> +    i = 0;
> +    for (; i < a; i++)
> +      if (c) {
> +        if (b[i])
> +          f = 1;
> +      } else if (b[i])
> +        f = 0;
> +    if (f)
> +      d++;
> +    while (e)
> +      ;
> +  }
> +}
> +
> diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
> index c226da5..941087d 100644
> --- a/gcc/tree-ssa-dom.c
> +++ b/gcc/tree-ssa-dom.c
> @@ -1840,8 +1840,13 @@ optimize_stmt (basic_block bb, gimple_stmt_iterator
> si,
>           edge taken_edge = find_taken_edge (bb, val);
>           if (taken_edge)
>             {
> -             /* Delete threads that start at BB.  */
> -             remove_jump_threads_starting_at (bb);
> +
> +             /* We need to remove any queued jump threads that
> +                reference outgoing edges from this block.  */
> +             edge_iterator ei;
> +             edge e;
> +             FOR_EACH_EDGE (e, ei, bb->succs)
> +               remove_jump_threads_including (e);
>
>               /* If BB is in a loop, then removing an outgoing edge from BB
>                  may cause BB to move outside the loop, changes in the
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 4a147bb..26b199b 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -215,6 +215,18 @@ redirection_data::equal (const redirection_data *p1,
> const redirection_data *p2)
>    return true;
>  }
>
> +/* Rather than search all the edges in jump thread paths each time
> +   DOM is able to simply if control statement, we build a hash table
> +   with the deleted edges.  We only care about the address of the edge,
> +   not its contents.  */
> +struct removed_edges : nofree_ptr_hash<edge_def>
> +{
> +  static hashval_t hash (edge e) { return htab_hash_pointer (e); }
> +  static bool equal (edge e1, edge e2) { return e1 == e2; }
> +};
> +
> +static hash_table<removed_edges> *removed_edges;
> +
>  /* Data structure of information to pass to hash table traversal routines.
> */
>  struct ssa_local_info_t
>  {
> @@ -2539,35 +2551,23 @@ valid_jump_thread_path (vec<jump_thread_edge *>
> *path)
>    return true;
>  }
>
> -/* Remove any queued jump threads that start at BB.  */
> +/* Remove any queued jump threads that include edge E.
> +
> +   We don't actually remove them here, just record the edges into ax
> +   hash table.  That way we can do the search once per iteration of
> +   DOM/VRP rather than for every case where DOM optimizes away a COND_EXPR.
> */
>
>  void
> -remove_jump_threads_starting_at (basic_block bb)
> +remove_jump_threads_including (edge_def *e)
>  {
>    if (!paths.exists ())
>      return;
>
> -  for (unsigned i = 0; i < paths.length ();)
> -    {
> -      vec<jump_thread_edge *> *path = paths[i];
> +  if (!removed_edges)
> +    removed_edges = new hash_table<struct removed_edges> (17);
>
> -      /* Sadly, FSM jump threads have a slightly different
> -        representation than the rest of the jump threads.  */
> -      if ((*path)[0]->type == EDGE_FSM_THREAD
> -         && (*path)[0]->e->src == bb)
> -       {
> -         delete_jump_thread_path (path);
> -         paths.unordered_remove (i);
> -       }
> -      else if ((*path)[0]->type != EDGE_FSM_THREAD
> -              && (*path)[0]->e->dest == bb)
> -       {
> -         delete_jump_thread_path (path);
> -         paths.unordered_remove (i);
> -       }
> -      else
> -       i++;
> -    }
> +  edge *slot = removed_edges->find_slot (e, INSERT);
> +  *slot = e;
>  }
>
>  /* Walk through all blocks and thread incoming edges to the appropriate
> @@ -2591,11 +2591,37 @@ thread_through_all_blocks (bool
> may_peel_loop_headers)
>    struct loop *loop;
>
>    if (!paths.exists ())
> -    return false;
> +    {
> +      retval = false;
> +      goto out;
> +    }
>
>    threaded_blocks = BITMAP_ALLOC (NULL);
>    memset (&thread_stats, 0, sizeof (thread_stats));
>
> +  /* Remove any paths that referenced removed edges.  */
> +  if (removed_edges)
> +    for (i = 0; i < paths.length (); )
> +      {
> +       unsigned int j;
> +       vec<jump_thread_edge *> *path = paths[i];
> +
> +       for (j = 0; j < path->length (); j++)
> +         {
> +           edge e = (*path)[j]->e;
> +           if (removed_edges->find_slot (e, NO_INSERT))
> +             break;
> +         }
> +
> +       if (j != path->length ())
> +         {
> +           delete_jump_thread_path (path);
> +           paths.unordered_remove (i);
> +           continue;
> +         }
> +       i++;
> +      }
> +
>    /* Jump-thread all FSM threads before other jump-threads.  */
>    for (i = 0; i < paths.length ();)
>      {
> @@ -2778,6 +2804,9 @@ thread_through_all_blocks (bool may_peel_loop_headers)
>    if (retval)
>      loops_state_set (LOOPS_NEED_FIXUP);
>
> + out:
> +  delete removed_edges;
> +  removed_edges = NULL;
>    return retval;
>  }
>
> diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h
> index 30428e8..984b6c4 100644
> --- a/gcc/tree-ssa-threadupdate.h
> +++ b/gcc/tree-ssa-threadupdate.h
> @@ -43,7 +43,7 @@ public:
>  };
>
>  extern void register_jump_thread (vec <class jump_thread_edge *> *);
> -extern void remove_jump_threads_starting_at (basic_block);
> +extern void remove_jump_threads_including (edge);
>  extern void delete_jump_thread_path (vec <class jump_thread_edge *> *);
>  extern void remove_ctrl_stmt_and_useless_edges (basic_block, basic_block);
>  #endif
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-10-07  8:26 ` Richard Biener
@ 2015-10-07 22:00   ` Jeff Law
  2015-10-08  9:56     ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-10-07 22:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 10/07/2015 02:26 AM, Richard Biener wrote:

>
> Hmm, other passes avoid all this by not removing edges or blocks
> themselves but leaving that to cfgcleanup.  They simply replace the
> condition in GIMPLE_CONDs or the switch value in GIMPLE_SWITCHes and
> let cleanup_control_expr_graph do the hard part of the job.
>
> Now - I don't know how that would interact with jump threads
> covering those paths, that is, what kind of "mess" jump threading
> would produce with the conditions/switches mangled that way.
>
> The other nice thing is that CFG cleanup properly deals with loop
> structure changes.
Threading will handle it correctly as that's essentially how we handled 
it until last week ;-)

The "problem" is the threader thread through those blocks.  And while 
that works correctly, it is quite wasteful in terms of compile time and 
memory consumption due to the block duplication required to preserve 
side effects.  It also churns the blocks & ssa-names used in the 
isolated regions which makes comparing debugging dumps before/after 
threading much harder than it ought to be.

Jeff


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-10-07 22:00   ` Jeff Law
@ 2015-10-08  9:56     ` Richard Biener
  2015-10-09 15:45       ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-10-08  9:56 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Thu, Oct 8, 2015 at 12:00 AM, Jeff Law <law@redhat.com> wrote:
> On 10/07/2015 02:26 AM, Richard Biener wrote:
>
>>
>> Hmm, other passes avoid all this by not removing edges or blocks
>> themselves but leaving that to cfgcleanup.  They simply replace the
>> condition in GIMPLE_CONDs or the switch value in GIMPLE_SWITCHes and
>> let cleanup_control_expr_graph do the hard part of the job.
>>
>> Now - I don't know how that would interact with jump threads
>> covering those paths, that is, what kind of "mess" jump threading
>> would produce with the conditions/switches mangled that way.
>>
>> The other nice thing is that CFG cleanup properly deals with loop
>> structure changes.
>
> Threading will handle it correctly as that's essentially how we handled it
> until last week ;-)
>
> The "problem" is the threader thread through those blocks.  And while that
> works correctly, it is quite wasteful in terms of compile time and memory
> consumption due to the block duplication required to preserve side effects.
> It also churns the blocks & ssa-names used in the isolated regions which
> makes comparing debugging dumps before/after threading much harder than it
> ought to be.

Yes, but as you remove jump threading paths you could leave the CFG change to
cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?

> Jeff
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-10-08  9:56     ` Richard Biener
@ 2015-10-09 15:45       ` Jeff Law
  2015-12-01 21:32         ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-10-09 15:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 10/08/2015 03:56 AM, Richard Biener wrote:
> On Thu, Oct 8, 2015 at 12:00 AM, Jeff Law <law@redhat.com> wrote:
>> On 10/07/2015 02:26 AM, Richard Biener wrote:
>>
>>>
>>> Hmm, other passes avoid all this by not removing edges or blocks
>>> themselves but leaving that to cfgcleanup.  They simply replace the
>>> condition in GIMPLE_CONDs or the switch value in GIMPLE_SWITCHes and
>>> let cleanup_control_expr_graph do the hard part of the job.
>>>
>>> Now - I don't know how that would interact with jump threads
>>> covering those paths, that is, what kind of "mess" jump threading
>>> would produce with the conditions/switches mangled that way.
>>>
>>> The other nice thing is that CFG cleanup properly deals with loop
>>> structure changes.
>>
>> Threading will handle it correctly as that's essentially how we handled it
>> until last week ;-)
>>
>> The "problem" is the threader thread through those blocks.  And while that
>> works correctly, it is quite wasteful in terms of compile time and memory
>> consumption due to the block duplication required to preserve side effects.
>> It also churns the blocks & ssa-names used in the isolated regions which
>> makes comparing debugging dumps before/after threading much harder than it
>> ought to be.
>
> Yes, but as you remove jump threading paths you could leave the CFG change to
> cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?
So go ahead and detect, remove the threading paths, but leave final 
fixup to cfg-cleanup.  I can certainly try that.

It'd actually be a good thing to experiement with regardless -- I've 
speculated that removing the edges in DOM allows DOM to do a better job, 
but never did the instrumentation to find out for sure.  Deferring the 
final cleanup like you've suggested ought to give me most of what I'd 
want to see if there's really any good secondary effects of cleaning up 
the edges in DOM.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-10-09 15:45       ` Jeff Law
@ 2015-12-01 21:32         ` Jeff Law
  2015-12-02  9:54           ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-12-01 21:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 10/09/2015 09:45 AM, Jeff Law wrote:
>> Yes, but as you remove jump threading paths you could leave the CFG
>> change to
>> cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?
> So go ahead and detect, remove the threading paths, but leave final
> fixup to cfg-cleanup.  I can certainly try that.
>
> It'd actually be a good thing to experiement with regardless -- I've
> speculated that removing the edges in DOM allows DOM to do a better job,
> but never did the instrumentation to find out for sure.  Deferring the
> final cleanup like you've suggested ought to give me most of what I'd
> want to see if there's really any good secondary effects of cleaning up
> the edges in DOM.
So I started looking at this in response to 68619, where this approach 
does indeed solve the problem.

Essentially DOM's optimization of those edges results in two irredicuble 
loops becoming reducible.  The loop analysis code then complains because 
we don't have proper loop structures for the new natural loops.

Deferring to cfg_cleanup works because if cfg_cleanup does anything, it 
sets LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it 
seems that the gyrations we often do to avoid LOOPS_NEED_FIXUP are 
probably not all that valuable in the end.  Anyway...


There's some fallout which I'm still exploring.  For example, we have 
cases where removal of the edge by DOM results in removal of a PHI 
argument in the target, which in turn results in the PHI becoming a 
degenerate which we can then propagate away.  I have a possible solution 
for this that I'm playing with.

I suspect the right path is to continue down this path.

Jeff


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-01 21:32         ` Jeff Law
@ 2015-12-02  9:54           ` Richard Biener
  2015-12-02 15:31             ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-12-02  9:54 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Tue, Dec 1, 2015 at 10:32 PM, Jeff Law <law@redhat.com> wrote:
> On 10/09/2015 09:45 AM, Jeff Law wrote:
>>>
>>> Yes, but as you remove jump threading paths you could leave the CFG
>>> change to
>>> cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?
>>
>> So go ahead and detect, remove the threading paths, but leave final
>> fixup to cfg-cleanup.  I can certainly try that.
>>
>> It'd actually be a good thing to experiement with regardless -- I've
>> speculated that removing the edges in DOM allows DOM to do a better job,
>> but never did the instrumentation to find out for sure.  Deferring the
>> final cleanup like you've suggested ought to give me most of what I'd
>> want to see if there's really any good secondary effects of cleaning up
>> the edges in DOM.
>
> So I started looking at this in response to 68619, where this approach does
> indeed solve the problem.
>
> Essentially DOM's optimization of those edges results in two irredicuble
> loops becoming reducible.  The loop analysis code then complains because we
> don't have proper loop structures for the new natural loops.
>
> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it sets
> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems that
> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
> that valuable in the end.  Anyway...

Yeah, I have partially working patches lying around to "fix" CFG cleanup to
avoid this.  Of course in the case of new loops appearing that's not easily
possible.

> There's some fallout which I'm still exploring.  For example, we have cases
> where removal of the edge by DOM results in removal of a PHI argument in the
> target, which in turn results in the PHI becoming a degenerate which we can
> then propagate away.  I have a possible solution for this that I'm playing
> with.
>
> I suspect the right path is to continue down this path.

Yeah, the issue here is that DOM isn't tracking which edges are executable
to handle merge PHIs (or to aovid doing work in unreachable regions).  It should
be possible to make it do that much like I extended SCCVN to do this
(when doing the DOM walk see if any incoming edge is marked executable
and if not, mark all outgoing edges as not executable, if the block is
executable
at the time we process the last stmt determine if we can compute the edge
that ends up always executed and mark all others as not executable)

Richard.

> Jeff
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-02  9:54           ` Richard Biener
@ 2015-12-02 15:31             ` Jeff Law
  2015-12-02 15:35               ` Richard Biener
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-12-02 15:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 12/02/2015 02:54 AM, Richard Biener wrote:
>> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it sets
>> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems that
>> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
>> that valuable in the end.  Anyway...
>
> Yeah, I have partially working patches lying around to "fix" CFG cleanup to
> avoid this.  Of course in the case of new loops appearing that's not easily
> possible.
And that may argue that it's largely inevitable if we collapse a 
conditional (and thus delete an edge).


>
>> There's some fallout which I'm still exploring.  For example, we have cases
>> where removal of the edge by DOM results in removal of a PHI argument in the
>> target, which in turn results in the PHI becoming a degenerate which we can
>> then propagate away.  I have a possible solution for this that I'm playing
>> with.
>>
>> I suspect the right path is to continue down this path.
>
> Yeah, the issue here is that DOM isn't tracking which edges are executable
> to handle merge PHIs (or to aovid doing work in unreachable regions).
Right.


It should
> be possible to make it do that much like I extended SCCVN to do this
> (when doing the DOM walk see if any incoming edge is marked executable
> and if not, mark all outgoing edges as not executable, if the block is
> executable
> at the time we process the last stmt determine if we can compute the edge
> that ends up always executed and mark all others as not executable)
Essentially yes. I'm using the not-executable flag and bypassing things 
when it's discovered.

The most interesting side effect, and one I haven't fully analyzed yet 
is an unexpected jump thread -- which I've traced back to differences in 
what the alias oracle is able to find when we walk unaliased vuses. 
Which makes totally no sense that it's unable to find the unaliased vuse 
in the simplified CFG, but finds it when we don't remove the 
unexecutable edge.  As I said, it makes no sense to me yet and I'm still 
digging.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-02 15:31             ` Jeff Law
@ 2015-12-02 15:35               ` Richard Biener
  2015-12-02 22:57                 ` Jeff Law
  2015-12-03 20:29                 ` Jeff Law
  0 siblings, 2 replies; 14+ messages in thread
From: Richard Biener @ 2015-12-02 15:35 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Wed, Dec 2, 2015 at 4:31 PM, Jeff Law <law@redhat.com> wrote:
> On 12/02/2015 02:54 AM, Richard Biener wrote:
>>>
>>> Deferring to cfg_cleanup works because if cfg_cleanup does anything, it
>>> sets
>>> LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it seems
>>> that
>>> the gyrations we often do to avoid LOOPS_NEED_FIXUP are probably not all
>>> that valuable in the end.  Anyway...
>>
>>
>> Yeah, I have partially working patches lying around to "fix" CFG cleanup
>> to
>> avoid this.  Of course in the case of new loops appearing that's not
>> easily
>> possible.
>
> And that may argue that it's largely inevitable if we collapse a conditional
> (and thus delete an edge).
>
>
>>
>>> There's some fallout which I'm still exploring.  For example, we have
>>> cases
>>> where removal of the edge by DOM results in removal of a PHI argument in
>>> the
>>> target, which in turn results in the PHI becoming a degenerate which we
>>> can
>>> then propagate away.  I have a possible solution for this that I'm
>>> playing
>>> with.
>>>
>>> I suspect the right path is to continue down this path.
>>
>>
>> Yeah, the issue here is that DOM isn't tracking which edges are executable
>> to handle merge PHIs (or to aovid doing work in unreachable regions).
>
> Right.
>
>
> It should
>>
>> be possible to make it do that much like I extended SCCVN to do this
>> (when doing the DOM walk see if any incoming edge is marked executable
>> and if not, mark all outgoing edges as not executable, if the block is
>> executable
>> at the time we process the last stmt determine if we can compute the edge
>> that ends up always executed and mark all others as not executable)
>
> Essentially yes. I'm using the not-executable flag and bypassing things when
> it's discovered.
>
> The most interesting side effect, and one I haven't fully analyzed yet is an
> unexpected jump thread -- which I've traced back to differences in what the
> alias oracle is able to find when we walk unaliased vuses. Which makes
> totally no sense that it's unable to find the unaliased vuse in the
> simplified CFG, but finds it when we don't remove the unexecutable edge.  As
> I said, it makes no sense to me yet and I'm still digging.

The walking of PHI nodes is quite simplistic to avoid doing too much work so
an extra (not executable) edge may confuse it enough.  So this might be
"expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
trusted would be an option (also helping SCCVN).

Richard.

> jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-02 15:35               ` Richard Biener
@ 2015-12-02 22:57                 ` Jeff Law
  2015-12-03  9:57                   ` Richard Biener
  2015-12-03 20:29                 ` Jeff Law
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-12-02 22:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 12/02/2015 08:35 AM, Richard Biener wrote:

>>> be possible to make it do that much like I extended SCCVN to do this
>>> (when doing the DOM walk see if any incoming edge is marked executable
>>> and if not, mark all outgoing edges as not executable, if the block is
>>> executable
>>> at the time we process the last stmt determine if we can compute the edge
>>> that ends up always executed and mark all others as not executable)
>>
>> Essentially yes. I'm using the not-executable flag and bypassing things when
>> it's discovered.
I took at look at what you did with SCCVN and I like it better than what 
I was doing -- your handling is more complete.

There's some code that ought to be factored out.  In particular at the 
start of before_dom_children you've got code that clears 
EDGE_EXECUTABLE.   There's no reason we should duplicate in SCCVN, DOM 
and perhaps other DOM walkers in the future.  If you've got a place in 
mind where you think it ought to live (cfg-something) speak up.  Else 
I'll find a spot on my own.




>>
>> The most interesting side effect, and one I haven't fully analyzed yet is an
>> unexpected jump thread -- which I've traced back to differences in what the
>> alias oracle is able to find when we walk unaliased vuses. Which makes
>> totally no sense that it's unable to find the unaliased vuse in the
>> simplified CFG, but finds it when we don't remove the unexecutable edge.  As
>> I said, it makes no sense to me yet and I'm still digging.
>
> The walking of PHI nodes is quite simplistic to avoid doing too much work so
> an extra (not executable) edge may confuse it enough.  So this might be
> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
> trusted would be an option (also helping SCCVN).
It was actually the opposite effect.  ie, with the simplified CFG, we 
missed the jump thread, but with the more complex CFG we found the jump 
thread.  And the edge we were removing didn't (at first glance) appear 
to be of any real significance.

I'm not seeing the oddity anymore now that I've converted DOM to mimick 
what you were doing in SCCVN.  So I'm going to assume I botched 
something somewhere.

Back to analysis to see if there's other fallout.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-02 22:57                 ` Jeff Law
@ 2015-12-03  9:57                   ` Richard Biener
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Biener @ 2015-12-03  9:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Wed, Dec 2, 2015 at 11:56 PM, Jeff Law <law@redhat.com> wrote:
> On 12/02/2015 08:35 AM, Richard Biener wrote:
>
>>>> be possible to make it do that much like I extended SCCVN to do this
>>>> (when doing the DOM walk see if any incoming edge is marked executable
>>>> and if not, mark all outgoing edges as not executable, if the block is
>>>> executable
>>>> at the time we process the last stmt determine if we can compute the
>>>> edge
>>>> that ends up always executed and mark all others as not executable)
>>>
>>>
>>> Essentially yes. I'm using the not-executable flag and bypassing things
>>> when
>>> it's discovered.
>
> I took at look at what you did with SCCVN and I like it better than what I
> was doing -- your handling is more complete.
>
> There's some code that ought to be factored out.  In particular at the start
> of before_dom_children you've got code that clears EDGE_EXECUTABLE.
> There's no reason we should duplicate in SCCVN, DOM and perhaps other DOM
> walkers in the future.  If you've got a place in mind where you think it
> ought to live (cfg-something) speak up.  Else I'll find a spot on my own.

I think domwalk.[ch] itself might be a good enough spot, a static method
inside the domwalk class

  bool before_dom_children_track_and_query_bb_executable (basic_block);

using a derived class that automagically does this would still require
to explicitely call the parent function before_dom_children, so better
make it explicit.

And document it is non-optimistic thus all edges have to be marked
EDGE_EXECUTABLE before the domwalk and before_dom_children
is supposed to mark non-executable outgoing edges.

>>>
>>> The most interesting side effect, and one I haven't fully analyzed yet is
>>> an
>>> unexpected jump thread -- which I've traced back to differences in what
>>> the
>>> alias oracle is able to find when we walk unaliased vuses. Which makes
>>> totally no sense that it's unable to find the unaliased vuse in the
>>> simplified CFG, but finds it when we don't remove the unexecutable edge.
>>> As
>>> I said, it makes no sense to me yet and I'm still digging.
>>
>>
>> The walking of PHI nodes is quite simplistic to avoid doing too much work
>> so
>> an extra (not executable) edge may confuse it enough.  So this might be
>> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
>> trusted would be an option (also helping SCCVN).
>
> It was actually the opposite effect.  ie, with the simplified CFG, we missed
> the jump thread, but with the more complex CFG we found the jump thread.
> And the edge we were removing didn't (at first glance) appear to be of any
> real significance.

Oh, I see...

> I'm not seeing the oddity anymore now that I've converted DOM to mimick what
> you were doing in SCCVN.  So I'm going to assume I botched something
> somewhere.
>
> Back to analysis to see if there's other fallout.
>
> jeff
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-02 15:35               ` Richard Biener
  2015-12-02 22:57                 ` Jeff Law
@ 2015-12-03 20:29                 ` Jeff Law
  2015-12-04 10:12                   ` Richard Biener
  1 sibling, 1 reply; 14+ messages in thread
From: Jeff Law @ 2015-12-03 20:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 12/02/2015 08:35 AM, Richard Biener wrote:

>>
>> The most interesting side effect, and one I haven't fully analyzed yet is an
>> unexpected jump thread -- which I've traced back to differences in what the
>> alias oracle is able to find when we walk unaliased vuses. Which makes
>> totally no sense that it's unable to find the unaliased vuse in the
>> simplified CFG, but finds it when we don't remove the unexecutable edge.  As
>> I said, it makes no sense to me yet and I'm still digging.
>
> The walking of PHI nodes is quite simplistic to avoid doing too much work so
> an extra (not executable) edge may confuse it enough.  So this might be
> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
> trusted would be an option (also helping SCCVN).
Found it.  In the CFG with the unexectuable edges _not_ removed there is 
a PHI associated with that edge which provides a dominating unaliased 
vuse.  Once that edge is removed, the PHI arg gets removed and thus we 
can't easily see the unaliased vuse.

So all is working as expected.  It wasn't ever a big issue, I just 
wanted to make sure I thoroughly understood the somewhat 
counter-intuitive result.

Jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-03 20:29                 ` Jeff Law
@ 2015-12-04 10:12                   ` Richard Biener
  2015-12-04 17:18                     ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Biener @ 2015-12-04 10:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

On Thu, Dec 3, 2015 at 9:29 PM, Jeff Law <law@redhat.com> wrote:
> On 12/02/2015 08:35 AM, Richard Biener wrote:
>
>>>
>>> The most interesting side effect, and one I haven't fully analyzed yet is
>>> an
>>> unexpected jump thread -- which I've traced back to differences in what
>>> the
>>> alias oracle is able to find when we walk unaliased vuses. Which makes
>>> totally no sense that it's unable to find the unaliased vuse in the
>>> simplified CFG, but finds it when we don't remove the unexecutable edge.
>>> As
>>> I said, it makes no sense to me yet and I'm still digging.
>>
>>
>> The walking of PHI nodes is quite simplistic to avoid doing too much work
>> so
>> an extra (not executable) edge may confuse it enough.  So this might be
>> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
>> trusted would be an option (also helping SCCVN).
>
> Found it.  In the CFG with the unexectuable edges _not_ removed there is a
> PHI associated with that edge which provides a dominating unaliased vuse.
> Once that edge is removed, the PHI arg gets removed and thus we can't easily
> see the unaliased vuse.
>
> So all is working as expected.  It wasn't ever a big issue, I just wanted to
> make sure I thoroughly understood the somewhat counter-intuitive result.

Good.  Btw, I remembered that with the unreachable tracking in SCCVN I
ran into cases where it didn't correctly track all unreachable blocks due
to the way the dominator walk works (which doesn't ensure we've visited
all predecessors).  Like for

  <unreachable>
   |                 |
   |               <bb3>
   |                 / \
 <bb2>          /   \                /
    \       <bb4>      <reachable>
     \      /
    <bb5>

DOM order visits <unreachable>, <bb2>, <bb3>, <bb5> and then
the DOM children like <bb4>.  So we fail to detect bb5 as unreachable
(didn't visit bb4 to mark outgoing edges unreachable yet).

The fix is easy (in testing right now).  Simply track if the current block
was unreachable when visiting DOM children.

Didn't manage to produce a missed-optimization testcase (but only
tried for a few minutes), the cases I've seen it were involving
unreachable loops, but I don't have them anymore.

Sorry if that makes extracting the machinery harder ;)

Richard.

2015-12-04  Richard Biener  <rguenther@suse.de>

        * tree-ssa-sccvn.c (sccvn_dom_walker): Add unreachable_dom
        member and initialize it.
        (sccvn_dom_walker::after_dom_children): Reset unreachable_dom
        if necessary.
        (sccvn_dom_walker::before_dom_children): If unreachable_dom
        is set BB is not reachable either.  Set unreachable_dom
        if not set and BB is unreachable.



> Jeff

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 2490 bytes --]

2015-12-04  Richard Biener  <rguenther@suse.de>

	* tree-ssa-sccvn.c (sccvn_dom_walker): Add unreachable_dom
	member and initialize it.
	(sccvn_dom_walker::after_dom_children): Reset unreachable_dom
	if necessary.
	(sccvn_dom_walker::before_dom_children): If unreachable_dom
	is set BB is not reachable either.  Set unreachable_dom
	if not set and BB is unreachable.

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	(revision 231244)
+++ gcc/tree-ssa-sccvn.c	(working copy)
@@ -4207,7 +4207,8 @@ class sccvn_dom_walker : public dom_walk
 {
 public:
   sccvn_dom_walker ()
-    : dom_walker (CDI_DOMINATORS), fail (false), cond_stack (vNULL) {}
+    : dom_walker (CDI_DOMINATORS), fail (false), unreachable_dom (NULL),
+      cond_stack (vNULL) {}
   ~sccvn_dom_walker ();
 
   virtual void before_dom_children (basic_block);
@@ -4219,6 +4220,7 @@ public:
 		     enum tree_code code, tree lhs, tree rhs, bool value);
 
   bool fail;
+  basic_block unreachable_dom;
   vec<std::pair <basic_block, std::pair <vn_nary_op_t, vn_nary_op_t> > >
     cond_stack;
 };
@@ -4299,6 +4301,9 @@ sccvn_dom_walker::record_conds (basic_bl
 void
 sccvn_dom_walker::after_dom_children (basic_block bb)
 {
+  if (unreachable_dom == bb)
+    unreachable_dom = NULL;
+
   while (!cond_stack.is_empty ()
 	 && cond_stack.last ().first == bb)
     {
@@ -4325,10 +4330,14 @@ sccvn_dom_walker::before_dom_children (b
   /* If any of the predecessor edges that do not come from blocks dominated
      by us are still marked as possibly executable consider this block
      reachable.  */
-  bool reachable = bb == ENTRY_BLOCK_PTR_FOR_FN (cfun);
-  FOR_EACH_EDGE (e, ei, bb->preds)
-    if (!dominated_by_p (CDI_DOMINATORS, e->src, bb))
-      reachable |= (e->flags & EDGE_EXECUTABLE);
+  bool reachable = false;
+  if (!unreachable_dom)
+    {
+      reachable = bb == ENTRY_BLOCK_PTR_FOR_FN (cfun);
+      FOR_EACH_EDGE (e, ei, bb->preds)
+	if (!dominated_by_p (CDI_DOMINATORS, e->src, bb))
+	  reachable |= (e->flags & EDGE_EXECUTABLE);
+    }
 
   /* If the block is not reachable all outgoing edges are not
      executable.  Neither are incoming edges with src dominated by us.  */
@@ -4352,6 +4361,11 @@ sccvn_dom_walker::before_dom_children (b
 	      e->flags &= ~EDGE_EXECUTABLE;
 	    }
 	}
+
+      /* Record the most dominating unreachable block.  */
+      if (!unreachable_dom)
+	unreachable_dom = bb;
+
       return;
     }
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path
  2015-12-04 10:12                   ` Richard Biener
@ 2015-12-04 17:18                     ` Jeff Law
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Law @ 2015-12-04 17:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 12/04/2015 03:12 AM, Richard Biener wrote:
> On Thu, Dec 3, 2015 at 9:29 PM, Jeff Law <law@redhat.com> wrote:
>> On 12/02/2015 08:35 AM, Richard Biener wrote:
>>
>>>>
>>>> The most interesting side effect, and one I haven't fully analyzed yet is
>>>> an
>>>> unexpected jump thread -- which I've traced back to differences in what
>>>> the
>>>> alias oracle is able to find when we walk unaliased vuses. Which makes
>>>> totally no sense that it's unable to find the unaliased vuse in the
>>>> simplified CFG, but finds it when we don't remove the unexecutable edge.
>>>> As
>>>> I said, it makes no sense to me yet and I'm still digging.
>>>
>>>
>>> The walking of PHI nodes is quite simplistic to avoid doing too much work
>>> so
>>> an extra (not executable) edge may confuse it enough.  So this might be
>>> "expected".  Adding a flag on whether EDGE_EXECUTABLE is to be
>>> trusted would be an option (also helping SCCVN).
>>
>> Found it.  In the CFG with the unexectuable edges _not_ removed there is a
>> PHI associated with that edge which provides a dominating unaliased vuse.
>> Once that edge is removed, the PHI arg gets removed and thus we can't easily
>> see the unaliased vuse.
>>
>> So all is working as expected.  It wasn't ever a big issue, I just wanted to
>> make sure I thoroughly understood the somewhat counter-intuitive result.
>
> Good.  Btw, I remembered that with the unreachable tracking in SCCVN I
> ran into cases where it didn't correctly track all unreachable blocks due
> to the way the dominator walk works (which doesn't ensure we've visited
> all predecessors).  Like for
>
>    <unreachable>
>     |                 |
>     |               <bb3>
>     |                 / \
>   <bb2>          /   \                /
>      \       <bb4>      <reachable>
>       \      /
>      <bb5>
>
> DOM order visits <unreachable>, <bb2>, <bb3>, <bb5> and then
> the DOM children like <bb4>.  So we fail to detect bb5 as unreachable
> (didn't visit bb4 to mark outgoing edges unreachable yet).
>
> The fix is easy (in testing right now).  Simply track if the current block
> was unreachable when visiting DOM children.
>
> Didn't manage to produce a missed-optimization testcase (but only
> tried for a few minutes), the cases I've seen it were involving
> unreachable loops, but I don't have them anymore.
>
> Sorry if that makes extracting the machinery harder ;)
I'll get it sorted out.

I found that I actually wanted those bits extracted and split into two 
functions.  One for the test and one for the propagation.  The hardest 
part is actually chosing meaningful names.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-04 17:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-07  2:26 [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path Jeff Law
2015-10-07  8:26 ` Richard Biener
2015-10-07 22:00   ` Jeff Law
2015-10-08  9:56     ` Richard Biener
2015-10-09 15:45       ` Jeff Law
2015-12-01 21:32         ` Jeff Law
2015-12-02  9:54           ` Richard Biener
2015-12-02 15:31             ` Jeff Law
2015-12-02 15:35               ` Richard Biener
2015-12-02 22:57                 ` Jeff Law
2015-12-03  9:57                   ` Richard Biener
2015-12-03 20:29                 ` Jeff Law
2015-12-04 10:12                   ` Richard Biener
2015-12-04 17:18                     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).