public inbox for gcc-regression@sourceware.org
help / color / mirror / Atom feed
* [TCWG CI] Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities
@ 2022-10-04  6:36 ci_notify
  0 siblings, 0 replies; 2+ messages in thread
From: ci_notify @ 2022-10-04  6:36 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-regression

[-- Attachment #1: Type: text/plain, Size: 16704 bytes --]

Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities:

Results changed to
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap_debug:
# FAILED
# First few build errors in logs:
# 00:01:45 ../../../../../../gcc/gcc/tree-ssa-dom.cc:689:13: error: ‘dst’ was not declared in this scope; did you mean ‘dse’?
# 00:01:45 ../../../../../../gcc/gcc/tree-ssa-dom.cc:689:33: error: ‘phi’ was not declared in this scope; did you mean ‘gphi’?
# 00:01:45 make[3]: *** [Makefile:1147: tree-ssa-dom.o] Error 1
# 00:03:00 make[2]: *** [Makefile:4937: all-stage1-gcc] Error 2
# 00:03:00 make[1]: *** [Makefile:25433: stage1-bubble] Error 2
# 00:03:00 make: *** [Makefile:1064: all] Error 2

from
-10
# true:
0
# build_abe binutils:
1
# build_abe bootstrap_debug:
2

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

For latest status see comments in https://linaro.atlassian.net/browse/GNU-692 .
Status of basepoints/gcc-13-3004-g1214196da79 commit for tcwg_gcc_bootstrap:
commit 1214196da79aabbe5c14ed36e5a28012e141f04c
Author: Jeff Law <jeffreyalaw@gmail.com>
Date:   Fri Sep 30 19:26:31 2022 -0400

    More gimple const/copy propagation opportunities
    
    While investigating a benchmark for optimization opportunities I came across single block loop which either iterates precisely once or forever.    This is an interesting scenario as we can ignore the infinite looping path and treat any PHI nodes as degenerates.  So more concretely let's consider this trivial testcase:
    
    volatile void abort (void);
    
    void
    foo(int a)
    {
     int b = 0;
    
     while (1)
       {
         if (!a)
           break;
         b = 1;
       }
    
     if (b != 0)
       abort ();
    }
    
    Quick analysis shows that b's initial value is 0 and its value only changes if we enter an infinite loop.  So if we get to the test b != 0, the only possible value b could have would be 0 and the test and its true arm can be eliminated.
    
    The DOM3 dump looks something like this:
    
    ;;   basic block 2, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 0, next block 3, flags: (NEW, VISITED)
    ;;    pred:       ENTRY [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;    succ:       3 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    
    ;;   basic block 3, loop depth 1, count 1073741824 (estimated locally), maybe hot
    ;;    prev block 2, next block 4, flags: (NEW, VISITED)
    ;;    pred:       2 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
      # b_1 = PHI <0(2), 1(3)>
      if (a_3(D) == 0)
        goto <bb 4>; [11.00%]
      else
        goto <bb 3>; [89.00%]
    ;;    succ:       4 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    ;;   basic block 4, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 3, next block 5, flags: (NEW, VISITED)
    ;;    pred:       3 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
      if (b_1 != 0)
        goto <bb 5>; [0.00%]
      else
        goto <bb 6>; [100.00%]
    ;;    succ:       5 [never]  count:0 (precise) (TRUE_VALUE,EXECUTABLE)
    ;;                6 [always]  count:118111600 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    This is a good representative of what the benchmark code looks like.
    
    The primary effect we want to capture is to realize that the test if (b_1 != 0) is always false and optimize it accordingly.
    
    In the benchmark, this opportunity is well hidden until after the loop optimizers have completed, so the first chance to capture this case is in DOM3.  Furthermore, DOM wants loops normalized with latch blocks/edges.  So instead of bb3 looping back to itself, there's an intermediate empty block during DOM.
    
    I originally thought this was likely to only affect the benchmark.  But when I instrumented the optimization and bootstrapped GCC, much to my surprise there were several hundred similar cases identified in GCC itself.  So it's not as benchmark specific as I'd initially feared.
    
    Anyway, detecting this in DOM is pretty simple.   We detect the infinite loop, including the latch block.  Once we've done that, we walk the PHI nodes and attach equivalences to the appropriate outgoing edge.   That's all we need to do as the rest of DOM is already prepared to handle equivalences on edges.
    
    gcc/
            * tree-ssa-dom.cc (single_block_loop_p): New function.
            (record_edge_info): Also record equivalences for the outgoing
            edge of a single block loop where the condition is an invariant.
    
    gcc/testsuite/
            * gcc.dg/infinite-loop.c: New test.
* master-arm-bootstrap_debug
** Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities:
** https://ci.linaro.org/job/tcwg_gcc_bootstrap-build-master-arm-bootstrap_debug/475/

Bad  build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-build-master-arm-bootstrap_debug/475/artifact/artifacts
Good build: https://ci.linaro.org/job/tcwg_gcc_bootstrap-build-master-arm-bootstrap_debug/474/artifact/artifacts

Reproduce current build:
<cut>
mkdir -p investigate-gcc-1214196da79aabbe5c14ed36e5a28012e141f04c
cd investigate-gcc-1214196da79aabbe5c14ed36e5a28012e141f04c

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests for bad and good builds
mkdir -p bad/artifacts good/artifacts
curl -o bad/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-build-master-arm-bootstrap_debug/475/artifact/artifacts/manifest.sh --fail
curl -o good/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_gcc_bootstrap-build-master-arm-bootstrap_debug/474/artifact/artifacts/manifest.sh --fail

# Reproduce bad build
(cd bad; ../jenkins-scripts/tcwg_gnu-build.sh ^^ true %%rr[top_artifacts] artifacts)
# Reproduce good build
(cd good; ../jenkins-scripts/tcwg_gnu-build.sh ^^ true %%rr[top_artifacts] artifacts)
</cut>

Full commit (up to 1000 lines):
<cut>
commit 1214196da79aabbe5c14ed36e5a28012e141f04c
Author: Jeff Law <jeffreyalaw@gmail.com>
Date:   Fri Sep 30 19:26:31 2022 -0400

    More gimple const/copy propagation opportunities
    
    While investigating a benchmark for optimization opportunities I came across single block loop which either iterates precisely once or forever.    This is an interesting scenario as we can ignore the infinite looping path and treat any PHI nodes as degenerates.  So more concretely let's consider this trivial testcase:
    
    volatile void abort (void);
    
    void
    foo(int a)
    {
     int b = 0;
    
     while (1)
       {
         if (!a)
           break;
         b = 1;
       }
    
     if (b != 0)
       abort ();
    }
    
    Quick analysis shows that b's initial value is 0 and its value only changes if we enter an infinite loop.  So if we get to the test b != 0, the only possible value b could have would be 0 and the test and its true arm can be eliminated.
    
    The DOM3 dump looks something like this:
    
    ;;   basic block 2, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 0, next block 3, flags: (NEW, VISITED)
    ;;    pred:       ENTRY [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;    succ:       3 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    
    ;;   basic block 3, loop depth 1, count 1073741824 (estimated locally), maybe hot
    ;;    prev block 2, next block 4, flags: (NEW, VISITED)
    ;;    pred:       2 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
      # b_1 = PHI <0(2), 1(3)>
      if (a_3(D) == 0)
        goto <bb 4>; [11.00%]
      else
        goto <bb 3>; [89.00%]
    ;;    succ:       4 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    ;;   basic block 4, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 3, next block 5, flags: (NEW, VISITED)
    ;;    pred:       3 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
      if (b_1 != 0)
        goto <bb 5>; [0.00%]
      else
        goto <bb 6>; [100.00%]
    ;;    succ:       5 [never]  count:0 (precise) (TRUE_VALUE,EXECUTABLE)
    ;;                6 [always]  count:118111600 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    This is a good representative of what the benchmark code looks like.
    
    The primary effect we want to capture is to realize that the test if (b_1 != 0) is always false and optimize it accordingly.
    
    In the benchmark, this opportunity is well hidden until after the loop optimizers have completed, so the first chance to capture this case is in DOM3.  Furthermore, DOM wants loops normalized with latch blocks/edges.  So instead of bb3 looping back to itself, there's an intermediate empty block during DOM.
    
    I originally thought this was likely to only affect the benchmark.  But when I instrumented the optimization and bootstrapped GCC, much to my surprise there were several hundred similar cases identified in GCC itself.  So it's not as benchmark specific as I'd initially feared.
    
    Anyway, detecting this in DOM is pretty simple.   We detect the infinite loop, including the latch block.  Once we've done that, we walk the PHI nodes and attach equivalences to the appropriate outgoing edge.   That's all we need to do as the rest of DOM is already prepared to handle equivalences on edges.
    
    gcc/
            * tree-ssa-dom.cc (single_block_loop_p): New function.
            (record_edge_info): Also record equivalences for the outgoing
            edge of a single block loop where the condition is an invariant.
    
    gcc/testsuite/
            * gcc.dg/infinite-loop.c: New test.
---
 gcc/testsuite/gcc.dg/infinite-loop.c |  26 +++++++
 gcc/tree-ssa-dom.cc                  | 133 ++++++++++++++++++++++++++++++++++-
 2 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/infinite-loop.c b/gcc/testsuite/gcc.dg/infinite-loop.c
new file mode 100644
index 00000000000..25037a2027e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/infinite-loop.c
@@ -0,0 +1,26 @@
+/* { dg-do link } */
+/* { dg-options "-O2" } */
+void link_error (void);
+
+void __attribute__ ((noinline,noipa))
+foo(int a)
+{
+ int b = 0;
+
+ while (1)
+   {
+     if (!a)
+       break;
+     b = 1;
+   }
+
+ if (b != 0)
+   link_error ();
+}
+
+int
+main()
+{
+  foo (0);
+}
+
diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index fa43dbe6c44..8d8312ca350 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -426,6 +426,74 @@ free_all_edge_infos (void)
     }
 }
 
+/* Return TRUE if BB has precisely two preds, one of which
+   is a backedge from a forwarder block where the forwarder
+   block is a direct successor of BB.  Being a forwarder
+   block, it has no side effects other than transfer of
+   control.  Otherwise return FALSE.  */
+
+static bool
+single_block_loop_p (basic_block bb)
+{
+  /* Two preds.  */
+  if (EDGE_COUNT (bb->preds) != 2)
+    return false;
+
+  /* One and only one of the edges must be marked with
+     EDGE_DFS_BACK.  */
+  basic_block pred = NULL;
+  unsigned int count = 0;
+  if (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK)
+    {
+      pred = EDGE_PRED (bb, 0)->src;
+      count++;
+    }
+  if (EDGE_PRED (bb, 1)->flags & EDGE_DFS_BACK)
+    {
+      pred = EDGE_PRED (bb, 1)->src;
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  /* Now examine PRED.  It should have a single predecessor which
+     is BB and a single successor that is also BB.  */
+  if (EDGE_COUNT (pred->preds) != 1
+      || EDGE_COUNT (pred->succs) != 1
+      || EDGE_PRED (pred, 0)->src != bb
+      || EDGE_SUCC (pred, 0)->dest != bb)
+    return false;
+
+  /* This looks good from a CFG standpoint.  Now look at the guts
+     of PRED.  Basically we want to verify there are no PHI nodes
+     and no real statements.  */
+  if (! gimple_seq_empty_p (phi_nodes (pred)))
+    return false;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_last_bb (pred); !gsi_end_p (gsi); gsi_prev (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_LABEL:
+	    if (DECL_NONLOCAL (gimple_label_label (as_a <glabel *> (stmt))))
+	      return false;
+	    break;
+
+	  case GIMPLE_DEBUG:
+	    break;
+
+	  default:
+	    return false;
+	}
+    }
+
+  return true;
+}
+
 /* We have finished optimizing BB, record any information implied by
    taking a specific outgoing edge from BB.  */
 
@@ -435,6 +503,13 @@ record_edge_info (basic_block bb)
   gimple_stmt_iterator gsi = gsi_last_bb (bb);
   class edge_info *edge_info;
 
+  /* Free all the outgoing edge info data associated with
+     BB's outgoing edges.  */
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    free_dom_edge_info (e);
+
   if (! gsi_end_p (gsi))
     {
       gimple *stmt = gsi_stmt (gsi);
@@ -450,8 +525,6 @@ record_edge_info (basic_block bb)
 	      int i;
               int n_labels = gimple_switch_num_labels (switch_stmt);
 	      tree *info = XCNEWVEC (tree, last_basic_block_for_fn (cfun));
-	      edge e;
-	      edge_iterator ei;
 
 	      for (i = 0; i < n_labels; i++)
 		{
@@ -583,6 +656,62 @@ record_edge_info (basic_block bb)
               if (can_infer_simple_equiv && TREE_CODE (inverted) == EQ_EXPR)
 		edge_info->record_simple_equiv (op0, op1);
             }
+
+	  /* If this block is a single block loop, then we may be able to
+	     record some equivalences on the loop's exit edge.  */
+	  if (single_block_loop_p (bb))
+	    {
+	      /* We know it's a single block loop.  Now look at the loop
+		 exit condition.  What we're looking for is whether or not
+		 the exit condition is loop invariant which we can detect
+		 by checking if all the SSA_NAMEs referenced are defined
+		 outside the loop.  */
+	      if ((TREE_CODE (op0) != SSA_NAME
+		   || gimple_bb (SSA_NAME_DEF_STMT (op0)) != bb)
+		  && (TREE_CODE (op1) != SSA_NAME
+		      || gimple_bb (SSA_NAME_DEF_STMT (op1)) != bb))
+		{
+		  /* At this point we know the exit condition is loop
+		     invariant.  The only way to get out of the loop is
+		     if never traverses the backedge to begin with.  This
+		     implies that any PHI nodes create equivalances we can
+		     attach to the loop exit edge.  */
+		  int alternative
+		    = (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK) ? 1 : 0;
+
+		  gphi_iterator gsi;
+		  for (gsi = gsi_start_phis (bb);
+		       !gsi_end_p (gsi);
+		       gsi_next (&gsi))
+		    {
+		      /* If the other alternative is the same as the result,
+			 then this is a degenerate and can be ignored.  */
+		      if (dst == PHI_ARG_DEF (phi, !alternative))
+			continue;
+
+		      /* Now get the EDGE_INFO class so we can append
+			 it to our list.  We want the successor edge
+			 where the destination is not the source of
+			 an incoming edge.  */
+		      gphi *phi = gsi.phi ();
+		      tree src = PHI_ARG_DEF (phi, alternative);
+		      tree dst = PHI_RESULT (phi);
+
+		      if (EDGE_SUCC (bb, 0)->dest
+			  != EDGE_PRED (bb, !alternative)->src)
+			edge_info = (class edge_info *)EDGE_SUCC (bb, 0)->aux;
+		      else
+			edge_info = (class edge_info *)EDGE_SUCC (bb, 1)->aux;
+
+		      /* Note that since this processing is done independently
+			 of other edge equivalency processing, we may not
+			 have an EDGE_INFO structure set up yet.  */
+		      if (edge_info == NULL)
+			edge_info = new class edge_info (false_edge);
+		      edge_info->record_simple_equiv (dst, src);
+		    }
+		}
+	    }
         }
     }
 }
</cut>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [TCWG CI] Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities
@ 2022-10-01  4:48 ci_notify
  0 siblings, 0 replies; 2+ messages in thread
From: ci_notify @ 2022-10-01  4:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-regression

[-- Attachment #1: Type: text/plain, Size: 16672 bytes --]

Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities:

Results changed to
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
# FAILED
# First few build errors in logs:
# 00:02:18 ../../../../../../gcc/gcc/tree-ssa-dom.cc:689:13: error: ‘dst’ was not declared in this scope; did you mean ‘dse’?
# 00:02:18 ../../../../../../gcc/gcc/tree-ssa-dom.cc:689:33: error: ‘phi’ was not declared in this scope; did you mean ‘gphi’?
# 00:02:18 make[2]: *** [Makefile:1147: tree-ssa-dom.o] Error 1
# 00:05:06 make[1]: *** [Makefile:4565: all-gcc] Error 2
# 00:05:06 make: *** [Makefile:1004: all] Error 2

from
-10
# true:
0
# build_abe binutils:
1
# build_abe stage1:
2
# build_abe linux:
3
# build_abe glibc:
4
# build_abe stage2:
5
# build_abe gdb:
6
# build_abe qemu:
7

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.

For latest status see comments in https://linaro.atlassian.net/browse/GNU-692 .
Status of basepoints/gcc-13-3004-g1214196da79 commit for tcwg_gnu_cross_build:
commit 1214196da79aabbe5c14ed36e5a28012e141f04c
Author: Jeff Law <jeffreyalaw@gmail.com>
Date:   Fri Sep 30 19:26:31 2022 -0400

    More gimple const/copy propagation opportunities
    
    While investigating a benchmark for optimization opportunities I came across single block loop which either iterates precisely once or forever.    This is an interesting scenario as we can ignore the infinite looping path and treat any PHI nodes as degenerates.  So more concretely let's consider this trivial testcase:
    
    volatile void abort (void);
    
    void
    foo(int a)
    {
     int b = 0;
    
     while (1)
       {
         if (!a)
           break;
         b = 1;
       }
    
     if (b != 0)
       abort ();
    }
    
    Quick analysis shows that b's initial value is 0 and its value only changes if we enter an infinite loop.  So if we get to the test b != 0, the only possible value b could have would be 0 and the test and its true arm can be eliminated.
    
    The DOM3 dump looks something like this:
    
    ;;   basic block 2, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 0, next block 3, flags: (NEW, VISITED)
    ;;    pred:       ENTRY [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;    succ:       3 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    
    ;;   basic block 3, loop depth 1, count 1073741824 (estimated locally), maybe hot
    ;;    prev block 2, next block 4, flags: (NEW, VISITED)
    ;;    pred:       2 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
      # b_1 = PHI <0(2), 1(3)>
      if (a_3(D) == 0)
        goto <bb 4>; [11.00%]
      else
        goto <bb 3>; [89.00%]
    ;;    succ:       4 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    ;;   basic block 4, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 3, next block 5, flags: (NEW, VISITED)
    ;;    pred:       3 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
      if (b_1 != 0)
        goto <bb 5>; [0.00%]
      else
        goto <bb 6>; [100.00%]
    ;;    succ:       5 [never]  count:0 (precise) (TRUE_VALUE,EXECUTABLE)
    ;;                6 [always]  count:118111600 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    This is a good representative of what the benchmark code looks like.
    
    The primary effect we want to capture is to realize that the test if (b_1 != 0) is always false and optimize it accordingly.
    
    In the benchmark, this opportunity is well hidden until after the loop optimizers have completed, so the first chance to capture this case is in DOM3.  Furthermore, DOM wants loops normalized with latch blocks/edges.  So instead of bb3 looping back to itself, there's an intermediate empty block during DOM.
    
    I originally thought this was likely to only affect the benchmark.  But when I instrumented the optimization and bootstrapped GCC, much to my surprise there were several hundred similar cases identified in GCC itself.  So it's not as benchmark specific as I'd initially feared.
    
    Anyway, detecting this in DOM is pretty simple.   We detect the infinite loop, including the latch block.  Once we've done that, we walk the PHI nodes and attach equivalences to the appropriate outgoing edge.   That's all we need to do as the rest of DOM is already prepared to handle equivalences on edges.
    
    gcc/
            * tree-ssa-dom.cc (single_block_loop_p): New function.
            (record_edge_info): Also record equivalences for the outgoing
            edge of a single block loop where the condition is an invariant.
    
    gcc/testsuite/
            * gcc.dg/infinite-loop.c: New test.
* master-aarch64
** Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities:
** https://ci.linaro.org/job/tcwg_gnu_cross_build-build-master-aarch64/1606/

Bad  build: https://ci.linaro.org/job/tcwg_gnu_cross_build-build-master-aarch64/1606/artifact/artifacts
Good build: https://ci.linaro.org/job/tcwg_gnu_cross_build-build-master-aarch64/1605/artifact/artifacts

Reproduce current build:
<cut>
mkdir -p investigate-gcc-1214196da79aabbe5c14ed36e5a28012e141f04c
cd investigate-gcc-1214196da79aabbe5c14ed36e5a28012e141f04c

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests for bad and good builds
mkdir -p bad/artifacts good/artifacts
curl -o bad/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-build-master-aarch64/1606/artifact/artifacts/manifest.sh --fail
curl -o good/artifacts/manifest.sh https://ci.linaro.org/job/tcwg_gnu_cross_build-build-master-aarch64/1605/artifact/artifacts/manifest.sh --fail

# Reproduce bad build
(cd bad; ../jenkins-scripts/tcwg_gnu-build.sh ^^ true %%rr[top_artifacts] artifacts)
# Reproduce good build
(cd good; ../jenkins-scripts/tcwg_gnu-build.sh ^^ true %%rr[top_artifacts] artifacts)
</cut>

Full commit (up to 1000 lines):
<cut>
commit 1214196da79aabbe5c14ed36e5a28012e141f04c
Author: Jeff Law <jeffreyalaw@gmail.com>
Date:   Fri Sep 30 19:26:31 2022 -0400

    More gimple const/copy propagation opportunities
    
    While investigating a benchmark for optimization opportunities I came across single block loop which either iterates precisely once or forever.    This is an interesting scenario as we can ignore the infinite looping path and treat any PHI nodes as degenerates.  So more concretely let's consider this trivial testcase:
    
    volatile void abort (void);
    
    void
    foo(int a)
    {
     int b = 0;
    
     while (1)
       {
         if (!a)
           break;
         b = 1;
       }
    
     if (b != 0)
       abort ();
    }
    
    Quick analysis shows that b's initial value is 0 and its value only changes if we enter an infinite loop.  So if we get to the test b != 0, the only possible value b could have would be 0 and the test and its true arm can be eliminated.
    
    The DOM3 dump looks something like this:
    
    ;;   basic block 2, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 0, next block 3, flags: (NEW, VISITED)
    ;;    pred:       ENTRY [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;    succ:       3 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    
    ;;   basic block 3, loop depth 1, count 1073741824 (estimated locally), maybe hot
    ;;    prev block 2, next block 4, flags: (NEW, VISITED)
    ;;    pred:       2 [always]  count:118111600 (estimated locally) (FALLTHRU,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
      # b_1 = PHI <0(2), 1(3)>
      if (a_3(D) == 0)
        goto <bb 4>; [11.00%]
      else
        goto <bb 3>; [89.00%]
    ;;    succ:       4 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
    ;;                3 [89.0% (guessed)]  count:955630224 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    ;;   basic block 4, loop depth 0, count 118111600 (estimated locally), maybe hot
    ;;    prev block 3, next block 5, flags: (NEW, VISITED)
    ;;    pred:       3 [11.0% (guessed)]  count:118111600 (estimated locally) (TRUE_VALUE,EXECUTABLE)
      if (b_1 != 0)
        goto <bb 5>; [0.00%]
      else
        goto <bb 6>; [100.00%]
    ;;    succ:       5 [never]  count:0 (precise) (TRUE_VALUE,EXECUTABLE)
    ;;                6 [always]  count:118111600 (estimated locally) (FALSE_VALUE,EXECUTABLE)
    
    This is a good representative of what the benchmark code looks like.
    
    The primary effect we want to capture is to realize that the test if (b_1 != 0) is always false and optimize it accordingly.
    
    In the benchmark, this opportunity is well hidden until after the loop optimizers have completed, so the first chance to capture this case is in DOM3.  Furthermore, DOM wants loops normalized with latch blocks/edges.  So instead of bb3 looping back to itself, there's an intermediate empty block during DOM.
    
    I originally thought this was likely to only affect the benchmark.  But when I instrumented the optimization and bootstrapped GCC, much to my surprise there were several hundred similar cases identified in GCC itself.  So it's not as benchmark specific as I'd initially feared.
    
    Anyway, detecting this in DOM is pretty simple.   We detect the infinite loop, including the latch block.  Once we've done that, we walk the PHI nodes and attach equivalences to the appropriate outgoing edge.   That's all we need to do as the rest of DOM is already prepared to handle equivalences on edges.
    
    gcc/
            * tree-ssa-dom.cc (single_block_loop_p): New function.
            (record_edge_info): Also record equivalences for the outgoing
            edge of a single block loop where the condition is an invariant.
    
    gcc/testsuite/
            * gcc.dg/infinite-loop.c: New test.
---
 gcc/testsuite/gcc.dg/infinite-loop.c |  26 +++++++
 gcc/tree-ssa-dom.cc                  | 133 ++++++++++++++++++++++++++++++++++-
 2 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/infinite-loop.c b/gcc/testsuite/gcc.dg/infinite-loop.c
new file mode 100644
index 00000000000..25037a2027e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/infinite-loop.c
@@ -0,0 +1,26 @@
+/* { dg-do link } */
+/* { dg-options "-O2" } */
+void link_error (void);
+
+void __attribute__ ((noinline,noipa))
+foo(int a)
+{
+ int b = 0;
+
+ while (1)
+   {
+     if (!a)
+       break;
+     b = 1;
+   }
+
+ if (b != 0)
+   link_error ();
+}
+
+int
+main()
+{
+  foo (0);
+}
+
diff --git a/gcc/tree-ssa-dom.cc b/gcc/tree-ssa-dom.cc
index fa43dbe6c44..8d8312ca350 100644
--- a/gcc/tree-ssa-dom.cc
+++ b/gcc/tree-ssa-dom.cc
@@ -426,6 +426,74 @@ free_all_edge_infos (void)
     }
 }
 
+/* Return TRUE if BB has precisely two preds, one of which
+   is a backedge from a forwarder block where the forwarder
+   block is a direct successor of BB.  Being a forwarder
+   block, it has no side effects other than transfer of
+   control.  Otherwise return FALSE.  */
+
+static bool
+single_block_loop_p (basic_block bb)
+{
+  /* Two preds.  */
+  if (EDGE_COUNT (bb->preds) != 2)
+    return false;
+
+  /* One and only one of the edges must be marked with
+     EDGE_DFS_BACK.  */
+  basic_block pred = NULL;
+  unsigned int count = 0;
+  if (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK)
+    {
+      pred = EDGE_PRED (bb, 0)->src;
+      count++;
+    }
+  if (EDGE_PRED (bb, 1)->flags & EDGE_DFS_BACK)
+    {
+      pred = EDGE_PRED (bb, 1)->src;
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  /* Now examine PRED.  It should have a single predecessor which
+     is BB and a single successor that is also BB.  */
+  if (EDGE_COUNT (pred->preds) != 1
+      || EDGE_COUNT (pred->succs) != 1
+      || EDGE_PRED (pred, 0)->src != bb
+      || EDGE_SUCC (pred, 0)->dest != bb)
+    return false;
+
+  /* This looks good from a CFG standpoint.  Now look at the guts
+     of PRED.  Basically we want to verify there are no PHI nodes
+     and no real statements.  */
+  if (! gimple_seq_empty_p (phi_nodes (pred)))
+    return false;
+
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_last_bb (pred); !gsi_end_p (gsi); gsi_prev (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_LABEL:
+	    if (DECL_NONLOCAL (gimple_label_label (as_a <glabel *> (stmt))))
+	      return false;
+	    break;
+
+	  case GIMPLE_DEBUG:
+	    break;
+
+	  default:
+	    return false;
+	}
+    }
+
+  return true;
+}
+
 /* We have finished optimizing BB, record any information implied by
    taking a specific outgoing edge from BB.  */
 
@@ -435,6 +503,13 @@ record_edge_info (basic_block bb)
   gimple_stmt_iterator gsi = gsi_last_bb (bb);
   class edge_info *edge_info;
 
+  /* Free all the outgoing edge info data associated with
+     BB's outgoing edges.  */
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+    free_dom_edge_info (e);
+
   if (! gsi_end_p (gsi))
     {
       gimple *stmt = gsi_stmt (gsi);
@@ -450,8 +525,6 @@ record_edge_info (basic_block bb)
 	      int i;
               int n_labels = gimple_switch_num_labels (switch_stmt);
 	      tree *info = XCNEWVEC (tree, last_basic_block_for_fn (cfun));
-	      edge e;
-	      edge_iterator ei;
 
 	      for (i = 0; i < n_labels; i++)
 		{
@@ -583,6 +656,62 @@ record_edge_info (basic_block bb)
               if (can_infer_simple_equiv && TREE_CODE (inverted) == EQ_EXPR)
 		edge_info->record_simple_equiv (op0, op1);
             }
+
+	  /* If this block is a single block loop, then we may be able to
+	     record some equivalences on the loop's exit edge.  */
+	  if (single_block_loop_p (bb))
+	    {
+	      /* We know it's a single block loop.  Now look at the loop
+		 exit condition.  What we're looking for is whether or not
+		 the exit condition is loop invariant which we can detect
+		 by checking if all the SSA_NAMEs referenced are defined
+		 outside the loop.  */
+	      if ((TREE_CODE (op0) != SSA_NAME
+		   || gimple_bb (SSA_NAME_DEF_STMT (op0)) != bb)
+		  && (TREE_CODE (op1) != SSA_NAME
+		      || gimple_bb (SSA_NAME_DEF_STMT (op1)) != bb))
+		{
+		  /* At this point we know the exit condition is loop
+		     invariant.  The only way to get out of the loop is
+		     if never traverses the backedge to begin with.  This
+		     implies that any PHI nodes create equivalances we can
+		     attach to the loop exit edge.  */
+		  int alternative
+		    = (EDGE_PRED (bb, 0)->flags & EDGE_DFS_BACK) ? 1 : 0;
+
+		  gphi_iterator gsi;
+		  for (gsi = gsi_start_phis (bb);
+		       !gsi_end_p (gsi);
+		       gsi_next (&gsi))
+		    {
+		      /* If the other alternative is the same as the result,
+			 then this is a degenerate and can be ignored.  */
+		      if (dst == PHI_ARG_DEF (phi, !alternative))
+			continue;
+
+		      /* Now get the EDGE_INFO class so we can append
+			 it to our list.  We want the successor edge
+			 where the destination is not the source of
+			 an incoming edge.  */
+		      gphi *phi = gsi.phi ();
+		      tree src = PHI_ARG_DEF (phi, alternative);
+		      tree dst = PHI_RESULT (phi);
+
+		      if (EDGE_SUCC (bb, 0)->dest
+			  != EDGE_PRED (bb, !alternative)->src)
+			edge_info = (class edge_info *)EDGE_SUCC (bb, 0)->aux;
+		      else
+			edge_info = (class edge_info *)EDGE_SUCC (bb, 1)->aux;
+
+		      /* Note that since this processing is done independently
+			 of other edge equivalency processing, we may not
+			 have an EDGE_INFO structure set up yet.  */
+		      if (edge_info == NULL)
+			edge_info = new class edge_info (false_edge);
+		      edge_info->record_simple_equiv (dst, src);
+		    }
+		}
+	    }
         }
     }
 }
</cut>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-10-04  6:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-04  6:36 [TCWG CI] Failure after basepoints/gcc-13-3004-g1214196da79: More gimple const/copy propagation opportunities ci_notify
  -- strict thread matches above, loose matches on Subject: below --
2022-10-01  4:48 ci_notify

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).