Re: More aggressive threading causing loop-interchange-9.c regression

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: More aggressive threading causing loop-interchange-9.c regression
       [not found]                       ` <alpine.LSU.2.20.2109091342460.12583@wotan.suse.de>
@ 2021-09-10  7:04                         ` Aldy Hernandez
  2021-09-10 13:16                           ` Michael Matz
  0 siblings, 1 reply; 7+ messages in thread
From: Aldy Hernandez @ 2021-09-10  7:04 UTC (permalink / raw)
  To: Michael Matz
  Cc: Richard Biener, Jeff Law, GCC Mailing List, Andrew MacLeod, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

I think things are clear enough to propose a patch.  Thanks for 
everyone's input.

I have added some comments and tweaked Michael's patch to include the 
final BB where we're threading to, in the check.  Without this last 
check, we fail in tree-ssa/cunroll-15.c because the latch becomes 
polluted with PHI nodes.

OK for trunk?
Aldy

commit e735a2cd01485773635bd66c97af21059992d5a7 (HEAD -> 
pending/global-ranges)
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Thu Sep 9 20:30:28 2021 +0200

     Disable threading through latches until after loop optimizations.

     The motivation for this patch was enabling the use of global ranges in
     the path solver, but this caused certain properties of loops being
     destroyed which made subsequent loop optimizations to fail.
     Consequently, this patch's mail goal is to disable jump threading
     involving the latch until after loop optimizations have run.

     I have added a bit in cfun to indicate when the "loopdone" optimization
     has completed.  This helps the path threader determine when it's 
safe to
     thread through loop structures.  I can adapt the patch if there is an
     alternate way of determining this.

     As can be seen in the test adjustments, we mostly shift the threading
     from the early threaders (ethread, thread[12] to the late threaders
     thread[34]).  I have nuked some of the early notes in the testcases
     that came as part of the jump threader rewrite.  They're mostly noise
     now.

     Note that we could probably relax some other restrictions in
     profitable_path_p when loop optimizations have completed, but it would
     require more testing, and I'm hesitant to touch more things than needed
     at this point.  I have added a reminder to the function to keep this
     in mind.

     Finally, perhaps as a follow-up, we should apply the same 
restrictions to
     the forward threader.  At some point I'd like to combine the cost 
models.

     Tested on x86-64 Linux.

     p.s. There is a thorough discussion involving the limitations of jump
     threading involving loops here:

             https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

[-- Attachment #2: 0001-Disable-threading-through-latches-until-after-loop-o.patch --]
[-- Type: text/x-patch, Size: 11395 bytes --]

From e735a2cd01485773635bd66c97af21059992d5a7 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Thu, 9 Sep 2021 20:30:28 +0200
Subject: [PATCH] Disable threading through latches until after loop
 optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

I have added a bit in cfun to indicate when the "loopdone" optimization
has completed.  This helps the path threader determine when it's safe to
thread through loop structures.  I can adapt the patch if there is an
alternate way of determining this.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

	* function.h (struct function): Add loop_optimizers_done.
	(set_loop_optimizers_done): New.
	(loop_optimizers_done_p): New.
	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
	Intersect with global range.
	* tree-ssa-loop.c (tree_ssa_loop_done): Call set_loop_optimizers_done.
	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Disable
	threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
	threading through latches.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>
---
 gcc/function.h                                | 15 ++++++++
 gcc/gimple-range-path.cc                      |  3 ++
 .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c       |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c        | 37 +------------------
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        | 17 +--------
 gcc/tree-ssa-loop.c                           |  1 +
 gcc/tree-ssa-threadbackward.c                 | 28 +++++++++++++-
 7 files changed, 50 insertions(+), 55 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 36003e7576a..fb99b6a44fc 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -438,6 +438,9 @@ struct GTY(()) function {
 
   /* Set if there are any OMP_TARGET regions in the function.  */
   unsigned int has_omp_target : 1;
+
+  /* Set if SSA loop optimizers have completed.  */
+  unsigned int loop_optimizers_done : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
@@ -514,6 +517,18 @@ set_loops_for_fn (struct function *fn, struct loops *loops)
   fn->x_current_loops = loops;
 }
 
+inline void
+set_loop_optimizers_done (struct function *fn)
+{
+  fn->loop_optimizers_done = 1;
+}
+
+inline bool
+loop_optimizers_done_p (struct function *fn)
+{
+  return fn->loop_optimizers_done;
+}
+
 /* For backward compatibility... eventually these should all go away.  */
 #define current_function_funcdef_no (cfun->funcdef_no)
 
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a4fa3b296ff..c616b65756f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt)
   basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
   if (stmt && range_defined_in_block (r, name, bb))
     {
+      if (TREE_CODE (name) == SSA_NAME)
+	r.intersect (gimple_range_global (name));
+
       set_cache (r, name);
       return true;
     }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index e1c33e86cd7..823ada982ff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
 
 void foo();
 void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
    case.  And we want to thread through the header as well.  These
    are both caught by threading in DOM.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index c7bf867b084..ee46759bacc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,41 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
 
-/* All the threads in the thread1 dump start on a X->BB12 edge, as can
-   be seen in the dump:
-
-     Registering FSM jump thread: (x, 12) incoming edge; ...
-     etc
-     etc
-
-   Before the new evrp, we were threading paths that started at the
-   following edges:
-
-      Registering FSM jump thread: (10, 12) incoming edge
-      Registering FSM jump thread:  (6, 12) incoming edge
-      Registering FSM jump thread:  (9, 12) incoming edge
-
-   This was because the PHI at BB12 had constant values coming in from
-   BB10, BB6, and BB9:
-
-   # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)>
-
-   Now with the new evrp, we get:
-
-   # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)>
-
-   Thus, we have 3 more paths that are known to be constant and can be
-   threaded.  Which means that by the second threading pass, we can
-   only find one profitable path.
-
-   For the record, all these extra constants are better paths coming
-   out of switches.  For example:
-
-     SWITCH_BB -> BBx -> BBy -> BBz -> PHI
-
-   We now know the value of the switch index at PHI.  */
 /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */
+/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */
 
 int sum0, sum1, sum2, sum3;
 int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 5fc2145a432..ba07942f9dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,23 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 
-/* Here we have the same issue as was commented in ssa-dom-thread-6.c.
-   The PHI coming into the threader has a lot more constants, so the
-   threader can thread more paths.
-
-$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 
-252c252
-<   # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)>
----
->   # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)>
-272a273
-
-  I spot checked a few and they all have the same pattern.  We are
-  basically tracking the switch index better through multiple
-  paths.  */
-
 /* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 0cc4b3bbccf..c827a51d3ce 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -528,6 +528,7 @@ tree_ssa_loop_done (void)
   free_numbers_of_iterations_estimates (cfun);
   scev_finalize ();
   loop_optimizer_finalize (cfun, true);
+  set_loop_optimizers_done (cfun);
   return 0;
 }
 
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..ded31a9e2de 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ssa.h"
 #include "tree-cfgcleanup.h"
 #include "tree-pretty-print.h"
+#include "cfghooks.h"
 
 // Path registry for the backwards threader.  After all paths have been
 // registered with register_path(), thread_through_all_blocks() is called
@@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
    TAKEN_EDGE, otherwise it is NULL.
 
    CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path
-   would create an irreducible loop.  */
+   would create an irreducible loop.
+
+   ?? It seems we should be able to loosen some of the restrictions in
+   this function after loop optimizations have run.  */
 
 bool
 back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
@@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	 the last entry in the array when determining if we thread
 	 through the loop latch.  */
       if (loop->latch == bb)
-	threaded_through_latch = true;
+	{
+	  threaded_through_latch = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, " (latch)");
+	}
     }
 
   gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 		 "a multiway branch.\n");
       return false;
     }
+
+  /* Threading through a non-empty latch would cause code to be added
+     to the latch.  This could alter the loop form sufficiently to
+     cause loop optimizations to fail.  Disable these threads until
+     after loop optimizations have run.  */
+  if ((threaded_through_latch
+       || (taken_edge && taken_edge->dest == loop->latch))
+      && !loop_optimizers_done_p (cfun)
+      && empty_block_p (loop->latch))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "  FAIL: FSM Thread through latch before loop opts would create non-empty latch\n");
+      return false;
+
+    }
   return true;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-10  7:04                         ` More aggressive threading causing loop-interchange-9.c regression Aldy Hernandez
@ 2021-09-10 13:16                           ` Michael Matz
  2021-09-10 13:53                             ` Aldy Hernandez
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Matz @ 2021-09-10 13:16 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

Hi,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

>      }
> +
> +  /* Threading through a non-empty latch would cause code to be added

"through an *empty* latch".  The test in code is correct, though.

And for the before/after loops flag you added: we have a 
cfun->curr_properties field which can be used.  We even already have a 
PROP_loops flag but that is set throughout compilation from CFG 
construction until the RTL loop optimizers, so can't be re-used for what 
is needed here.  But you still could invent another PROP_ value instead of 
adding a new field in struct function.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-10 13:16                           ` Michael Matz
@ 2021-09-10 13:53                             ` Aldy Hernandez
  2021-09-10 14:33                               ` Michael Matz
  2021-09-10 16:31                               ` Jeff Law
  0 siblings, 2 replies; 7+ messages in thread
From: Aldy Hernandez @ 2021-09-10 13:53 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]



On 9/10/21 3:16 PM, Michael Matz wrote:
> Hi,
> 
> On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:
> 
>>       }
>> +
>> +  /* Threading through a non-empty latch would cause code to be added
> 
> "through an *empty* latch".  The test in code is correct, though.

Whoops.

> 
> And for the before/after loops flag you added: we have a
> cfun->curr_properties field which can be used.  We even already have a
> PROP_loops flag but that is set throughout compilation from CFG
> construction until the RTL loop optimizers, so can't be re-used for what
> is needed here.  But you still could invent another PROP_ value instead of
> adding a new field in struct function.

Oooo, even better.  No inline functions.

Like this?
Aldy

[-- Attachment #2: 0001-Disable-threading-through-latches-until-after-loop-o.patch --]
[-- Type: text/x-patch, Size: 10838 bytes --]

From ff25faa8dd8721da9bb4715706c662fc09fd4e8c Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Thu, 9 Sep 2021 20:30:28 +0200
Subject: [PATCH] Disable threading through latches until after loop
 optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

	* tree-pass.h (PROP_loop_opts_done): New.
	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
	Intersect with global range.
	* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Disable
	threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
	threading through latches.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz <matz@suse.de>
---
 gcc/gimple-range-path.cc                      |  3 ++
 .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c       |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c        | 37 +------------------
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        | 17 +--------
 gcc/tree-pass.h                               |  2 +
 gcc/tree-ssa-loop.c                           |  2 +-
 gcc/tree-ssa-threadbackward.c                 | 28 +++++++++++++-
 7 files changed, 37 insertions(+), 56 deletions(-)

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index a4fa3b296ff..c616b65756f 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -127,6 +127,9 @@ path_range_query::internal_range_of_expr (irange &r, tree name, gimple *stmt)
   basic_block bb = stmt ? gimple_bb (stmt) : exit_bb ();
   if (stmt && range_defined_in_block (r, name, bb))
     {
+      if (TREE_CODE (name) == SSA_NAME)
+	r.intersect (gimple_range_global (name));
+
       set_cache (r, name);
       return true;
     }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
index e1c33e86cd7..823ada982ff 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2b.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
+/* { dg-options "-O2 -fdump-tree-thread3-stats -fdump-tree-dom2-stats -fdisable-tree-ethread" } */
 
 void foo();
 void bla();
@@ -26,4 +26,4 @@ void thread_latch_through_header (void)
    case.  And we want to thread through the header as well.  These
    are both caught by threading in DOM.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread1"} } */
+/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "thread3"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
index c7bf867b084..ee46759bacc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
@@ -1,41 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread2-details" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
 
-/* All the threads in the thread1 dump start on a X->BB12 edge, as can
-   be seen in the dump:
-
-     Registering FSM jump thread: (x, 12) incoming edge; ...
-     etc
-     etc
-
-   Before the new evrp, we were threading paths that started at the
-   following edges:
-
-      Registering FSM jump thread: (10, 12) incoming edge
-      Registering FSM jump thread:  (6, 12) incoming edge
-      Registering FSM jump thread:  (9, 12) incoming edge
-
-   This was because the PHI at BB12 had constant values coming in from
-   BB10, BB6, and BB9:
-
-   # state_10 = PHI <state_11(7), 0(10), state_11(5), 1(6), state_11(8), 2(9), state_11(11)>
-
-   Now with the new evrp, we get:
-
-   # state_10 = PHI <0(7), 0(10), state_11(5), 1(6), 0(8), 2(9), 1(11)>
-
-   Thus, we have 3 more paths that are known to be constant and can be
-   threaded.  Which means that by the second threading pass, we can
-   only find one profitable path.
-
-   For the record, all these extra constants are better paths coming
-   out of switches.  For example:
-
-     SWITCH_BB -> BBx -> BBy -> BBz -> PHI
-
-   We now know the value of the switch index at PHI.  */
 /* { dg-final { scan-tree-dump-times "Registering FSM jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread2" } } */
+/* { dg-final { scan-tree-dump-times "Registering FSM jump" 1 "thread3" } } */
 
 int sum0, sum1, sum2, sum3;
 int foo (char *s, char **ret)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 5fc2145a432..ba07942f9dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,23 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 
-/* Here we have the same issue as was commented in ssa-dom-thread-6.c.
-   The PHI coming into the threader has a lot more constants, so the
-   threader can thread more paths.
-
-$ diff clean/a.c.105t.mergephi2 a.c.105t.mergephi2 
-252c252
-<   # s_50 = PHI <s_49(10), 5(14), s_51(18), s_51(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), s_51(30)>
----
->   # s_50 = PHI <s_49(10), 5(14), 4(18), 5(22), 1(26), 1(29), 1(31), s_51(5), 4(12), 1(15), 5(17), 1(19), 3(21), 1(23), 6(25), 7(28), 7(30)>
-272a273
-
-  I spot checked a few and they all have the same pattern.  We are
-  basically tracking the switch index better through multiple
-  paths.  */
-
 /* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 83941bc0cee..eb75eb17951 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -225,6 +225,8 @@ protected:
 						   been optimized.  */
 #define PROP_gimple_lomp_dev	(1 << 16)	/* done omp_device_lower */
 #define PROP_rtl_split_insns	(1 << 17)	/* RTL has insns split.  */
+#define PROP_loop_opts_done	(1 << 18)	/* SSA loop optimizations
+						   have completed.  */
 
 #define PROP_gimple \
   (PROP_gimple_any | PROP_gimple_lcf | PROP_gimple_leh | PROP_gimple_lomp)
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 0cc4b3bbccf..1bbf2f1fb2c 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -540,7 +540,7 @@ const pass_data pass_data_tree_loop_done =
   OPTGROUP_LOOP, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
-  0, /* properties_provided */
+  PROP_loop_opts_done, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   TODO_cleanup_cfg, /* todo_flags_finish */
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 449232c7715..e72992328de 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ssa.h"
 #include "tree-cfgcleanup.h"
 #include "tree-pretty-print.h"
+#include "cfghooks.h"
 
 // Path registry for the backwards threader.  After all paths have been
 // registered with register_path(), thread_through_all_blocks() is called
@@ -564,7 +565,10 @@ back_threader_registry::thread_through_all_blocks (bool may_peel_loop_headers)
    TAKEN_EDGE, otherwise it is NULL.
 
    CREATES_IRREDUCIBLE_LOOP, if non-null is set to TRUE if threading this path
-   would create an irreducible loop.  */
+   would create an irreducible loop.
+
+   ?? It seems we should be able to loosen some of the restrictions in
+   this function after loop optimizations have run.  */
 
 bool
 back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
@@ -725,7 +729,11 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 	 the last entry in the array when determining if we thread
 	 through the loop latch.  */
       if (loop->latch == bb)
-	threaded_through_latch = true;
+	{
+	  threaded_through_latch = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, " (latch)");
+	}
     }
 
   gimple *stmt = get_gimple_control_stmt (m_path[0]);
@@ -845,6 +853,22 @@ back_threader_profitability::profitable_path_p (const vec<basic_block> &m_path,
 		 "a multiway branch.\n");
       return false;
     }
+
+  /* Threading through an empty latch would cause code to be added to
+     the latch.  This could alter the loop form sufficiently to cause
+     loop optimizations to fail.  Disable these threads until after
+     loop optimizations have run.  */
+  if ((threaded_through_latch
+       || (taken_edge && taken_edge->dest == loop->latch))
+      && !(cfun->curr_properties & PROP_loop_opts_done)
+      && empty_block_p (loop->latch))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "  FAIL: FSM Thread through latch before loop opts would create non-empty latch\n");
+      return false;
+
+    }
   return true;
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-10 13:53                             ` Aldy Hernandez
@ 2021-09-10 14:33                               ` Michael Matz
  2021-09-10 16:31                               ` Jeff Law
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Matz @ 2021-09-10 14:33 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: gcc-patches

Hello,

On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:

> Like this?

Yep, but I can't approve.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-10 13:53                             ` Aldy Hernandez
  2021-09-10 14:33                               ` Michael Matz
@ 2021-09-10 16:31                               ` Jeff Law
  2021-09-13 11:48                                 ` Christophe Lyon
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Law @ 2021-09-10 16:31 UTC (permalink / raw)
  To: Aldy Hernandez, Michael Matz; +Cc: gcc-patches



On 9/10/2021 7:53 AM, Aldy Hernandez via Gcc-patches wrote:
>
>
> On 9/10/21 3:16 PM, Michael Matz wrote:
>> Hi,
>>
>> On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:
>>
>>>       }
>>> +
>>> +  /* Threading through a non-empty latch would cause code to be added
>>
>> "through an *empty* latch".  The test in code is correct, though.
>
> Whoops.
>
>>
>> And for the before/after loops flag you added: we have a
>> cfun->curr_properties field which can be used.  We even already have a
>> PROP_loops flag but that is set throughout compilation from CFG
>> construction until the RTL loop optimizers, so can't be re-used for what
>> is needed here.  But you still could invent another PROP_ value 
>> instead of
>> adding a new field in struct function.
>
> Oooo, even better.  No inline functions.
>
> Like this?
> Aldy
>
> 0001-Disable-threading-through-latches-until-after-loop-o.patch
>
>  From ff25faa8dd8721da9bb4715706c662fc09fd4e8c Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez <aldyh@redhat.com>
> Date: Thu, 9 Sep 2021 20:30:28 +0200
> Subject: [PATCH] Disable threading through latches until after loop
>   optimizations.
>
> The motivation for this patch was enabling the use of global ranges in
> the path solver, but this caused certain properties of loops being
> destroyed which made subsequent loop optimizations to fail.
> Consequently, this patch's mail goal is to disable jump threading
> involving the latch until after loop optimizations have run.
>
> As can be seen in the test adjustments, we mostly shift the threading
> from the early threaders (ethread, thread[12] to the late threaders
> thread[34]).  I have nuked some of the early notes in the testcases
> that came as part of the jump threader rewrite.  They're mostly noise
> now.
>
> Note that we could probably relax some other restrictions in
> profitable_path_p when loop optimizations have completed, but it would
> require more testing, and I'm hesitant to touch more things than needed
> at this point.  I have added a reminder to the function to keep this
> in mind.
>
> Finally, perhaps as a follow-up, we should apply the same restrictions to
> the forward threader.  At some point I'd like to combine the cost models.
>
> Tested on x86-64 Linux.
>
> p.s. There is a thorough discussion involving the limitations of jump
> threading involving loops here:
>
> 	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
>
> gcc/ChangeLog:
>
> 	* tree-pass.h (PROP_loop_opts_done): New.
> 	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
> 	Intersect with global range.
> 	* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
> 	* tree-ssa-threadbackward.c
> 	(back_threader_profitability::profitable_path_p): Disable
> 	threading through latches until after loop optimizations have run.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
> 	threading through latches.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
> 	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
OK
jeff


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-10 16:31                               ` Jeff Law
@ 2021-09-13 11:48                                 ` Christophe Lyon
  2021-09-13 13:04                                   ` Aldy Hernandez
  0 siblings, 1 reply; 7+ messages in thread
From: Christophe Lyon @ 2021-09-13 11:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: Aldy Hernandez, Michael Matz, gcc-patches

On Fri, Sep 10, 2021 at 6:32 PM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 9/10/2021 7:53 AM, Aldy Hernandez via Gcc-patches wrote:
> >
> >
> > On 9/10/21 3:16 PM, Michael Matz wrote:
> >> Hi,
> >>
> >> On Fri, 10 Sep 2021, Aldy Hernandez via Gcc-patches wrote:
> >>
> >>>       }
> >>> +
> >>> +  /* Threading through a non-empty latch would cause code to be added
> >>
> >> "through an *empty* latch".  The test in code is correct, though.
> >
> > Whoops.
> >
> >>
> >> And for the before/after loops flag you added: we have a
> >> cfun->curr_properties field which can be used.  We even already have a
> >> PROP_loops flag but that is set throughout compilation from CFG
> >> construction until the RTL loop optimizers, so can't be re-used for what
> >> is needed here.  But you still could invent another PROP_ value
> >> instead of
> >> adding a new field in struct function.
> >
> > Oooo, even better.  No inline functions.
> >
> > Like this?
> > Aldy
> >
> > 0001-Disable-threading-through-latches-until-after-loop-o.patch
> >
> >  From ff25faa8dd8721da9bb4715706c662fc09fd4e8c Mon Sep 17 00:00:00 2001
> > From: Aldy Hernandez <aldyh@redhat.com>
> > Date: Thu, 9 Sep 2021 20:30:28 +0200
> > Subject: [PATCH] Disable threading through latches until after loop
> >   optimizations.
> >
> > The motivation for this patch was enabling the use of global ranges in
> > the path solver, but this caused certain properties of loops being
> > destroyed which made subsequent loop optimizations to fail.
> > Consequently, this patch's mail goal is to disable jump threading
> > involving the latch until after loop optimizations have run.
> >
> > As can be seen in the test adjustments, we mostly shift the threading
> > from the early threaders (ethread, thread[12] to the late threaders
> > thread[34]).  I have nuked some of the early notes in the testcases
> > that came as part of the jump threader rewrite.  They're mostly noise
> > now.
> >
> > Note that we could probably relax some other restrictions in
> > profitable_path_p when loop optimizations have completed, but it would
> > require more testing, and I'm hesitant to touch more things than needed
> > at this point.  I have added a reminder to the function to keep this
> > in mind.
> >
> > Finally, perhaps as a follow-up, we should apply the same restrictions to
> > the forward threader.  At some point I'd like to combine the cost models.
> >
> > Tested on x86-64 Linux.
> >
> > p.s. There is a thorough discussion involving the limitations of jump
> > threading involving loops here:
> >
> >       https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
> >
> > gcc/ChangeLog:
> >
> >       * tree-pass.h (PROP_loop_opts_done): New.
> >       * gimple-range-path.cc (path_range_query::internal_range_of_expr):
> >       Intersect with global range.
> >       * tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
> >       * tree-ssa-threadbackward.c
> >       (back_threader_profitability::profitable_path_p): Disable
> >       threading through latches until after loop optimizations have run.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
> >       threading through latches.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
>

Hi,

This last test now fails on aarch64:
FAIL:  gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 8"

Can you check?

Thanks,

Christophe

OK
> jeff
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More aggressive threading causing loop-interchange-9.c regression
  2021-09-13 11:48                                 ` Christophe Lyon
@ 2021-09-13 13:04                                   ` Aldy Hernandez
  0 siblings, 0 replies; 7+ messages in thread
From: Aldy Hernandez @ 2021-09-13 13:04 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Jeff Law, Michael Matz, gcc-patches, MacLeod, Andrew

[-- Attachment #1: Type: text/plain, Size: 2018 bytes --]

On Mon, Sep 13, 2021 at 1:49 PM Christophe Lyon
<christophe.lyon.oss@gmail.com> wrote:

> This last test now fails on aarch64:
> FAIL:  gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps threaded: 8"
>
> Can you check?

These rather large tests checking for some random number of jump
threads are very annoying.  In all my work in the past 2 years,
they've never found anything that the smaller threading tests (for
instance ssa-thread-14.c) couldn't find.

The thing is that the new backward threader is not a threader per se,
but a framework for solving paths.  It can find virtually any path in
the IL, depending on how good the solver or the path discovery bits
are.  So any change to a great number of components can alter the
number of threads.  For example, changes to at *least* the following
components could alter the number of threads:

* the path solver
* the path discovery bits in tree-ssa-threadbackward
* the profitability code in tree-ssa-threadbackward
* the post-registration bits in the low-level path registry
* range-ops
* ranger
* any pass alterting global SSA_NAME_RANGE_INFO since the path solver
can pull that info (evrp, VRP, sprintf, PRE, strlen, loop-manip, etc).

It is unreasonable to have to go through every assumed jump thread
candidate in these tests for every minor change.  Heck, are we even
sure there are supposed to be 18 exact jump threads in the first
backward jump thread pass for this test?

I suggest we remove these tests.  They add a maintenance burder, for
very little return.  And if we _really_ must test something in this
area, perhaps a unit test in the appropriate component would be more
useful.

For this particular test the IL is sufficiently different on aarch64,
such that the number of threads is different.  I'm just going to
disable this.  We already have to disable the dom3 and vrp2 jump
threading  tests on this same test on aarch64.  This is more of the
same.  Committed to trunk.

I'd be happy to hear if anyone has other solutions.
Aldy

[-- Attachment #2: p.patch --]
[-- Type: text/x-patch, Size: 1151 bytes --]

commit a7f59856ea8ea02d6f09d2e9e793ce2800ebbe4b
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Mon Sep 13 14:25:15 2021 +0200

    Adjust ssa-dom-thread-7.c on aarch64.

    gcc/testsuite/ChangeLog:

            * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust for aarch64.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index ba07942f9dd..e3d4b311c03 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */

 /* { dg-final { scan-tree-dump "Jumps threaded: 18"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 8" "thread3" { target { ! aarch64*-*-* } } } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */

 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-13 13:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <09e48b82-bc51-405e-7680-89a5f08e4e8f@redhat.com>
     [not found] ` <alpine.LSU.2.20.2109071406070.12583@wotan.suse.de>
     [not found]   ` <cb4191f1-0432-8740-b327-078c2aaff0fa@redhat.com>
     [not found]     ` <CAFiYyc3ZBApnN3ks3YExLTLQ8tvEdFV-uuU2MXVVqXFYn+cdRw@mail.gmail.com>
     [not found]       ` <d8234eed-6c60-c5fb-8800-5e9c5b932c58@redhat.com>
     [not found]         ` <CAFiYyc2KWNEMD31AdYuNJ-dP7ixMsWtTCtokQpcbRrZctTUqzA@mail.gmail.com>
     [not found]           ` <6d5695e4-e4eb-14a5-46a6-f425d1514008@redhat.com>
     [not found]             ` <alpine.LSU.2.20.2109081627081.12583@wotan.suse.de>
     [not found]               ` <alpine.LSU.2.20.2109081736530.12583@wotan.suse.de>
     [not found]                 ` <56bd6a6c-0416-7123-c792-521495d69654@redhat.com>
     [not found]                   ` <alpine.LSU.2.20.2109091249420.12583@wotan.suse.de>
     [not found]                     ` <7dad1f1f-98e3-f6c7-8cbd-d01122b72260@redhat.com>
     [not found]                       ` <alpine.LSU.2.20.2109091342460.12583@wotan.suse.de>
2021-09-10  7:04                         ` More aggressive threading causing loop-interchange-9.c regression Aldy Hernandez
2021-09-10 13:16                           ` Michael Matz
2021-09-10 13:53                             ` Aldy Hernandez
2021-09-10 14:33                               ` Michael Matz
2021-09-10 16:31                               ` Jeff Law
2021-09-13 11:48                                 ` Christophe Lyon
2021-09-13 13:04                                   ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).