public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Aldy Hernandez <aldyh@redhat.com>
To: Jeff Law <jeffreyalaw@gmail.com>,
	Richard Biener <richard.guenther@gmail.com>,
	Michael Matz <matz@suse.de>
Cc: Andrew MacLeod <amacleod@redhat.com>,
	GCC patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] More jump threading restrictions in the presence of loops.
Date: Mon, 18 Oct 2021 12:14:44 +0200	[thread overview]
Message-ID: <76fb7148-3d03-9885-cc31-26f929603986@redhat.com> (raw)
In-Reply-To: <fbde72ff-acb8-0013-eed6-21d6ee67cab1@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5650 bytes --]



On 10/17/21 3:32 AM, Jeff Law wrote:

> I think once we reach a consensus on the tests, this will be good to go.
> 
> 
>> diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
>> index 90ea1c45524..66318fc08dc 100644
>> --- a/gcc/testsuite/gcc.dg/loop-8.c
>> +++ b/gcc/testsuite/gcc.dg/loop-8.c
>> @@ -24,5 +24,9 @@ f (int *a, int *b)
>>   
>>   /* Load of 42 is moved out of the loop, introducing a new pseudo register.  */
>>   /* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
>> -/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" } } */
>> +
>> +
>> +/* ?? The expected behavior below depends on threading the 2->3->5 path
>> +   in DOM2, but this is invalid since it would rotate the loop.  */
>> +/* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" { xfail *-*-* } } } */
> So maybe the thing to do here since I guess we want to keep the test 
> would be to manually rotate the loop in the source.  In theory that 
> should restore the test to validating what we want it to validate 
> (specifically the behavior of LICM).

I have rotated the loop.  This fixes the xfail I introduced, but the 
"Decided"  test fails.  I've removed that instead.  Let me know if this 
is OK.

> 
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
>> index 0246ebf3c63..f83cefd8d89 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
>> @@ -22,6 +22,8 @@
>>   
>>      All the cases are picked up by VRP1 as jump threads.  */
>>   
>> -/* There used to be 6 jump threads found by thread1, but they all
>> -   depended on threading through distinct loops in ethread.  */
>> -/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
>> +/* This test should be obsoleted.  We used to catch 2 total threads in
>> +   vrp-thread1, but after adding loop rotating restrictions, we get
>> +   none.  Interestingly, on x86-64 we now get 1 in DOM2, 5 in DOM3,
>> +   and 1 in vrp-thread2.  */
>> +/* { dg-final { scan-tree-dump-not "Threaded" "vrp-thread1" } } */
> I think that testing nothing was threaded in vrp1 is probably best. 
> Though I wouldn't lose any sleep if this just went away.

I've opted to remove these tests, since I'm testing the exact behavior 
we're disallowing in the gimple FE tests I've included in this patch.

> 
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
>> index 8f0a12c12ee..68808bd09fc 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
>> @@ -1,10 +1,9 @@
>>   /* { dg-do compile } */
>> -/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
>> +/* { dg-options "-O2 -fdump-statistics" } */
>>   
>>   void bla();
>>   
>> -/* In the following case, we should be able to thread edge through
>> -   the loop header.  */
>> +/* No one should thread through the loop header.  */
>>   
>>   void thread_entry_through_header (void)
>>   {
>> @@ -14,8 +13,4 @@ void thread_entry_through_header (void)
>>       bla ();
>>   }
>>   
>> -/* There's a single jump thread that should be handled by the VRP
>> -   jump threading pass.  */
>> -/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
>> -/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
>> -/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
>> +/* { dg-final { scan-tree-dump-not "Jumps threaded" "statistics"} } */
> Similarly.

Same.

> 
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
>> index b0a7d423475..24de9d57d50 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
>> @@ -1,8 +1,12 @@
>>   /* { dg-do compile } */
>>   /* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
>>   
>> -/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
>> -/* { dg-final { scan-tree-dump-times "Registering jump" 1 "thread3" } } */
>> +/* ?? We should obsolete this test.  All the threads in thread1 and
>> +   thread3 we used to get cross the loop header but does not exit the
>> +   loop, so they have been deemed invalid.  */
>> +
>> +/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread1" } } */
>> +/* { dg-final { scan-tree-dump-times "Registering jump" 0 "thread3" } } */
>>   
>>   int sum0, sum1, sum2, sum3;
>>   int foo (char *s, char **ret)
> Similarly.

Same.

> 
>> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
>> index e68a9b62535..fc3adab3fc3 100644
>> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
>> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
>> @@ -65,5 +65,9 @@ int main (void)
>>     return 0;
>>   }
>>   
>> -/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
>> +/* ?? The check below depends on jump threading.  There are no a
>> +   couple threaded paths that are being rejected because they either
>> +   rotate a loop or cross the loop header without exiting the
>> +   loop.  */
>> +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" { xfail *-*-* } } } */
> So we could manually rotate the loop in the source which in turn would 
> allow this test to continue to validate SLP's behavior.

That did the trick.  Thanks.

OK?

Aldy

[-- Attachment #2: 0001-Disallow-loop-rotation-and-loop-header-crossing-in-j.patch --]
[-- Type: text/x-patch, Size: 27979 bytes --]

From 4e21d1aa9a794356b32d25a9ed539ac4bb70ab7e Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Mon, 4 Oct 2021 09:47:02 +0200
Subject: [PATCH] Disallow loop rotation and loop header crossing in jump
 threaders.

There is a lot of fall-out from this patch, as there were many threading
tests that assumed the restrictions introduced by this patch were valid.
Some tests have merely shifted the threading to after loop
optimizations, but others ended up with no threading opportunities at
all.  Surprisingly some tests ended up with more total threads.  It was
a crapshoot all around.

On a postive note, there are 6 tests that no longer XFAIL, and one
guality test which now passes.

I would appreciate someone reviewing the test changes.  I am unsure
whether some of the tests should be removed, XFAILed, or some other
thing.

I felt a bit queasy about such a fundamental change wrt threading, so I
ran it through my callgrind test harness (.ii files from a bootstrap).
There was no change in overall compilation, DOM, or the VRP threaders.

However, there was a slight increase of 1.63% in the backward threader.
I'm pretty sure we could reduce this if we incorporated the restrictions
into their profitability code.  This way we could stop the search when
we ran into one of these restrictions.  Not sure it's worth it at this
point.

Note, that this ad-hoc testing is not meant to replace a more thorough
SPEC, etc test.

Tested on x86-64 Linux.

Co-authored-by: Richard Biener <rguenther@suse.de>

gcc/ChangeLog:

	* tree-ssa-threadupdate.c (cancel_thread): Dump threading reason
	on the same line as the threading cancellation.
	(jt_path_registry::cancel_invalid_paths): Avoid rotating loops.
	Avoid threading through loop headers where the path remains in the
	loop.

libgomp/ChangeLog:

	* testsuite/libgomp.graphite/force-parallel-5.c: Remove xfail.

gcc/testsuite/ChangeLog:

	* gcc.dg/Warray-bounds-87.c: Remove xfail.
	* gcc.dg/analyzer/pr94851-2.c: Remove xfail.
	* gcc.dg/graphite/pr69728.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyr2k.c: Remove xfail.
	* gcc.dg/graphite/scop-dsyrk.c: Remove xfail.
	* gcc.dg/shrink-wrap-loop.c: Remove xfail.
	* gcc.dg/loop-8.c: Adjust for new threading restrictions.
	* gcc.dg/tree-ssa/ifc-20040816-1.c: Same.
	* gcc.dg/tree-ssa/pr21559.c: Same.
	* gcc.dg/tree-ssa/pr59597.c: Same.
	* gcc.dg/tree-ssa/pr71437.c: Same.
	* gcc.dg/tree-ssa/pr77445-2.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-4.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
	* gcc.dg/vect/bb-slp-16.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-18.c: Remove.
	* gcc.dg/tree-ssa/ssa-dom-thread-2a.c: Remove.
	* gcc.dg/tree-ssa/ssa-thread-invalid.c: New test.
---
 gcc/testsuite/gcc.dg/Warray-bounds-87.c       |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr94851-2.c     |   2 +-
 gcc/testsuite/gcc.dg/graphite/pr69728.c       |   4 +-
 gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c   |   2 +-
 gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c    |   2 +-
 gcc/testsuite/gcc.dg/loop-8.c                 |  19 ++--
 gcc/testsuite/gcc.dg/shrink-wrap-loop.c       |  54 +---------
 .../gcc.dg/tree-ssa/ifc-20040816-1.c          |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21559.c       |   7 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c       |  10 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr71437.c       |   8 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c     |   3 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-18.c       |  27 -----
 .../gcc.dg/tree-ssa/ssa-dom-thread-2a.c       |  21 ----
 .../gcc.dg/tree-ssa/ssa-dom-thread-4.c        |  14 ++-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c        |  44 --------
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c        |   5 +-
 .../gcc.dg/tree-ssa/ssa-thread-invalid.c      | 102 ++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/bb-slp-16.c         |  70 ++++++------
 gcc/tree-ssa-threadupdate.c                   |  26 ++++-
 .../libgomp.graphite/force-parallel-5.c       |   2 +-
 21 files changed, 207 insertions(+), 219 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-invalid.c

diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-87.c b/gcc/testsuite/gcc.dg/Warray-bounds-87.c
index a49874df5da..a5457807c3a 100644
--- a/gcc/testsuite/gcc.dg/Warray-bounds-87.c
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-87.c
@@ -33,7 +33,7 @@ static unsigned int h (int i, int j)
     case 9:
       return j;
     case 10:
-      return a[i];  // { dg-bogus "-Warray-bounds" "pr101671" { xfail *-*-* } }
+      return a[i];  // { dg-bogus "-Warray-bounds" "pr101671" }
     }
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr94851-2.c b/gcc/testsuite/gcc.dg/analyzer/pr94851-2.c
index 0acf48810c1..62176bdaee8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr94851-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr94851-2.c
@@ -45,7 +45,7 @@ int pamark(void) {
     if (curbp->b_amark == (AMARK *)NULL)
       curbp->b_amark = p;
     else
-      last->m_next = p; /* { dg-warning "dereference of NULL 'last'" "deref" { xfail *-*-* } } */
+      last->m_next = p; /* { dg-warning "dereference of NULL 'last'" "deref" } */
   }
 
   p->m_name = (char)c; /* { dg-bogus "leak of 'p'" "bogus leak" } */
diff --git a/gcc/testsuite/gcc.dg/graphite/pr69728.c b/gcc/testsuite/gcc.dg/graphite/pr69728.c
index 69e28318aaf..a6f385749c2 100644
--- a/gcc/testsuite/gcc.dg/graphite/pr69728.c
+++ b/gcc/testsuite/gcc.dg/graphite/pr69728.c
@@ -24,6 +24,4 @@ fn1 ()
    run into scheduling issues before here, not being able to handle
    empty domains.  */
 
-/* XFAILed by fix for PR86865.  */
-
-/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" { xfail *-*-* } } }  */
+/* { dg-final { scan-tree-dump "loop nest optimized" "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
index 925ae306903..41c91b97b57 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyr2k.c
@@ -17,4 +17,4 @@ void dsyr2k(int N) {
 #pragma endscop
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } } */ 
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
index b748946fabb..e01a517be11 100644
--- a/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
+++ b/gcc/testsuite/gcc.dg/graphite/scop-dsyrk.c
@@ -19,4 +19,4 @@ void dsyrk(int N)
 #pragma endscop
 }
 
-/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-8.c b/gcc/testsuite/gcc.dg/loop-8.c
index 90ea1c45524..a685fc25056 100644
--- a/gcc/testsuite/gcc.dg/loop-8.c
+++ b/gcc/testsuite/gcc.dg/loop-8.c
@@ -11,18 +11,23 @@ f (int *a, int *b)
 {
   int i;
 
-  for (i = 0; i < 100; i++)
+  i = 100;
+  if (i > 0)
     {
-      int d = 42;
+      do
+	{
+	  int d = 42;
 
-      a[i] = d;
-      if (i % 2)
-	d = i;
-      b[i] = d;
+	  a[i] = d;
+	  if (i % 2)
+	    d = i;
+	  b[i] = d;
+	  ++i;
+	}
+      while (i < 100);
     }
 }
 
 /* Load of 42 is moved out of the loop, introducing a new pseudo register.  */
-/* { dg-final { scan-rtl-dump-times "Decided" 1 "loop2_invariant" } } */
 /* { dg-final { scan-rtl-dump-not "without introducing a new temporary register" "loop2_invariant" } } */
 
diff --git a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
index 6e1be8937fe..ddc99e6b75a 100644
--- a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
+++ b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
@@ -1,58 +1,6 @@
 /* { dg-do compile { target { { { i?86-*-* x86_64-*-* } && lp64 } || { arm_thumb2 } } } } */
 /* { dg-options "-O2 -fdump-rtl-pro_and_epilogue"  } */
 
-/*
-Our new threader is threading things a bit too early, and causing the
-testcase in gcc.dg/shrink-wrap-loop.c to fail.
-
-  The gist is this BB inside a loop:
-
-  <bb 6> :
-  # p_2 = PHI <p2_6(D)(2), p_12(5)>
-  if (p_2 != 0B)
-    goto <bb 3>; [INV]
-  else
-    goto <bb 7>; [INV]
-
-Our threader can move this check outside of the loop (good).  This is
-done before branch probabilities are calculated and causes the probs
-to be calculated as:
-
-<bb 2> [local count: 216361238]:
-  if (p2_6(D) != 0B)
-    goto <bb 7>; [54.59%]
-  else
-    goto <bb 6>; [45.41%]
-
-Logically this seems correct to me.  A simple check outside of a loop
-should slightly but not overwhelmingly favor a non-zero value.
-
-Interestingly however, the old threader couldn't get this, but the IL
-ended up identical, albeit with different probabilities.  What happens
-is that, because the old code could not thread this, the p2 != 0 check
-would remain inside the loop and probs would be calculated thusly:
-
-  <bb 6> [local count: 1073741824]:
-  # p_2 = PHI <p2_6(D)(2), p_12(5)>
-  if (p_2 != 0B)
-    goto <bb 3>; [94.50%]
-  else
-    goto <bb 7>; [5.50%]
-
-Then when the loop header copying pass ("ch") shuffled things around,
-the IL would end up identical to my early threader code, but with the
-probabilities would remain as 94.5/5.5.
-
-The above discrepancy causes the RTL ifcvt pass to generate different
-code, and by the time we get to the shrink wrapping pass, things look
-sufficiently different such that the legacy code can actually shrink
-wrap, whereas our new code does not.
-
-IMO, if the loop-ch pass moves conditionals outside of a loop, the
-probabilities should be adjusted, but that does mean the shrink wrap
-won't happen for this contrived testcase.
- */
-
 int foo (int *p1, int *p2);
 
 int
@@ -68,4 +16,4 @@ test (int *p1, int *p2)
 
   return 1;
 }
-/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" { xfail *-*-* } } } */
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping" "pro_and_epilogue" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c
index b55a533e374..f8a6495cbaa 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-20040816-1.c
@@ -39,4 +39,4 @@ int main1 ()
    which is folded by vectorizer.  Both outgoing edges must have probability
    100% so the resulting profile match after folding.  */
 /* { dg-final { scan-tree-dump-times "Invalid sum of outgoing probabilities 200.0" 1 "ifcvt" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum of incoming counts" 1 "ifcvt" } } */
+/* { dg-final { scan-tree-dump-times "Invalid sum of incoming counts" 2 "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
index 51b3b7ac755..43f046edabe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr21559.c
@@ -35,10 +35,7 @@ void foo (void)
 /* First, we should simplify the bits < 0 test within the loop.  */
 /* { dg-final { scan-tree-dump-times "Simplified relational" 1 "evrp" } } */
 
-/* Second, we should thread the edge out of the loop via the break
-   statement.  We also realize that the final bytes == 0 test is useless,
-   and thread over it.  We also know that toread != 0 is useless when
-   entering while loop and thread over it.  */
-/* { dg-final { scan-tree-dump-times "Threaded jump" 3 "vrp-thread1" } } */
+/* We used to check for 3 threaded jumps here, but they all would
+   rotate the loop.  */
 
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
index 2caa1f532ea..764b3fe2e80 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c
@@ -56,11 +56,7 @@ main (int argc, char argv[])
   return crc;
 }
 
-/* Previously we had 3 jump threads, but one of them crossed loops.
-   The reason the old threader was allowing it, was because there was
-   an ASSERT_EXPR getting in the way.  Without the ASSERT_EXPR, we
-   have an empty pre-header block as the final block in the thread,
-   which the threader will simply join with the next block which *is*
-   in a different loop.  */
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */
+/* None of the threads we can get in vrp-thread1 are valid.  They all
+   cross or rotate loops.  */
+/* { dg-final { scan-tree-dump-not "Registering jump thread" "vrp-thread1" } } */
 /* { dg-final { scan-tree-dump-not "joiner" "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
index a2386ba19f0..eab3a25928e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-ffast-math -O3 -fdump-tree-vrp-thread1-details" } */
+/* { dg-options "-ffast-math -O3 -fdump-tree-dom3-details" } */
 
 int I = 50, J = 50;
 int S, L;
@@ -39,4 +39,8 @@ void foo (int K)
 	bar (LD, SD);
     }
 }
-/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp-thread1" } } */
+
+/* We used to get 1 vrp-thread1 candidates here, but they now get
+   deferred until after loop opts are done, because they were rotating
+   loops.  */
+/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "dom3" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
index 18f7aab2be7..f2a5e78e6be 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
@@ -123,8 +123,7 @@ enum STATES FMS( u8 **in , u32 *transitions) {
    aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
    to change decisions in switch expansion which in turn can expose new
    jump threading opportunities.  Skip the later tests on aarch64.  */
-/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Invalid sum" 1 "thread1" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: \[7-9\]" "thread2" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread1" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread2" } } */
 /* { dg-final { scan-tree-dump-not "optimizing for size" "thread3" { target { ! aarch64*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
deleted file mode 100644
index 0246ebf3c63..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-18.c
+++ /dev/null
@@ -1,27 +0,0 @@
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -std=gnu89 --param logical-op-non-short-circuit=0" } */
-
-#include "ssa-dom-thread-4.c"
-
-/* On targets that define LOGICAL_OP_NON_SHORT_CIRCUIT to 0, we split both
-   "a_elt || b_elt" and "b_elt && kill_elt" into two conditions each,
-   rather than using "(var1 != 0) op (var2 != 0)".  Also, as on other targets,
-   we duplicate the header of the inner "while" loop.  There are then
-   4 threading opportunities:
-
-   1x "!a_elt && b_elt" in the outer "while" loop
-      -> the start of the inner "while" loop,
-	 skipping the known-true "b_elt" in the first condition.
-   1x "!b_elt" in the first condition
-      -> the outer "while" loop's continuation point,
-	 skipping the known-false "b_elt" in the second condition.
-   2x "kill_elt->indx >= b_elt->indx" in the first "while" loop
-      -> "kill_elt->indx == b_elt->indx" in the second condition,
-	 skipping the known-true "b_elt && kill_elt" in the second
-	 condition.
-
-   All the cases are picked up by VRP1 as jump threads.  */
-
-/* There used to be 6 jump threads found by thread1, but they all
-   depended on threading through distinct loops in ethread.  */
-/* { dg-final { scan-tree-dump-times "Threaded" 2 "vrp-thread1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
deleted file mode 100644
index 8f0a12c12ee..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-2a.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-stats -fdump-tree-dom2-stats" } */
-
-void bla();
-
-/* In the following case, we should be able to thread edge through
-   the loop header.  */
-
-void thread_entry_through_header (void)
-{
-  int i;
-
-  for (i = 0; i < 170; i++)
-    bla ();
-}
-
-/* There's a single jump thread that should be handled by the VRP
-   jump threading pass.  */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 1" 1 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-times "Jumps threaded: 2" 0 "vrp-thread1"} } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
index 46e464ff26a..9cd463571c4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fdump-tree-vrp-thread1-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
+/* { dg-options "-O2 -fdump-tree-vrp-thread2-details -fdump-tree-dom2-details -std=gnu89 --param logical-op-non-short-circuit=1" } */
 struct bitmap_head_def;
 typedef struct bitmap_head_def *bitmap;
 typedef const struct bitmap_head_def *const_bitmap;
@@ -53,10 +53,8 @@ bitmap_ior_and_compl (bitmap dst, const_bitmap a, const_bitmap b,
 
   return changed;
 }
-/* The block starting the second conditional has  3 incoming edges,
-   we should thread all three, but due to a bug in the threading
-   code we missed the edge when the first conditional is false
-   (b_elt is zero, which means the second conditional is always
-   zero.  VRP1 catches all three.  */
-/* { dg-final { scan-tree-dump-times "Registering jump thread" 2 "vrp-thread1" } } */
-/* { dg-final { scan-tree-dump-times "Path crosses loops" 1 "vrp-thread1" } } */
+/* We used to catch 3 jump threads in vrp-thread1, but they all
+   rotated the loop, so they were disallowed.  This in turn created
+   other opportunities for the other threaders which result in the the
+   post-loop threader (vrp-thread2) catching more.  */
+/* { dg-final { scan-tree-dump-times "Registering jump thread" 5 "vrp-thread2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
deleted file mode 100644
index b0a7d423475..00000000000
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c
+++ /dev/null
@@ -1,44 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details -fdump-tree-thread3-details" } */
-
-/* { dg-final { scan-tree-dump-times "Registering jump" 6 "thread1" } } */
-/* { dg-final { scan-tree-dump-times "Registering jump" 1 "thread3" } } */
-
-int sum0, sum1, sum2, sum3;
-int foo (char *s, char **ret)
-{
-  int state=0;
-  char c;
-
-  for (; *s && state != 4; s++)
-    {
-      c = *s;
-      if (c == '*')
-	{
-	  s++;
-	  break;
-	}
-      switch (state)
-	{
-	case 0:
-	  if (c == '+')
-	    state = 1;
-	  else if (c != '-')
-	    sum0+=c;
-	  break;
-	case 1:
-	  if (c == '+')
-	    state = 2;
-	  else if (c == '-')
-	    state = 0;
-	  else
-	    sum1+=c;
-	  break;
-	default:
-	  break;
-	}
-
-    }
-  *ret = s;
-  return state;
-}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 16abcde5053..1da00a691c8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,15 +1,14 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats -fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats -fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 
-/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread1" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 5" "thread3" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 12"  "thread3" } } */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
    to change decisions in switch expansion which in turn can expose new
    jump threading opportunities.  Skip the later tests on aarch64.  */
 /* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! aarch64*-*-* } } } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" { target { ! aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp-thread2" { target { ! aarch64*-*-* } } } } */
 
 enum STATE {
   S0=0,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-invalid.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-invalid.c
new file mode 100644
index 00000000000..bd56a62a4b4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-invalid.c
@@ -0,0 +1,102 @@
+// { dg-do compile }
+// { dg-options "-O2 -fgimple -fdump-statistics" }
+//
+// This is a collection of seemingly threadble paths that should not be allowed.
+
+void foobar (int);
+
+// Possible thread from 2->4->3, but it would rotate the loop.
+void __GIMPLE (ssa)
+f1 ()
+{
+  int i;
+
+  // Pre-header.
+  __BB(2):
+  goto __BB4;
+
+  // Latch.
+  __BB(3):
+  foobar (i_1);
+  i_5 = i_1 + 1;
+  goto __BB4;
+
+  __BB(4,loop_header(1)):
+  i_1 = __PHI (__BB2: 0, __BB3: i_5);
+  if (i_1 != 101)
+    goto __BB3;
+  else
+    goto __BB5;
+
+  __BB(5):
+  return;
+
+}
+
+// Possible thread from 2->3->5 but threading through the empty latch
+// would create a non-empty latch.
+void __GIMPLE (ssa)
+f2 ()
+{
+  int i;
+
+  // Pre-header.
+  __BB(2):
+  goto __BB3;
+
+  __BB(3,loop_header(1)):
+  i_8 = __PHI (__BB5: i_5, __BB2: 0);
+  foobar (i_8);
+  i_5 = i_8 + 1;
+  if (i_5 != 256)
+    goto __BB5;
+  else
+    goto __BB4;
+
+  // Latch.
+  __BB(5):
+  goto __BB3;
+
+  __BB(4):
+  return;
+
+}
+
+// Possible thread from 3->5->6->3 but this would thread through the
+// header but not exit the loop.
+int __GIMPLE (ssa)
+f3 (int a)
+{
+  int i;
+
+  __BB(2):
+  goto __BB6;
+
+  __BB(3):
+  if (i_1 != 0)
+    goto __BB4;
+  else
+    goto __BB5;
+
+  __BB(4):
+  foobar (5);
+  goto __BB5;
+
+  // Latch.
+  __BB(5):
+  i_7 = i_1 + 1;
+  goto __BB6;
+
+  __BB(6,loop_header(1)):
+  i_1 = __PHI (__BB2: 1, __BB5: i_7);
+  if (i_1 <= 99)
+    goto __BB3;
+  else
+    goto __BB7;
+
+  __BB(7):
+  return;
+
+}
+
+// { dg-final { scan-tree-dump-not "Jumps threaded" "statistics" } }
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
index e68a9b62535..4fc176dde84 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-16.c
@@ -16,41 +16,52 @@ main1 (int dummy)
   unsigned int *pin = &in[0];
   unsigned int *pout = &out[0];
   unsigned int a = 0;
-  
-  for (i = 0; i < N; i++)
+
+  i = N;
+  if (i > 0)
     {
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      *pout++ = *pin++ + a;
-      if (arr[i] = i)
-        a = i;
-      else
-        a = 2;
+      do
+	{
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  *pout++ = *pin++ + a;
+	  if (arr[i] = i)
+	    a = i;
+	  else
+	    a = 2;
+	}
+      while (i < N);
     }
 
   a = 0;
-  /* check results: */ 
-  for (i = 0; i < N; i++)
+  /* check results: */
+  i = N;
+  if (i > 0)
     {
-      if (out[i*8] !=  in[i*8] + a
-         || out[i*8 + 1] != in[i*8 + 1] + a
-         || out[i*8 + 2] != in[i*8 + 2] + a
-         || out[i*8 + 3] != in[i*8 + 3] + a
-         || out[i*8 + 4] != in[i*8 + 4] + a
-         || out[i*8 + 5] != in[i*8 + 5] + a
-         || out[i*8 + 6] != in[i*8 + 6] + a
-         || out[i*8 + 7] != in[i*8 + 7] + a)
-	abort ();
+      do
+	{
+	  if (out[i*8] !=  in[i*8] + a
+	      || out[i*8 + 1] != in[i*8 + 1] + a
+	      || out[i*8 + 2] != in[i*8 + 2] + a
+	      || out[i*8 + 3] != in[i*8 + 3] + a
+	      || out[i*8 + 4] != in[i*8 + 4] + a
+	      || out[i*8 + 5] != in[i*8 + 5] + a
+	      || out[i*8 + 6] != in[i*8 + 6] + a
+	      || out[i*8 + 7] != in[i*8 + 7] + a)
+	    abort ();
 
-      if (arr[i] = i)
-        a = i;
-      else
-        a = 2;
+	  if (arr[i] = i)
+	    a = i;
+	  else
+	    a = 2;
+	  i++;
+	}
+      while (i < N);
     }
 
   return 0;
@@ -66,4 +77,3 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp1" } } */
-  
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 32ce1e3af40..293836cdc53 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -278,7 +278,7 @@ cancel_thread (vec<jump_thread_edge *> *path, const char *reason = NULL)
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       if (reason)
-	fprintf (dump_file, "%s:\n", reason);
+	fprintf (dump_file, "%s: ", reason);
 
       dump_jump_thread_path (dump_file, *path, false);
       fprintf (dump_file, "\n");
@@ -2771,6 +2771,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
   bool seen_latch = false;
   int loops_crossed = 0;
   bool crossed_latch = false;
+  bool crossed_loop_header = false;
   // Use ->dest here instead of ->src to ignore the first block.  The
   // first block is allowed to be in a different loop, since it'll be
   // redirected.  See similar comment in profitable_path_p: "we don't
@@ -2804,6 +2805,14 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
 	  ++loops_crossed;
 	}
 
+      // ?? Avoid threading through loop headers that remain in the
+      // loop, as such threadings tend to create sub-loops which
+      // _might_ be OK ??.
+      if (e->dest->loop_father->header == e->dest
+	  && !flow_loop_nested_p (exit->dest->loop_father,
+				  e->dest->loop_father))
+	crossed_loop_header = true;
+
       if (flag_checking && !m_backedge_threads)
 	gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0);
     }
@@ -2829,6 +2838,21 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path)
       cancel_thread (&path, "Path crosses loops");
       return true;
     }
+  // The path should either start and end in the same loop or exit the
+  // loop it starts in but never enter a loop.  This also catches
+  // creating irreducible loops, not only rotation.
+  if (entry->src->loop_father != exit->dest->loop_father
+      && !flow_loop_nested_p (exit->src->loop_father,
+			      entry->dest->loop_father))
+    {
+      cancel_thread (&path, "Path rotates loop");
+      return true;
+    }
+  if (crossed_loop_header)
+    {
+      cancel_thread (&path, "Path crosses loop header but does not exit it");
+      return true;
+    }
   return false;
 }
 
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-5.c b/libgomp/testsuite/libgomp.graphite/force-parallel-5.c
index b83ca79dfae..de31d6436f5 100644
--- a/libgomp/testsuite/libgomp.graphite/force-parallel-5.c
+++ b/libgomp/testsuite/libgomp.graphite/force-parallel-5.c
@@ -31,6 +31,6 @@ int main(void)
 }
 
 /* Check that parallel code generation part make the right answer.  */
-/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.0" 4 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "loopfn.1" 4 "optimized" } } */
-- 
2.31.1


      reply	other threads:[~2021-10-18 10:14 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04  9:43 Aldy Hernandez
2021-10-04 13:31 ` Jeff Law
2021-10-04 13:36   ` Aldy Hernandez
2021-10-04 15:30     ` Jeff Law
2021-10-04 16:29       ` Michael Matz
2021-10-05 11:22         ` Richard Biener
2021-10-05 12:43           ` Michael Matz
2021-10-05 14:56           ` Jeff Law
2021-10-05 13:33         ` Aldy Hernandez
2021-10-05 15:10           ` Jeff Law
2021-10-05 16:08           ` Jeff Law
2021-10-05 16:22             ` Aldy Hernandez
2021-10-06 13:15           ` Andreas Schwab
2021-10-06 13:47             ` Aldy Hernandez
2021-10-06 15:03               ` Martin Sebor
2021-10-06 16:22                 ` Aldy Hernandez
2021-10-06 17:03                   ` Aldy Hernandez
2021-10-06 19:11                     ` Martin Sebor
2021-10-06 23:00                   ` Jeff Law
2021-10-06 23:06               ` Jeff Law
2021-10-07  8:15                 ` Aldy Hernandez
2021-10-07 13:35                   ` Jeff Law
2021-10-06 10:22 ` Aldy Hernandez
2021-10-14  9:25   ` Aldy Hernandez
2021-10-14 14:23     ` Jeff Law
2021-10-17  1:32   ` Jeff Law
2021-10-18 10:14     ` Aldy Hernandez [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=76fb7148-3d03-9885-cc31-26f929603986@redhat.com \
    --to=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=matz@suse.de \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).