* [PATCH] Prevent excessive loop-header copying with multiple exits
@ 2018-04-26 12:39 Richard Biener
2018-04-27 8:29 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2018-04-26 12:39 UTC (permalink / raw)
To: gcc-patches
The following makes loop-header copying stop after the first exit test
it copied. The reports rightfully complain about too much peeling.
If some cases pop up which show we should peel up to a specific test
we need to improve this heuristic which simply errs on the easy side.
Bootstrap & regtest running on x86_64-unknown-linux-gnu.
Richard.
2018-04-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/28364
PR tree-optimization/85275
* tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after
copying first exit test.
* gcc.dg/tree-ssa/copy-headers-5.c: New testcase.
Index: gcc/tree-ssa-loop-ch.c
===================================================================
--- gcc/tree-ssa-loop-ch.c (revision 259669)
+++ gcc/tree-ssa-loop-ch.c (working copy)
@@ -328,6 +340,11 @@ ch_base::copy_headers (function *fun)
bbs[n_bbs++] = header;
gcc_assert (bbs_size > n_bbs);
header = exit->dest;
+ /* Make sure to stop copying after we copied the first exit test.
+ Without further heuristics we do not want to rotate the loop
+ any further. */
+ if (loop_exits_from_bb_p (loop, exit->src))
+ break;
}
if (!exit)
Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch2-details" } */
+
+int is_sorted(int *a, int n)
+{
+ for (int i = 0; i < n - 1; i++)
+ if (a[i] > a[i + 1])
+ return 0;
+ return 1;
+}
+
+/* Verify we apply loop header copying but only copy the IV test and
+ not the alternate exit test. */
+
+/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
+/* { dg-final { scan-tree-dump-times " if " 3 "ch2" } } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Prevent excessive loop-header copying with multiple exits
2018-04-26 12:39 [PATCH] Prevent excessive loop-header copying with multiple exits Richard Biener
@ 2018-04-27 8:29 ` Richard Biener
2018-04-30 14:47 ` Kyrill Tkachov
0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2018-04-27 8:29 UTC (permalink / raw)
To: gcc-patches
On Thu, 26 Apr 2018, Richard Biener wrote:
>
> The following makes loop-header copying stop after the first exit test
> it copied. The reports rightfully complain about too much peeling.
> If some cases pop up which show we should peel up to a specific test
> we need to improve this heuristic which simply errs on the easy side.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
Bootstrap went ok but it showed a few required testsuite adjustments.
gcc.dg/tree-ssa/cunroll-13.c looks fragile (it was a test for a
profile mismatch), so I rewrote it as a GIMPLE testcase.
With the ivopt_mult_[12].c testcases IVOPTs no longer elminiates
an IV due to the change in IL -- what loop header copying does now
looks more sensible than before though, so I added GIMPLE testcase
variants that verify IVOPTs can still pull off the trick but I
had to XFAIL the C testcases (less IVs still look good and I fail
to see why the trick shouldn't work with the new IL - sth to
investigate).
Re-bootstrapping and testing on x86_64-unknown-linux-gnu now.
Richard.
2018-04-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/28364
PR tree-optimization/85275
* tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after
copying first exit test.
* gcc.dg/tree-ssa/copy-headers-5.c: New testcase.
* gcc.dg/tree-ssa/predcom-8.c: Likewise.
* gcc.dg/tree-ssa/cunroll-13.c: Rewrite to gimple testcase.
* gcc.dg/tree-ssa/ivopt_mult_1.c: XFAIL.
* gcc.dg/tree-ssa/ivopt_mult_1g.c: Add gimple variant that
still passes.
* gcc.dg/tree-ssa/ivopt_mult_2.c: XFAIL.
* gcc.dg/tree-ssa/ivopt_mult_2g.c: Add gimple variant that
still passes.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
* gcc.dg/tree-ssa/20030710-1.c: Likewise.
* gcc.dg/tree-ssa/20030711-1.c: Likewise.
Index: gcc/tree-ssa-loop-ch.c
===================================================================
--- gcc/tree-ssa-loop-ch.c (revision 259695)
+++ gcc/tree-ssa-loop-ch.c (working copy)
@@ -340,6 +340,11 @@ ch_base::copy_headers (function *fun)
bbs[n_bbs++] = header;
gcc_assert (bbs_size > n_bbs);
header = exit->dest;
+ /* Make sure to stop copying after we copied the first exit test.
+ Without further heuristics we do not want to rotate the loop
+ any further. */
+ if (loop_exits_from_bb_p (loop, exit->src))
+ break;
}
if (!exit)
Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch2-details" } */
+
+int is_sorted(int *a, int n)
+{
+ for (int i = 0; i < n - 1; i++)
+ if (a[i] > a[i + 1])
+ return 0;
+ return 1;
+}
+
+/* Verify we apply loop header copying but only copy the IV test and
+ not the alternate exit test. */
+
+/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
+/* { dg-final { scan-tree-dump-times " if " 3 "ch2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-pcom-details" } */
+
+int is_sorted(int *a, int n)
+{
+ for (int i = 0; i < n - 1; i++)
+ if (a[i] > a[i + 1])
+ return 0;
+ return 1;
+}
+
+/* { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (working copy)
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O3 -fdisable-tree-evrp -fdisable-tree-cunrolli -fdisable-tree-vrp1 -fdump-tree-cunroll-blocks-details" } */
+/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */
#if __SIZEOF_INT__ < 4
__extension__ typedef __INT32_TYPE__ i32;
@@ -7,15 +7,53 @@ __extension__ typedef __INT32_TYPE__ i32
typedef int i32;
#endif
-struct a {int a[8];int b;};
-void
-t(struct a *a)
+struct a {i32 a[8];i32 b;};
+
+void __GIMPLE (startwith("fix_loops"))
+t (struct a * a)
{
- for (i32 i=0;i<123456 && a->a[i];i++)
- a->a[i]++;
+ i32 i;
+ i32 _1;
+ i32 _2;
+ i32 _9;
+ i32 _11;
+
+bb_2:
+ _11 = a_6(D)->a[0];
+ if (_11 != _Literal (i32) 0)
+ goto bb_6;
+ else
+ goto bb_3;
+
+bb_3:
+ return;
+
+bb_4:
+ _1 = _2 + 1;
+ a_6(D)->a[i_19] = _1;
+ i_8 = i_19 + _Literal (i32) 1;
+ if (i_8 <= _Literal (i32) 123455)
+ goto bb_5;
+ else
+ goto bb_3;
+
+bb_5:
+ i_19 = __PHI (bb_6: _Literal (i32) 1, bb_4: i_8);
+ _2 = a_6(D)->a[i_19];
+ if (_2 != _Literal (i32) 0)
+ goto bb_4;
+ else
+ goto bb_3;
+
+bb_6:
+ _9 = _11 + _Literal (i32) 1;
+ a_6(D)->a[0] = _9;
+ goto bb_5;
}
-/* This pass relies on the fact that we do not eliminate the redundant test for i early.
- It is necessary to disable all passes that do so. At the moment it is evrp, vrp1 and cunrolli. */
+
+/* This testcase relies on the fact that we do not eliminate the redundant test
+ for i early. It is necessary to disable all passes that do so, for the
+ moment starting with the loop pipeline is good enough. */
/* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1 "cunroll" } } */
/* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */
/* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations was eliminated" 1 "cunroll" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (working copy)
@@ -20,4 +20,4 @@ long foo(long* p, long* p2, int N1, int
return s;
}
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" { xfail *-*-* } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (working copy)
@@ -21,4 +21,4 @@ long foo(long* p, long* p2, int N1, int
return s;
}
-/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" { xfail *-*-* } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (working copy)
@@ -0,0 +1,82 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
+
+/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be
+ * eliminated. */
+long int __GIMPLE (startwith("fix_loops"))
+foo (long int * p, long int * p2, int N1, int N2)
+{
+ long int s;
+ long int * p_limit2;
+ long int * p_limit;
+ long unsigned int _1;
+ long unsigned int _2;
+ long unsigned int _3;
+ long unsigned int _4;
+ long int _5;
+
+ bb_2:
+ _1 = (long unsigned int) N1_10(D);
+ _2 = _1 * 8ul;
+ p_limit_12 = p_11(D) + _2;
+ _3 = (long unsigned int) N2_13(D);
+ _4 = _3 * 8ul;
+ p_limit2_15 = p2_14(D) + _4;
+ if (p_11(D) <= p_limit_12)
+ goto bb_3;
+ else
+ goto bb_13;
+
+ bb_13:
+
+ bb_9:
+ goto bb_6;
+
+ bb_3:
+ p_20 = p_11(D) + 8ul;
+ p2_23 = p2_14(D) + 8ul;
+ if (p_limit2_15 < p2_23)
+ goto bb_14;
+ else
+ goto bb_7;
+
+ bb_14:
+ goto bb_9;
+
+ bb_7:
+ goto bb_5;
+
+ bb_4:
+ p_16 = p_26 + 8ul;
+ p2_17 = p2_27 + 8ul;
+ if (p_limit2_15 < p2_17)
+ goto bb_11;
+ else
+ goto bb_8;
+
+ bb_11:
+ goto bb_6;
+
+ bb_8:
+ ;
+
+ bb_5:
+ s_24 = __PHI (bb_7: 0l, bb_8: s_19);
+ p_26 = __PHI (bb_7: p_20, bb_8: p_16);
+ p2_27 = __PHI (bb_7: p2_23, bb_8: p2_17);
+ _5 = __MEM <long int> (p_26);
+ s_19 = _5 + s_24;
+ if (p_limit_12 >= p_26)
+ goto bb_4;
+ else
+ goto bb_12;
+
+ bb_12:
+ ;
+
+ bb_6:
+ s_25 = __PHI (bb_12: s_19, bb_11: s_19, bb_9: 0l);
+ return s_25;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (working copy)
@@ -0,0 +1,79 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
+
+/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
+ * two ivs i and p2 can be eliminate. */
+long int __GIMPLE (startwith("fix_loops"))
+foo (long int * p, long int * p2, int N1, int N2)
+{
+ long int s;
+ long int * p_limit2;
+ int i;
+ long unsigned int _1;
+ long unsigned int _2;
+ long int _3;
+
+ bb_2:
+ _1 = (long unsigned int) N2_9(D);
+ _2 = _1 * 8ul;
+ p_limit2_11 = p2_10(D) + _2;
+ if (N1_13(D) > 0)
+ goto bb_3;
+ else
+ goto bb_13;
+
+ bb_13:
+
+ bb_9:
+ goto bb_6;
+
+ bb_3:
+ p_22 = p_12(D) + 8ul;
+ p2_23 = p2_10(D) + 8ul;
+ if (p_limit2_11 < p2_23)
+ goto bb_14;
+ else
+ goto bb_7;
+
+ bb_14:
+ goto bb_9;
+
+ bb_7:
+ goto bb_5;
+
+ bb_4:
+ p_14 = p_27 + 8ul;
+ p2_15 = p2_28 + 8ul;
+ i_16 = i_29 + 1;
+ if (p_limit2_11 < p2_15)
+ goto bb_11;
+ else
+ goto bb_8;
+
+ bb_11:
+ goto bb_6;
+
+ bb_8:
+ ;
+
+ bb_5:
+ s_25 = __PHI (bb_7: 0l, bb_8: s_18);
+ p_27 = __PHI (bb_7: p_22, bb_8: p_14);
+ p2_28 = __PHI (bb_7: p2_23, bb_8: p2_15);
+ i_29 = __PHI (bb_7: 1, bb_8: i_16);
+ _3 = __MEM <long int> (p_27);
+ s_18 = _3 + s_25;
+ if (N1_13(D) > i_29)
+ goto bb_4;
+ else
+ goto bb_12;
+
+ bb_12:
+ ;
+
+ bb_6:
+ s_26 = __PHI (bb_12: s_18, bb_11: s_18, bb_9: 0l);
+ return s_26;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (working copy)
@@ -3,7 +3,7 @@
/* { dg-final { scan-tree-dump "Jumps threaded: 16" "thread1" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
/* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
-/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" } } */
/* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (working copy)
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-dom2" } */
+/* { dg-options "-O1 -fdump-tree-phicprop1" } */
extern void abort (void);
extern void blah (void);
@@ -42,14 +42,14 @@ record_component_aliases (type)
/* The call to blah should have been eliminated. If the call is not
eliminated, then dominator optimizations failed and it'll be
impossible to delete other unnecessary code. */
-/* { dg-final { scan-tree-dump-not "blah \\(\\)" "dom2" } } */
+/* { dg-final { scan-tree-dump-not "blah \\(\\)" "phicprop1" } } */
/* There should be two IF conditionals. */
-/* { dg-final { scan-tree-dump-times "if " 2 "dom2"} } */
+/* { dg-final { scan-tree-dump-times "if " 2 "phicprop1"} } */
/* There should be a single load of type.binfo. */
-/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "dom2"} } */
+/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "phicprop1"} } */
/* There should be two loads of vec.length. */
-/* { dg-final { scan-tree-dump-times "vec.length" 2 "dom2"} } */
+/* { dg-final { scan-tree-dump-times "vec.length" 2 "phicprop1"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (revision 259695)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (working copy)
@@ -44,12 +44,12 @@ record_component_aliases (type)
/* The call to blah can not be eliminated. */
/* { dg-final { scan-tree-dump-times "blah \\(\\)" 1 "dom2" } } */
-/* There should be four IF conditionals. */
-/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
+/* There should be three IF conditionals. */
+/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
/* There should be two loads of type.binfo. */
/* { dg-final { scan-tree-dump-times "type\\.binfo" 2 "dom2"} } */
-/* There should be four loads of vec.length. */
-/* { dg-final { scan-tree-dump-times "vec.length" 4 "dom2"} } */
+/* There should be three loads of vec.length. */
+/* { dg-final { scan-tree-dump-times "vec.length" 3 "dom2"} } */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Prevent excessive loop-header copying with multiple exits
2018-04-27 8:29 ` Richard Biener
@ 2018-04-30 14:47 ` Kyrill Tkachov
2018-05-02 8:26 ` Richard Biener
0 siblings, 1 reply; 4+ messages in thread
From: Kyrill Tkachov @ 2018-04-30 14:47 UTC (permalink / raw)
To: Richard Biener, gcc-patches
Hi Richard,
On 27/04/18 09:24, Richard Biener wrote:
> On Thu, 26 Apr 2018, Richard Biener wrote:
>
> >
> > The following makes loop-header copying stop after the first exit test
> > it copied. The reports rightfully complain about too much peeling.
> > If some cases pop up which show we should peel up to a specific test
> > we need to improve this heuristic which simply errs on the easy side.
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
> Bootstrap went ok but it showed a few required testsuite adjustments.
>
> gcc.dg/tree-ssa/cunroll-13.c looks fragile (it was a test for a
> profile mismatch), so I rewrote it as a GIMPLE testcase.
>
> With the ivopt_mult_[12].c testcases IVOPTs no longer elminiates
> an IV due to the change in IL -- what loop header copying does now
> looks more sensible than before though, so I added GIMPLE testcase
> variants that verify IVOPTs can still pull off the trick but I
> had to XFAIL the C testcases (less IVs still look good and I fail
> to see why the trick shouldn't work with the new IL - sth to
> investigate).
>
> Re-bootstrapping and testing on x86_64-unknown-linux-gnu now.
>
After this patch I'm seeing gfortran.dg/pr51434.f90 FAIL on aarch64 at -O1
It is an execution failure.
The current trunk removes almost all of main (final optimised GIMPLE dump):
__attribute__((externally_visible))
main (integer(kind=4) argc, character(kind=1) * * argv)
{
static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31};
<bb 2> [local count: 1073741826]:
_gfortran_set_args (argc_2(D), argv_3(D));
_gfortran_set_options (7, &options.3[0]);
return 0;
}
whereas trunk on the 28th had much more control flow in it:
__attribute__((externally_visible))
main (integer(kind=4) argc, character(kind=1) * * argv)
{
unsigned long ivtmp.5;
static struct a c = {.m=5, .t={"a", "b", "c", "d", "e", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " "}};
static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31};
character(kind=1) _8;
character(kind=1)[1:1] * _9;
character(kind=1) _10;
<bb 2> [local count: 1073741826]:
_gfortran_set_args (argc_2(D), argv_3(D));
_gfortran_set_options (7, &options.3[0]);
<bb 3> [local count: 4208275768]:
# ivtmp.5_12 = PHI <0(2), ivtmp.5_15(4)>
_8 = MEM[symbol: c, index: ivtmp.5_12, offset: 4B];
_9 = MEM[symbol: A.1, index: ivtmp.5_12, step: 8, offset: 0B];
_10 = *_9[1]{lb: 1 sz: 1};
if (_8 != _10)
goto <bb 6>; [5.50%]
else
goto <bb 4>; [94.50%]
<bb 4> [local count: 3976820602]:
ivtmp.5_15 = ivtmp.5_12 + 1;
if (ivtmp.5_15 == 5)
goto <bb 5>; [16.67%]
else
goto <bb 3>; [83.33%]
<bb 5> [local count: 1072883003]:
return 0;
<bb 6> [local count: 429327]:
_gfortran_stop_numeric (2, 0);
}
Cheers,
Kyrill
> Richard.
>
> 2018-04-26 Richard Biener <rguenther@suse.de>
>
> PR tree-optimization/28364
> PR tree-optimization/85275
> * tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after
> copying first exit test.
>
> * gcc.dg/tree-ssa/copy-headers-5.c: New testcase.
> * gcc.dg/tree-ssa/predcom-8.c: Likewise.
> * gcc.dg/tree-ssa/cunroll-13.c: Rewrite to gimple testcase.
> * gcc.dg/tree-ssa/ivopt_mult_1.c: XFAIL.
> * gcc.dg/tree-ssa/ivopt_mult_1g.c: Add gimple variant that
> still passes.
> * gcc.dg/tree-ssa/ivopt_mult_2.c: XFAIL.
> * gcc.dg/tree-ssa/ivopt_mult_2g.c: Add gimple variant that
> still passes.
> * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
> * gcc.dg/tree-ssa/20030710-1.c: Likewise.
> * gcc.dg/tree-ssa/20030711-1.c: Likewise.
>
> Index: gcc/tree-ssa-loop-ch.c
> ===================================================================
> --- gcc/tree-ssa-loop-ch.c (revision 259695)
> +++ gcc/tree-ssa-loop-ch.c (working copy)
> @@ -340,6 +340,11 @@ ch_base::copy_headers (function *fun)
> bbs[n_bbs++] = header;
> gcc_assert (bbs_size > n_bbs);
> header = exit->dest;
> + /* Make sure to stop copying after we copied the first exit test.
> + Without further heuristics we do not want to rotate the loop
> + any further. */
> + if (loop_exits_from_bb_p (loop, exit->src))
> + break;
> }
>
> if (!exit)
> Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ch2-details" } */
> +
> +int is_sorted(int *a, int n)
> +{
> + for (int i = 0; i < n - 1; i++)
> + if (a[i] > a[i + 1])
> + return 0;
> + return 1;
> +}
> +
> +/* Verify we apply loop header copying but only copy the IV test and
> + not the alternate exit test. */
> +
> +/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
> +/* { dg-final { scan-tree-dump-times " if " 3 "ch2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-pcom-details" } */
> +
> +int is_sorted(int *a, int n)
> +{
> + for (int i = 0; i < n - 1; i++)
> + if (a[i] > a[i + 1])
> + return 0;
> + return 1;
> +}
> +
> +/* { dg-final { scan-tree-dump "Executing predictive commoning without unrolling" "pcom" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (working copy)
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O3 -fdisable-tree-evrp -fdisable-tree-cunrolli -fdisable-tree-vrp1 -fdump-tree-cunroll-blocks-details" } */
> +/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */
>
> #if __SIZEOF_INT__ < 4
> __extension__ typedef __INT32_TYPE__ i32;
> @@ -7,15 +7,53 @@ __extension__ typedef __INT32_TYPE__ i32
> typedef int i32;
> #endif
>
> -struct a {int a[8];int b;};
> -void
> -t(struct a *a)
> +struct a {i32 a[8];i32 b;};
> +
> +void __GIMPLE (startwith("fix_loops"))
> +t (struct a * a)
> {
> - for (i32 i=0;i<123456 && a->a[i];i++)
> - a->a[i]++;
> + i32 i;
> + i32 _1;
> + i32 _2;
> + i32 _9;
> + i32 _11;
> +
> +bb_2:
> + _11 = a_6(D)->a[0];
> + if (_11 != _Literal (i32) 0)
> + goto bb_6;
> + else
> + goto bb_3;
> +
> +bb_3:
> + return;
> +
> +bb_4:
> + _1 = _2 + 1;
> + a_6(D)->a[i_19] = _1;
> + i_8 = i_19 + _Literal (i32) 1;
> + if (i_8 <= _Literal (i32) 123455)
> + goto bb_5;
> + else
> + goto bb_3;
> +
> +bb_5:
> + i_19 = __PHI (bb_6: _Literal (i32) 1, bb_4: i_8);
> + _2 = a_6(D)->a[i_19];
> + if (_2 != _Literal (i32) 0)
> + goto bb_4;
> + else
> + goto bb_3;
> +
> +bb_6:
> + _9 = _11 + _Literal (i32) 1;
> + a_6(D)->a[0] = _9;
> + goto bb_5;
> }
> -/* This pass relies on the fact that we do not eliminate the redundant test for i early.
> - It is necessary to disable all passes that do so. At the moment it is evrp, vrp1 and cunrolli. */
> +
> +/* This testcase relies on the fact that we do not eliminate the redundant test
> + for i early. It is necessary to disable all passes that do so, for the
> + moment starting with the loop pipeline is good enough. */
> /* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1 "cunroll" } } */
> /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */
> /* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations was eliminated" 1 "cunroll" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (working copy)
> @@ -20,4 +20,4 @@ long foo(long* p, long* p2, int N1, int
> return s;
> }
>
> -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
> +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" { xfail *-*-* } } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (working copy)
> @@ -21,4 +21,4 @@ long foo(long* p, long* p2, int N1, int
> return s;
> }
>
> -/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
> +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" { xfail *-*-* } } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (working copy)
> @@ -0,0 +1,82 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> +
> +/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be
> + * eliminated. */
> +long int __GIMPLE (startwith("fix_loops"))
> +foo (long int * p, long int * p2, int N1, int N2)
> +{
> + long int s;
> + long int * p_limit2;
> + long int * p_limit;
> + long unsigned int _1;
> + long unsigned int _2;
> + long unsigned int _3;
> + long unsigned int _4;
> + long int _5;
> +
> + bb_2:
> + _1 = (long unsigned int) N1_10(D);
> + _2 = _1 * 8ul;
> + p_limit_12 = p_11(D) + _2;
> + _3 = (long unsigned int) N2_13(D);
> + _4 = _3 * 8ul;
> + p_limit2_15 = p2_14(D) + _4;
> + if (p_11(D) <= p_limit_12)
> + goto bb_3;
> + else
> + goto bb_13;
> +
> + bb_13:
> +
> + bb_9:
> + goto bb_6;
> +
> + bb_3:
> + p_20 = p_11(D) + 8ul;
> + p2_23 = p2_14(D) + 8ul;
> + if (p_limit2_15 < p2_23)
> + goto bb_14;
> + else
> + goto bb_7;
> +
> + bb_14:
> + goto bb_9;
> +
> + bb_7:
> + goto bb_5;
> +
> + bb_4:
> + p_16 = p_26 + 8ul;
> + p2_17 = p2_27 + 8ul;
> + if (p_limit2_15 < p2_17)
> + goto bb_11;
> + else
> + goto bb_8;
> +
> + bb_11:
> + goto bb_6;
> +
> + bb_8:
> + ;
> +
> + bb_5:
> + s_24 = __PHI (bb_7: 0l, bb_8: s_19);
> + p_26 = __PHI (bb_7: p_20, bb_8: p_16);
> + p2_27 = __PHI (bb_7: p2_23, bb_8: p2_17);
> + _5 = __MEM <long int> (p_26);
> + s_19 = _5 + s_24;
> + if (p_limit_12 >= p_26)
> + goto bb_4;
> + else
> + goto bb_12;
> +
> + bb_12:
> + ;
> +
> + bb_6:
> + s_25 = __PHI (bb_12: s_19, bb_11: s_19, bb_9: 0l);
> + return s_25;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (working copy)
> @@ -0,0 +1,79 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> +
> +/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> + * two ivs i and p2 can be eliminate. */
> +long int __GIMPLE (startwith("fix_loops"))
> +foo (long int * p, long int * p2, int N1, int N2)
> +{
> + long int s;
> + long int * p_limit2;
> + int i;
> + long unsigned int _1;
> + long unsigned int _2;
> + long int _3;
> +
> + bb_2:
> + _1 = (long unsigned int) N2_9(D);
> + _2 = _1 * 8ul;
> + p_limit2_11 = p2_10(D) + _2;
> + if (N1_13(D) > 0)
> + goto bb_3;
> + else
> + goto bb_13;
> +
> + bb_13:
> +
> + bb_9:
> + goto bb_6;
> +
> + bb_3:
> + p_22 = p_12(D) + 8ul;
> + p2_23 = p2_10(D) + 8ul;
> + if (p_limit2_11 < p2_23)
> + goto bb_14;
> + else
> + goto bb_7;
> +
> + bb_14:
> + goto bb_9;
> +
> + bb_7:
> + goto bb_5;
> +
> + bb_4:
> + p_14 = p_27 + 8ul;
> + p2_15 = p2_28 + 8ul;
> + i_16 = i_29 + 1;
> + if (p_limit2_11 < p2_15)
> + goto bb_11;
> + else
> + goto bb_8;
> +
> + bb_11:
> + goto bb_6;
> +
> + bb_8:
> + ;
> +
> + bb_5:
> + s_25 = __PHI (bb_7: 0l, bb_8: s_18);
> + p_27 = __PHI (bb_7: p_22, bb_8: p_14);
> + p2_28 = __PHI (bb_7: p2_23, bb_8: p2_15);
> + i_29 = __PHI (bb_7: 1, bb_8: i_16);
> + _3 = __MEM <long int> (p_27);
> + s_18 = _3 + s_25;
> + if (N1_13(D) > i_29)
> + goto bb_4;
> + else
> + goto bb_12;
> +
> + bb_12:
> + ;
> +
> + bb_6:
> + s_26 = __PHI (bb_12: s_18, bb_11: s_18, bb_9: 0l);
> + return s_26;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (working copy)
> @@ -3,7 +3,7 @@
> /* { dg-final { scan-tree-dump "Jumps threaded: 16" "thread1" } } */
> /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
> /* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
> +/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2" } } */
> /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" } } */
> /* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp2" } } */
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (working copy)
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-dom2" } */
> +/* { dg-options "-O1 -fdump-tree-phicprop1" } */
>
> extern void abort (void);
> extern void blah (void);
> @@ -42,14 +42,14 @@ record_component_aliases (type)
> /* The call to blah should have been eliminated. If the call is not
> eliminated, then dominator optimizations failed and it'll be
> impossible to delete other unnecessary code. */
> -/* { dg-final { scan-tree-dump-not "blah \\(\\)" "dom2" } } */
> +/* { dg-final { scan-tree-dump-not "blah \\(\\)" "phicprop1" } } */
>
> /* There should be two IF conditionals. */
> -/* { dg-final { scan-tree-dump-times "if " 2 "dom2"} } */
> +/* { dg-final { scan-tree-dump-times "if " 2 "phicprop1"} } */
>
> /* There should be a single load of type.binfo. */
> -/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "dom2"} } */
> +/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "phicprop1"} } */
>
> /* There should be two loads of vec.length. */
> -/* { dg-final { scan-tree-dump-times "vec.length" 2 "dom2"} } */
> +/* { dg-final { scan-tree-dump-times "vec.length" 2 "phicprop1"} } */
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (revision 259695)
> +++ gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (working copy)
> @@ -44,12 +44,12 @@ record_component_aliases (type)
> /* The call to blah can not be eliminated. */
> /* { dg-final { scan-tree-dump-times "blah \\(\\)" 1 "dom2" } } */
>
> -/* There should be four IF conditionals. */
> -/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
> +/* There should be three IF conditionals. */
> +/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
>
> /* There should be two loads of type.binfo. */
> /* { dg-final { scan-tree-dump-times "type\\.binfo" 2 "dom2"} } */
>
> -/* There should be four loads of vec.length. */
> -/* { dg-final { scan-tree-dump-times "vec.length" 4 "dom2"} } */
> +/* There should be three loads of vec.length. */
> +/* { dg-final { scan-tree-dump-times "vec.length" 3 "dom2"} } */
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Prevent excessive loop-header copying with multiple exits
2018-04-30 14:47 ` Kyrill Tkachov
@ 2018-05-02 8:26 ` Richard Biener
0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2018-05-02 8:26 UTC (permalink / raw)
To: Kyrill Tkachov; +Cc: gcc-patches
On Mon, 30 Apr 2018, Kyrill Tkachov wrote:
> Hi Richard,
>
>
> On 27/04/18 09:24, Richard Biener wrote:
> > On Thu, 26 Apr 2018, Richard Biener wrote:
> >
> > >
> > > The following makes loop-header copying stop after the first exit test
> > > it copied. The reports rightfully complain about too much peeling.
> > > If some cases pop up which show we should peel up to a specific test
> > > we need to improve this heuristic which simply errs on the easy side.
> > >
> > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> >
> > Bootstrap went ok but it showed a few required testsuite adjustments.
> >
> > gcc.dg/tree-ssa/cunroll-13.c looks fragile (it was a test for a
> > profile mismatch), so I rewrote it as a GIMPLE testcase.
> >
> > With the ivopt_mult_[12].c testcases IVOPTs no longer elminiates
> > an IV due to the change in IL -- what loop header copying does now
> > looks more sensible than before though, so I added GIMPLE testcase
> > variants that verify IVOPTs can still pull off the trick but I
> > had to XFAIL the C testcases (less IVs still look good and I fail
> > to see why the trick shouldn't work with the new IL - sth to
> > investigate).
> >
> > Re-bootstrapping and testing on x86_64-unknown-linux-gnu now.
> >
>
> After this patch I'm seeing gfortran.dg/pr51434.f90 FAIL on aarch64 at -O1
> It is an execution failure.
>
> The current trunk removes almost all of main (final optimised GIMPLE dump):
>
> __attribute__((externally_visible))
> main (integer(kind=4) argc, character(kind=1) * * argv)
> {
> static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31};
>
> <bb 2> [local count: 1073741826]:
> _gfortran_set_args (argc_2(D), argv_3(D));
> _gfortran_set_options (7, &options.3[0]);
> return 0;
>
> }
I see this at -O3. But the IL below following at -O1.
Meanwhile PR85579 got filed for this where I noted the testcase
is somwhat fishy.
Richard.
> whereas trunk on the 28th had much more control flow in it:
>
> __attribute__((externally_visible))
> main (integer(kind=4) argc, character(kind=1) * * argv)
> {
> unsigned long ivtmp.5;
> static struct a c = {.m=5, .t={"a", "b", "c", "d", "e", " ", " ", " ", " ",
> " ", " ", " ", " ", " ", " ", " ", " ", " "}};
> static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31};
> character(kind=1) _8;
> character(kind=1)[1:1] * _9;
> character(kind=1) _10;
>
> <bb 2> [local count: 1073741826]:
> _gfortran_set_args (argc_2(D), argv_3(D));
> _gfortran_set_options (7, &options.3[0]);
>
> <bb 3> [local count: 4208275768]:
> # ivtmp.5_12 = PHI <0(2), ivtmp.5_15(4)>
> _8 = MEM[symbol: c, index: ivtmp.5_12, offset: 4B];
> _9 = MEM[symbol: A.1, index: ivtmp.5_12, step: 8, offset: 0B];
> _10 = *_9[1]{lb: 1 sz: 1};
> if (_8 != _10)
> goto <bb 6>; [5.50%]
> else
> goto <bb 4>; [94.50%]
>
> <bb 4> [local count: 3976820602]:
> ivtmp.5_15 = ivtmp.5_12 + 1;
> if (ivtmp.5_15 == 5)
> goto <bb 5>; [16.67%]
> else
> goto <bb 3>; [83.33%]
>
> <bb 5> [local count: 1072883003]:
> return 0;
>
> <bb 6> [local count: 429327]:
> _gfortran_stop_numeric (2, 0);
>
> }
>
> Cheers,
> Kyrill
>
> > Richard.
> >
> > 2018-04-26 Richard Biener <rguenther@suse.de>
> >
> > PR tree-optimization/28364
> > PR tree-optimization/85275
> > * tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after
> > copying first exit test.
> >
> > * gcc.dg/tree-ssa/copy-headers-5.c: New testcase.
> > * gcc.dg/tree-ssa/predcom-8.c: Likewise.
> > * gcc.dg/tree-ssa/cunroll-13.c: Rewrite to gimple testcase.
> > * gcc.dg/tree-ssa/ivopt_mult_1.c: XFAIL.
> > * gcc.dg/tree-ssa/ivopt_mult_1g.c: Add gimple variant that
> > still passes.
> > * gcc.dg/tree-ssa/ivopt_mult_2.c: XFAIL.
> > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Add gimple variant that
> > still passes.
> > * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.
> > * gcc.dg/tree-ssa/20030710-1.c: Likewise.
> > * gcc.dg/tree-ssa/20030711-1.c: Likewise.
> >
> > Index: gcc/tree-ssa-loop-ch.c
> > ===================================================================
> > --- gcc/tree-ssa-loop-ch.c (revision 259695)
> > +++ gcc/tree-ssa-loop-ch.c (working copy)
> > @@ -340,6 +340,11 @@ ch_base::copy_headers (function *fun)
> > bbs[n_bbs++] = header;
> > gcc_assert (bbs_size > n_bbs);
> > header = exit->dest;
> > + /* Make sure to stop copying after we copied the first exit test.
> > + Without further heuristics we do not want to rotate the loop
> > + any further. */
> > + if (loop_exits_from_bb_p (loop, exit->src))
> > + break;
> > }
> >
> > if (!exit)
> > Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (nonexistent)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (working copy)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-ch2-details" } */
> > +
> > +int is_sorted(int *a, int n)
> > +{
> > + for (int i = 0; i < n - 1; i++)
> > + if (a[i] > a[i + 1])
> > + return 0;
> > + return 1;
> > +}
> > +
> > +/* Verify we apply loop header copying but only copy the IV test and
> > + not the alternate exit test. */
> > +
> > +/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
> > +/* { dg-final { scan-tree-dump-times " if " 3 "ch2" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (nonexistent)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (working copy)
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-pcom-details" } */
> > +
> > +int is_sorted(int *a, int n)
> > +{
> > + for (int i = 0; i < n - 1; i++)
> > + if (a[i] > a[i + 1])
> > + return 0;
> > + return 1;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "Executing predictive commoning without
> > unrolling" "pcom" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (working copy)
> > @@ -1,5 +1,5 @@
> > /* { dg-do compile } */
> > -/* { dg-options "-O3 -fdisable-tree-evrp -fdisable-tree-cunrolli
> > -fdisable-tree-vrp1 -fdump-tree-cunroll-blocks-details" } */
> > +/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */
> >
> > #if __SIZEOF_INT__ < 4
> > __extension__ typedef __INT32_TYPE__ i32;
> > @@ -7,15 +7,53 @@ __extension__ typedef __INT32_TYPE__ i32
> > typedef int i32;
> > #endif
> >
> > -struct a {int a[8];int b;};
> > -void
> > -t(struct a *a)
> > +struct a {i32 a[8];i32 b;};
> > +
> > +void __GIMPLE (startwith("fix_loops"))
> > +t (struct a * a)
> > {
> > - for (i32 i=0;i<123456 && a->a[i];i++)
> > - a->a[i]++;
> > + i32 i;
> > + i32 _1;
> > + i32 _2;
> > + i32 _9;
> > + i32 _11;
> > +
> > +bb_2:
> > + _11 = a_6(D)->a[0];
> > + if (_11 != _Literal (i32) 0)
> > + goto bb_6;
> > + else
> > + goto bb_3;
> > +
> > +bb_3:
> > + return;
> > +
> > +bb_4:
> > + _1 = _2 + 1;
> > + a_6(D)->a[i_19] = _1;
> > + i_8 = i_19 + _Literal (i32) 1;
> > + if (i_8 <= _Literal (i32) 123455)
> > + goto bb_5;
> > + else
> > + goto bb_3;
> > +
> > +bb_5:
> > + i_19 = __PHI (bb_6: _Literal (i32) 1, bb_4: i_8);
> > + _2 = a_6(D)->a[i_19];
> > + if (_2 != _Literal (i32) 0)
> > + goto bb_4;
> > + else
> > + goto bb_3;
> > +
> > +bb_6:
> > + _9 = _11 + _Literal (i32) 1;
> > + a_6(D)->a[0] = _9;
> > + goto bb_5;
> > }
> > -/* This pass relies on the fact that we do not eliminate the redundant test
> > for i early.
> > - It is necessary to disable all passes that do so. At the moment it is
> > evrp, vrp1 and cunrolli. */
> > +
> > +/* This testcase relies on the fact that we do not eliminate the redundant
> > test
> > + for i early. It is necessary to disable all passes that do so, for the
> > + moment starting with the loop pipeline is good enough. */
> > /* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1
> > "cunroll" } } */
> > /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved
> > true" 1 "cunroll" } } */
> > /* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations
> > was eliminated" 1 "cunroll" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (working copy)
> > @@ -20,4 +20,4 @@ long foo(long* p, long* p2, int N1, int
> > return s;
> > }
> >
> > -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
> > +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" { xfail *-*-* }
> > } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (working copy)
> > @@ -21,4 +21,4 @@ long foo(long* p, long* p2, int N1, int
> > return s;
> > }
> >
> > -/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
> > +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" { xfail *-*-* }
> > } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (nonexistent)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (working copy)
> > @@ -0,0 +1,82 @@
> > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> > +
> > +/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be
> > + * eliminated. */
> > +long int __GIMPLE (startwith("fix_loops"))
> > +foo (long int * p, long int * p2, int N1, int N2)
> > +{
> > + long int s;
> > + long int * p_limit2;
> > + long int * p_limit;
> > + long unsigned int _1;
> > + long unsigned int _2;
> > + long unsigned int _3;
> > + long unsigned int _4;
> > + long int _5;
> > +
> > + bb_2:
> > + _1 = (long unsigned int) N1_10(D);
> > + _2 = _1 * 8ul;
> > + p_limit_12 = p_11(D) + _2;
> > + _3 = (long unsigned int) N2_13(D);
> > + _4 = _3 * 8ul;
> > + p_limit2_15 = p2_14(D) + _4;
> > + if (p_11(D) <= p_limit_12)
> > + goto bb_3;
> > + else
> > + goto bb_13;
> > +
> > + bb_13:
> > +
> > + bb_9:
> > + goto bb_6;
> > +
> > + bb_3:
> > + p_20 = p_11(D) + 8ul;
> > + p2_23 = p2_14(D) + 8ul;
> > + if (p_limit2_15 < p2_23)
> > + goto bb_14;
> > + else
> > + goto bb_7;
> > +
> > + bb_14:
> > + goto bb_9;
> > +
> > + bb_7:
> > + goto bb_5;
> > +
> > + bb_4:
> > + p_16 = p_26 + 8ul;
> > + p2_17 = p2_27 + 8ul;
> > + if (p_limit2_15 < p2_17)
> > + goto bb_11;
> > + else
> > + goto bb_8;
> > +
> > + bb_11:
> > + goto bb_6;
> > +
> > + bb_8:
> > + ;
> > +
> > + bb_5:
> > + s_24 = __PHI (bb_7: 0l, bb_8: s_19);
> > + p_26 = __PHI (bb_7: p_20, bb_8: p_16);
> > + p2_27 = __PHI (bb_7: p2_23, bb_8: p2_17);
> > + _5 = __MEM <long int> (p_26);
> > + s_19 = _5 + s_24;
> > + if (p_limit_12 >= p_26)
> > + goto bb_4;
> > + else
> > + goto bb_12;
> > +
> > + bb_12:
> > + ;
> > +
> > + bb_6:
> > + s_25 = __PHI (bb_12: s_19, bb_11: s_19, bb_9: 0l);
> > + return s_25;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (nonexistent)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (working copy)
> > @@ -0,0 +1,79 @@
> > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */
> > +
> > +/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
> > + * two ivs i and p2 can be eliminate. */
> > +long int __GIMPLE (startwith("fix_loops"))
> > +foo (long int * p, long int * p2, int N1, int N2)
> > +{
> > + long int s;
> > + long int * p_limit2;
> > + int i;
> > + long unsigned int _1;
> > + long unsigned int _2;
> > + long int _3;
> > +
> > + bb_2:
> > + _1 = (long unsigned int) N2_9(D);
> > + _2 = _1 * 8ul;
> > + p_limit2_11 = p2_10(D) + _2;
> > + if (N1_13(D) > 0)
> > + goto bb_3;
> > + else
> > + goto bb_13;
> > +
> > + bb_13:
> > +
> > + bb_9:
> > + goto bb_6;
> > +
> > + bb_3:
> > + p_22 = p_12(D) + 8ul;
> > + p2_23 = p2_10(D) + 8ul;
> > + if (p_limit2_11 < p2_23)
> > + goto bb_14;
> > + else
> > + goto bb_7;
> > +
> > + bb_14:
> > + goto bb_9;
> > +
> > + bb_7:
> > + goto bb_5;
> > +
> > + bb_4:
> > + p_14 = p_27 + 8ul;
> > + p2_15 = p2_28 + 8ul;
> > + i_16 = i_29 + 1;
> > + if (p_limit2_11 < p2_15)
> > + goto bb_11;
> > + else
> > + goto bb_8;
> > +
> > + bb_11:
> > + goto bb_6;
> > +
> > + bb_8:
> > + ;
> > +
> > + bb_5:
> > + s_25 = __PHI (bb_7: 0l, bb_8: s_18);
> > + p_27 = __PHI (bb_7: p_22, bb_8: p_14);
> > + p2_28 = __PHI (bb_7: p2_23, bb_8: p2_15);
> > + i_29 = __PHI (bb_7: 1, bb_8: i_16);
> > + _3 = __MEM <long int> (p_27);
> > + s_18 = _3 + s_25;
> > + if (N1_13(D) > i_29)
> > + goto bb_4;
> > + else
> > + goto bb_12;
> > +
> > + bb_12:
> > + ;
> > +
> > + bb_6:
> > + s_26 = __PHI (bb_12: s_18, bb_11: s_18, bb_9: 0l);
> > + return s_26;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (working copy)
> > @@ -3,7 +3,7 @@
> > /* { dg-final { scan-tree-dump "Jumps threaded: 16" "thread1" } } */
> > /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
> > /* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */
> > -/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */
> > +/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2" } } */
> > /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" } } */
> > /* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp2" } } */
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (working copy)
> > @@ -1,5 +1,5 @@
> > /* { dg-do compile } */
> > -/* { dg-options "-O1 -fdump-tree-dom2" } */
> > +/* { dg-options "-O1 -fdump-tree-phicprop1" } */
> >
> > extern void abort (void);
> > extern void blah (void);
> > @@ -42,14 +42,14 @@ record_component_aliases (type)
> > /* The call to blah should have been eliminated. If the call is not
> > eliminated, then dominator optimizations failed and it'll be
> > impossible to delete other unnecessary code. */
> > -/* { dg-final { scan-tree-dump-not "blah \\(\\)" "dom2" } } */
> > +/* { dg-final { scan-tree-dump-not "blah \\(\\)" "phicprop1" } } */
> >
> > /* There should be two IF conditionals. */
> > -/* { dg-final { scan-tree-dump-times "if " 2 "dom2"} } */
> > +/* { dg-final { scan-tree-dump-times "if " 2 "phicprop1"} } */
> >
> > /* There should be a single load of type.binfo. */
> > -/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "dom2"} } */
> > +/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "phicprop1"} } */
> >
> > /* There should be two loads of vec.length. */
> > -/* { dg-final { scan-tree-dump-times "vec.length" 2 "dom2"} } */
> > +/* { dg-final { scan-tree-dump-times "vec.length" 2 "phicprop1"} } */
> >
> > Index: gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (revision 259695)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (working copy)
> > @@ -44,12 +44,12 @@ record_component_aliases (type)
> > /* The call to blah can not be eliminated. */
> > /* { dg-final { scan-tree-dump-times "blah \\(\\)" 1 "dom2" } } */
> >
> > -/* There should be four IF conditionals. */
> > -/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */
> > +/* There should be three IF conditionals. */
> > +/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */
> >
> > /* There should be two loads of type.binfo. */
> > /* { dg-final { scan-tree-dump-times "type\\.binfo" 2 "dom2"} } */
> >
> > -/* There should be four loads of vec.length. */
> > -/* { dg-final { scan-tree-dump-times "vec.length" 4 "dom2"} } */
> > +/* There should be three loads of vec.length. */
> > +/* { dg-final { scan-tree-dump-times "vec.length" 3 "dom2"} } */
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-05-02 8:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-26 12:39 [PATCH] Prevent excessive loop-header copying with multiple exits Richard Biener
2018-04-27 8:29 ` Richard Biener
2018-04-30 14:47 ` Kyrill Tkachov
2018-05-02 8:26 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).