public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
@ 2005-10-31 1:48 ` mmitchel at gcc dot gnu dot org
2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
` (4 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-10-31 1:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #55 from mmitchel at gcc dot gnu dot org 2005-10-31 01:48 -------
I wouldn't worry too much about exceedingly deep loop nests. Even the
scientific code I've looked at rarely has loop nests deeper than ten -- and if
it does, optimizing the inner ten loops is probably good enough.
So, I'd suggest that we add a --param here for max-loop-nest-depth, and then
just not do this stuff on deeper nests, or ignore all the outer loops in that
case.
Is that feasible?
Marked as P4; I think we should fix this, but I don't think it needs to happen
for 4.1. I'll revise for 4.2.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
2005-10-31 1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
@ 2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
` (3 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 13:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #56 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 13:24 -------
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3)
>
> So, I'd suggest that we add a --param here for max-loop-nest-depth, and then
> just not do this stuff on deeper nests, or ignore all the outer loops in that
> case.
>
> Is that feasible?
>
Mark, I think that this has already been fixed by the patch that added
PARAM_SCEV_MAX_EXPR_SIZE: we give up when the expressions that we
handle are longer than this size (if I remember correctly this
defaults to 20). For the example in this bug,
{ int j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, a;
a = 0;
for (j0 = 0; j0 < 2; j0 ++)
for (j1 = j0; j1 < 2; j1 ++)
for (j2 = j1; j2 < 2; j2 ++)
for (j3 = j2; j3 < 2; j3 ++)
for (j4 = j3; j4 < 2; j4 ++)
for (j5 = j4; j5 < 2; j5 ++)
for (j6 = j5; j6 < 2; j6 ++)
for (j7 = j6; j7 < 2; j7 ++)
for (j8 = j7; j8 < 2; j8 ++)
for (j9 = j8; j9 < 2; j9 ++)
a += j0 + j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8 + j9;
return a;
}
we would have a scev with 10 dimensions, i.e. an evolution in each
dimension, thus an expression with about 10 components.
Some numbers for mainline with -O2 (on amd64 with cpufreq):
time ./gcc/cc1 -O2 ~/ex/pr18595_10.c
real 0m0.345s
user 0m0.089s
sys 0m0.017s
time ./gcc/cc1 -O2 ~/ex/pr18595_100.c
tree PRE : 0.28 (21%) usr 0.02 (50%) sys 0.29 (20%) wall
7742 kB (59%) ggc
tree loop bounds : 0.14 (10%) usr 0.00 ( 0%) sys 0.15 (10%) wall
0 kB ( 0%) ggc
real 0m1.776s
user 0m1.348s
sys 0m0.050s
time ./gcc/cc1 -O2 ~/ex/pr18595_200.c
tree PRE : 1.53 (24%) usr 0.09 (60%) sys 1.63 (25%) wall
31149 kB (74%) ggc
tree loop bounds : 1.21 (19%) usr 0.00 ( 0%) sys 1.21 (18%) wall
0 kB ( 0%) ggc
real 0m7.077s
user 0m6.363s
sys 0m0.167s
time ./gcc/cc1 -O2 ~/ex/pr18595_300.c
tree PRE : 4.69 (28%) usr 0.21 (68%) sys 5.03 (29%) wall
69961 kB (80%) ggc
tree loop bounds : 4.03 (24%) usr 0.00 ( 0%) sys 4.06 (23%) wall
0 kB ( 0%) ggc
real 0m18.126s
user 0m17.022s
sys 0m0.333s
Now, if we remove PRE, we can even compile the case with 1000 nested
loops in a reasonable time:
time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_300.c
tree loop bounds : 0.09 ( 4%) usr 0.00 ( 0%) sys 0.24 ( 9%) wall
0 kB ( 0%) ggc
real 0m3.184s
user 0m2.147s
sys 0m0.054s
time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_1000.c
tree loop bounds : 2.00 ( 8%) usr 0.00 ( 0%) sys 2.00 ( 8%) wall
0 kB ( 0%) ggc
real 0m27.171s
user 0m26.298s
sys 0m0.169s
So, I think that we can safely close this PR.
Seb
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
2005-10-31 1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
` (2 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 15:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #57 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 15:02 -------
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3)
sebastian dot pop at cri dot ensmp dot fr wrote:
>
> So, I think that we can safely close this PR.
>
The previous numbers were with mainline + patch, otherwise mainline
gets an ice as described in PR24483. I only now realized this, sorry.
Seb
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-08 1:40 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 19:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #58 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 19:31 -------
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3)
Here are again the numbers for mainline with no other patch:
time ./gcc/cc1 -O2 ~/ex/pr18595_10.c
real 0m0.164s
user 0m0.116s
sys 0m0.018s
time ./gcc/cc1 -O2 ~/ex/pr18595_100.c
tree PRE : 0.33 ( 3%) usr 0.01 ( 3%) sys 0.34 ( 3%) wall
7742 kB ( 2%) ggc
tree loop bounds : 0.77 ( 7%) usr 0.03 (10%) sys 0.80 ( 7%) wall
53463 kB (17%) ggc
tree iv optimization : 7.70 (74%) usr 0.19 (66%) sys 7.97 (73%) wall
220347 kB (71%) ggc
real 0m10.912s
user 0m10.448s
sys 0m0.317s
For 200 nested loops it begins to swap even with -fno-tree-pre, so
we'd need a patch for limiting this.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-08 1:40 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 20:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #59 from sebastian dot pop at cri dot ensmp dot fr 2005-11-03 20:10 -------
Subject: Re: [4.0/4.1 Regression] IV-OPTS is O(N^3)
Here is a first patch that uses PARAM_SCEV_MAX_EXPR_SIZE for limiting
the size of expressions that we want to handle. I will send it to
gcc-patches once it bootstraps, etc.
time ./gcc/cc1 -O2 ~/ex/pr18595_100.c
tree loop bounds : 0.51 (17%) usr 0.03 (23%) sys 0.53 (17%) wall
32717 kB (29%) ggc
tree iv optimization : 1.23 (41%) usr 0.06 (46%) sys 1.29 (40%) wall
64971 kB (58%) ggc
real 0m3.995s
user 0m3.015s
sys 0m0.147s
time ./gcc/cc1 -O2 ~/ex/pr18595_200.c
tree loop bounds : 2.75 (23%) usr 0.05 (13%) sys 2.80 (22%) wall
72775 kB (22%) ggc
tree iv optimization : 3.74 (31%) usr 0.20 (51%) sys 4.30 (33%) wall
210289 kB (64%) ggc
real 0m13.316s
user 0m12.163s
sys 0m0.423s
time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_300.c
tree loop bounds : 0.16 ( 3%) usr 0.00 ( 0%) sys 0.15 ( 3%) wall
508 kB ( 1%) ggc
tree iv optimization : 2.26 (47%) usr 0.08 (62%) sys 2.33 (46%) wall
74743 kB (84%) ggc
real 0m6.287s
user 0m4.849s
sys 0m0.142s
time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_1000.c
tree loop bounds : 2.92 ( 4%) usr 0.00 ( 0%) sys 2.92 ( 4%) wall
913 kB ( 0%) ggc
tree iv optimization : 37.78 (54%) usr 0.91 (65%) sys 41.15 (52%) wall
786700 kB (93%) ggc
real 1m19.611s
user 1m10.288s
sys 0m1.403s
Index: tree-scalar-evolution.c
===================================================================
--- tree-scalar-evolution.c (revision 106440)
+++ tree-scalar-evolution.c (working copy)
@@ -1982,7 +1982,8 @@ loop_closed_phi_def (tree var)
/* Analyze all the parameters of the chrec that were left under a symbolic
form,
with respect to LOOP. CHREC is the chrec to instantiate. CACHE is the
cache
of already instantiated values. FLAGS modify the way chrecs are
- instantiated. */
+ instantiated. SIZE_EXPR is used for computing the size of the expression
to
+ be instantiated, and to stop if it exceeds some limit. */
/* Values for FLAGS. */
enum
@@ -1995,12 +1996,17 @@ enum
};
static tree
-instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t
cache)
+instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t
cache,
+ int size_expr)
{
tree res, op0, op1, op2;
basic_block def_bb;
struct loop *def_loop;
+ /* Give up if the path is longer than the MAX that we allow. */
+ if (size_expr++ > PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
+ return chrec_dont_know;
+
if (automatically_generated_chrec_p (chrec)
|| is_gimple_min_invariant (chrec))
return chrec;
@@ -2068,7 +2074,7 @@ instantiate_parameters_1 (struct loop *l
}
else if (res != chrec_dont_know)
- res = instantiate_parameters_1 (loop, res, flags, cache);
+ res = instantiate_parameters_1 (loop, res, flags, cache, size_expr);
bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec));
@@ -2078,12 +2084,12 @@ instantiate_parameters_1 (struct loop *l
case POLYNOMIAL_CHREC:
op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
@@ -2094,12 +2100,12 @@ instantiate_parameters_1 (struct loop *l
case PLUS_EXPR:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
@@ -2110,12 +2116,12 @@ instantiate_parameters_1 (struct loop *l
case MINUS_EXPR:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
@@ -2126,12 +2132,12 @@ instantiate_parameters_1 (struct loop *l
case MULT_EXPR:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
@@ -2144,7 +2150,7 @@ instantiate_parameters_1 (struct loop *l
case CONVERT_EXPR:
case NON_LVALUE_EXPR:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
@@ -2174,17 +2180,17 @@ instantiate_parameters_1 (struct loop *l
{
case 3:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2),
- flags, cache);
+ flags, cache, size_expr);
if (op2 == chrec_dont_know)
return chrec_dont_know;
@@ -2198,12 +2204,12 @@ instantiate_parameters_1 (struct loop *l
case 2:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
- flags, cache);
+ flags, cache, size_expr);
if (op1 == chrec_dont_know)
return chrec_dont_know;
@@ -2214,7 +2220,7 @@ instantiate_parameters_1 (struct loop *l
case 1:
op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
- flags, cache);
+ flags, cache, size_expr);
if (op0 == chrec_dont_know)
return chrec_dont_know;
if (op0 == TREE_OPERAND (chrec, 0))
@@ -2252,7 +2258,8 @@ instantiate_parameters (struct loop *loo
fprintf (dump_file, ")\n");
}
- res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS,
cache);
+ res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache,
+ 0);
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -2275,7 +2282,7 @@ static tree
resolve_mixers (struct loop *loop, tree chrec)
{
htab_t cache = htab_create (10, hash_scev_info, eq_scev_info,
del_scev_info);
- tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache);
+ tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache,
0);
htab_delete (cache);
return ret;
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-08 1:40 ` pinskia at gcc dot gnu dot org
5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-08 1:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #60 from pinskia at gcc dot gnu dot org 2005-11-08 01:40 -------
Fixed on the mainline.
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-11-08 1:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
2005-10-31 1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-08 1:40 ` pinskia at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).