public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
@ 2005-10-31  1:48 ` mmitchel at gcc dot gnu dot org
  2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-10-31  1:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #55 from mmitchel at gcc dot gnu dot org  2005-10-31 01:48 -------
I wouldn't worry too much about exceedingly deep loop nests.  Even the
scientific code I've looked at rarely has loop nests deeper than ten -- and if
it does, optimizing the inner ten loops is probably good enough.

So, I'd suggest that we add a --param here for max-loop-nest-depth, and then
just not do this stuff on deeper nests, or ignore all the outer loops in that
case.

Is that feasible?

Marked as P4; I think we should fix this, but I don't think it needs to happen
for 4.1.  I'll revise for 4.2.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
  2005-10-31  1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
@ 2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
  2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 13:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #56 from sebastian dot pop at cri dot ensmp dot fr  2005-11-03 13:24 -------
Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

> 
> So, I'd suggest that we add a --param here for max-loop-nest-depth, and then
> just not do this stuff on deeper nests, or ignore all the outer loops in that
> case.
> 
> Is that feasible?
> 

Mark, I think that this has already been fixed by the patch that added
PARAM_SCEV_MAX_EXPR_SIZE: we give up when the expressions that we
handle are longer than this size (if I remember correctly this
defaults to 20).  For the example in this bug,

{       int j0, j1, j2, j3, j4, j5, j6, j7, j8, j9, a;
        a = 0;
        for (j0 = 0; j0 < 2; j0 ++)
        for (j1 = j0; j1 < 2; j1 ++)
        for (j2 = j1; j2 < 2; j2 ++)
        for (j3 = j2; j3 < 2; j3 ++)
        for (j4 = j3; j4 < 2; j4 ++)
        for (j5 = j4; j5 < 2; j5 ++)
        for (j6 = j5; j6 < 2; j6 ++)
        for (j7 = j6; j7 < 2; j7 ++)
        for (j8 = j7; j8 < 2; j8 ++)
        for (j9 = j8; j9 < 2; j9 ++)
        a += j0 + j1 + j2 + j3 + j4 + j5 + j6 + j7 + j8 + j9;
        return a;
}

we would have a scev with 10 dimensions, i.e. an evolution in each
dimension, thus an expression with about 10 components.

Some numbers for mainline with -O2 (on amd64 with cpufreq):

time ./gcc/cc1 -O2 ~/ex/pr18595_10.c 
real    0m0.345s
user    0m0.089s
sys     0m0.017s

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c 
 tree PRE              :   0.28 (21%) usr   0.02 (50%) sys   0.29 (20%) wall   
7742 kB (59%) ggc
 tree loop bounds      :   0.14 (10%) usr   0.00 ( 0%) sys   0.15 (10%) wall   
   0 kB ( 0%) ggc
real    0m1.776s
user    0m1.348s
sys     0m0.050s

time ./gcc/cc1 -O2 ~/ex/pr18595_200.c 
 tree PRE              :   1.53 (24%) usr   0.09 (60%) sys   1.63 (25%) wall  
31149 kB (74%) ggc
 tree loop bounds      :   1.21 (19%) usr   0.00 ( 0%) sys   1.21 (18%) wall   
   0 kB ( 0%) ggc
real    0m7.077s
user    0m6.363s
sys     0m0.167s

time ./gcc/cc1 -O2 ~/ex/pr18595_300.c
 tree PRE              :   4.69 (28%) usr   0.21 (68%) sys   5.03 (29%) wall  
69961 kB (80%) ggc
 tree loop bounds      :   4.03 (24%) usr   0.00 ( 0%) sys   4.06 (23%) wall   
   0 kB ( 0%) ggc
real    0m18.126s
user    0m17.022s
sys     0m0.333s

Now, if we remove PRE, we can even compile the case with 1000 nested
loops in a reasonable time:

time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_300.c 
 tree loop bounds      :   0.09 ( 4%) usr   0.00 ( 0%) sys   0.24 ( 9%) wall   
   0 kB ( 0%) ggc
real    0m3.184s
user    0m2.147s
sys     0m0.054s

time ./gcc/cc1 -O2 -fno-tree-pre ~/ex/pr18595_1000.c 
 tree loop bounds      :   2.00 ( 8%) usr   0.00 ( 0%) sys   2.00 ( 8%) wall   
   0 kB ( 0%) ggc
real    0m27.171s
user    0m26.298s
sys     0m0.169s

So, I think that we can safely close this PR.

Seb


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
  2005-10-31  1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
  2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
  2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 15:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #57 from sebastian dot pop at cri dot ensmp dot fr  2005-11-03 15:02 -------
Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

sebastian dot pop at cri dot ensmp dot fr wrote:
> 
> So, I think that we can safely close this PR.
> 

The previous numbers were with mainline + patch, otherwise mainline
gets an ice as described in PR24483.  I only now realized this, sorry.

Seb


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
  2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
  2005-11-08  1:40 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 19:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #58 from sebastian dot pop at cri dot ensmp dot fr  2005-11-03 19:31 -------
Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

Here are again the numbers for mainline with no other patch:

time ./gcc/cc1 -O2 ~/ex/pr18595_10.c 
real    0m0.164s
user    0m0.116s
sys     0m0.018s

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c 
 tree PRE              :   0.33 ( 3%) usr   0.01 ( 3%) sys   0.34 ( 3%) wall   
7742 kB ( 2%) ggc
 tree loop bounds      :   0.77 ( 7%) usr   0.03 (10%) sys   0.80 ( 7%) wall  
53463 kB (17%) ggc
 tree iv optimization  :   7.70 (74%) usr   0.19 (66%) sys   7.97 (73%) wall 
220347 kB (71%) ggc
real    0m10.912s
user    0m10.448s
sys     0m0.317s

For 200 nested loops it begins to swap even with -fno-tree-pre, so
we'd need a patch for limiting this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
  2005-11-08  1:40 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 6+ messages in thread
From: sebastian dot pop at cri dot ensmp dot fr @ 2005-11-03 20:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #59 from sebastian dot pop at cri dot ensmp dot fr  2005-11-03 20:10 -------
Subject: Re:  [4.0/4.1 Regression] IV-OPTS is O(N^3)

Here is a first patch that uses PARAM_SCEV_MAX_EXPR_SIZE for limiting
the size of expressions that we want to handle.  I will send it to
gcc-patches once it bootstraps, etc.

time ./gcc/cc1 -O2 ~/ex/pr18595_100.c
 tree loop bounds      :   0.51 (17%) usr   0.03 (23%) sys   0.53 (17%) wall  
32717 kB (29%) ggc
 tree iv optimization  :   1.23 (41%) usr   0.06 (46%) sys   1.29 (40%) wall  
64971 kB (58%) ggc
real    0m3.995s
user    0m3.015s
sys     0m0.147s

time ./gcc/cc1 -O2 ~/ex/pr18595_200.c
 tree loop bounds      :   2.75 (23%) usr   0.05 (13%) sys   2.80 (22%) wall  
72775 kB (22%) ggc
 tree iv optimization  :   3.74 (31%) usr   0.20 (51%) sys   4.30 (33%) wall 
210289 kB (64%) ggc
real    0m13.316s
user    0m12.163s
sys     0m0.423s

time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_300.c
 tree loop bounds      :   0.16 ( 3%) usr   0.00 ( 0%) sys   0.15 ( 3%) wall   
 508 kB ( 1%) ggc
 tree iv optimization  :   2.26 (47%) usr   0.08 (62%) sys   2.33 (46%) wall  
74743 kB (84%) ggc
real    0m6.287s
user    0m4.849s
sys     0m0.142s

time ./gcc/cc1 -fno-tree-pre -O2 ~/ex/pr18595_1000.c
 tree loop bounds      :   2.92 ( 4%) usr   0.00 ( 0%) sys   2.92 ( 4%) wall   
 913 kB ( 0%) ggc
 tree iv optimization  :  37.78 (54%) usr   0.91 (65%) sys  41.15 (52%) wall 
786700 kB (93%) ggc
real    1m19.611s
user    1m10.288s
sys     0m1.403s


Index: tree-scalar-evolution.c
===================================================================
--- tree-scalar-evolution.c     (revision 106440)
+++ tree-scalar-evolution.c     (working copy)
@@ -1982,7 +1982,8 @@ loop_closed_phi_def (tree var)
 /* Analyze all the parameters of the chrec that were left under a symbolic
form,
    with respect to LOOP.  CHREC is the chrec to instantiate.  CACHE is the
cache
    of already instantiated values.  FLAGS modify the way chrecs are
-   instantiated.  */
+   instantiated.  SIZE_EXPR is used for computing the size of the expression
to
+   be instantiated, and to stop if it exceeds some limit.  */

 /* Values for FLAGS.  */
 enum
@@ -1995,12 +1996,17 @@ enum
 };

 static tree
-instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t
cache)
+instantiate_parameters_1 (struct loop *loop, tree chrec, int flags, htab_t
cache,
+                         int size_expr)
 {
   tree res, op0, op1, op2;
   basic_block def_bb;
   struct loop *def_loop;

+  /* Give up if the path is longer than the MAX that we allow.  */
+  if (size_expr++ > PARAM_VALUE (PARAM_SCEV_MAX_EXPR_SIZE))
+    return chrec_dont_know;
+
   if (automatically_generated_chrec_p (chrec)
       || is_gimple_min_invariant (chrec))
     return chrec;
@@ -2068,7 +2074,7 @@ instantiate_parameters_1 (struct loop *l
        }

       else if (res != chrec_dont_know)
-       res = instantiate_parameters_1 (loop, res, flags, cache);
+       res = instantiate_parameters_1 (loop, res, flags, cache, size_expr);

       bitmap_clear_bit (already_instantiated, SSA_NAME_VERSION (chrec));

@@ -2078,12 +2084,12 @@ instantiate_parameters_1 (struct loop *l

     case POLYNOMIAL_CHREC:
       op0 = instantiate_parameters_1 (loop, CHREC_LEFT (chrec),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, CHREC_RIGHT (chrec),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
        return chrec_dont_know;

@@ -2094,12 +2100,12 @@ instantiate_parameters_1 (struct loop *l

     case PLUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
        return chrec_dont_know;

@@ -2110,12 +2116,12 @@ instantiate_parameters_1 (struct loop *l

     case MINUS_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
        return chrec_dont_know;

@@ -2126,12 +2132,12 @@ instantiate_parameters_1 (struct loop *l

     case MULT_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
        return chrec_dont_know;

@@ -2144,7 +2150,7 @@ instantiate_parameters_1 (struct loop *l
     case CONVERT_EXPR:
     case NON_LVALUE_EXPR:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
         return chrec_dont_know;

@@ -2174,17 +2180,17 @@ instantiate_parameters_1 (struct loop *l
     {
     case 3:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
        return chrec_dont_know;

       op2 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 2),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op2 == chrec_dont_know)
         return chrec_dont_know;

@@ -2198,12 +2204,12 @@ instantiate_parameters_1 (struct loop *l

     case 2:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
        return chrec_dont_know;

       op1 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 1),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op1 == chrec_dont_know)
         return chrec_dont_know;

@@ -2214,7 +2220,7 @@ instantiate_parameters_1 (struct loop *l

     case 1:
       op0 = instantiate_parameters_1 (loop, TREE_OPERAND (chrec, 0),
-                                     flags, cache);
+                                     flags, cache, size_expr);
       if (op0 == chrec_dont_know)
         return chrec_dont_know;
       if (op0 == TREE_OPERAND (chrec, 0))
@@ -2252,7 +2258,8 @@ instantiate_parameters (struct loop *loo
       fprintf (dump_file, ")\n");
     }

-  res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS,
cache);
+  res = instantiate_parameters_1 (loop, chrec, INSERT_SUPERLOOP_CHRECS, cache,
+                                 0);

   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2275,7 +2282,7 @@ static tree
 resolve_mixers (struct loop *loop, tree chrec)
 {
   htab_t cache = htab_create (10, hash_scev_info, eq_scev_info,
del_scev_info);
-  tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache);
+  tree ret = instantiate_parameters_1 (loop, chrec, FOLD_CONVERSIONS, cache,
0);
   htab_delete (cache);
   return ret;
 }


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3)
       [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
@ 2005-11-08  1:40 ` pinskia at gcc dot gnu dot org
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-08  1:40 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #60 from pinskia at gcc dot gnu dot org  2005-11-08 01:40 -------
Fixed on the mainline.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18595


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-11-08  1:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-18595-6528@http.gcc.gnu.org/bugzilla/>
2005-10-31  1:48 ` [Bug tree-optimization/18595] [4.0/4.1 Regression] IV-OPTS is O(N^3) mmitchel at gcc dot gnu dot org
2005-11-03 13:24 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 15:03 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 19:31 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-03 20:10 ` sebastian dot pop at cri dot ensmp dot fr
2005-11-08  1:40 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).