public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/46590] New: long compile time with -O2 and many loops
@ 2010-11-21 13:23 tkoenig at gcc dot gnu.org
  2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
                   ` (45 more replies)
  0 siblings, 46 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 13:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

           Summary: long compile time with -O2 and many loops
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Keywords: compile-time-hog
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: tkoenig@gcc.gnu.org


The attached program, which is a stress-test to test array assignment
dependencies for gfortran, compiles just about forever at -O2;
so far, it's 90 minutes and still no sign of finishing.

Compilation time without -O2 is 45 seconds.

Here is the perl script used to generate the test case.

#! /usr/bin/perl                                                         

$cnt = 0;
# @ind = (-6, -3, -1, 1, 3, 6);
@ind = (-5, -3, -1, 1, 3, 5);  
$n = 6;                        

print <<EOF;
program main
  implicit none
  integer, dimension(-100:100) :: a,b,original
  integer :: i
  original = [(15*i+2,i=-100,100)]

EOF
foreach $stride_l (@ind) {
    foreach $stride_r (@ind) {
        for ($start_l = 1; $start_l < $n; $start_l ++) {
            for ($start_r = 1; $start_r < $n; $start_r ++) {
                for ($times = 0; $times < $n; $times ++) {
                    $end_l = $start_l + ($times-1)*$stride_l;
                    $end_r = $start_r + ($times-1)*$stride_r;
                    print <<EOF;
  a = original
  b = original
  a($start_l:$end_l:$stride_l) =  a($start_r:$end_r:$stride_r)
  b($start_l:$end_l:$stride_l) = original($start_r:$end_r:$stride_r)
  if (any (a /= b)) then
    print *,$start_l, $end_l,$stride_l,$start_r,$end_r, $stride_r
    call abort
  end if

EOF
                }
            }
        }
    }
}
print "end program main\n";


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
@ 2010-11-21 13:25 ` tkoenig at gcc dot gnu.org
  2010-11-21 17:19 ` dominiq at lps dot ens.fr
                   ` (44 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 13:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 13:22:50 UTC ---
Created attachment 22475
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22475
gzipped test case

Test case.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
  2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
@ 2010-11-21 17:19 ` dominiq at lps dot ens.fr
  2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
                   ` (43 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: dominiq at lps dot ens.fr @ 2010-11-21 17:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-11-21 16:59:39 UTC ---
On my laptop the test in comment #1 takes ~4mn to compile and at -O1 I had to
interrupt the compilation after ~20mn during which all the time was spent in
memory pagination!-(with 4Gb of RAM).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
  2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
  2010-11-21 17:19 ` dominiq at lps dot ens.fr
@ 2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
  2010-11-21 19:13 ` rguenth at gcc dot gnu.org
                   ` (42 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 18:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |memory-hog

--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 18:49:20 UTC ---
Yes, this is a memory hog as well:

time ~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -O2 gener-max.f90
 MAIN__ main
Analyzing compilation unit
 {GC 183863k -> 103486k}Performing interprocedural optimizations
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 156854k -> 151699k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:
 MAIN__ {GC 281998k -> 189095k} {GC 393064k -> 341214k} {GC 554029k -> 337458k}
{GC 477071k -> 337199k} {GC 557246k -> 473808k}
f951: out of memory allocating 968552 bytes after a total of 4310765568 bytes

real    150m58.614s
user    150m16.083s
sys     0m11.924s
ig25@linux-fd1f:~/Krempel/Dep-c>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 19:13 ` rguenth at gcc dot gnu.org
  2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
                   ` (41 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-21 19:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-21 18:56:58 UTC ---
Please strip down the testcase and provide -ftime-report output (I suppose
you hit IVOPTs slowness, so try -fno-ivopts).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2010-11-21 19:13 ` rguenth at gcc dot gnu.org
@ 2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
  2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
                   ` (40 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 19:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 19:48:41 UTC ---
Without -fno-ivopts:

ig25@linux-fd1f:~/Krempel/Dep-c> time
~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2 
gener-4.f90                                                                     
 MAIN__ main                                                                    
Analyzing compilation unit                                                      
 {GC 44026k -> 25146k}Performing interprocedural optimizations                  
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:                                   
 MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC
75220k -> 51694k} {GC 89661k -> 77264k} {GC 500191k -> 73724k} {GC 106651k ->
72832k} main                                  
Execution times (seconds)                                                       
 garbage collection    :   1.05 ( 0%) usr   0.03 ( 2%) sys   1.09 ( 0%) wall   
   0 kB ( 0%) ggc    
 callgraph construction:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
2465 kB ( 0%) ggc    
 callgraph optimization:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   2 kB ( 0%) ggc    
 ipa pure const        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc    
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc    
 cfg cleanup           :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall   
 344 kB ( 0%) ggc    
 CFG verifier          :   1.35 ( 0%) usr   0.00 ( 0%) sys   1.35 ( 0%) wall   
   0 kB ( 0%) ggc    
 trivially dead code   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall   
   0 kB ( 0%) ggc    
 df scan insns         :   0.17 ( 0%) usr   0.01 ( 1%) sys   0.18 ( 0%) wall   
   0 kB ( 0%) ggc    
 df multiple defs      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc    
 df reaching defs      :   2.04 ( 1%) usr   0.01 ( 1%) sys   2.11 ( 1%) wall   
   0 kB ( 0%) ggc    
 df live regs          :   1.02 ( 0%) usr   0.00 ( 0%) sys   1.06 ( 0%) wall   
   0 kB ( 0%) ggc    
 df live&initialized regs:   0.29 ( 0%) usr   0.01 ( 1%) sys   0.24 ( 0%) wall 
     0 kB ( 0%) ggc  
 df use-def / def-use chains:   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%)
wall       0 kB ( 0%) ggc                                                       
 df reg dead/unused notes:   0.43 ( 0%) usr   0.00 ( 0%) sys   0.43 ( 0%) wall 
  2460 kB ( 0%) ggc  
 register information  :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
   0 kB ( 0%) ggc    
 alias analysis        :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) wall   
2816 kB ( 0%) ggc    
 alias stmt walking    : 192.54 (71%) usr   0.22 (16%) sys 193.02 (71%) wall   
7023 kB ( 1%) ggc    
 register scan         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
  20 kB ( 0%) ggc    
 rebuild jump labels   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
   0 kB ( 0%) ggc    
 parser                :   0.43 ( 0%) usr   0.03 ( 2%) sys   0.47 ( 0%) wall  
20481 kB ( 3%) ggc    
 inline heuristics     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree gimplify         :   0.16 ( 0%) usr   0.02 ( 1%) sys   0.19 ( 0%) wall  
17257 kB ( 3%) ggc    
 tree eh               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree CFG construction :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
5915 kB ( 1%) ggc    
 tree CFG cleanup      :   1.06 ( 0%) usr   0.01 ( 1%) sys   1.07 ( 0%) wall   
 602 kB ( 0%) ggc    
 tree VRP              :   0.40 ( 0%) usr   0.01 ( 1%) sys   0.37 ( 0%) wall   
4754 kB ( 1%) ggc    
 tree copy propagation :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall   
  75 kB ( 0%) ggc    
 tree find ref. vars   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 865 kB ( 0%) ggc    
 tree PTA              :   2.24 ( 1%) usr   0.06 ( 4%) sys   2.30 ( 1%) wall   
2253 kB ( 0%) ggc    
 tree PHI insertion    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
2203 kB ( 0%) ggc    
 tree SSA rewrite      :   3.18 ( 1%) usr   0.00 ( 0%) sys   2.75 ( 1%) wall   
6503 kB ( 1%) ggc    
 tree SSA other        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
  32 kB ( 0%) ggc    
 tree SSA incremental  :   6.64 ( 2%) usr   0.01 ( 1%) sys   6.73 ( 2%) wall   
1372 kB ( 0%) ggc    
 tree operand scan     :   0.17 ( 0%) usr   0.09 ( 7%) sys   0.28 ( 0%) wall  
13590 kB ( 2%) ggc    
 dominator optimization:   0.38 ( 0%) usr   0.01 ( 1%) sys   0.40 ( 0%) wall  
22013 kB ( 3%) ggc    
 tree SRA              :   0.09 ( 0%) usr   0.02 ( 1%) sys   0.15 ( 0%) wall  
11123 kB ( 2%) ggc    
 tree CCP              :   0.45 ( 0%) usr   0.02 ( 1%) sys   0.43 ( 0%) wall   
 838 kB ( 0%) ggc    
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 559 kB ( 0%) ggc    
 tree PRE              :   0.53 ( 0%) usr   0.04 ( 3%) sys   0.67 ( 0%) wall   
7992 kB ( 1%) ggc    
 tree FRE              :   0.34 ( 0%) usr   0.01 ( 1%) sys   0.42 ( 0%) wall   
 530 kB ( 0%) ggc    
 tree code sinking     :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall   
 360 kB ( 0%) ggc    
 tree linearize phis   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree forward propagate:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 337 kB ( 0%) ggc    
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree conservative DCE :   0.08 ( 0%) usr   0.03 ( 2%) sys   0.05 ( 0%) wall   
  64 kB ( 0%) ggc    
 tree aggressive DCE   :   0.23 ( 0%) usr   0.03 ( 2%) sys   0.27 ( 0%) wall   
4227 kB ( 1%) ggc    
 tree buildin call DCE :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree DSE              :   0.83 ( 0%) usr   0.00 ( 0%) sys   0.85 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree loop bounds      :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 821 kB ( 0%) ggc    
 tree loop invariant motion:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%)
wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
1012 kB ( 0%) ggc    
 scev constant prop    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
 174 kB ( 0%) ggc    
 complete unrolling    :   1.00 ( 0%) usr   0.03 ( 2%) sys   0.86 ( 0%) wall  
16838 kB ( 2%) ggc    
 tree iv optimization  :   0.31 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) wall  
11430 kB ( 2%) ggc    
 tree loop init        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 394 kB ( 0%) ggc    
 tree copy headers     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 613 kB ( 0%) ggc    
 tree SSA uncprop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree rename SSA copies:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree SSA verifier     :   2.92 ( 1%) usr   0.00 ( 0%) sys   2.81 ( 1%) wall   
   0 kB ( 0%) ggc    
 tree STMT verifier    :   5.27 ( 2%) usr   0.00 ( 0%) sys   5.41 ( 2%) wall   
   0 kB ( 0%) ggc    
 callgraph verifier    :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) wall   
   0 kB ( 0%) ggc    
 dominance frontiers   :   6.46 ( 2%) usr   0.01 ( 1%) sys   7.35 ( 3%) wall   
   0 kB ( 0%) ggc    
 dominance computation :   5.33 ( 2%) usr   0.01 ( 1%) sys   5.06 ( 2%) wall   
   0 kB ( 0%) ggc    
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 out of ssa            :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc    
 expand vars           :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
 726 kB ( 0%) ggc    
 expand                :   0.44 ( 0%) usr   0.01 ( 1%) sys   0.45 ( 0%) wall  
37845 kB ( 5%) ggc    
 post expand cleanups  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   1 kB ( 0%) ggc    
 forward prop          :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) wall   
1217 kB ( 0%) ggc    
 CSE                   :   0.97 ( 0%) usr   0.00 ( 0%) sys   0.97 ( 0%) wall   
2127 kB ( 0%) ggc    
 dead code elimination :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall   
   0 kB ( 0%) ggc    
 dead store elim1      :   0.23 ( 0%) usr   0.01 ( 1%) sys   0.23 ( 0%) wall   
4659 kB ( 1%) ggc    
 dead store elim2      :   0.50 ( 0%) usr   0.00 ( 0%) sys   0.51 ( 0%) wall   
5526 kB ( 1%) ggc    
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 836 kB ( 0%) ggc    
 loop invariant motion :   5.21 ( 2%) usr   0.01 ( 1%) sys   5.23 ( 2%) wall   
   0 kB ( 0%) ggc    
 CPROP                 :   0.88 ( 0%) usr   0.03 ( 2%) sys   0.90 ( 0%) wall  
13157 kB ( 2%) ggc    
 PRE                   :  10.87 ( 4%) usr   0.33 (24%) sys  11.24 ( 4%) wall 
409967 kB (60%) ggc    
 CSE 2                 :   0.53 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall   
 729 kB ( 0%) ggc    
 branch prediction     :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall   
1542 kB ( 0%) ggc    
 combiner              :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.26 ( 0%) wall   
1844 kB ( 0%) ggc
 if-conversion         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 559 kB ( 0%) ggc
 regmove               :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc
 integrated RA         :   2.04 ( 1%) usr   0.21 (15%) sys   2.26 ( 1%) wall   
5282 kB ( 1%) ggc
 reload                :   0.90 ( 0%) usr   0.00 ( 0%) sys   0.90 ( 0%) wall  
13255 kB ( 2%) ggc
 reload CSE regs       :   1.28 ( 0%) usr   0.00 ( 0%) sys   1.28 ( 0%) wall  
13060 kB ( 2%) ggc
 zee                   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 thread pro- & epilogue:   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   3 kB ( 0%) ggc
 if-conversion 2       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 279 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
      0 kB ( 0%) ggc
 peephole 2            :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
 354 kB ( 0%) ggc
 hard reg cprop        :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :   1.84 ( 1%) usr   0.02 ( 1%) sys   1.85 ( 1%) wall   
  71 kB ( 0%) ggc
 machine dep reorg     :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall   
   0 kB ( 0%) ggc
 reorder blocks        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
 891 kB ( 0%) ggc
 final                 :   0.31 ( 0%) usr   0.01 ( 1%) sys   0.31 ( 0%) wall   
   0 kB ( 0%) ggc
 rest of compilation   :   0.39 ( 0%) usr   0.01 ( 1%) sys   0.38 ( 0%) wall   
3151 kB ( 0%) ggc
 remove unused locals  :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.38 ( 0%) wall   
   0 kB ( 0%) ggc
 address taken         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
  32 kB ( 0%) ggc
 unaccounted todo      :   0.07 ( 0%) usr   0.01 ( 1%) sys   0.14 ( 0%) wall   
   0 kB ( 0%) ggc
 verify loop closed    :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall   
   0 kB ( 0%) ggc
 verify RTL sharing    :   1.55 ( 1%) usr   0.00 ( 0%) sys   1.54 ( 1%) wall   
   0 kB ( 0%) ggc
 repair loop structures:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
 145 kB ( 0%) ggc
 TOTAL                 : 270.29             1.38           272.32            
688523 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    4m32.395s
user    4m30.299s
sys     0m1.406s


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
  2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
                   ` (39 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 19:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 19:39:23 UTC ---
With -fno-ivopts, the alias statement walking is the clear culprit
(a somewhat shorter test case):

ig25@linux-fd1f:~/Krempel/Dep-c> time
~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2 -fno-ivopts
gener-4.f90                                                                     
 MAIN__ main                                                                    
Analyzing compilation unit                                                      
 {GC 44026k -> 25146k}Performing interprocedural optimizations                  
 <*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:                                   
 MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC
103199k -> 77477k} {GC 100723k -> 74180k} {GC 98147k -> 73521k} main            
Execution times (seconds)                                                       
 garbage collection    :   0.79 ( 0%) usr   0.00 ( 0%) sys   0.81 ( 0%) wall   
   0 kB ( 0%) ggc    
 callgraph construction:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
2465 kB ( 1%) ggc    
 callgraph optimization:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   2 kB ( 0%) ggc    
 ipa cp                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
 256 kB ( 0%) ggc    
 ipa profile           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 ipa pure const        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc    
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 cfg cleanup           :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall   
 344 kB ( 0%) ggc    
 CFG verifier          :   1.40 ( 0%) usr   0.01 ( 0%) sys   1.35 ( 0%) wall   
   0 kB ( 0%) ggc    
 trivially dead code   :   0.20 ( 0%) usr   0.00 ( 0%) sys   0.20 ( 0%) wall   
   0 kB ( 0%) ggc    
 df scan insns         :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall   
   0 kB ( 0%) ggc    
 df multiple defs      :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc    
 df reaching defs      :   2.12 ( 1%) usr   0.01 ( 0%) sys   2.15 ( 1%) wall   
   0 kB ( 0%) ggc    
 df live regs          :   0.93 ( 0%) usr   0.00 ( 0%) sys   1.00 ( 0%) wall   
   0 kB ( 0%) ggc    
 df live&initialized regs:   0.26 ( 0%) usr   0.00 ( 0%) sys   0.23 ( 0%) wall 
     0 kB ( 0%) ggc  
 df use-def / def-use chains:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%)
wall       0 kB ( 0%) ggc                                                       
 df reg dead/unused notes:   0.44 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall 
  2460 kB ( 1%) ggc  
 register information  :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
   0 kB ( 0%) ggc    
 alias analysis        :   0.30 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) wall   
2816 kB ( 1%) ggc    
 alias stmt walking    : 205.44 (72%) usr   1.33 (58%) sys 207.39 (72%) wall   
7023 kB ( 3%) ggc    
 register scan         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc    
 rebuild jump labels   :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   
   0 kB ( 0%) ggc    
 parser                :   0.44 ( 0%) usr   0.03 ( 1%) sys   0.47 ( 0%) wall  
20481 kB ( 8%) ggc    
 inline heuristics     :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree gimplify         :   0.16 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall  
17257 kB ( 6%) ggc    
 tree eh               :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree CFG construction :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
5915 kB ( 2%) ggc    
 tree CFG cleanup      :   1.08 ( 0%) usr   0.02 ( 1%) sys   1.15 ( 0%) wall   
 602 kB ( 0%) ggc    
 tree VRP              :   0.44 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall   
4778 kB ( 2%) ggc    
 tree copy propagation :   0.23 ( 0%) usr   0.01 ( 0%) sys   0.25 ( 0%) wall   
  75 kB ( 0%) ggc    
 tree find ref. vars   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 865 kB ( 0%) ggc    
 tree PTA              :   2.31 ( 1%) usr   0.06 ( 3%) sys   2.38 ( 1%) wall   
2253 kB ( 1%) ggc    
 tree PHI insertion    :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
2203 kB ( 1%) ggc    
 tree SSA rewrite      :   3.08 ( 1%) usr   0.01 ( 0%) sys   5.10 ( 2%) wall   
6503 kB ( 2%) ggc    
 tree SSA other        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
  32 kB ( 0%) ggc    
 tree SSA incremental  :   6.20 ( 2%) usr   0.03 ( 1%) sys   7.43 ( 3%) wall   
1372 kB ( 1%) ggc    
 tree operand scan     :   0.13 ( 0%) usr   0.10 ( 4%) sys   0.19 ( 0%) wall  
13430 kB ( 5%) ggc    
 dominator optimization:   0.40 ( 0%) usr   0.01 ( 0%) sys   0.41 ( 0%) wall  
22013 kB ( 8%) ggc    
 tree SRA              :   0.12 ( 0%) usr   0.02 ( 1%) sys   0.14 ( 0%) wall  
11123 kB ( 4%) ggc    
 tree CCP              :   0.47 ( 0%) usr   0.02 ( 1%) sys   0.52 ( 0%) wall   
 838 kB ( 0%) ggc    
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall 
     0 kB ( 0%) ggc  
 tree reassociation    :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
 559 kB ( 0%) ggc    
 tree PRE              :   0.65 ( 0%) usr   0.05 ( 2%) sys   0.70 ( 0%) wall   
7992 kB ( 3%) ggc    
 tree FRE              :   0.43 ( 0%) usr   0.05 ( 2%) sys   0.38 ( 0%) wall   
 530 kB ( 0%) ggc    
 tree code sinking     :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
 360 kB ( 0%) ggc    
 tree linearize phis   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 337 kB ( 0%) ggc    
 tree phiprop          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree conservative DCE :   0.10 ( 0%) usr   0.04 ( 2%) sys   0.11 ( 0%) wall   
  64 kB ( 0%) ggc    
 tree aggressive DCE   :   0.25 ( 0%) usr   0.01 ( 0%) sys   0.29 ( 0%) wall   
4194 kB ( 2%) ggc    
 tree DSE              :   0.90 ( 0%) usr   0.01 ( 0%) sys   0.91 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree loop bounds      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 821 kB ( 0%) ggc    
 tree loop invariant motion:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%)
wall       0 kB ( 0%) ggc
 tree canonical iv     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
1012 kB ( 0%) ggc    
 complete unrolling    :   1.38 ( 0%) usr   0.05 ( 2%) sys   1.84 ( 1%) wall  
16838 kB ( 6%) ggc    
 tree loop init        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
 394 kB ( 0%) ggc    
 tree loop fini        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree copy headers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 613 kB ( 0%) ggc    
 tree SSA uncprop      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree rename SSA copies:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall   
   0 kB ( 0%) ggc    
 tree SSA verifier     :   2.85 ( 1%) usr   0.01 ( 0%) sys   2.99 ( 1%) wall   
   0 kB ( 0%) ggc    
 tree STMT verifier    :   5.48 ( 2%) usr   0.00 ( 0%) sys   5.55 ( 2%) wall   
   0 kB ( 0%) ggc    
 tree switch initialization conversion:   0.00 ( 0%) usr   0.00 ( 0%) sys  
0.01 ( 0%) wall       0 kB ( 0%) ggc                                            
 callgraph verifier    :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall   
   0 kB ( 0%) ggc    
 dominance frontiers   :   6.84 ( 2%) usr   0.01 ( 0%) sys   4.58 ( 2%) wall   
   0 kB ( 0%) ggc    
 dominance computation :   4.91 ( 2%) usr   0.00 ( 0%) sys   3.66 ( 1%) wall   
   0 kB ( 0%) ggc    
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc    
 out of ssa            :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
   0 kB ( 0%) ggc    
 expand vars           :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall   
1109 kB ( 0%) ggc    
 expand                :   0.54 ( 0%) usr   0.02 ( 1%) sys   0.56 ( 0%) wall  
38882 kB (14%) ggc    
 post expand cleanups  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
   1 kB ( 0%) ggc    
 forward prop          :   0.22 ( 0%) usr   0.01 ( 0%) sys   0.21 ( 0%) wall   
1581 kB ( 1%) ggc    
 CSE                   :   1.07 ( 0%) usr   0.00 ( 0%) sys   1.06 ( 0%) wall   
1762 kB ( 1%) ggc    
 dead code elimination :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall   
   0 kB ( 0%) ggc    
 dead store elim1      :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%) wall   
4983 kB ( 2%) ggc    
 dead store elim2      :   0.52 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) wall   
6295 kB ( 2%) ggc    
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall   
 836 kB ( 0%) ggc    
 loop invariant motion :   5.52 ( 2%) usr   0.00 ( 0%) sys   5.50 ( 2%) wall   
   0 kB ( 0%) ggc    
 CPROP                 :   0.89 ( 0%) usr   0.04 ( 2%) sys   0.94 ( 0%) wall   
8097 kB ( 3%) ggc    
 PRE                   :  10.12 ( 4%) usr   0.11 ( 5%) sys  10.28 ( 4%) wall   
2091 kB ( 1%) ggc    
 CSE 2                 :   0.55 ( 0%) usr   0.00 ( 0%) sys   0.55 ( 0%) wall   
 547 kB ( 0%) ggc    
 branch prediction     :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall   
1542 kB ( 1%) ggc    
 combiner              :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.36 ( 0%) wall   
4406 kB ( 2%) ggc    
 if-conversion         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 559 kB ( 0%) ggc
 regmove               :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc
 integrated RA         :   2.07 ( 1%) usr   0.18 ( 8%) sys   2.26 ( 1%) wall   
5525 kB ( 2%) ggc
 reload                :   0.93 ( 0%) usr   0.00 ( 0%) sys   0.93 ( 0%) wall  
14653 kB ( 5%) ggc
 reload CSE regs       :   1.30 ( 0%) usr   0.00 ( 0%) sys   1.32 ( 0%) wall  
13495 kB ( 5%) ggc
 zee                   :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall   
   0 kB ( 0%) ggc
 thread pro- & epilogue:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
   3 kB ( 0%) ggc
 if-conversion 2       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall   
 279 kB ( 0%) ggc
 combine stack adjustments:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
      0 kB ( 0%) ggc
 peephole 2            :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall   
 354 kB ( 0%) ggc
 hard reg cprop        :   0.35 ( 0%) usr   0.00 ( 0%) sys   0.34 ( 0%) wall   
   0 kB ( 0%) ggc
 scheduling 2          :   1.83 ( 1%) usr   0.02 ( 1%) sys   1.94 ( 1%) wall   
  71 kB ( 0%) ggc
 machine dep reorg     :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall   
   0 kB ( 0%) ggc
 reorder blocks        :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall   
 891 kB ( 0%) ggc
 final                 :   0.32 ( 0%) usr   0.01 ( 0%) sys   0.33 ( 0%) wall   
   0 kB ( 0%) ggc
 tree if-combine       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall   
   0 kB ( 0%) ggc
 rest of compilation   :   0.43 ( 0%) usr   0.00 ( 0%) sys   0.39 ( 0%) wall   
3151 kB ( 1%) ggc
 remove unused locals  :   0.40 ( 0%) usr   0.00 ( 0%) sys   0.45 ( 0%) wall   
   0 kB ( 0%) ggc
 address taken         :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall   
  32 kB ( 0%) ggc
 unaccounted todo      :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.07 ( 0%) wall   
   0 kB ( 0%) ggc
 verify loop closed    :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall   
   0 kB ( 0%) ggc
 verify RTL sharing    :   1.59 ( 1%) usr   0.00 ( 0%) sys   1.61 ( 1%) wall   
   0 kB ( 0%) ggc
 repair loop structures:   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall   
 145 kB ( 0%) ggc
 TOTAL                 : 283.50             2.31           286.74            
270936 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

real    4m46.813s
user    4m43.509s
sys     0m2.336s


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
  2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
                   ` (38 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 21:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #7 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 21:07:08 UTC ---
Created attachment 22478
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22478
Test case from comment #5 and comment #6

Here is a test case that is a little shorter.

A bit hard to trim it down further when the bug depends on test
case size...


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 21:46 ` tkoenig at gcc dot gnu.org
  2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
                   ` (37 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 21:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |4.4.1
            Summary|long compile time with -O2  |[4.6 Regression] long
                   |and many loops              |compile time with -O2 and
                   |                            |many loops
      Known to fail|                            |4.6.0

--- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 21:13:24 UTC ---
4.4 compile time is much lower:

ig25@linux-fd1f:~/Krempel/Dep-c> time /usr/bin/gfortran-4.4 -O2 gener-4.f90

real    0m59.893s
user    0m58.172s
sys     0m0.664s


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
@ 2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
  2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
                   ` (36 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 23:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.6.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
@ 2010-11-22 13:30 ` rguenth at gcc dot gnu.org
  2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
                   ` (34 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 13:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.11.22 13:13:03
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 13:13:03 UTC ---
It's a very large monolithic function.  And as usual we have a gazillion
amount of local IO state variables.

On trunk with release checking I see:

A quarter of the testcase:

 alias stmt walking    :  15.94 (59%) usr   0.03 ( 8%) sys  16.00 (57%) wall   
1845 kB ( 2%) ggc
 TOTAL                 :  27.08             0.38            27.89             
96306 kB

Half of the testcase:

 alias stmt walking    :  63.31 (68%) usr   0.51 (31%) sys  64.06 (67%) wall   
3684 kB ( 2%) ggc
 TOTAL                 :  93.52             1.66            95.57            
241871 kB

All of the testcase:

 alias stmt walking    : 259.19 (73%) usr   0.78 (26%) sys 261.79 (72%) wall   
7023 kB ( 1%) ggc
 TOTAL                 : 356.27             2.98           361.57            
690719 kB

so it's definitely nearly quadratic (but that's expected).

4.5.x for a quarter of the testcase has:

 alias stmt walking    :  93.10 (88%) usr   0.03 ( 8%) sys  93.31 (87%) wall   
   0 kB ( 0%) ggc
 TOTAL                 : 106.11             0.40           106.93             
87895 kB

so trunk is already a lot better.

Removing the alias stmt walk timevar gets us to the following on trunk
(quarter of the testcase again):

 tree PRE              :  12.93 (47%) usr   0.00 ( 0%) sys  12.98 (46%) wall   
3607 kB ( 4%) ggc
 TOTAL                 :  27.57             0.34            27.99             
96324 kB

What is costly is translating things through the loop bodies.  We can
improve this a lot by properly marking the I/O structs as dead once
they are no longer used and before they are used first.  The proposed
virtual kill stmts could be used for that.  We could also build this
kind of lifeness information up-front and use it to limit the walking
(but that again is only trivial for non-address taken variables, which
the I/O structs are not).

Anyway, confirmed.  Fortran I/O and array descriptor temporaries really
need re-use (I proposed a patch for I/O ones once but it was shot down
because of async-I/O).

Removing all prints from the testcase gives:

 alias stmt walking    : 132.44 (61%) usr   0.79 (30%) sys 133.47 (61%) wall   
7023 kB ( 1%) ggc
 TOTAL                 : 216.55             2.67           219.78            
645229 kB

As all arrays are not address-taken we really look for CSE opportunities
up to the very start of the function (PRE translates the in-loop
references from the any (a /= b) loop to the loop header using the
constant initial index and tries to CSE that, but it doesn't actually
succeed - which is another bug of course, it should look it up from
original resp. A.0).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
@ 2010-11-22 13:30 ` rguenth at gcc dot gnu.org
  2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
                   ` (35 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 13:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #10 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 13:29:56 UTC ---
At -O1 we have

 tree SSA rewrite      :  23.32 (13%) usr   0.06 ( 3%) sys  22.80 (13%) wall   
6392 kB ( 3%) ggc
 tree SSA incremental  :  24.05 (14%) usr   0.07 ( 4%) sys  25.27 (14%) wall   
3533 kB ( 2%) ggc
 tree FRE              :  45.17 (25%) usr   0.51 (29%) sys  45.83 (26%) wall   
2048 kB ( 1%) ggc
 dominance frontiers   :  29.37 (17%) usr   0.01 ( 1%) sys  29.30 (16%) wall   
   0 kB ( 0%) ggc
 dominance computation :  19.49 (11%) usr   0.02 ( 1%) sys  19.29 (11%) wall   
   0 kB ( 0%) ggc
 loop invariant motion :  12.25 ( 7%) usr   0.01 ( 1%) sys  12.21 ( 7%) wall   
 648 kB ( 0%) ggc
 TOTAL                 : 177.31             1.74           179.57            
196657 kB

the testcase is simply large.  And yes, the FRE/PRE alias stmt walks are
not limited (compared to the DCE/DSE ones).  Maybe it's time to add this
to cover this kind of artificial testcases.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
@ 2010-11-22 16:19 ` rguenth at gcc dot gnu.org
  2010-11-24 12:40 ` rguenth at gcc dot gnu.org
                   ` (33 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 16:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #11 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 16:19:37 UTC ---
(In reply to comment #10)
> At -O1 we have
> 
>  tree SSA rewrite      :  23.32 (13%) usr   0.06 ( 3%) sys  22.80 (13%) wall   
> 6392 kB ( 3%) ggc
>  tree SSA incremental  :  24.05 (14%) usr   0.07 ( 4%) sys  25.27 (14%) wall   
> 3533 kB ( 2%) ggc

The above is almost completely loop header copying.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
@ 2010-11-24 12:40 ` rguenth at gcc dot gnu.org
  2010-11-24 14:19 ` rguenth at gcc dot gnu.org
                   ` (32 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-24 12:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |rguenth at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #12 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-24 12:22:48 UTC ---
Mine for now.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2010-11-24 12:40 ` rguenth at gcc dot gnu.org
@ 2010-11-24 14:19 ` rguenth at gcc dot gnu.org
  2011-01-03 20:25 ` rguenth at gcc dot gnu.org
                   ` (31 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-24 14:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-24 13:50:21 UTC ---
The I/O parts of the FRE cost are due to value-numbering stores in
visit_reference_op_store.  They can be drastically cut by an equivalent
of (not generating code)

Index: trans-io.c
===================================================================
--- trans-io.c  (revision 167111)
+++ trans-io.c  (working copy)
@@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
   gfc_init_block (&post_iu_block);

   var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
+  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));

   set_error_locus (&block, var, &code->loc);

which constrains lifetime begin of the dt_parm structs (which have their
address taken and thus lifetime analysis will have a hard time).  That
prevents store value numbering to consider all dominated blocks (which
only contain the loops).  The above brings down the previous -O1 numbers to

 tree FRE              :  28.34 (18%) usr   0.39 (18%) sys  28.83 (18%) wall   
1906 kB ( 1%) ggc
 TOTAL                 : 159.87             2.17           162.72            
201091 kB

all other variables are re-used and thus their lifetime isn't limited.

The loads from original[] are the only unconstrained ones, handling
those exhibits quadratic behavior.  It walks up to the first statement
in the function which is

  MEM[(c_char * {ref-all})&original] = MEM[(c_char * {ref-all})&A.0];

and then fails to constant fold using the static initializer of A.0.

FRE optimizes away the self-assignments

  a(1:6:-5) =  a(1:6:-5)

it doesn't do anything useful to the other loops.

Overall it's not completely unreasonable what FRE does for a, b and original
(we could have propagated A.0 to all uses of original).  The I/O struct
walks are the only thing that would be nice to fix.

And of course maybe limit walking in general.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2010-11-24 14:19 ` rguenth at gcc dot gnu.org
@ 2011-01-03 20:25 ` rguenth at gcc dot gnu.org
  2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
                   ` (30 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-03 20:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2011-01-03 20:25 ` rguenth at gcc dot gnu.org
@ 2011-03-25 19:55 ` jakub at gcc dot gnu.org
  2011-06-27 16:23 ` jakub at gcc dot gnu.org
                   ` (29 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-03-25 19:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.6.0                       |4.6.1

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-03-25 19:52:13 UTC ---
GCC 4.6.0 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
@ 2011-06-27 16:23 ` jakub at gcc dot gnu.org
  2011-10-26 17:37 ` jakub at gcc dot gnu.org
                   ` (28 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-06-27 16:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.6.1                       |4.6.2

--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-27 12:32:42 UTC ---
GCC 4.6.1 is being released.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2011-06-27 16:23 ` jakub at gcc dot gnu.org
@ 2011-10-26 17:37 ` jakub at gcc dot gnu.org
  2012-01-05  1:48 ` pinskia at gcc dot gnu.org
                   ` (27 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-26 17:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.6.2                       |4.6.3

--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-26 17:13:27 UTC ---
GCC 4.6.2 is being released.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2011-10-26 17:37 ` jakub at gcc dot gnu.org
@ 2012-01-05  1:48 ` pinskia at gcc dot gnu.org
  2012-01-19 15:20 ` matz at gcc dot gnu.org
                   ` (26 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-01-05  1:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at gcc dot gnu.org

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-01-05 01:47:57 UTC ---
On the trunk most of the time at -O0 is spent doing add_scope_conflicts.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2012-01-05  1:48 ` pinskia at gcc dot gnu.org
@ 2012-01-19 15:20 ` matz at gcc dot gnu.org
  2012-01-19 15:26 ` matz at gcc dot gnu.org
                   ` (25 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-19 15:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #18 from Michael Matz <matz at gcc dot gnu.org> 2012-01-19 15:06:14 UTC ---
Author: matz
Date: Thu Jan 19 15:06:04 2012
New Revision: 183305

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183305
Log:
        PR tree-optimization/46590
    * cfgexpand.c (add_scope_conflicts_1): New old_conflicts argument,
    use it in remembering which conflicts we already created.
    (add_scope_conflicts): Adjust call to above, (de)allocate helper
    bitmap.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2012-01-19 15:20 ` matz at gcc dot gnu.org
@ 2012-01-19 15:26 ` matz at gcc dot gnu.org
  2012-01-26 16:23 ` matz at gcc dot gnu.org
                   ` (24 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-19 15:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #19 from Michael Matz <matz at gcc dot gnu.org> 2012-01-19 15:10:43 UTC ---
The var-expansion slowness is fixed again.  The rest of course still applies.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2012-01-19 15:26 ` matz at gcc dot gnu.org
@ 2012-01-26 16:23 ` matz at gcc dot gnu.org
  2012-03-01 14:45 ` jakub at gcc dot gnu.org
                   ` (23 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-26 16:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #20 from Michael Matz <matz at gcc dot gnu.org> 2012-01-26 15:50:43 UTC ---
Author: matz
Date: Thu Jan 26 15:50:33 2012
New Revision: 183566

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183566
Log:
    PR tree-optimization/46590
    * cfgexpand.c: Revert last change (r183305).
    * gimplify.c (gimplify_bind_expr): Add clobbers for all non-gimple
    regs.
    * tree-eh.c (cleanup_empty_eh): Try to optimize clobbers before
    checking for emptiness.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/cfgexpand.c
    trunk/gcc/gimplify.c
    trunk/gcc/tree-eh.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2012-01-26 16:23 ` matz at gcc dot gnu.org
@ 2012-03-01 14:45 ` jakub at gcc dot gnu.org
  2012-08-21  8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-01 14:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.6.3                       |4.6.4

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-01 14:38:15 UTC ---
GCC 4.6.3 is being released.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2012-03-01 14:45 ` jakub at gcc dot gnu.org
@ 2012-08-21  8:10 ` rguenth at gcc dot gnu.org
  2012-08-21  9:44 ` steven at gcc dot gnu.org
                   ` (21 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21  8:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nbhargava at google dot com

--- Comment #22 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 08:09:55 UTC ---
*** Bug 54337 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2012-08-21  8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
@ 2012-08-21  9:44 ` steven at gcc dot gnu.org
  2012-08-21 10:00 ` rguenther at suse dot de
                   ` (20 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-21  9:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-21 09:42:42 UTC ---
(In reply to comment #13)
> The I/O parts of the FRE cost are due to value-numbering stores in
> visit_reference_op_store.  They can be drastically cut by an equivalent
> of (not generating code)
> 
> Index: trans-io.c
> ===================================================================
> --- trans-io.c  (revision 167111)
> +++ trans-io.c  (working copy)
> @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
>    gfc_init_block (&post_iu_block);
> 
>    var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> +  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
> 
>    set_error_locus (&block, var, &code->loc);
> 

You didn't post/commit this, but it looks like a reasonable change to me.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2012-08-21  9:44 ` steven at gcc dot gnu.org
@ 2012-08-21 10:00 ` rguenther at suse dot de
  2012-08-21 14:11 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenther at suse dot de @ 2012-08-21 10:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-21 09:59:41 UTC ---
On Tue, 21 Aug 2012, steven at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
> 
> Steven Bosscher <steven at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |steven at gcc dot gnu.org
> 
> --- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-21 09:42:42 UTC ---
> (In reply to comment #13)
> > The I/O parts of the FRE cost are due to value-numbering stores in
> > visit_reference_op_store.  They can be drastically cut by an equivalent
> > of (not generating code)
> > 
> > Index: trans-io.c
> > ===================================================================
> > --- trans-io.c  (revision 167111)
> > +++ trans-io.c  (working copy)
> > @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
> >    gfc_init_block (&post_iu_block);
> > 
> >    var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> > +  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
> > 
> >    set_error_locus (&block, var, &code->loc);
> > 
> 
> You didn't post/commit this, but it looks like a reasonable change to me.

I think this is obsolete now with Michas change to have CLOBBERs.  Or
if still useful, the above should be a CLOBBER now.

Richard.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2012-08-21 10:00 ` rguenther at suse dot de
@ 2012-08-21 14:11 ` rguenth at gcc dot gnu.org
  2012-08-21 14:57 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 14:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #25 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 14:10:38 UTC ---
I have a patch for the SCCVN issue, but trying to gather current trunk status
first.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2012-08-21 14:11 ` rguenth at gcc dot gnu.org
@ 2012-08-21 14:57 ` rguenth at gcc dot gnu.org
  2012-08-22 11:43 ` rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 14:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #26 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 14:56:41 UTC ---
For a somewhat reduced testcase I now get at -O1:

 alias stmt walking      : 105.51 (45%) usr   0.33 (24%) sys
 tree SSA rewrite        :  22.01 ( 9%) usr   0.04 ( 3%) sys
 tree SSA incremental    :  25.25 (11%) usr   0.07 ( 5%) sys
 dominance frontiers     :  35.35 (15%) usr   0.02 ( 1%) sys
 dominance computation   :  14.60 ( 6%) usr   0.09 ( 7%) sys
 TOTAL                 : 234.28             1.38

as previously said most of the non-alias-stmt walk time is spent
on loop header copying.  WIth -O1 -fno-tree-ch we have

 alias stmt walking      : 101.52 (68%) usr   0.37 (34%) sys
 tree SSA rewrite        :   4.14 ( 3%) usr   0.01 ( 1%) sys
 tree SSA incremental    :   8.00 ( 5%) usr   0.02 ( 2%) sys
 dominance frontiers     :   6.14 ( 4%) usr   0.01 ( 1%) sys
 dominance computation   :   4.74 ( 3%) usr   0.06 ( 6%) sys
 TOTAL                 : 150.14             1.09

limiting stmt walk results in the ability to arbitrarily scale down its cost
with a param (we can either limit alias oracle query numbers or SCCVN
table lookups).  With 100 alias oracle queries per load/store we end up with

 alias stmt walking      :   1.60 ( 3%) usr   0.05 ( 5%) sys

with 100 SCCVN table lookups the figure is

 alias stmt walking      :   1.60 ( 3%) usr   0.06 ( 6%) sys

one assumes the lookups are expensive, the other one assumes the walk itself
is.
Increasing the latter to 1000 SCCVN table lookup produces

 alias stmt walking      :   9.24 (16%) usr   0.18 (19%) sys

which is around the expected 10-fold increase (but still reasonable given
the artificial nature of the testcase).  We could also, instead of
limiting each walk to a constant cost, have a per-function budget that
we can use up first before limiting further walks individually (helps
to not regress reasonably sized cases).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2012-08-21 14:57 ` rguenth at gcc dot gnu.org
@ 2012-08-22 11:43 ` rguenth at gcc dot gnu.org
  2012-08-22 13:21 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-22 11:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-22 11:39:56 UTC ---
The issue with loop header copying is that we use incremental SSA updating
with PHI insertion for each individual loop header copied.  That computes
dominance frontiers which is always for the whole function.

I thought that loop header copying wouldn't need to insert new PHI nodes
and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
but appearantly that is not the case (I didn't inverstigate further here,
but possibly that's just because virtual operands are not in loop-closed
SSA form).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2012-08-22 11:43 ` rguenth at gcc dot gnu.org
@ 2012-08-22 13:21 ` rguenth at gcc dot gnu.org
  2012-08-22 17:34 ` stevenb.gcc at gmail dot com
                   ` (15 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-22 13:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #28 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-22 13:17:42 UTC ---
Author: rguenth
Date: Wed Aug 22 13:17:26 2012
New Revision: 190594

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190594
Log:
2012-08-22  Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/46590
    * tree-ssa-alias.h (get_continuation_for_phi): Add alias query
    counter output argument.
    (walk_non_aliased_vuses): Add alias query counter argument
    to the walker callback.
    * tree-ssa-alias.c (maybe_skip_until): Add alias query counter
    output argument and count alias queries.
    (get_continuation_for_phi_1): Likewise.
    (get_continuation_for_phi): Likewise.
    (walk_non_aliased_vuses): Add alias query counter argument
    to the walker callback and allow it to abort the walk by
    returning -1.
    * tree-ssa-pre.c (translate_vuse_through_block): Adjust.
    * tree-ssa-sccvn.c (vn_reference_lookup_2): Add alias query
    counter parmeter, abort walk if that is bigger than
    --param sccvn-max-alias-queries-per-access.
    * params.def (sccvn-max-alias-queries-per-access): New param.
    * doc/invoke.texi (sccvn-max-alias-queries-per-access): Document.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/doc/invoke.texi
    trunk/gcc/params.def
    trunk/gcc/tree-ssa-alias.c
    trunk/gcc/tree-ssa-alias.h
    trunk/gcc/tree-ssa-pre.c
    trunk/gcc/tree-ssa-sccvn.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2012-08-22 13:21 ` rguenth at gcc dot gnu.org
@ 2012-08-22 17:34 ` stevenb.gcc at gmail dot com
  2012-08-23  7:14 ` rguenther at suse dot de
                   ` (14 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: stevenb.gcc at gmail dot com @ 2012-08-22 17:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #29 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-22 17:33:00 UTC ---
> I thought that loop header copying wouldn't need to insert new PHI nodes
> and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> but appearantly that is not the case (I didn't inverstigate further here,
> but possibly that's just because virtual operands are not in loop-closed
> SSA form).

I'll experiment with VOPs in LC-SSA.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2012-08-22 17:34 ` stevenb.gcc at gmail dot com
@ 2012-08-23  7:14 ` rguenther at suse dot de
  2012-08-23 11:43 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenther at suse dot de @ 2012-08-23  7:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-23 07:13:13 UTC ---
On Wed, 22 Aug 2012, stevenb.gcc at gmail dot com wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
> 
> --- Comment #29 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-22 17:33:00 UTC ---
> > I thought that loop header copying wouldn't need to insert new PHI nodes
> > and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> > but appearantly that is not the case (I didn't inverstigate further here,
> > but possibly that's just because virtual operands are not in loop-closed
> > SSA form).
> 
> I'll experiment with VOPs in LC-SSA.

You might have seen the patch I posted - it passed testing and I will
install it today.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2012-08-23  7:14 ` rguenther at suse dot de
@ 2012-08-23 11:43 ` rguenth at gcc dot gnu.org
  2012-08-23 13:46 ` steven at gcc dot gnu.org
                   ` (12 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-23 11:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #31 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-23 11:41:50 UTC ---
Btw, I have experimented with

Index: tree-into-ssa.c
===================================================================
--- tree-into-ssa.c     (revision 190594)
+++ tree-into-ssa.c     (working copy)
@@ -3224,11 +3224,35 @@ update_ssa (unsigned update_flags)
       bitmap_head *dfs;

       /* If the caller requested PHI nodes to be added, compute
-        dominance frontiers.  */
+        dominance frontiers for all interesting blocks
+        up to the dominating start_bb.  */
       dfs = XNEWVEC (bitmap_head, last_basic_block);
       FOR_EACH_BB (bb)
        bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
-      compute_dominance_frontiers (dfs);
+      EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, i, bi)
+       {
+         basic_block b = BASIC_BLOCK (i);
+         edge_iterator ei;
+         edge p;
+         if (EDGE_COUNT (b->preds) >= 2)
+           FOR_EACH_EDGE (p, ei, b->preds)
+             {
+               basic_block runner = p->src;
+               basic_block domsb;
+               if (runner == start_bb)
+                 continue;
+
+               domsb = get_immediate_dominator (CDI_DOMINATORS, b);
+               while (runner != domsb)
+                 {
+                   if (!bitmap_set_bit (&dfs[runner->index],
+                                        b->index))
+                     break;
+                   runner = get_immediate_dominator (CDI_DOMINATORS,
+                                                     runner);
+                 }
+             }
+       }

       if (sbitmap_first_set_bit (old_ssa_names) >= 0)
        {

which helps reducing the time spent in computing dominance frontiers.  But
as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
we only ever access dfs for entries we computed ... but we should,
looking at how compute_idf works(?).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2012-08-23 11:43 ` rguenth at gcc dot gnu.org
@ 2012-08-23 13:46 ` steven at gcc dot gnu.org
  2012-09-03 10:46 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-23 13:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #32 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-23 13:44:53 UTC ---
(In reply to comment #31)
> which helps reducing the time spent in computing dominance frontiers.  But
> as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
> we only ever access dfs for entries we computed ... but we should,
> looking at how compute_idf works(?).

I don't understand this comment. You can still always do:
bitmap_empty_p (&dfs[bb->index]) to see if something was
computed for bb.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2012-08-23 13:46 ` steven at gcc dot gnu.org
@ 2012-09-03 10:46 ` rguenth at gcc dot gnu.org
  2012-09-03 15:41 ` matz at gcc dot gnu.org
                   ` (10 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-03 10:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
         AssignedTo|rguenth at gcc dot gnu.org  |unassigned at gcc dot
                   |                            |gnu.org

--- Comment #33 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-03 10:45:06 UTC ---
Not mine anymore.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2012-09-03 10:46 ` rguenth at gcc dot gnu.org
@ 2012-09-03 15:41 ` matz at gcc dot gnu.org
  2012-09-04 13:18 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-09-03 15:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #34 from Michael Matz <matz at gcc dot gnu.org> 2012-09-03 15:39:20 UTC ---
Author: matz
Date: Mon Sep  3 15:39:15 2012
New Revision: 190897

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190897
Log:
    PR tree-optimization/46590
    * tree-cfg.c (gimple_duplicate_sese_region): Don't update
    SSA web here ...
    * tree-ssa-loop-ch.c (copy_loop_headers): ... but here.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-cfg.c
    trunk/gcc/tree-ssa-loop-ch.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2012-09-03 15:41 ` matz at gcc dot gnu.org
@ 2012-09-04 13:18 ` rguenth at gcc dot gnu.org
  2012-09-05 11:01 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-04 13:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #35 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-04 13:17:53 UTC ---
With -O1 FRE uses loads of memory and compile-time.  Not the value-numbering
itself but compute_avail ()!  Bah.  Of course AVAIL sets get bigger and
bigger going down the dominator tree.  For FRE a single domwalk after
SCCVN computing "avail" and doing elimiation would be enough.  Especially
as all the overhead in AVAIL is just SSA DEFs and their value.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2012-09-04 13:18 ` rguenth at gcc dot gnu.org
@ 2012-09-05 11:01 ` rguenth at gcc dot gnu.org
  2012-09-05 13:30 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-05 11:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #36 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-05 10:59:52 UTC ---
If I fix that (PR54489) by iterating over immediate dominators when querying
AVAIL_OUT
instead of accumulating then other loop opts quickly take over in compile-time,
but memory usage stays reasonable at -O1.  LIM is now the pass that pushes
memory usage to 1.8GB - all other optimization passes are happy with just
~800MB.  The issue with LIM is that it analyzes the whole function instead
of working on outermost loops at a time (PR54488).  Then of course IRA
comes along and wrecks memory usage again ... (create_loop_tree_nodes).
One can tame down IRA a bit using -fno-ira-loop-pressure -fira-region=one.
We then arrive at roughly a constant 900MB memory usage for the full(!)
testcase at -O1 and

Execution times (seconds)
 phase opt and generate  : 495.90 (99%) usr   1.98 (98%) sys 499.91 (99%) wall 
870508 kB (92%) ggc
 df reaching defs        :  19.16 ( 4%) usr   0.06 ( 3%) sys  19.18 ( 4%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  28.75 ( 6%) usr   0.21 (10%) sys  29.12 ( 6%) wall 
  2336 kB ( 0%) ggc
 tree SSA rewrite        :  63.42 (13%) usr   0.02 ( 1%) sys  63.77 (13%) wall 
 18830 kB ( 2%) ggc
 tree SSA incremental    :  74.64 (15%) usr   0.03 ( 1%) sys  74.44 (15%) wall 
 25886 kB ( 3%) ggc
 dominance frontiers     : 101.71 (20%) usr   0.09 ( 4%) sys 102.17 (20%) wall 
     0 kB ( 0%) ggc
 dominance computation   :  52.56 (11%) usr   0.09 ( 4%) sys  53.35 (11%) wall 
     0 kB ( 0%) ggc
 loop invariant motion   : 101.20 (20%) usr   0.10 ( 5%) sys 101.75 (20%) wall 
  2700 kB ( 0%) ggc
 TOTAL                 : 498.79             2.03           502.87            
947764 kB

(all entries > 10s)

The incremental SSA stuff is complete loop unrolling / IV canonicalization
which does SSA update once per loop (similar to what loop header copying
formerly did).  Fixing that leads to

Execution times (seconds)
 phase opt and generate  : 214.62 (99%) usr   1.53 (96%) sys 217.41 (99%) wall 
870508 kB (92%) ggc
 df reaching defs        :  23.07 (11%) usr   0.01 ( 1%) sys  23.10 (10%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  28.51 (13%) usr   0.23 (14%) sys  28.93 (13%) wall 
  2336 kB ( 0%) ggc
 loop invariant motion   : 105.43 (48%) usr   0.01 ( 1%) sys 106.22 (48%) wall 
  2700 kB ( 0%) ggc
 TOTAL                 : 217.56             1.59           220.44            
947764 kB

so RTL invariant motion is now the main offender ;)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2012-09-05 11:01 ` rguenth at gcc dot gnu.org
@ 2012-09-05 13:30 ` rguenth at gcc dot gnu.org
  2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-05 13:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #37 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-05 13:29:20 UTC ---
Author: rguenth
Date: Wed Sep  5 13:29:13 2012
New Revision: 190978

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190978
Log:
2012-09-05  Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/46590
    * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not
    update SSA form here.
    (canonicalize_induction_variables): Assert we do not need to
    update SSA form.
    (tree_unroll_loops_completely): Update SSA form here.
    * tree-ssa-loop-manip.c (gimple_duplicate_loop_to_header_edge):
    Do not verify loop-closed SSA form if SSA form is not up-to-date.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-loop-ivcanon.c
    trunk/gcc/tree-ssa-loop-manip.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (38 preceding siblings ...)
  2012-09-05 13:30 ` rguenth at gcc dot gnu.org
@ 2012-12-03 15:52 ` rguenth at gcc dot gnu.org
  2014-01-16 13:49 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-03 15:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |47344
   Target Milestone|4.6.4                       |---
            Summary|[4.6/4.7/4.8 Regression]    |long compile time with -O2
                   |long compile time with -O2  |and many loops
                   |and many loops              |

--- Comment #38 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-03 15:51:47 UTC ---
Generic compile-time/memory-usage umbrella regression tracking bug.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (39 preceding siblings ...)
  2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
@ 2014-01-16 13:49 ` rguenth at gcc dot gnu.org
  2014-01-16 15:51 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-16 13:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #39 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Jan 16 13:48:51 2014
New Revision: 206663

URL: http://gcc.gnu.org/viewcvs?rev=206663&root=gcc&view=rev
Log:
2014-01-16  Richard Biener  <rguenther@suse.de>

    PR rtl-optimization/46590
    * lcm.c (compute_antinout_edge): Use postorder iteration.
    (compute_laterin): Use inverted postorder iteration.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lcm.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (40 preceding siblings ...)
  2014-01-16 13:49 ` rguenth at gcc dot gnu.org
@ 2014-01-16 15:51 ` rguenth at gcc dot gnu.org
  2014-01-17 11:37 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-16 15:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
On the full testcase tree LIM uses too much memory (I didn't merge some of
the patches that only benefit this kind of testcase ... bah).  Without LIM
we use around 1GB of memory at -O2 and

 df reaching defs        :  19.42 (14%) usr  35.49 (82%) sys  55.13 (30%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  34.25 (25%) usr   0.47 ( 1%) sys  34.76 (19%) wall 
  1451 kB ( 0%) ggc
 tree CFG cleanup        :   7.75 ( 6%) usr   0.04 ( 0%) sys   7.73 ( 4%) wall 
  5002 kB ( 1%) ggc
 tree PTA                :  22.32 (16%) usr   0.10 ( 0%) sys  22.43 (12%) wall 
  7882 kB ( 1%) ggc
 TOTAL                 : 137.59            43.38           180.98            
866898 kB

-O3 is similar.

I have a patch for the memory usage issue.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (41 preceding siblings ...)
  2014-01-16 15:51 ` rguenth at gcc dot gnu.org
@ 2014-01-17 11:37 ` rguenth at gcc dot gnu.org
  2014-01-17 12:09 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 11:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
After I will have committed the patch the original testcase will use about 1GB
of memory (regardless of optimization level) on x86_64, 122s at -O1 and 161s
at -Ofast with the following main contributors:

> ./f951 -quiet -Ofast t2.f90 -ftime-report 

Execution times (seconds)
 phase opt and generate  : 159.14 (98%) usr 116.92 (100%) sys 276.41 (99%) wall
 821437 kB (92%) ggc
 df reaching defs        :  36.37 (22%) usr  91.52 (78%) sys 128.33 (46%) wall 
     0 kB ( 0%) ggc
 df live regs            :   2.60 ( 2%) usr   0.19 ( 0%) sys   2.84 ( 1%) wall 
     0 kB ( 0%) ggc
 alias stmt walking      :  36.03 (22%) usr   0.46 ( 0%) sys  36.53 (13%) wall 
  1451 kB ( 0%) ggc
 parser (global)         :   2.65 ( 2%) usr   0.08 ( 0%) sys   2.74 ( 1%) wall 
 70763 kB ( 8%) ggc
 tree CFG cleanup        :   8.23 ( 5%) usr   0.03 ( 0%) sys   8.19 ( 3%) wall 
  4982 kB ( 1%) ggc
 tree PTA                :  22.95 (14%) usr   0.17 ( 0%) sys  23.12 ( 8%) wall 
  7882 kB ( 1%) ggc
 complete unrolling      :   4.88 ( 3%) usr   0.14 ( 0%) sys   5.08 ( 2%) wall 
 80683 kB ( 9%) ggc
 loop init               :   5.93 ( 4%) usr   0.01 ( 0%) sys   5.92 ( 2%) wall 
 25175 kB ( 3%) ggc
 integrated RA           :   2.97 ( 2%) usr   0.03 ( 0%) sys   3.02 ( 1%) wall 
 64778 kB ( 7%) ggc
 TOTAL                 : 161.83           117.01           279.22            
892406 kB

(everything > 1% usr listed)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (42 preceding siblings ...)
  2014-01-17 11:37 ` rguenth at gcc dot gnu.org
@ 2014-01-17 12:09 ` rguenth at gcc dot gnu.org
  2014-01-17 14:49 ` rguenth at gcc dot gnu.org
  2014-10-21  9:23 ` rguenth at gcc dot gnu.org
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 12:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another interesting bit - -O0 takes 17s (and also ~1GB memory) but -Og is
not much faster than -O1 (116s).

> ./f951 -quiet -Og t2.f90 -ftime-report 

Execution times (seconds)
 df reaching defs        :  62.51 (54%) usr 104.47 (85%) sys 167.73 (70%) wall 
     0 kB ( 0%) ggc

bah.  I didn't constrain RTL optimizers with -Og (seems to be RTL loop
invariant motion here - after all at -Og no gimple loop opts run and
it _can_ move quite some invariants).

Disabling loop2 for -Og improves this to 50s

Execution times (seconds)
 alias stmt walking      :  10.11 (20%) usr   0.10 (14%) sys  10.29 (20%) wall 
   338 kB ( 0%) ggc
 parser (global)         :   2.34 ( 5%) usr   0.06 ( 8%) sys   2.39 ( 5%) wall 
 70763 kB (11%) ggc
 tree PTA                :   6.12 (12%) usr   0.04 ( 6%) sys   6.16 (12%) wall 
  2735 kB ( 0%) ggc
 loop init               :   5.78 (11%) usr   0.01 ( 1%) sys   5.78 (11%) wall 
 24078 kB ( 4%) ggc
 integrated RA           :   2.68 ( 5%) usr   0.04 ( 6%) sys   2.73 ( 5%) wall 
 64596 kB (10%) ggc
 TOTAL                 :  50.76             0.71            51.71            
646243 kB

I'll push a patch to do that (the above shows PTA which also can be
disabled, and alias stmt walking could be limited some more).


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (43 preceding siblings ...)
  2014-01-17 12:09 ` rguenth at gcc dot gnu.org
@ 2014-01-17 14:49 ` rguenth at gcc dot gnu.org
  2014-10-21  9:23 ` rguenth at gcc dot gnu.org
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 14:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #44 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Jan 17 14:49:18 2014
New Revision: 206714

URL: http://gcc.gnu.org/viewcvs?rev=206714&root=gcc&view=rev
Log:
2014-01-17  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/46590
    * opts.c (default_options_table): Add entries for
    OPT_fbranch_count_reg, OPT_fmove_loop_invariants and OPT_ftree_pta,
    all enabled at -O1 but not for -Og.
    * common.opt (fbranch-count-reg): Remove Init(1).
    (fmove-loop-invariants): Likewise.
    (ftree-pta): Likewise.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/common.opt
    trunk/gcc/opts.c


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Bug tree-optimization/46590] long compile time with -O2 and many loops
  2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
                   ` (44 preceding siblings ...)
  2014-01-17 14:49 ` rguenth at gcc dot gnu.org
@ 2014-10-21  9:23 ` rguenth at gcc dot gnu.org
  45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-21  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org

--- Comment #45 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, now DOM and incremental SSA are the new main offenders (regression from
4.8).
This is maybe the same as PR61515.  Compile-time is up to 9 minutes (from two).

Jeff - any progress on PR61515?  This is really a major regression for -O1
compile-time.  Can we at least disable the offending code at
!flag_expensive_optimizations on trunk and the 4.9 branch?  And fix it properly
for GCC 5?


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2014-10-21  9:22 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
2010-11-21 17:19 ` dominiq at lps dot ens.fr
2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
2010-11-21 19:13 ` rguenth at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
2010-11-24 12:40 ` rguenth at gcc dot gnu.org
2010-11-24 14:19 ` rguenth at gcc dot gnu.org
2011-01-03 20:25 ` rguenth at gcc dot gnu.org
2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
2011-06-27 16:23 ` jakub at gcc dot gnu.org
2011-10-26 17:37 ` jakub at gcc dot gnu.org
2012-01-05  1:48 ` pinskia at gcc dot gnu.org
2012-01-19 15:20 ` matz at gcc dot gnu.org
2012-01-19 15:26 ` matz at gcc dot gnu.org
2012-01-26 16:23 ` matz at gcc dot gnu.org
2012-03-01 14:45 ` jakub at gcc dot gnu.org
2012-08-21  8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
2012-08-21  9:44 ` steven at gcc dot gnu.org
2012-08-21 10:00 ` rguenther at suse dot de
2012-08-21 14:11 ` rguenth at gcc dot gnu.org
2012-08-21 14:57 ` rguenth at gcc dot gnu.org
2012-08-22 11:43 ` rguenth at gcc dot gnu.org
2012-08-22 13:21 ` rguenth at gcc dot gnu.org
2012-08-22 17:34 ` stevenb.gcc at gmail dot com
2012-08-23  7:14 ` rguenther at suse dot de
2012-08-23 11:43 ` rguenth at gcc dot gnu.org
2012-08-23 13:46 ` steven at gcc dot gnu.org
2012-09-03 10:46 ` rguenth at gcc dot gnu.org
2012-09-03 15:41 ` matz at gcc dot gnu.org
2012-09-04 13:18 ` rguenth at gcc dot gnu.org
2012-09-05 11:01 ` rguenth at gcc dot gnu.org
2012-09-05 13:30 ` rguenth at gcc dot gnu.org
2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
2014-01-16 13:49 ` rguenth at gcc dot gnu.org
2014-01-16 15:51 ` rguenth at gcc dot gnu.org
2014-01-17 11:37 ` rguenth at gcc dot gnu.org
2014-01-17 12:09 ` rguenth at gcc dot gnu.org
2014-01-17 14:49 ` rguenth at gcc dot gnu.org
2014-10-21  9:23 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).