public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/46590] New: long compile time with -O2 and many loops
@ 2010-11-21 13:23 tkoenig at gcc dot gnu.org
2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
` (45 more replies)
0 siblings, 46 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 13:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Summary: long compile time with -O2 and many loops
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Keywords: compile-time-hog
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: tkoenig@gcc.gnu.org
The attached program, which is a stress-test to test array assignment
dependencies for gfortran, compiles just about forever at -O2;
so far, it's 90 minutes and still no sign of finishing.
Compilation time without -O2 is 45 seconds.
Here is the perl script used to generate the test case.
#! /usr/bin/perl
$cnt = 0;
# @ind = (-6, -3, -1, 1, 3, 6);
@ind = (-5, -3, -1, 1, 3, 5);
$n = 6;
print <<EOF;
program main
implicit none
integer, dimension(-100:100) :: a,b,original
integer :: i
original = [(15*i+2,i=-100,100)]
EOF
foreach $stride_l (@ind) {
foreach $stride_r (@ind) {
for ($start_l = 1; $start_l < $n; $start_l ++) {
for ($start_r = 1; $start_r < $n; $start_r ++) {
for ($times = 0; $times < $n; $times ++) {
$end_l = $start_l + ($times-1)*$stride_l;
$end_r = $start_r + ($times-1)*$stride_r;
print <<EOF;
a = original
b = original
a($start_l:$end_l:$stride_l) = a($start_r:$end_r:$stride_r)
b($start_l:$end_l:$stride_l) = original($start_r:$end_r:$stride_r)
if (any (a /= b)) then
print *,$start_l, $end_l,$stride_l,$start_r,$end_r, $stride_r
call abort
end if
EOF
}
}
}
}
}
print "end program main\n";
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
@ 2010-11-21 13:25 ` tkoenig at gcc dot gnu.org
2010-11-21 17:19 ` dominiq at lps dot ens.fr
` (44 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 13:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 13:22:50 UTC ---
Created attachment 22475
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22475
gzipped test case
Test case.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
@ 2010-11-21 17:19 ` dominiq at lps dot ens.fr
2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
` (43 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: dominiq at lps dot ens.fr @ 2010-11-21 17:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2010-11-21 16:59:39 UTC ---
On my laptop the test in comment #1 takes ~4mn to compile and at -O1 I had to
interrupt the compilation after ~20mn during which all the time was spent in
memory pagination!-(with 4Gb of RAM).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
2010-11-21 17:19 ` dominiq at lps dot ens.fr
@ 2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
2010-11-21 19:13 ` rguenth at gcc dot gnu.org
` (42 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 18:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |memory-hog
--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 18:49:20 UTC ---
Yes, this is a memory hog as well:
time ~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -O2 gener-max.f90
MAIN__ main
Analyzing compilation unit
{GC 183863k -> 103486k}Performing interprocedural optimizations
<*free_lang_data> <visibility> <early_local_cleanups> {GC 156854k -> 151699k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:
MAIN__ {GC 281998k -> 189095k} {GC 393064k -> 341214k} {GC 554029k -> 337458k}
{GC 477071k -> 337199k} {GC 557246k -> 473808k}
f951: out of memory allocating 968552 bytes after a total of 4310765568 bytes
real 150m58.614s
user 150m16.083s
sys 0m11.924s
ig25@linux-fd1f:~/Krempel/Dep-c>
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 19:13 ` rguenth at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
` (41 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-21 19:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-21 18:56:58 UTC ---
Please strip down the testcase and provide -ftime-report output (I suppose
you hit IVOPTs slowness, so try -fno-ivopts).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2010-11-21 19:13 ` rguenth at gcc dot gnu.org
@ 2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
` (40 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 19:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 19:48:41 UTC ---
Without -fno-ivopts:
ig25@linux-fd1f:~/Krempel/Dep-c> time
~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2
gener-4.f90
MAIN__ main
Analyzing compilation unit
{GC 44026k -> 25146k}Performing interprocedural optimizations
<*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:
MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC
75220k -> 51694k} {GC 89661k -> 77264k} {GC 500191k -> 73724k} {GC 106651k ->
72832k} main
Execution times (seconds)
garbage collection : 1.05 ( 0%) usr 0.03 ( 2%) sys 1.09 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
2465 kB ( 0%) ggc
callgraph optimization: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
2 kB ( 0%) ggc
ipa pure const : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
344 kB ( 0%) ggc
CFG verifier : 1.35 ( 0%) usr 0.00 ( 0%) sys 1.35 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
df scan insns : 0.17 ( 0%) usr 0.01 ( 1%) sys 0.18 ( 0%) wall
0 kB ( 0%) ggc
df multiple defs : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 2.04 ( 1%) usr 0.01 ( 1%) sys 2.11 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 1.02 ( 0%) usr 0.00 ( 0%) sys 1.06 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.29 ( 0%) usr 0.01 ( 1%) sys 0.24 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.43 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 0%) wall
2460 kB ( 0%) ggc
register information : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
2816 kB ( 0%) ggc
alias stmt walking : 192.54 (71%) usr 0.22 (16%) sys 193.02 (71%) wall
7023 kB ( 1%) ggc
register scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
20 kB ( 0%) ggc
rebuild jump labels : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
0 kB ( 0%) ggc
parser : 0.43 ( 0%) usr 0.03 ( 2%) sys 0.47 ( 0%) wall
20481 kB ( 3%) ggc
inline heuristics : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.16 ( 0%) usr 0.02 ( 1%) sys 0.19 ( 0%) wall
17257 kB ( 3%) ggc
tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
5915 kB ( 1%) ggc
tree CFG cleanup : 1.06 ( 0%) usr 0.01 ( 1%) sys 1.07 ( 0%) wall
602 kB ( 0%) ggc
tree VRP : 0.40 ( 0%) usr 0.01 ( 1%) sys 0.37 ( 0%) wall
4754 kB ( 1%) ggc
tree copy propagation : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
75 kB ( 0%) ggc
tree find ref. vars : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
865 kB ( 0%) ggc
tree PTA : 2.24 ( 1%) usr 0.06 ( 4%) sys 2.30 ( 1%) wall
2253 kB ( 0%) ggc
tree PHI insertion : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
2203 kB ( 0%) ggc
tree SSA rewrite : 3.18 ( 1%) usr 0.00 ( 0%) sys 2.75 ( 1%) wall
6503 kB ( 1%) ggc
tree SSA other : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
32 kB ( 0%) ggc
tree SSA incremental : 6.64 ( 2%) usr 0.01 ( 1%) sys 6.73 ( 2%) wall
1372 kB ( 0%) ggc
tree operand scan : 0.17 ( 0%) usr 0.09 ( 7%) sys 0.28 ( 0%) wall
13590 kB ( 2%) ggc
dominator optimization: 0.38 ( 0%) usr 0.01 ( 1%) sys 0.40 ( 0%) wall
22013 kB ( 3%) ggc
tree SRA : 0.09 ( 0%) usr 0.02 ( 1%) sys 0.15 ( 0%) wall
11123 kB ( 2%) ggc
tree CCP : 0.45 ( 0%) usr 0.02 ( 1%) sys 0.43 ( 0%) wall
838 kB ( 0%) ggc
tree reassociation : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
559 kB ( 0%) ggc
tree PRE : 0.53 ( 0%) usr 0.04 ( 3%) sys 0.67 ( 0%) wall
7992 kB ( 1%) ggc
tree FRE : 0.34 ( 0%) usr 0.01 ( 1%) sys 0.42 ( 0%) wall
530 kB ( 0%) ggc
tree code sinking : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall
360 kB ( 0%) ggc
tree linearize phis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
337 kB ( 0%) ggc
tree phiprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.08 ( 0%) usr 0.03 ( 2%) sys 0.05 ( 0%) wall
64 kB ( 0%) ggc
tree aggressive DCE : 0.23 ( 0%) usr 0.03 ( 2%) sys 0.27 ( 0%) wall
4227 kB ( 1%) ggc
tree buildin call DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.83 ( 0%) usr 0.00 ( 0%) sys 0.85 ( 0%) wall
0 kB ( 0%) ggc
tree loop bounds : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
821 kB ( 0%) ggc
tree loop invariant motion: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%)
wall 0 kB ( 0%) ggc
tree canonical iv : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1012 kB ( 0%) ggc
scev constant prop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
174 kB ( 0%) ggc
complete unrolling : 1.00 ( 0%) usr 0.03 ( 2%) sys 0.86 ( 0%) wall
16838 kB ( 2%) ggc
tree iv optimization : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
11430 kB ( 2%) ggc
tree loop init : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
394 kB ( 0%) ggc
tree copy headers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
613 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree rename SSA copies: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 2.92 ( 1%) usr 0.00 ( 0%) sys 2.81 ( 1%) wall
0 kB ( 0%) ggc
tree STMT verifier : 5.27 ( 2%) usr 0.00 ( 0%) sys 5.41 ( 2%) wall
0 kB ( 0%) ggc
callgraph verifier : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 6.46 ( 2%) usr 0.01 ( 1%) sys 7.35 ( 3%) wall
0 kB ( 0%) ggc
dominance computation : 5.33 ( 2%) usr 0.01 ( 1%) sys 5.06 ( 2%) wall
0 kB ( 0%) ggc
control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
out of ssa : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
expand vars : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
726 kB ( 0%) ggc
expand : 0.44 ( 0%) usr 0.01 ( 1%) sys 0.45 ( 0%) wall
37845 kB ( 5%) ggc
post expand cleanups : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1 kB ( 0%) ggc
forward prop : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
1217 kB ( 0%) ggc
CSE : 0.97 ( 0%) usr 0.00 ( 0%) sys 0.97 ( 0%) wall
2127 kB ( 0%) ggc
dead code elimination : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 0.23 ( 0%) usr 0.01 ( 1%) sys 0.23 ( 0%) wall
4659 kB ( 1%) ggc
dead store elim2 : 0.50 ( 0%) usr 0.00 ( 0%) sys 0.51 ( 0%) wall
5526 kB ( 1%) ggc
loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
836 kB ( 0%) ggc
loop invariant motion : 5.21 ( 2%) usr 0.01 ( 1%) sys 5.23 ( 2%) wall
0 kB ( 0%) ggc
CPROP : 0.88 ( 0%) usr 0.03 ( 2%) sys 0.90 ( 0%) wall
13157 kB ( 2%) ggc
PRE : 10.87 ( 4%) usr 0.33 (24%) sys 11.24 ( 4%) wall
409967 kB (60%) ggc
CSE 2 : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall
729 kB ( 0%) ggc
branch prediction : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
1542 kB ( 0%) ggc
combiner : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
1844 kB ( 0%) ggc
if-conversion : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
559 kB ( 0%) ggc
regmove : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
integrated RA : 2.04 ( 1%) usr 0.21 (15%) sys 2.26 ( 1%) wall
5282 kB ( 1%) ggc
reload : 0.90 ( 0%) usr 0.00 ( 0%) sys 0.90 ( 0%) wall
13255 kB ( 2%) ggc
reload CSE regs : 1.28 ( 0%) usr 0.00 ( 0%) sys 1.28 ( 0%) wall
13060 kB ( 2%) ggc
zee : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
3 kB ( 0%) ggc
if-conversion 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
279 kB ( 0%) ggc
combine stack adjustments: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
354 kB ( 0%) ggc
hard reg cprop : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.34 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 1.84 ( 1%) usr 0.02 ( 1%) sys 1.85 ( 1%) wall
71 kB ( 0%) ggc
machine dep reorg : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
891 kB ( 0%) ggc
final : 0.31 ( 0%) usr 0.01 ( 1%) sys 0.31 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 0.39 ( 0%) usr 0.01 ( 1%) sys 0.38 ( 0%) wall
3151 kB ( 0%) ggc
remove unused locals : 0.43 ( 0%) usr 0.00 ( 0%) sys 0.38 ( 0%) wall
0 kB ( 0%) ggc
address taken : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
32 kB ( 0%) ggc
unaccounted todo : 0.07 ( 0%) usr 0.01 ( 1%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
verify loop closed : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
0 kB ( 0%) ggc
verify RTL sharing : 1.55 ( 1%) usr 0.00 ( 0%) sys 1.54 ( 1%) wall
0 kB ( 0%) ggc
repair loop structures: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
145 kB ( 0%) ggc
TOTAL : 270.29 1.38 272.32
688523 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
real 4m32.395s
user 4m30.299s
sys 0m1.406s
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
` (39 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 19:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 19:39:23 UTC ---
With -fno-ivopts, the alias statement walking is the clear culprit
(a somewhat shorter test case):
ig25@linux-fd1f:~/Krempel/Dep-c> time
~/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 -ftime-report -O2 -fno-ivopts
gener-4.f90
MAIN__ main
Analyzing compilation unit
{GC 44026k -> 25146k}Performing interprocedural optimizations
<*free_lang_data> <visibility> <early_local_cleanups> {GC 37683k -> 36011k}
<whole-program> <ipa-profile> <cp> <inline> <pure-const> <static-var>Assembling
functions:
MAIN__ {GC 54739k -> 37948k} {GC 61997k -> 51931k} {GC 77889k -> 51575k} {GC
103199k -> 77477k} {GC 100723k -> 74180k} {GC 98147k -> 73521k} main
Execution times (seconds)
garbage collection : 0.79 ( 0%) usr 0.00 ( 0%) sys 0.81 ( 0%) wall
0 kB ( 0%) ggc
callgraph construction: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
2465 kB ( 1%) ggc
callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
2 kB ( 0%) ggc
ipa cp : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
256 kB ( 0%) ggc
ipa profile : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
344 kB ( 0%) ggc
CFG verifier : 1.40 ( 0%) usr 0.01 ( 0%) sys 1.35 ( 0%) wall
0 kB ( 0%) ggc
trivially dead code : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
0 kB ( 0%) ggc
df scan insns : 0.20 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall
0 kB ( 0%) ggc
df multiple defs : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 2.12 ( 1%) usr 0.01 ( 0%) sys 2.15 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 0.93 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.26 ( 0%) usr 0.00 ( 0%) sys 0.23 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.44 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall
2460 kB ( 1%) ggc
register information : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 0.30 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
2816 kB ( 1%) ggc
alias stmt walking : 205.44 (72%) usr 1.33 (58%) sys 207.39 (72%) wall
7023 kB ( 3%) ggc
register scan : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
rebuild jump labels : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
0 kB ( 0%) ggc
parser : 0.44 ( 0%) usr 0.03 ( 1%) sys 0.47 ( 0%) wall
20481 kB ( 8%) ggc
inline heuristics : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall
17257 kB ( 6%) ggc
tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
5915 kB ( 2%) ggc
tree CFG cleanup : 1.08 ( 0%) usr 0.02 ( 1%) sys 1.15 ( 0%) wall
602 kB ( 0%) ggc
tree VRP : 0.44 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall
4778 kB ( 2%) ggc
tree copy propagation : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.25 ( 0%) wall
75 kB ( 0%) ggc
tree find ref. vars : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
865 kB ( 0%) ggc
tree PTA : 2.31 ( 1%) usr 0.06 ( 3%) sys 2.38 ( 1%) wall
2253 kB ( 1%) ggc
tree PHI insertion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
2203 kB ( 1%) ggc
tree SSA rewrite : 3.08 ( 1%) usr 0.01 ( 0%) sys 5.10 ( 2%) wall
6503 kB ( 2%) ggc
tree SSA other : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
32 kB ( 0%) ggc
tree SSA incremental : 6.20 ( 2%) usr 0.03 ( 1%) sys 7.43 ( 3%) wall
1372 kB ( 1%) ggc
tree operand scan : 0.13 ( 0%) usr 0.10 ( 4%) sys 0.19 ( 0%) wall
13430 kB ( 5%) ggc
dominator optimization: 0.40 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall
22013 kB ( 8%) ggc
tree SRA : 0.12 ( 0%) usr 0.02 ( 1%) sys 0.14 ( 0%) wall
11123 kB ( 4%) ggc
tree CCP : 0.47 ( 0%) usr 0.02 ( 1%) sys 0.52 ( 0%) wall
838 kB ( 0%) ggc
tree PHI const/copy prop: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree reassociation : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
559 kB ( 0%) ggc
tree PRE : 0.65 ( 0%) usr 0.05 ( 2%) sys 0.70 ( 0%) wall
7992 kB ( 3%) ggc
tree FRE : 0.43 ( 0%) usr 0.05 ( 2%) sys 0.38 ( 0%) wall
530 kB ( 0%) ggc
tree code sinking : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
360 kB ( 0%) ggc
tree linearize phis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
337 kB ( 0%) ggc
tree phiprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.10 ( 0%) usr 0.04 ( 2%) sys 0.11 ( 0%) wall
64 kB ( 0%) ggc
tree aggressive DCE : 0.25 ( 0%) usr 0.01 ( 0%) sys 0.29 ( 0%) wall
4194 kB ( 2%) ggc
tree DSE : 0.90 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall
0 kB ( 0%) ggc
tree loop bounds : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
821 kB ( 0%) ggc
tree loop invariant motion: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%)
wall 0 kB ( 0%) ggc
tree canonical iv : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
1012 kB ( 0%) ggc
complete unrolling : 1.38 ( 0%) usr 0.05 ( 2%) sys 1.84 ( 1%) wall
16838 kB ( 6%) ggc
tree loop init : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
394 kB ( 0%) ggc
tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree copy headers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
613 kB ( 0%) ggc
tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree rename SSA copies: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 2.85 ( 1%) usr 0.01 ( 0%) sys 2.99 ( 1%) wall
0 kB ( 0%) ggc
tree STMT verifier : 5.48 ( 2%) usr 0.00 ( 0%) sys 5.55 ( 2%) wall
0 kB ( 0%) ggc
tree switch initialization conversion: 0.00 ( 0%) usr 0.00 ( 0%) sys
0.01 ( 0%) wall 0 kB ( 0%) ggc
callgraph verifier : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 6.84 ( 2%) usr 0.01 ( 0%) sys 4.58 ( 2%) wall
0 kB ( 0%) ggc
dominance computation : 4.91 ( 2%) usr 0.00 ( 0%) sys 3.66 ( 1%) wall
0 kB ( 0%) ggc
control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
out of ssa : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
expand vars : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
1109 kB ( 0%) ggc
expand : 0.54 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall
38882 kB (14%) ggc
post expand cleanups : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1 kB ( 0%) ggc
forward prop : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.21 ( 0%) wall
1581 kB ( 1%) ggc
CSE : 1.07 ( 0%) usr 0.00 ( 0%) sys 1.06 ( 0%) wall
1762 kB ( 1%) ggc
dead code elimination : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
4983 kB ( 2%) ggc
dead store elim2 : 0.52 ( 0%) usr 0.00 ( 0%) sys 0.52 ( 0%) wall
6295 kB ( 2%) ggc
loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
836 kB ( 0%) ggc
loop invariant motion : 5.52 ( 2%) usr 0.00 ( 0%) sys 5.50 ( 2%) wall
0 kB ( 0%) ggc
CPROP : 0.89 ( 0%) usr 0.04 ( 2%) sys 0.94 ( 0%) wall
8097 kB ( 3%) ggc
PRE : 10.12 ( 4%) usr 0.11 ( 5%) sys 10.28 ( 4%) wall
2091 kB ( 1%) ggc
CSE 2 : 0.55 ( 0%) usr 0.00 ( 0%) sys 0.55 ( 0%) wall
547 kB ( 0%) ggc
branch prediction : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
1542 kB ( 1%) ggc
combiner : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.36 ( 0%) wall
4406 kB ( 2%) ggc
if-conversion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
559 kB ( 0%) ggc
regmove : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
integrated RA : 2.07 ( 1%) usr 0.18 ( 8%) sys 2.26 ( 1%) wall
5525 kB ( 2%) ggc
reload : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%) wall
14653 kB ( 5%) ggc
reload CSE regs : 1.30 ( 0%) usr 0.00 ( 0%) sys 1.32 ( 0%) wall
13495 kB ( 5%) ggc
zee : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
thread pro- & epilogue: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
3 kB ( 0%) ggc
if-conversion 2 : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
279 kB ( 0%) ggc
combine stack adjustments: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
354 kB ( 0%) ggc
hard reg cprop : 0.35 ( 0%) usr 0.00 ( 0%) sys 0.34 ( 0%) wall
0 kB ( 0%) ggc
scheduling 2 : 1.83 ( 1%) usr 0.02 ( 1%) sys 1.94 ( 1%) wall
71 kB ( 0%) ggc
machine dep reorg : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
0 kB ( 0%) ggc
reorder blocks : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
891 kB ( 0%) ggc
final : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall
0 kB ( 0%) ggc
tree if-combine : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 0.43 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall
3151 kB ( 1%) ggc
remove unused locals : 0.40 ( 0%) usr 0.00 ( 0%) sys 0.45 ( 0%) wall
0 kB ( 0%) ggc
address taken : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
32 kB ( 0%) ggc
unaccounted todo : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
verify loop closed : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
0 kB ( 0%) ggc
verify RTL sharing : 1.59 ( 1%) usr 0.00 ( 0%) sys 1.61 ( 1%) wall
0 kB ( 0%) ggc
repair loop structures: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
145 kB ( 0%) ggc
TOTAL : 283.50 2.31 286.74
270936 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.
real 4m46.813s
user 4m43.509s
sys 0m2.336s
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (5 preceding siblings ...)
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
` (38 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 21:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #7 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 21:07:08 UTC ---
Created attachment 22478
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22478
Test case from comment #5 and comment #6
Here is a test case that is a little shorter.
A bit hard to trim it down further when the bug depends on test
case size...
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (6 preceding siblings ...)
2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
@ 2010-11-21 21:46 ` tkoenig at gcc dot gnu.org
2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
` (37 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 21:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |4.4.1
Summary|long compile time with -O2 |[4.6 Regression] long
|and many loops |compile time with -O2 and
| |many loops
Known to fail| |4.6.0
--- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2010-11-21 21:13:24 UTC ---
4.4 compile time is much lower:
ig25@linux-fd1f:~/Krempel/Dep-c> time /usr/bin/gfortran-4.4 -O2 gener-4.f90
real 0m59.893s
user 0m58.172s
sys 0m0.664s
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (7 preceding siblings ...)
2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
@ 2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
` (36 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2010-11-21 23:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.6.0
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (9 preceding siblings ...)
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
@ 2010-11-22 13:30 ` rguenth at gcc dot gnu.org
2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
` (34 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 13:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2010.11.22 13:13:03
CC| |rguenth at gcc dot gnu.org
Ever Confirmed|0 |1
--- Comment #9 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 13:13:03 UTC ---
It's a very large monolithic function. And as usual we have a gazillion
amount of local IO state variables.
On trunk with release checking I see:
A quarter of the testcase:
alias stmt walking : 15.94 (59%) usr 0.03 ( 8%) sys 16.00 (57%) wall
1845 kB ( 2%) ggc
TOTAL : 27.08 0.38 27.89
96306 kB
Half of the testcase:
alias stmt walking : 63.31 (68%) usr 0.51 (31%) sys 64.06 (67%) wall
3684 kB ( 2%) ggc
TOTAL : 93.52 1.66 95.57
241871 kB
All of the testcase:
alias stmt walking : 259.19 (73%) usr 0.78 (26%) sys 261.79 (72%) wall
7023 kB ( 1%) ggc
TOTAL : 356.27 2.98 361.57
690719 kB
so it's definitely nearly quadratic (but that's expected).
4.5.x for a quarter of the testcase has:
alias stmt walking : 93.10 (88%) usr 0.03 ( 8%) sys 93.31 (87%) wall
0 kB ( 0%) ggc
TOTAL : 106.11 0.40 106.93
87895 kB
so trunk is already a lot better.
Removing the alias stmt walk timevar gets us to the following on trunk
(quarter of the testcase again):
tree PRE : 12.93 (47%) usr 0.00 ( 0%) sys 12.98 (46%) wall
3607 kB ( 4%) ggc
TOTAL : 27.57 0.34 27.99
96324 kB
What is costly is translating things through the loop bodies. We can
improve this a lot by properly marking the I/O structs as dead once
they are no longer used and before they are used first. The proposed
virtual kill stmts could be used for that. We could also build this
kind of lifeness information up-front and use it to limit the walking
(but that again is only trivial for non-address taken variables, which
the I/O structs are not).
Anyway, confirmed. Fortran I/O and array descriptor temporaries really
need re-use (I proposed a patch for I/O ones once but it was shot down
because of async-I/O).
Removing all prints from the testcase gives:
alias stmt walking : 132.44 (61%) usr 0.79 (30%) sys 133.47 (61%) wall
7023 kB ( 1%) ggc
TOTAL : 216.55 2.67 219.78
645229 kB
As all arrays are not address-taken we really look for CSE opportunities
up to the very start of the function (PRE translates the in-loop
references from the any (a /= b) loop to the loop header using the
constant initial index and tries to CSE that, but it doesn't actually
succeed - which is another bug of course, it should look it up from
original resp. A.0).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (8 preceding siblings ...)
2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
@ 2010-11-22 13:30 ` rguenth at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
` (35 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 13:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #10 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 13:29:56 UTC ---
At -O1 we have
tree SSA rewrite : 23.32 (13%) usr 0.06 ( 3%) sys 22.80 (13%) wall
6392 kB ( 3%) ggc
tree SSA incremental : 24.05 (14%) usr 0.07 ( 4%) sys 25.27 (14%) wall
3533 kB ( 2%) ggc
tree FRE : 45.17 (25%) usr 0.51 (29%) sys 45.83 (26%) wall
2048 kB ( 1%) ggc
dominance frontiers : 29.37 (17%) usr 0.01 ( 1%) sys 29.30 (16%) wall
0 kB ( 0%) ggc
dominance computation : 19.49 (11%) usr 0.02 ( 1%) sys 19.29 (11%) wall
0 kB ( 0%) ggc
loop invariant motion : 12.25 ( 7%) usr 0.01 ( 1%) sys 12.21 ( 7%) wall
648 kB ( 0%) ggc
TOTAL : 177.31 1.74 179.57
196657 kB
the testcase is simply large. And yes, the FRE/PRE alias stmt walks are
not limited (compared to the DCE/DSE ones). Maybe it's time to add this
to cover this kind of artificial testcases.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (10 preceding siblings ...)
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
@ 2010-11-22 16:19 ` rguenth at gcc dot gnu.org
2010-11-24 12:40 ` rguenth at gcc dot gnu.org
` (33 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-22 16:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #11 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-22 16:19:37 UTC ---
(In reply to comment #10)
> At -O1 we have
>
> tree SSA rewrite : 23.32 (13%) usr 0.06 ( 3%) sys 22.80 (13%) wall
> 6392 kB ( 3%) ggc
> tree SSA incremental : 24.05 (14%) usr 0.07 ( 4%) sys 25.27 (14%) wall
> 3533 kB ( 2%) ggc
The above is almost completely loop header copying.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (11 preceding siblings ...)
2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
@ 2010-11-24 12:40 ` rguenth at gcc dot gnu.org
2010-11-24 14:19 ` rguenth at gcc dot gnu.org
` (32 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-24 12:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org
|gnu.org |
--- Comment #12 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-24 12:22:48 UTC ---
Mine for now.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (12 preceding siblings ...)
2010-11-24 12:40 ` rguenth at gcc dot gnu.org
@ 2010-11-24 14:19 ` rguenth at gcc dot gnu.org
2011-01-03 20:25 ` rguenth at gcc dot gnu.org
` (31 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-11-24 14:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-24 13:50:21 UTC ---
The I/O parts of the FRE cost are due to value-numbering stores in
visit_reference_op_store. They can be drastically cut by an equivalent
of (not generating code)
Index: trans-io.c
===================================================================
--- trans-io.c (revision 167111)
+++ trans-io.c (working copy)
@@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
gfc_init_block (&post_iu_block);
var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
+ gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
set_error_locus (&block, var, &code->loc);
which constrains lifetime begin of the dt_parm structs (which have their
address taken and thus lifetime analysis will have a hard time). That
prevents store value numbering to consider all dominated blocks (which
only contain the loops). The above brings down the previous -O1 numbers to
tree FRE : 28.34 (18%) usr 0.39 (18%) sys 28.83 (18%) wall
1906 kB ( 1%) ggc
TOTAL : 159.87 2.17 162.72
201091 kB
all other variables are re-used and thus their lifetime isn't limited.
The loads from original[] are the only unconstrained ones, handling
those exhibits quadratic behavior. It walks up to the first statement
in the function which is
MEM[(c_char * {ref-all})&original] = MEM[(c_char * {ref-all})&A.0];
and then fails to constant fold using the static initializer of A.0.
FRE optimizes away the self-assignments
a(1:6:-5) = a(1:6:-5)
it doesn't do anything useful to the other loops.
Overall it's not completely unreasonable what FRE does for a, b and original
(we could have propagated A.0 to all uses of original). The I/O struct
walks are the only thing that would be nice to fix.
And of course maybe limit walking in general.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (13 preceding siblings ...)
2010-11-24 14:19 ` rguenth at gcc dot gnu.org
@ 2011-01-03 20:25 ` rguenth at gcc dot gnu.org
2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
` (30 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-03 20:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (14 preceding siblings ...)
2011-01-03 20:25 ` rguenth at gcc dot gnu.org
@ 2011-03-25 19:55 ` jakub at gcc dot gnu.org
2011-06-27 16:23 ` jakub at gcc dot gnu.org
` (29 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-03-25 19:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.6.0 |4.6.1
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-03-25 19:52:13 UTC ---
GCC 4.6.0 is being released, adjusting target milestone.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (15 preceding siblings ...)
2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
@ 2011-06-27 16:23 ` jakub at gcc dot gnu.org
2011-10-26 17:37 ` jakub at gcc dot gnu.org
` (28 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-06-27 16:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.6.1 |4.6.2
--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-27 12:32:42 UTC ---
GCC 4.6.1 is being released.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (16 preceding siblings ...)
2011-06-27 16:23 ` jakub at gcc dot gnu.org
@ 2011-10-26 17:37 ` jakub at gcc dot gnu.org
2012-01-05 1:48 ` pinskia at gcc dot gnu.org
` (27 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-26 17:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.6.2 |4.6.3
--- Comment #16 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-26 17:13:27 UTC ---
GCC 4.6.2 is being released.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (17 preceding siblings ...)
2011-10-26 17:37 ` jakub at gcc dot gnu.org
@ 2012-01-05 1:48 ` pinskia at gcc dot gnu.org
2012-01-19 15:20 ` matz at gcc dot gnu.org
` (26 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-01-05 1:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at gcc dot gnu.org
--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-01-05 01:47:57 UTC ---
On the trunk most of the time at -O0 is spent doing add_scope_conflicts.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (18 preceding siblings ...)
2012-01-05 1:48 ` pinskia at gcc dot gnu.org
@ 2012-01-19 15:20 ` matz at gcc dot gnu.org
2012-01-19 15:26 ` matz at gcc dot gnu.org
` (25 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-19 15:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #18 from Michael Matz <matz at gcc dot gnu.org> 2012-01-19 15:06:14 UTC ---
Author: matz
Date: Thu Jan 19 15:06:04 2012
New Revision: 183305
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183305
Log:
PR tree-optimization/46590
* cfgexpand.c (add_scope_conflicts_1): New old_conflicts argument,
use it in remembering which conflicts we already created.
(add_scope_conflicts): Adjust call to above, (de)allocate helper
bitmap.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (19 preceding siblings ...)
2012-01-19 15:20 ` matz at gcc dot gnu.org
@ 2012-01-19 15:26 ` matz at gcc dot gnu.org
2012-01-26 16:23 ` matz at gcc dot gnu.org
` (24 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-19 15:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #19 from Michael Matz <matz at gcc dot gnu.org> 2012-01-19 15:10:43 UTC ---
The var-expansion slowness is fixed again. The rest of course still applies.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (20 preceding siblings ...)
2012-01-19 15:26 ` matz at gcc dot gnu.org
@ 2012-01-26 16:23 ` matz at gcc dot gnu.org
2012-03-01 14:45 ` jakub at gcc dot gnu.org
` (23 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-01-26 16:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #20 from Michael Matz <matz at gcc dot gnu.org> 2012-01-26 15:50:43 UTC ---
Author: matz
Date: Thu Jan 26 15:50:33 2012
New Revision: 183566
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183566
Log:
PR tree-optimization/46590
* cfgexpand.c: Revert last change (r183305).
* gimplify.c (gimplify_bind_expr): Add clobbers for all non-gimple
regs.
* tree-eh.c (cleanup_empty_eh): Try to optimize clobbers before
checking for emptiness.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
trunk/gcc/gimplify.c
trunk/gcc/tree-eh.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.5/4.6/4.7 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (21 preceding siblings ...)
2012-01-26 16:23 ` matz at gcc dot gnu.org
@ 2012-03-01 14:45 ` jakub at gcc dot gnu.org
2012-08-21 8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
` (22 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-03-01 14:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.6.3 |4.6.4
--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-03-01 14:38:15 UTC ---
GCC 4.6.3 is being released.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (22 preceding siblings ...)
2012-03-01 14:45 ` jakub at gcc dot gnu.org
@ 2012-08-21 8:10 ` rguenth at gcc dot gnu.org
2012-08-21 9:44 ` steven at gcc dot gnu.org
` (21 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 8:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |nbhargava at google dot com
--- Comment #22 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 08:09:55 UTC ---
*** Bug 54337 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (23 preceding siblings ...)
2012-08-21 8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
@ 2012-08-21 9:44 ` steven at gcc dot gnu.org
2012-08-21 10:00 ` rguenther at suse dot de
` (20 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-21 9:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |steven at gcc dot gnu.org
--- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-21 09:42:42 UTC ---
(In reply to comment #13)
> The I/O parts of the FRE cost are due to value-numbering stores in
> visit_reference_op_store. They can be drastically cut by an equivalent
> of (not generating code)
>
> Index: trans-io.c
> ===================================================================
> --- trans-io.c (revision 167111)
> +++ trans-io.c (working copy)
> @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
> gfc_init_block (&post_iu_block);
>
> var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> + gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
>
> set_error_locus (&block, var, &code->loc);
>
You didn't post/commit this, but it looks like a reasonable change to me.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (24 preceding siblings ...)
2012-08-21 9:44 ` steven at gcc dot gnu.org
@ 2012-08-21 10:00 ` rguenther at suse dot de
2012-08-21 14:11 ` rguenth at gcc dot gnu.org
` (19 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenther at suse dot de @ 2012-08-21 10:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-21 09:59:41 UTC ---
On Tue, 21 Aug 2012, steven at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
>
> Steven Bosscher <steven at gcc dot gnu.org> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |steven at gcc dot gnu.org
>
> --- Comment #23 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-21 09:42:42 UTC ---
> (In reply to comment #13)
> > The I/O parts of the FRE cost are due to value-numbering stores in
> > visit_reference_op_store. They can be drastically cut by an equivalent
> > of (not generating code)
> >
> > Index: trans-io.c
> > ===================================================================
> > --- trans-io.c (revision 167111)
> > +++ trans-io.c (working copy)
> > @@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
> > gfc_init_block (&post_iu_block);
> >
> > var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
> > + gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));
> >
> > set_error_locus (&block, var, &code->loc);
> >
>
> You didn't post/commit this, but it looks like a reasonable change to me.
I think this is obsolete now with Michas change to have CLOBBERs. Or
if still useful, the above should be a CLOBBER now.
Richard.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (25 preceding siblings ...)
2012-08-21 10:00 ` rguenther at suse dot de
@ 2012-08-21 14:11 ` rguenth at gcc dot gnu.org
2012-08-21 14:57 ` rguenth at gcc dot gnu.org
` (18 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 14:11 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #25 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 14:10:38 UTC ---
I have a patch for the SCCVN issue, but trying to gather current trunk status
first.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (26 preceding siblings ...)
2012-08-21 14:11 ` rguenth at gcc dot gnu.org
@ 2012-08-21 14:57 ` rguenth at gcc dot gnu.org
2012-08-22 11:43 ` rguenth at gcc dot gnu.org
` (17 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-21 14:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #26 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-21 14:56:41 UTC ---
For a somewhat reduced testcase I now get at -O1:
alias stmt walking : 105.51 (45%) usr 0.33 (24%) sys
tree SSA rewrite : 22.01 ( 9%) usr 0.04 ( 3%) sys
tree SSA incremental : 25.25 (11%) usr 0.07 ( 5%) sys
dominance frontiers : 35.35 (15%) usr 0.02 ( 1%) sys
dominance computation : 14.60 ( 6%) usr 0.09 ( 7%) sys
TOTAL : 234.28 1.38
as previously said most of the non-alias-stmt walk time is spent
on loop header copying. WIth -O1 -fno-tree-ch we have
alias stmt walking : 101.52 (68%) usr 0.37 (34%) sys
tree SSA rewrite : 4.14 ( 3%) usr 0.01 ( 1%) sys
tree SSA incremental : 8.00 ( 5%) usr 0.02 ( 2%) sys
dominance frontiers : 6.14 ( 4%) usr 0.01 ( 1%) sys
dominance computation : 4.74 ( 3%) usr 0.06 ( 6%) sys
TOTAL : 150.14 1.09
limiting stmt walk results in the ability to arbitrarily scale down its cost
with a param (we can either limit alias oracle query numbers or SCCVN
table lookups). With 100 alias oracle queries per load/store we end up with
alias stmt walking : 1.60 ( 3%) usr 0.05 ( 5%) sys
with 100 SCCVN table lookups the figure is
alias stmt walking : 1.60 ( 3%) usr 0.06 ( 6%) sys
one assumes the lookups are expensive, the other one assumes the walk itself
is.
Increasing the latter to 1000 SCCVN table lookup produces
alias stmt walking : 9.24 (16%) usr 0.18 (19%) sys
which is around the expected 10-fold increase (but still reasonable given
the artificial nature of the testcase). We could also, instead of
limiting each walk to a constant cost, have a per-function budget that
we can use up first before limiting further walks individually (helps
to not regress reasonably sized cases).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (27 preceding siblings ...)
2012-08-21 14:57 ` rguenth at gcc dot gnu.org
@ 2012-08-22 11:43 ` rguenth at gcc dot gnu.org
2012-08-22 13:21 ` rguenth at gcc dot gnu.org
` (16 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-22 11:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #27 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-22 11:39:56 UTC ---
The issue with loop header copying is that we use incremental SSA updating
with PHI insertion for each individual loop header copied. That computes
dominance frontiers which is always for the whole function.
I thought that loop header copying wouldn't need to insert new PHI nodes
and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
but appearantly that is not the case (I didn't inverstigate further here,
but possibly that's just because virtual operands are not in loop-closed
SSA form).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (28 preceding siblings ...)
2012-08-22 11:43 ` rguenth at gcc dot gnu.org
@ 2012-08-22 13:21 ` rguenth at gcc dot gnu.org
2012-08-22 17:34 ` stevenb.gcc at gmail dot com
` (15 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-22 13:21 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #28 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-22 13:17:42 UTC ---
Author: rguenth
Date: Wed Aug 22 13:17:26 2012
New Revision: 190594
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190594
Log:
2012-08-22 Richard Guenther <rguenther@suse.de>
PR tree-optimization/46590
* tree-ssa-alias.h (get_continuation_for_phi): Add alias query
counter output argument.
(walk_non_aliased_vuses): Add alias query counter argument
to the walker callback.
* tree-ssa-alias.c (maybe_skip_until): Add alias query counter
output argument and count alias queries.
(get_continuation_for_phi_1): Likewise.
(get_continuation_for_phi): Likewise.
(walk_non_aliased_vuses): Add alias query counter argument
to the walker callback and allow it to abort the walk by
returning -1.
* tree-ssa-pre.c (translate_vuse_through_block): Adjust.
* tree-ssa-sccvn.c (vn_reference_lookup_2): Add alias query
counter parmeter, abort walk if that is bigger than
--param sccvn-max-alias-queries-per-access.
* params.def (sccvn-max-alias-queries-per-access): New param.
* doc/invoke.texi (sccvn-max-alias-queries-per-access): Document.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/invoke.texi
trunk/gcc/params.def
trunk/gcc/tree-ssa-alias.c
trunk/gcc/tree-ssa-alias.h
trunk/gcc/tree-ssa-pre.c
trunk/gcc/tree-ssa-sccvn.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (29 preceding siblings ...)
2012-08-22 13:21 ` rguenth at gcc dot gnu.org
@ 2012-08-22 17:34 ` stevenb.gcc at gmail dot com
2012-08-23 7:14 ` rguenther at suse dot de
` (14 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: stevenb.gcc at gmail dot com @ 2012-08-22 17:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #29 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-22 17:33:00 UTC ---
> I thought that loop header copying wouldn't need to insert new PHI nodes
> and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> but appearantly that is not the case (I didn't inverstigate further here,
> but possibly that's just because virtual operands are not in loop-closed
> SSA form).
I'll experiment with VOPs in LC-SSA.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (30 preceding siblings ...)
2012-08-22 17:34 ` stevenb.gcc at gmail dot com
@ 2012-08-23 7:14 ` rguenther at suse dot de
2012-08-23 11:43 ` rguenth at gcc dot gnu.org
` (13 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenther at suse dot de @ 2012-08-23 7:14 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #30 from rguenther at suse dot de <rguenther at suse dot de> 2012-08-23 07:13:13 UTC ---
On Wed, 22 Aug 2012, stevenb.gcc at gmail dot com wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
>
> --- Comment #29 from stevenb.gcc at gmail dot com <stevenb.gcc at gmail dot com> 2012-08-22 17:33:00 UTC ---
> > I thought that loop header copying wouldn't need to insert new PHI nodes
> > and thus can do with TODO_update_ssa_no_phi if we are in loop-closed SSA form,
> > but appearantly that is not the case (I didn't inverstigate further here,
> > but possibly that's just because virtual operands are not in loop-closed
> > SSA form).
>
> I'll experiment with VOPs in LC-SSA.
You might have seen the patch I posted - it passed testing and I will
install it today.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (31 preceding siblings ...)
2012-08-23 7:14 ` rguenther at suse dot de
@ 2012-08-23 11:43 ` rguenth at gcc dot gnu.org
2012-08-23 13:46 ` steven at gcc dot gnu.org
` (12 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-23 11:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #31 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-23 11:41:50 UTC ---
Btw, I have experimented with
Index: tree-into-ssa.c
===================================================================
--- tree-into-ssa.c (revision 190594)
+++ tree-into-ssa.c (working copy)
@@ -3224,11 +3224,35 @@ update_ssa (unsigned update_flags)
bitmap_head *dfs;
/* If the caller requested PHI nodes to be added, compute
- dominance frontiers. */
+ dominance frontiers for all interesting blocks
+ up to the dominating start_bb. */
dfs = XNEWVEC (bitmap_head, last_basic_block);
FOR_EACH_BB (bb)
bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
- compute_dominance_frontiers (dfs);
+ EXECUTE_IF_SET_IN_BITMAP (blocks_to_update, 0, i, bi)
+ {
+ basic_block b = BASIC_BLOCK (i);
+ edge_iterator ei;
+ edge p;
+ if (EDGE_COUNT (b->preds) >= 2)
+ FOR_EACH_EDGE (p, ei, b->preds)
+ {
+ basic_block runner = p->src;
+ basic_block domsb;
+ if (runner == start_bb)
+ continue;
+
+ domsb = get_immediate_dominator (CDI_DOMINATORS, b);
+ while (runner != domsb)
+ {
+ if (!bitmap_set_bit (&dfs[runner->index],
+ b->index))
+ break;
+ runner = get_immediate_dominator (CDI_DOMINATORS,
+ runner);
+ }
+ }
+ }
if (sbitmap_first_set_bit (old_ssa_names) >= 0)
{
which helps reducing the time spent in computing dominance frontiers. But
as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
we only ever access dfs for entries we computed ... but we should,
looking at how compute_idf works(?).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (32 preceding siblings ...)
2012-08-23 11:43 ` rguenth at gcc dot gnu.org
@ 2012-08-23 13:46 ` steven at gcc dot gnu.org
2012-09-03 10:46 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-23 13:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #32 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-23 13:44:53 UTC ---
(In reply to comment #31)
> which helps reducing the time spent in computing dominance frontiers. But
> as we no longer have bitmaps but bitmap_heads in dfs it's hard to verify
> we only ever access dfs for entries we computed ... but we should,
> looking at how compute_idf works(?).
I don't understand this comment. You can still always do:
bitmap_empty_p (&dfs[bb->index]) to see if something was
computed for bb.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (33 preceding siblings ...)
2012-08-23 13:46 ` steven at gcc dot gnu.org
@ 2012-09-03 10:46 ` rguenth at gcc dot gnu.org
2012-09-03 15:41 ` matz at gcc dot gnu.org
` (10 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-03 10:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
AssignedTo|rguenth at gcc dot gnu.org |unassigned at gcc dot
| |gnu.org
--- Comment #33 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-03 10:45:06 UTC ---
Not mine anymore.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (34 preceding siblings ...)
2012-09-03 10:46 ` rguenth at gcc dot gnu.org
@ 2012-09-03 15:41 ` matz at gcc dot gnu.org
2012-09-04 13:18 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: matz at gcc dot gnu.org @ 2012-09-03 15:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #34 from Michael Matz <matz at gcc dot gnu.org> 2012-09-03 15:39:20 UTC ---
Author: matz
Date: Mon Sep 3 15:39:15 2012
New Revision: 190897
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190897
Log:
PR tree-optimization/46590
* tree-cfg.c (gimple_duplicate_sese_region): Don't update
SSA web here ...
* tree-ssa-loop-ch.c (copy_loop_headers): ... but here.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-ssa-loop-ch.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (35 preceding siblings ...)
2012-09-03 15:41 ` matz at gcc dot gnu.org
@ 2012-09-04 13:18 ` rguenth at gcc dot gnu.org
2012-09-05 11:01 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-04 13:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #35 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-04 13:17:53 UTC ---
With -O1 FRE uses loads of memory and compile-time. Not the value-numbering
itself but compute_avail ()! Bah. Of course AVAIL sets get bigger and
bigger going down the dominator tree. For FRE a single domwalk after
SCCVN computing "avail" and doing elimiation would be enough. Especially
as all the overhead in AVAIL is just SSA DEFs and their value.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (36 preceding siblings ...)
2012-09-04 13:18 ` rguenth at gcc dot gnu.org
@ 2012-09-05 11:01 ` rguenth at gcc dot gnu.org
2012-09-05 13:30 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-05 11:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #36 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-05 10:59:52 UTC ---
If I fix that (PR54489) by iterating over immediate dominators when querying
AVAIL_OUT
instead of accumulating then other loop opts quickly take over in compile-time,
but memory usage stays reasonable at -O1. LIM is now the pass that pushes
memory usage to 1.8GB - all other optimization passes are happy with just
~800MB. The issue with LIM is that it analyzes the whole function instead
of working on outermost loops at a time (PR54488). Then of course IRA
comes along and wrecks memory usage again ... (create_loop_tree_nodes).
One can tame down IRA a bit using -fno-ira-loop-pressure -fira-region=one.
We then arrive at roughly a constant 900MB memory usage for the full(!)
testcase at -O1 and
Execution times (seconds)
phase opt and generate : 495.90 (99%) usr 1.98 (98%) sys 499.91 (99%) wall
870508 kB (92%) ggc
df reaching defs : 19.16 ( 4%) usr 0.06 ( 3%) sys 19.18 ( 4%) wall
0 kB ( 0%) ggc
alias stmt walking : 28.75 ( 6%) usr 0.21 (10%) sys 29.12 ( 6%) wall
2336 kB ( 0%) ggc
tree SSA rewrite : 63.42 (13%) usr 0.02 ( 1%) sys 63.77 (13%) wall
18830 kB ( 2%) ggc
tree SSA incremental : 74.64 (15%) usr 0.03 ( 1%) sys 74.44 (15%) wall
25886 kB ( 3%) ggc
dominance frontiers : 101.71 (20%) usr 0.09 ( 4%) sys 102.17 (20%) wall
0 kB ( 0%) ggc
dominance computation : 52.56 (11%) usr 0.09 ( 4%) sys 53.35 (11%) wall
0 kB ( 0%) ggc
loop invariant motion : 101.20 (20%) usr 0.10 ( 5%) sys 101.75 (20%) wall
2700 kB ( 0%) ggc
TOTAL : 498.79 2.03 502.87
947764 kB
(all entries > 10s)
The incremental SSA stuff is complete loop unrolling / IV canonicalization
which does SSA update once per loop (similar to what loop header copying
formerly did). Fixing that leads to
Execution times (seconds)
phase opt and generate : 214.62 (99%) usr 1.53 (96%) sys 217.41 (99%) wall
870508 kB (92%) ggc
df reaching defs : 23.07 (11%) usr 0.01 ( 1%) sys 23.10 (10%) wall
0 kB ( 0%) ggc
alias stmt walking : 28.51 (13%) usr 0.23 (14%) sys 28.93 (13%) wall
2336 kB ( 0%) ggc
loop invariant motion : 105.43 (48%) usr 0.01 ( 1%) sys 106.22 (48%) wall
2700 kB ( 0%) ggc
TOTAL : 217.56 1.59 220.44
947764 kB
so RTL invariant motion is now the main offender ;)
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] [4.6/4.7/4.8 Regression] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (37 preceding siblings ...)
2012-09-05 11:01 ` rguenth at gcc dot gnu.org
@ 2012-09-05 13:30 ` rguenth at gcc dot gnu.org
2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
` (6 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-05 13:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #37 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-05 13:29:20 UTC ---
Author: rguenth
Date: Wed Sep 5 13:29:13 2012
New Revision: 190978
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=190978
Log:
2012-09-05 Richard Guenther <rguenther@suse.de>
PR tree-optimization/46590
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not
update SSA form here.
(canonicalize_induction_variables): Assert we do not need to
update SSA form.
(tree_unroll_loops_completely): Update SSA form here.
* tree-ssa-loop-manip.c (gimple_duplicate_loop_to_header_edge):
Do not verify loop-closed SSA form if SSA form is not up-to-date.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-loop-ivcanon.c
trunk/gcc/tree-ssa-loop-manip.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (38 preceding siblings ...)
2012-09-05 13:30 ` rguenth at gcc dot gnu.org
@ 2012-12-03 15:52 ` rguenth at gcc dot gnu.org
2014-01-16 13:49 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-03 15:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |47344
Target Milestone|4.6.4 |---
Summary|[4.6/4.7/4.8 Regression] |long compile time with -O2
|long compile time with -O2 |and many loops
|and many loops |
--- Comment #38 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-03 15:51:47 UTC ---
Generic compile-time/memory-usage umbrella regression tracking bug.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (39 preceding siblings ...)
2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
@ 2014-01-16 13:49 ` rguenth at gcc dot gnu.org
2014-01-16 15:51 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-16 13:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #39 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Thu Jan 16 13:48:51 2014
New Revision: 206663
URL: http://gcc.gnu.org/viewcvs?rev=206663&root=gcc&view=rev
Log:
2014-01-16 Richard Biener <rguenther@suse.de>
PR rtl-optimization/46590
* lcm.c (compute_antinout_edge): Use postorder iteration.
(compute_laterin): Use inverted postorder iteration.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lcm.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (40 preceding siblings ...)
2014-01-16 13:49 ` rguenth at gcc dot gnu.org
@ 2014-01-16 15:51 ` rguenth at gcc dot gnu.org
2014-01-17 11:37 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-16 15:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #40 from Richard Biener <rguenth at gcc dot gnu.org> ---
On the full testcase tree LIM uses too much memory (I didn't merge some of
the patches that only benefit this kind of testcase ... bah). Without LIM
we use around 1GB of memory at -O2 and
df reaching defs : 19.42 (14%) usr 35.49 (82%) sys 55.13 (30%) wall
0 kB ( 0%) ggc
alias stmt walking : 34.25 (25%) usr 0.47 ( 1%) sys 34.76 (19%) wall
1451 kB ( 0%) ggc
tree CFG cleanup : 7.75 ( 6%) usr 0.04 ( 0%) sys 7.73 ( 4%) wall
5002 kB ( 1%) ggc
tree PTA : 22.32 (16%) usr 0.10 ( 0%) sys 22.43 (12%) wall
7882 kB ( 1%) ggc
TOTAL : 137.59 43.38 180.98
866898 kB
-O3 is similar.
I have a patch for the memory usage issue.
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (41 preceding siblings ...)
2014-01-16 15:51 ` rguenth at gcc dot gnu.org
@ 2014-01-17 11:37 ` rguenth at gcc dot gnu.org
2014-01-17 12:09 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 11:37 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #41 from Richard Biener <rguenth at gcc dot gnu.org> ---
After I will have committed the patch the original testcase will use about 1GB
of memory (regardless of optimization level) on x86_64, 122s at -O1 and 161s
at -Ofast with the following main contributors:
> ./f951 -quiet -Ofast t2.f90 -ftime-report
Execution times (seconds)
phase opt and generate : 159.14 (98%) usr 116.92 (100%) sys 276.41 (99%) wall
821437 kB (92%) ggc
df reaching defs : 36.37 (22%) usr 91.52 (78%) sys 128.33 (46%) wall
0 kB ( 0%) ggc
df live regs : 2.60 ( 2%) usr 0.19 ( 0%) sys 2.84 ( 1%) wall
0 kB ( 0%) ggc
alias stmt walking : 36.03 (22%) usr 0.46 ( 0%) sys 36.53 (13%) wall
1451 kB ( 0%) ggc
parser (global) : 2.65 ( 2%) usr 0.08 ( 0%) sys 2.74 ( 1%) wall
70763 kB ( 8%) ggc
tree CFG cleanup : 8.23 ( 5%) usr 0.03 ( 0%) sys 8.19 ( 3%) wall
4982 kB ( 1%) ggc
tree PTA : 22.95 (14%) usr 0.17 ( 0%) sys 23.12 ( 8%) wall
7882 kB ( 1%) ggc
complete unrolling : 4.88 ( 3%) usr 0.14 ( 0%) sys 5.08 ( 2%) wall
80683 kB ( 9%) ggc
loop init : 5.93 ( 4%) usr 0.01 ( 0%) sys 5.92 ( 2%) wall
25175 kB ( 3%) ggc
integrated RA : 2.97 ( 2%) usr 0.03 ( 0%) sys 3.02 ( 1%) wall
64778 kB ( 7%) ggc
TOTAL : 161.83 117.01 279.22
892406 kB
(everything > 1% usr listed)
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (42 preceding siblings ...)
2014-01-17 11:37 ` rguenth at gcc dot gnu.org
@ 2014-01-17 12:09 ` rguenth at gcc dot gnu.org
2014-01-17 14:49 ` rguenth at gcc dot gnu.org
2014-10-21 9:23 ` rguenth at gcc dot gnu.org
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 12:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #42 from Richard Biener <rguenth at gcc dot gnu.org> ---
Another interesting bit - -O0 takes 17s (and also ~1GB memory) but -Og is
not much faster than -O1 (116s).
> ./f951 -quiet -Og t2.f90 -ftime-report
Execution times (seconds)
df reaching defs : 62.51 (54%) usr 104.47 (85%) sys 167.73 (70%) wall
0 kB ( 0%) ggc
bah. I didn't constrain RTL optimizers with -Og (seems to be RTL loop
invariant motion here - after all at -Og no gimple loop opts run and
it _can_ move quite some invariants).
Disabling loop2 for -Og improves this to 50s
Execution times (seconds)
alias stmt walking : 10.11 (20%) usr 0.10 (14%) sys 10.29 (20%) wall
338 kB ( 0%) ggc
parser (global) : 2.34 ( 5%) usr 0.06 ( 8%) sys 2.39 ( 5%) wall
70763 kB (11%) ggc
tree PTA : 6.12 (12%) usr 0.04 ( 6%) sys 6.16 (12%) wall
2735 kB ( 0%) ggc
loop init : 5.78 (11%) usr 0.01 ( 1%) sys 5.78 (11%) wall
24078 kB ( 4%) ggc
integrated RA : 2.68 ( 5%) usr 0.04 ( 6%) sys 2.73 ( 5%) wall
64596 kB (10%) ggc
TOTAL : 50.76 0.71 51.71
646243 kB
I'll push a patch to do that (the above shows PTA which also can be
disabled, and alias stmt walking could be limited some more).
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (43 preceding siblings ...)
2014-01-17 12:09 ` rguenth at gcc dot gnu.org
@ 2014-01-17 14:49 ` rguenth at gcc dot gnu.org
2014-10-21 9:23 ` rguenth at gcc dot gnu.org
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-17 14:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
--- Comment #44 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Fri Jan 17 14:49:18 2014
New Revision: 206714
URL: http://gcc.gnu.org/viewcvs?rev=206714&root=gcc&view=rev
Log:
2014-01-17 Richard Biener <rguenther@suse.de>
PR tree-optimization/46590
* opts.c (default_options_table): Add entries for
OPT_fbranch_count_reg, OPT_fmove_loop_invariants and OPT_ftree_pta,
all enabled at -O1 but not for -Og.
* common.opt (fbranch-count-reg): Remove Init(1).
(fmove-loop-invariants): Likewise.
(ftree-pta): Likewise.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/common.opt
trunk/gcc/opts.c
^ permalink raw reply [flat|nested] 47+ messages in thread
* [Bug tree-optimization/46590] long compile time with -O2 and many loops
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
` (44 preceding siblings ...)
2014-01-17 14:49 ` rguenth at gcc dot gnu.org
@ 2014-10-21 9:23 ` rguenth at gcc dot gnu.org
45 siblings, 0 replies; 47+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-10-21 9:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |law at gcc dot gnu.org
--- Comment #45 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, now DOM and incremental SSA are the new main offenders (regression from
4.8).
This is maybe the same as PR61515. Compile-time is up to 9 minutes (from two).
Jeff - any progress on PR61515? This is really a major regression for -O1
compile-time. Can we at least disable the offending code at
!flag_expensive_optimizations on trunk and the 4.9 branch? And fix it properly
for GCC 5?
^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2014-10-21 9:22 UTC | newest]
Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-21 13:23 [Bug tree-optimization/46590] New: long compile time with -O2 and many loops tkoenig at gcc dot gnu.org
2010-11-21 13:25 ` [Bug tree-optimization/46590] " tkoenig at gcc dot gnu.org
2010-11-21 17:19 ` dominiq at lps dot ens.fr
2010-11-21 18:55 ` tkoenig at gcc dot gnu.org
2010-11-21 19:13 ` rguenth at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 19:49 ` tkoenig at gcc dot gnu.org
2010-11-21 21:13 ` tkoenig at gcc dot gnu.org
2010-11-21 21:46 ` [Bug tree-optimization/46590] [4.6 Regression] " tkoenig at gcc dot gnu.org
2010-11-21 23:14 ` tkoenig at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
2010-11-22 13:30 ` [Bug tree-optimization/46590] [4.6 " rguenth at gcc dot gnu.org
2010-11-22 16:19 ` [Bug tree-optimization/46590] [4.5/4.6 " rguenth at gcc dot gnu.org
2010-11-24 12:40 ` rguenth at gcc dot gnu.org
2010-11-24 14:19 ` rguenth at gcc dot gnu.org
2011-01-03 20:25 ` rguenth at gcc dot gnu.org
2011-03-25 19:55 ` [Bug tree-optimization/46590] [4.5/4.6/4.7 " jakub at gcc dot gnu.org
2011-06-27 16:23 ` jakub at gcc dot gnu.org
2011-10-26 17:37 ` jakub at gcc dot gnu.org
2012-01-05 1:48 ` pinskia at gcc dot gnu.org
2012-01-19 15:20 ` matz at gcc dot gnu.org
2012-01-19 15:26 ` matz at gcc dot gnu.org
2012-01-26 16:23 ` matz at gcc dot gnu.org
2012-03-01 14:45 ` jakub at gcc dot gnu.org
2012-08-21 8:10 ` [Bug tree-optimization/46590] [4.6/4.7/4.8 " rguenth at gcc dot gnu.org
2012-08-21 9:44 ` steven at gcc dot gnu.org
2012-08-21 10:00 ` rguenther at suse dot de
2012-08-21 14:11 ` rguenth at gcc dot gnu.org
2012-08-21 14:57 ` rguenth at gcc dot gnu.org
2012-08-22 11:43 ` rguenth at gcc dot gnu.org
2012-08-22 13:21 ` rguenth at gcc dot gnu.org
2012-08-22 17:34 ` stevenb.gcc at gmail dot com
2012-08-23 7:14 ` rguenther at suse dot de
2012-08-23 11:43 ` rguenth at gcc dot gnu.org
2012-08-23 13:46 ` steven at gcc dot gnu.org
2012-09-03 10:46 ` rguenth at gcc dot gnu.org
2012-09-03 15:41 ` matz at gcc dot gnu.org
2012-09-04 13:18 ` rguenth at gcc dot gnu.org
2012-09-05 11:01 ` rguenth at gcc dot gnu.org
2012-09-05 13:30 ` rguenth at gcc dot gnu.org
2012-12-03 15:52 ` [Bug tree-optimization/46590] " rguenth at gcc dot gnu.org
2014-01-16 13:49 ` rguenth at gcc dot gnu.org
2014-01-16 15:51 ` rguenth at gcc dot gnu.org
2014-01-17 11:37 ` rguenth at gcc dot gnu.org
2014-01-17 12:09 ` rguenth at gcc dot gnu.org
2014-01-17 14:49 ` rguenth at gcc dot gnu.org
2014-10-21 9:23 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).