* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
@ 2011-01-17 11:59 ` Joost.VandeVondele at pci dot uzh.ch
2011-01-21 10:31 ` jakub at gcc dot gnu.org
` (16 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-01-17 11:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2010-08-29 09:25:52 |2011-01-17 9:25:52
--- Comment #27 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-01-17 11:38:36 UTC ---
timings with current trunk (release checking).
out.4_3
TOTAL : 34.62 0.43 35.27 837034 kB
out.4_5
TOTAL : 45.30 0.70 46.02 897447 kB
out.trunk
TOTAL : 165.89 0.99 166.97 1743679 kB
so time is up by 5x memory 2x relative to 4.3.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
2011-01-17 11:59 ` [Bug middle-end/45422] [4.6 Regression] compile time increases 3x Joost.VandeVondele at pci dot uzh.ch
@ 2011-01-21 10:31 ` jakub at gcc dot gnu.org
2011-01-21 16:46 ` xinliangli at gmail dot com
` (15 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-21 10:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #28 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-21 09:50:25 UTC ---
David, any progress with this?
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
2011-01-17 11:59 ` [Bug middle-end/45422] [4.6 Regression] compile time increases 3x Joost.VandeVondele at pci dot uzh.ch
2011-01-21 10:31 ` jakub at gcc dot gnu.org
@ 2011-01-21 16:46 ` xinliangli at gmail dot com
2011-01-21 20:08 ` xinliangli at gmail dot com
` (14 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: xinliangli at gmail dot com @ 2011-01-21 16:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #29 from davidxl <xinliangli at gmail dot com> 2011-01-21 16:27:43 UTC ---
(In reply to comment #28)
> David, any progress with this?
The cost function fix to make sure solution set does not become too big will be
probably very involved and won't be availlable in 4.6 time frame. I will get a
workaround using Richard's suggestion -- terminate the iterating loop when slow
convergence is detected and some limit is reached.
David
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2011-01-21 16:46 ` xinliangli at gmail dot com
@ 2011-01-21 20:08 ` xinliangli at gmail dot com
2011-01-21 21:01 ` xinliangli at gmail dot com
` (13 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: xinliangli at gmail dot com @ 2011-01-21 20:08 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #30 from davidxl <xinliangli at gmail dot com> 2011-01-21 19:58:41 UTC ---
(In reply to comment #29)
> (In reply to comment #28)
> > David, any progress with this?
>
> The cost function fix to make sure solution set does not become too big will be
> probably very involved and won't be availlable in 4.6 time frame. I will get a
> workaround using Richard's suggestion -- terminate the iterating loop when slow
> convergence is detected and some limit is reached.
>
> David
Two observations:
1) I can not reproduce the timing by Joost -- see below. Can someone else
measure the time independently?
2) Limiting the iteration count of ivopt improvement loop does not help that
much: from unlimited (can be ~40 in this case) to max iteration of 5 only
cutdown total compile time by 2s.
The following is the timing of the trunk compiler. Options:
-O2 -ftime-report -cpp -fbounds-check -g -O3 -ffast-math -funroll-loops
-ftree-vectorize -march=native -ffree-form
parser : 0.67 ( 1%) usr 0.09 ( 6%) sys 0.77 ( 1%) wall
53556 kB ( 5%) ggc
inline heuristics : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.35 ( 1%) usr 0.03 ( 2%) sys 0.38 ( 1%) wall
48426 kB ( 4%) ggc
tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
11978 kB ( 1%) ggc
tree CFG cleanup : 0.68 ( 1%) usr 0.02 ( 1%) sys 0.64 ( 1%) wall
2484 kB ( 0%) ggc
tree VRP : 0.83 ( 1%) usr 0.02 ( 1%) sys 1.28 ( 2%) wall
64371 kB ( 6%) ggc
tree copy propagation : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
1267 kB ( 0%) ggc
tree find ref. vars : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
3806 kB ( 0%) ggc
tree PTA : 0.82 ( 1%) usr 0.00 ( 0%) sys 0.80 ( 1%) wall
5497 kB ( 0%) ggc
tree PHI insertion : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
3194 kB ( 0%) ggc
tree SSA rewrite : 0.23 ( 0%) usr 0.01 ( 1%) sys 0.21 ( 0%) wall
14021 kB ( 1%) ggc
tree SSA other : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall
435 kB ( 0%) ggc
tree SSA incremental : 0.65 ( 1%) usr 0.02 ( 1%) sys 0.65 ( 1%) wall
6735 kB ( 1%) ggc
tree operand scan : 0.37 ( 1%) usr 0.14 ( 9%) sys 0.53 ( 1%) wall
47156 kB ( 4%) ggc
dominator optimization: 0.38 ( 1%) usr 0.02 ( 1%) sys 0.50 ( 1%) wall
6948 kB ( 1%) ggc
tree SRA : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree CCP : 0.93 ( 1%) usr 0.01 ( 1%) sys 1.02 ( 2%) wall
4975 kB ( 0%) ggc
tree PHI const/copy prop: 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
124 kB ( 0%) ggc
tree split crit edges : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1743 kB ( 0%) ggc
tree reassociation : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall
5095 kB ( 0%) ggc
tree PRE : 0.64 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall
9790 kB ( 1%) ggc
tree FRE : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall
5410 kB ( 0%) ggc
tree code sinking : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
956 kB ( 0%) ggc
tree linearize phis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.17 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
11005 kB ( 1%) ggc
tree phiprop : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.04 ( 0%) usr 0.02 ( 1%) sys 0.06 ( 0%) wall
944 kB ( 0%) ggc
tree aggressive DCE : 0.31 ( 0%) usr 0.03 ( 2%) sys 0.40 ( 1%) wall
15336 kB ( 1%) ggc
tree DSE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
225 kB ( 0%) ggc
tree loop bounds : 0.16 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
6744 kB ( 1%) ggc
tree loop invariant motion: 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%)
wall 485 kB ( 0%) ggc
tree canonical iv : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
3128 kB ( 0%) ggc
scev constant prop : 0.04 ( 0%) usr 0.01 ( 1%) sys 0.03 ( 0%) wall
1924 kB ( 0%) ggc
complete unrolling : 0.79 ( 1%) usr 0.05 ( 3%) sys 0.85 ( 1%) wall
91364 kB ( 8%) ggc
tree vectorization : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.37 ( 1%) wall
25117 kB ( 2%) ggc
tree slp vectorization: 0.41 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 1%) wall
19256 kB ( 2%) ggc
tree loop distribution: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
850 kB ( 0%) ggc
tree iv optimization : 11.14 (18%) usr 0.33 (22%) sys 12.24 (18%) wall
141300 kB (12%) ggc
predictive commoning : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
2696 kB ( 0%) ggc
tree loop init : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1220 kB ( 0%) ggc
tree loop fini : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree copy headers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1652 kB ( 0%) ggc
tree SSA uncprop : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree rename SSA copies: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall
0 kB ( 0%) ggc
control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
out of ssa : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
130 kB ( 0%) ggc
expand vars : 0.09 ( 0%) usr 0.01 ( 1%) sys 0.10 ( 0%) wall
9013 kB ( 1%) ggc
expand : 0.40 ( 1%) usr 0.01 ( 1%) sys 0.52 ( 1%) wall
57975 kB ( 5%) ggc
post expand cleanups : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
3355 kB ( 0%) ggc
lower subreg : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
11129 kB ( 1%) ggc
CSE : 0.97 ( 2%) usr 0.01 ( 1%) sys 0.99 ( 1%) wall
207 kB ( 0%) ggc
dead code elimination : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.32 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 0.49 ( 1%) usr 0.00 ( 0%) sys 0.45 ( 1%) wall
11519 kB ( 1%) ggc
dead store elim2 : 0.41 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall
13060 kB ( 1%) ggc
loop analysis : 0.03 ( 0%) usr 0.01 ( 1%) sys 0.02 ( 0%) wall
1626 kB ( 0%) ggc
loop invariant motion : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
505 kB ( 0%) ggc
loop unswitching : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
loop unrolling : 1.59 ( 3%) usr 0.02 ( 1%) sys 1.64 ( 2%) wall
102158 kB ( 9%) ggc
CPROP : 0.69 ( 1%) usr 0.02 ( 1%) sys 0.77 ( 1%) wall
13208 kB ( 1%) ggc
PRE : 0.58 ( 1%) usr 0.00 ( 0%) sys 0.58 ( 1%) wall
1030 kB ( 0%) ggc
web : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
2961 kB ( 0%) ggc
CSE 2 : 0.87 ( 1%) usr 0.01 ( 1%) sys 1.08 ( 2%) wall
1246 kB ( 0%) ggc
branch prediction : 0.12 ( 0%) usr 0.02 ( 1%) sys 0.14 ( 0%) wall
6859 kB ( 1%) ggc
combiner : 1.75 ( 3%) usr 0.03 ( 2%) sys 1.70 ( 3%) wall
39971 kB ( 3%) ggc
if-conversion : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
1398 kB ( 0%) ggc
regmove : 0.26 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
0 kB ( 0%) ggc
integrated RA : 3.24 ( 5%) usr 0.01 ( 1%) sys 3.39 ( 5%) wall
24873 kB ( 2%) ggc
reload : 1.72 ( 3%) usr 0.00 ( 0%) sys 1.72 ( 3%) wall
8401 kB ( 1%) ggc
reload CSE regs : 1.93 ( 3%) usr 0.00 ( 0%) sys 1.75 ( 3%) wall
19943 kB ( 2%) ggc
load CSE after reload : 0.14 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
487 kB ( 0%) ggc
zee : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
31 kB ( 0%) ggc
thread pro- & epilogue: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
3614 kB ( 0%) ggc
combine stack adjustments: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
1907 kB ( 0%) ggc
rename registers : 0.47 ( 1%) usr 0.00 ( 0%) sys 0.49 ( 1%) wall
2169 kB ( 0%) ggc
hard reg cprop : 0.45 ( 1%) usr 0.00 ( 0%) sys 0.45 ( 1%) wall
22 kB ( 0%) ggc
scheduling 2 : 4.47 ( 7%) usr 0.01 ( 1%) sys 4.35 ( 6%) wall
1114 kB ( 0%) ggc
machine dep reorg : 0.35 ( 1%) usr 0.00 ( 0%) sys 0.39 ( 1%) wall
22 kB ( 0%) ggc
reorder blocks : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
3129 kB ( 0%) ggc
final : 0.82 ( 1%) usr 0.03 ( 2%) sys 0.83 ( 1%) wall
8473 kB ( 1%) ggc
symout : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
53120 kB ( 5%) ggc
variable tracking : 1.34 ( 2%) usr 0.01 ( 1%) sys 1.42 ( 2%) wall
37182 kB ( 3%) ggc
var-tracking dataflow : 2.11 ( 3%) usr 0.00 ( 0%) sys 2.28 ( 3%) wall
0 kB ( 0%) ggc
var-tracking emit : 2.01 ( 3%) usr 0.00 ( 0%) sys 1.89 ( 3%) wall
18854 kB ( 2%) ggc
rest of compilation : 3.44 ( 5%) usr 0.33 (22%) sys 3.46 ( 5%) wall
8050 kB ( 1%) ggc
remove unused locals : 0.47 ( 1%) usr 0.01 ( 1%) sys 0.62 ( 1%) wall
0 kB ( 0%) ggc
address taken : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
unaccounted todo : 0.98 ( 2%) usr 0.08 ( 5%) sys 1.27 ( 2%) wall
8 kB ( 0%) ggc
repair loop structures: 0.07 ( 0%) usr 0.01 ( 1%) sys 0.07 ( 0%) wall
4127 kB ( 0%) ggc
TOTAL : 63.40 1.53 67.47
1152381 kB
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2011-01-21 20:08 ` xinliangli at gmail dot com
@ 2011-01-21 21:01 ` xinliangli at gmail dot com
2011-01-25 9:47 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: xinliangli at gmail dot com @ 2011-01-21 21:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #31 from davidxl <xinliangli at gmail dot com> 2011-01-21 20:08:11 UTC ---
Comparing this timing with 4.6 results (164s), looks like many other passes
become slower other than ivopt (e.g IRA increases from 3.5s to 11s etc -- ivopt
only account for a small part of the 110s increase.
David
(In reply to comment #18)
> FYI, these are the 4.5 branch timings:
>
> Execution times (seconds)
> garbage collection : 0.47 ( 1%) usr 0.00 ( 0%) sys 0.47 ( 1%) wall
> 0 kB ( 0%) ggc
> callgraph construction: 0.05 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall
> 5996 kB ( 1%) ggc
> callgraph optimization: 0.21 ( 0%) usr 0.02 ( 1%) sys 0.26 ( 0%) wall
> 606 kB ( 0%) ggc
> ipa cp : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
> 1381 kB ( 0%) ggc
> ipa reference : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
> 0 kB ( 0%) ggc
> ipa pure const : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall
> 0 kB ( 0%) ggc
> cfg cleanup : 0.39 ( 1%) usr 0.00 ( 0%) sys 0.51 ( 1%) wall
> 2459 kB ( 0%) ggc
> trivially dead code : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.30 ( 1%) wall
> 0 kB ( 0%) ggc
> df multiple defs : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
> 0 kB ( 0%) ggc
> df reaching defs : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 1%) wall
> 0 kB ( 0%) ggc
> df live regs : 2.08 ( 4%) usr 0.01 ( 1%) sys 2.19 ( 4%) wall
> 0 kB ( 0%) ggc
> df live&initialized regs: 0.98 ( 2%) usr 0.00 ( 0%) sys 0.92 ( 2%) wall
> 0 kB ( 0%) ggc
> df use-def / def-use chains: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%)
> wall 0 kB ( 0%) ggc
> df reg dead/unused notes: 0.93 ( 2%) usr 0.00 ( 0%) sys 1.04 ( 2%) wall
> 5756 kB ( 1%) ggc
> register information : 0.51 ( 1%) usr 0.01 ( 1%) sys 0.39 ( 1%) wall
> 0 kB ( 0%) ggc
> alias analysis : 0.78 ( 1%) usr 0.01 ( 1%) sys 0.91 ( 2%) wall
> 22384 kB ( 3%) ggc
> alias stmt walking : 0.50 ( 1%) usr 0.03 ( 2%) sys 0.38 ( 1%) wall
> 5563 kB ( 1%) ggc
> register scan : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
> 0 kB ( 0%) ggc
> rebuild jump labels : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
> 0 kB ( 0%) ggc
> parser : 0.82 ( 2%) usr 0.13 ( 9%) sys 0.94 ( 2%) wall
> 55603 kB ( 6%) ggc
> inline heuristics : 0.20 ( 0%) usr 0.01 ( 1%) sys 0.16 ( 0%) wall
> 0 kB ( 0%) ggc
> tree gimplify : 0.38 ( 1%) usr 0.03 ( 2%) sys 0.40 ( 1%) wall
> 46588 kB ( 5%) ggc
> tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 0 kB ( 0%) ggc
> tree CFG construction : 0.04 ( 0%) usr 0.02 ( 1%) sys 0.05 ( 0%) wall
> 11964 kB ( 1%) ggc
> tree CFG cleanup : 0.47 ( 1%) usr 0.00 ( 0%) sys 0.79 ( 1%) wall
> 1829 kB ( 0%) ggc
> tree VRP : 1.46 ( 3%) usr 0.05 ( 4%) sys 1.27 ( 2%) wall
> 56376 kB ( 6%) ggc
> tree copy propagation : 0.09 ( 0%) usr 0.02 ( 1%) sys 0.22 ( 0%) wall
> 746 kB ( 0%) ggc
> tree find ref. vars : 0.09 ( 0%) usr 0.01 ( 1%) sys 0.07 ( 0%) wall
> 3806 kB ( 0%) ggc
> tree PTA : 0.30 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall
> 3836 kB ( 0%) ggc
> tree PHI insertion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
> 3194 kB ( 0%) ggc
> tree SSA rewrite : 0.24 ( 0%) usr 0.01 ( 1%) sys 0.29 ( 1%) wall
> 13860 kB ( 2%) ggc
> tree SSA other : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.11 ( 0%) wall
> 418 kB ( 0%) ggc
> tree SSA incremental : 0.89 ( 2%) usr 0.06 ( 4%) sys 0.97 ( 2%) wall
> 6811 kB ( 1%) ggc
> tree operand scan : 0.34 ( 1%) usr 0.23 (17%) sys 0.59 ( 1%) wall
> 44776 kB ( 5%) ggc
> dominator optimization: 0.29 ( 1%) usr 0.01 ( 1%) sys 0.35 ( 1%) wall
> 5152 kB ( 1%) ggc
> tree CCP : 0.51 ( 1%) usr 0.02 ( 1%) sys 0.43 ( 1%) wall
> 4620 kB ( 1%) ggc
> tree PHI const/copy prop: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 106 kB ( 0%) ggc
> tree split crit edges : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
> 2019 kB ( 0%) ggc
> tree reassociation : 0.12 ( 0%) usr 0.01 ( 1%) sys 0.12 ( 0%) wall
> 2946 kB ( 0%) ggc
> tree PRE : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.95 ( 2%) wall
> 7315 kB ( 1%) ggc
> tree FRE : 0.45 ( 1%) usr 0.04 ( 3%) sys 0.35 ( 1%) wall
> 5518 kB ( 1%) ggc
> tree code sinking : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 1400 kB ( 0%) ggc
> tree linearize phis : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall
> 0 kB ( 0%) ggc
> tree forward propagate: 0.18 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall
> 10006 kB ( 1%) ggc
> tree conservative DCE : 0.05 ( 0%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall
> 576 kB ( 0%) ggc
> tree aggressive DCE : 0.28 ( 1%) usr 0.01 ( 1%) sys 0.37 ( 1%) wall
> 8853 kB ( 1%) ggc
> tree buildin call DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
> 0 kB ( 0%) ggc
> tree DSE : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
> 132 kB ( 0%) ggc
> PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
> 37 kB ( 0%) ggc
> tree loop bounds : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
> 8266 kB ( 1%) ggc
> tree loop invariant motion: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%)
> wall 67 kB ( 0%) ggc
> tree canonical iv : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
> 4779 kB ( 1%) ggc
> scev constant prop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
> 2345 kB ( 0%) ggc
> tree loop unswitching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 573 kB ( 0%) ggc
> complete unrolling : 1.05 ( 2%) usr 0.11 ( 8%) sys 1.39 ( 3%) wall
> 98553 kB (11%) ggc
> tree vectorization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 883 kB ( 0%) ggc
> tree slp vectorization: 0.61 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 1%) wall
> 53236 kB ( 6%) ggc
> tree iv optimization : 5.80 (11%) usr 0.06 ( 4%) sys 5.94 (11%) wall
> 95356 kB (11%) ggc
> predictive commoning : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 1054 kB ( 0%) ggc
> tree loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 1339 kB ( 0%) ggc
> tree copy headers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 1613 kB ( 0%) ggc
> tree SSA uncprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
> 0 kB ( 0%) ggc
> tree rename SSA copies: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 0 kB ( 0%) ggc
> dominance frontiers : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
> 0 kB ( 0%) ggc
> dominance computation : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
> 0 kB ( 0%) ggc
> expand : 3.24 ( 6%) usr 0.07 ( 5%) sys 3.34 ( 6%) wall
> 69633 kB ( 8%) ggc
> lower subreg : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
> 0 kB ( 0%) ggc
> forward prop : 0.48 ( 1%) usr 0.01 ( 1%) sys 0.48 ( 1%) wall
> 9984 kB ( 1%) ggc
> CSE : 0.73 ( 1%) usr 0.00 ( 0%) sys 0.92 ( 2%) wall
> 248 kB ( 0%) ggc
> dead code elimination : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 1%) wall
> 0 kB ( 0%) ggc
> dead store elim1 : 0.33 ( 1%) usr 0.01 ( 1%) sys 0.32 ( 1%) wall
> 5987 kB ( 1%) ggc
> dead store elim2 : 0.44 ( 1%) usr 0.02 ( 1%) sys 0.39 ( 1%) wall
> 7831 kB ( 1%) ggc
> loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 718 kB ( 0%) ggc
> loop invariant motion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
> 305 kB ( 0%) ggc
> loop unswitching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 0 kB ( 0%) ggc
> loop unrolling : 0.65 ( 1%) usr 0.00 ( 0%) sys 0.62 ( 1%) wall
> 32780 kB ( 4%) ggc
> CPROP : 0.70 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 1%) wall
> 7825 kB ( 1%) ggc
> PRE : 0.32 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall
> 719 kB ( 0%) ggc
> web : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
> 594 kB ( 0%) ggc
> CSE 2 : 0.75 ( 1%) usr 0.01 ( 1%) sys 0.60 ( 1%) wall
> 470 kB ( 0%) ggc
> branch prediction : 0.19 ( 0%) usr 0.01 ( 1%) sys 0.14 ( 0%) wall
> 7344 kB ( 1%) ggc
> combiner : 1.19 ( 2%) usr 0.01 ( 1%) sys 1.33 ( 2%) wall
> 19980 kB ( 2%) ggc
> if-conversion : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
> 746 kB ( 0%) ggc
> regmove : 0.37 ( 1%) usr 0.01 ( 1%) sys 0.33 ( 1%) wall
> 0 kB ( 0%) ggc
> integrated RA : 3.51 ( 7%) usr 0.01 ( 1%) sys 3.74 ( 7%) wall
> 12746 kB ( 1%) ggc
> reload : 2.16 ( 4%) usr 0.02 ( 1%) sys 2.01 ( 4%) wall
> 7755 kB ( 1%) ggc
> reload CSE regs : 1.38 ( 3%) usr 0.00 ( 0%) sys 1.26 ( 2%) wall
> 12331 kB ( 1%) ggc
> load CSE after reload : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
> 162 kB ( 0%) ggc
> thread pro- & epilogue: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
> 4370 kB ( 0%) ggc
> if-conversion 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 357 kB ( 0%) ggc
> combine stack adjustments: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
> 0 kB ( 0%) ggc
> peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
> 1899 kB ( 0%) ggc
> rename registers : 0.46 ( 1%) usr 0.00 ( 0%) sys 0.55 ( 1%) wall
> 2237 kB ( 0%) ggc
> hard reg cprop : 0.37 ( 1%) usr 0.00 ( 0%) sys 0.48 ( 1%) wall
> 13 kB ( 0%) ggc
> scheduling 2 : 3.30 ( 6%) usr 0.04 ( 3%) sys 3.10 ( 6%) wall
> 1216 kB ( 0%) ggc
> machine dep reorg : 0.38 ( 1%) usr 0.00 ( 0%) sys 0.36 ( 1%) wall
> 11 kB ( 0%) ggc
> reorder blocks : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
> 1283 kB ( 0%) ggc
> final : 0.93 ( 2%) usr 0.07 ( 5%) sys 0.84 ( 2%) wall
> 6610 kB ( 1%) ggc
> symout : 0.30 ( 1%) usr 0.03 ( 2%) sys 0.34 ( 1%) wall
> 27006 kB ( 3%) ggc
> variable tracking : 3.86 ( 7%) usr 0.03 ( 2%) sys 3.99 ( 7%) wall
> 39804 kB ( 4%) ggc
> plugin execution : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.05 ( 0%) wall
> 0 kB ( 0%) ggc
> rest of compilation : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall
> 0 kB ( 0%) ggc
> TOTAL : 52.50 1.37 53.88
> 893901 kB
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2011-01-21 21:01 ` xinliangli at gmail dot com
@ 2011-01-25 9:47 ` jakub at gcc dot gnu.org
2011-01-25 9:51 ` Joost.VandeVondele at pci dot uzh.ch
` (11 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-25 9:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #32 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-25 09:02:57 UTC ---
IMHO for P1 purposes we should just look at compile time regressions from 4.5
here at this point. On the #c1 testcase I get with --enable-checking=release
current trunk and current 4.5 branch on x86_64-linux:
4.6 x86_64 -m64 -O3 -fbounds-check -ftime-report
df live regs : 1.87 ( 3%) usr 0.02 ( 1%) sys 1.66 ( 3%) wall
0 kB ( 0%) ggc
parser : 1.04 ( 2%) usr 0.20 ( 9%) sys 1.24 ( 2%) wall
53425 kB ( 6%) ggc
tree VRP : 1.82 ( 3%) usr 0.09 ( 4%) sys 2.02 ( 3%) wall
63870 kB ( 8%) ggc
tree PTA : 1.02 ( 2%) usr 0.01 ( 0%) sys 0.98 ( 2%) wall
5498 kB ( 1%) ggc
tree SSA incremental : 1.23 ( 2%) usr 0.12 ( 6%) sys 1.11 ( 2%) wall
6733 kB ( 1%) ggc
tree CCP : 1.33 ( 2%) usr 0.03 ( 1%) sys 1.33 ( 2%) wall
4989 kB ( 1%) ggc
complete unrolling : 1.07 ( 2%) usr 0.16 ( 8%) sys 1.28 ( 2%) wall
88755 kB (11%) ggc
tree iv optimization : 10.99 (19%) usr 0.09 ( 4%) sys 11.09 (19%) wall
138994 kB (16%) ggc
CSE : 1.28 ( 2%) usr 0.01 ( 0%) sys 1.28 ( 2%) wall
229 kB ( 0%) ggc
combiner : 2.00 ( 3%) usr 0.00 ( 0%) sys 1.95 ( 3%) wall
31554 kB ( 4%) ggc
integrated RA : 3.68 ( 6%) usr 0.01 ( 0%) sys 3.78 ( 6%) wall
19906 kB ( 2%) ggc
reload : 2.04 ( 4%) usr 0.00 ( 0%) sys 2.18 ( 4%) wall
7106 kB ( 1%) ggc
reload CSE regs : 2.04 ( 4%) usr 0.02 ( 1%) sys 2.01 ( 3%) wall
12188 kB ( 1%) ggc
scheduling 2 : 2.55 ( 4%) usr 0.01 ( 0%) sys 2.61 ( 4%) wall
895 kB ( 0%) ggc
TOTAL : 57.47 2.11 59.60
845009 kB
4.5 x86_64 -m64 -O3 -fbounds-check -ftime-report
df live regs : 1.58 ( 4%) usr 0.00 ( 0%) sys 1.39 ( 3%) wall
0 kB ( 0%) ggc
parser : 1.02 ( 2%) usr 0.18 ( 9%) sys 1.21 ( 3%) wall
55472 kB ( 7%) ggc
tree VRP : 1.39 ( 3%) usr 0.13 ( 6%) sys 1.73 ( 4%) wall
56478 kB ( 8%) ggc
tree PRE : 1.03 ( 2%) usr 0.04 ( 2%) sys 1.24 ( 3%) wall
7286 kB ( 1%) ggc
complete unrolling : 1.32 ( 3%) usr 0.21 (10%) sys 1.55 ( 3%) wall
91137 kB (12%) ggc
tree iv optimization : 5.45 (12%) usr 0.09 ( 4%) sys 5.43 (12%) wall
95576 kB (13%) ggc
expand : 2.62 ( 6%) usr 0.16 ( 8%) sys 2.76 ( 6%) wall
58104 kB ( 8%) ggc
CSE : 1.18 ( 3%) usr 0.01 ( 0%) sys 0.94 ( 2%) wall
261 kB ( 0%) ggc
combiner : 1.53 ( 3%) usr 0.00 ( 0%) sys 1.48 ( 3%) wall
19953 kB ( 3%) ggc
integrated RA : 3.21 ( 7%) usr 0.00 ( 0%) sys 3.55 ( 8%) wall
11410 kB ( 2%) ggc
reload : 2.13 ( 5%) usr 0.04 ( 2%) sys 2.00 ( 4%) wall
7273 kB ( 1%) ggc
reload CSE regs : 1.67 ( 4%) usr 0.01 ( 0%) sys 1.55 ( 3%) wall
10032 kB ( 1%) ggc
scheduling 2 : 2.65 ( 6%) usr 0.02 ( 1%) sys 2.66 ( 6%) wall
1063 kB ( 0%) ggc
TOTAL : 44.55 2.05 46.62
747832 kB
4.6 x86_64 -m32 -O3 -fbounds-check -ftime-report
df live regs : 1.24 ( 2%) usr 0.02 ( 1%) sys 1.05 ( 2%) wall
0 kB ( 0%) ggc
parser : 1.05 ( 2%) usr 0.18 ( 9%) sys 1.23 ( 2%) wall
53861 kB ( 7%) ggc
tree VRP : 1.48 ( 3%) usr 0.05 ( 2%) sys 1.78 ( 3%) wall
52970 kB ( 7%) ggc
tree iv optimization : 9.92 (19%) usr 0.15 ( 7%) sys 9.98 (18%) wall
125735 kB (17%) ggc
CSE : 1.46 ( 3%) usr 0.00 ( 0%) sys 1.42 ( 3%) wall
329 kB ( 0%) ggc
combiner : 1.41 ( 3%) usr 0.01 ( 0%) sys 1.35 ( 2%) wall
20981 kB ( 3%) ggc
integrated RA : 2.89 ( 6%) usr 0.00 ( 0%) sys 2.83 ( 5%) wall
14083 kB ( 2%) ggc
reload : 2.59 ( 5%) usr 0.02 ( 1%) sys 2.58 ( 5%) wall
18918 kB ( 3%) ggc
reload CSE regs : 2.62 ( 5%) usr 0.00 ( 0%) sys 2.91 ( 5%) wall
13557 kB ( 2%) ggc
scheduling 2 : 2.49 ( 5%) usr 0.01 ( 0%) sys 2.45 ( 5%) wall
953 kB ( 0%) ggc
TOTAL : 52.36 2.02 54.39
744417 kB
4.5 x86_64 -m32 -O3 -fbounds-check -ftime-report
df live regs : 1.41 ( 3%) usr 0.02 ( 1%) sys 1.43 ( 3%) wall
0 kB ( 0%) ggc
parser : 1.02 ( 2%) usr 0.18 ( 9%) sys 1.19 ( 2%) wall
55913 kB ( 8%) ggc
tree VRP : 1.44 ( 3%) usr 0.14 ( 7%) sys 1.39 ( 3%) wall
54451 kB ( 8%) ggc
tree iv optimization : 7.76 (17%) usr 0.11 ( 5%) sys 8.02 (17%) wall
107362 kB (15%) ggc
expand : 2.66 ( 6%) usr 0.08 ( 4%) sys 2.73 ( 6%) wall
56088 kB ( 8%) ggc
CSE : 1.41 ( 3%) usr 0.00 ( 0%) sys 1.31 ( 3%) wall
480 kB ( 0%) ggc
integrated RA : 2.88 ( 6%) usr 0.00 ( 0%) sys 2.78 ( 6%) wall
9890 kB ( 1%) ggc
reload : 2.71 ( 6%) usr 0.05 ( 2%) sys 2.68 ( 6%) wall
20135 kB ( 3%) ggc
reload CSE regs : 1.98 ( 4%) usr 0.00 ( 0%) sys 2.00 ( 4%) wall
13166 kB ( 2%) ggc
scheduling 2 : 2.67 ( 6%) usr 0.04 ( 2%) sys 2.77 ( 6%) wall
840 kB ( 0%) ggc
TOTAL : 46.38 2.08 48.48
708175 kB
(listing only lines with >= 1sec times). For x86_64 -m32 it doesn't seem to be
a big deal and even the 4.6 numbers are nowhere the claimed 3x increase, it is
a 30% slowdown and only half of the slowdown can be actually attributed to
ivopts. On the #c5 testcase ivopts still takes > 50% of the reported time
though. To me this sounds P2ish, but I'll let Richard chime in...
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2011-01-25 9:47 ` jakub at gcc dot gnu.org
@ 2011-01-25 9:51 ` Joost.VandeVondele at pci dot uzh.ch
2011-01-25 10:03 ` jakub at gcc dot gnu.org
` (10 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-01-25 9:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #33 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-01-25 09:47:10 UTC ---
I just note that the timings reported by David and Jakub are not for the
compile options I originally reported.
With 4.6 (20110117) I now have
gfortran -c -ftime-report -cpp -fbounds-check -g -O3 -ffast-math -funroll-loops
-ftree-vectorize -march=native -ffree-form PR45422.F90
TOTAL : 102.15
while with the options used by David / Jakub I have timings similar to theirs.
gfortran -O3 -fbounds-check -ftime-report -c PR45422.F90
TOTAL : 42.87
With 4.5 timings remain ~44s
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2011-01-25 9:51 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-01-25 10:03 ` jakub at gcc dot gnu.org
2011-01-25 10:25 ` Joost.VandeVondele at pci dot uzh.ch
` (9 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-25 10:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #34 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-25 09:52:23 UTC ---
-march=native is ambiguous, please see with -v what actually is being used.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2011-01-25 10:03 ` jakub at gcc dot gnu.org
@ 2011-01-25 10:25 ` Joost.VandeVondele at pci dot uzh.ch
2011-01-25 17:58 ` xinliangli at gmail dot com
` (8 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: Joost.VandeVondele at pci dot uzh.ch @ 2011-01-25 10:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #35 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-01-25 10:03:02 UTC ---
(In reply to comment #34)
> -march=native is ambiguous, please see with -v what actually is being used.
This was mentioned in the initial comment:
-march=k8-sse3 -mcx16 -msahf
--param l1-cache-size=64 --param l1-cache-line-size=64 --param
l2-cache-size=1024 -mtune=k8
The latest timings are on a newer machine (old one is gone now) which has:
-march=amdfam10 -mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2011-01-25 10:25 ` Joost.VandeVondele at pci dot uzh.ch
@ 2011-01-25 17:58 ` xinliangli at gmail dot com
2011-01-27 16:03 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: xinliangli at gmail dot com @ 2011-01-25 17:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #36 from davidxl <xinliangli at gmail dot com> 2011-01-25 17:28:30 UTC ---
(In reply to comment #35)
> (In reply to comment #34)
> > -march=native is ambiguous, please see with -v what actually is being used.
>
> This was mentioned in the initial comment:
> -march=k8-sse3 -mcx16 -msahf
> --param l1-cache-size=64 --param l1-cache-line-size=64 --param
> l2-cache-size=1024 -mtune=k8
>
> The latest timings are on a newer machine (old one is gone now) which has:
> -march=amdfam10 -mcx16 -msahf -mpopcnt -mabm --param l1-cache-size=64 --param
> l1-cache-line-size=64 --param l2-cache-size=512 -mtune=amdfam10
I did use the options you originally posted "-ftime-report -cpp -fbounds-check
-g -O3 -ffast-math -funroll-loops -ftree-vectorize -march=native -ffree-form".
The timing is consistently 58s on my 2.4Ghz core-2 box, and 42s on the 2.67Ghz
Xeon machine.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2011-01-25 17:58 ` xinliangli at gmail dot com
@ 2011-01-27 16:03 ` jakub at gcc dot gnu.org
2011-01-27 16:17 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-27 16:03 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #37 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-27 15:55:57 UTC ---
/usr/src/gcc/objr/gcc/f951 -quiet -ftime-report -fbounds-check -g -O3
-ffast-math -funroll-loops -ftree-vectorize -march=amdfam10 pr45422.f90 2>&1 |
grep ':[ ]*[1-9]\|TOTAL'
garbage collection : 1.34 ( 1%) usr 0.00 ( 0%) sys 1.32 ( 1%) wall
0 kB ( 0%) ggc
cfg cleanup : 2.24 ( 2%) usr 0.01 ( 0%) sys 2.26 ( 2%) wall
7301 kB ( 0%) ggc
df reaching defs : 1.46 ( 1%) usr 0.02 ( 1%) sys 1.34 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 8.28 ( 6%) usr 0.02 ( 1%) sys 8.49 ( 6%) wall
0 kB ( 0%) ggc
df live&initialized regs: 2.46 ( 2%) usr 0.00 ( 0%) sys 2.98 ( 2%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 1.31 ( 1%) usr 0.00 ( 0%) sys 1.13 ( 1%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 4.01 ( 3%) usr 0.00 ( 0%) sys 4.03 ( 3%) wall
7770 kB ( 0%) ggc
register information : 1.48 ( 1%) usr 0.00 ( 0%) sys 1.53 ( 1%) wall
0 kB ( 0%) ggc
alias analysis : 1.86 ( 1%) usr 0.00 ( 0%) sys 1.89 ( 1%) wall
46655 kB ( 3%) ggc
tree VRP : 2.25 ( 2%) usr 0.08 ( 4%) sys 2.27 ( 2%) wall
74472 kB ( 4%) ggc
tree SSA incremental : 1.43 ( 1%) usr 0.25 (11%) sys 1.34 ( 1%) wall
7187 kB ( 0%) ggc
complete unrolling : 1.19 ( 1%) usr 0.14 ( 6%) sys 1.24 ( 1%) wall
91809 kB ( 5%) ggc
tree prefetching : 1.31 ( 1%) usr 0.12 ( 5%) sys 1.50 ( 1%) wall
92179 kB ( 5%) ggc
tree iv optimization : 15.43 (11%) usr 0.09 ( 4%) sys 15.62 (11%) wall
303704 kB (17%) ggc
expand : 1.11 ( 1%) usr 0.03 ( 1%) sys 1.11 ( 1%) wall
81489 kB ( 5%) ggc
forward prop : 1.17 ( 1%) usr 0.01 ( 0%) sys 1.19 ( 1%) wall
16030 kB ( 1%) ggc
CSE : 1.58 ( 1%) usr 0.01 ( 0%) sys 1.42 ( 1%) wall
667 kB ( 0%) ggc
dead code elimination : 1.24 ( 1%) usr 0.00 ( 0%) sys 1.30 ( 1%) wall
0 kB ( 0%) ggc
dead store elim1 : 1.37 ( 1%) usr 0.00 ( 0%) sys 1.31 ( 1%) wall
23509 kB ( 1%) ggc
dead store elim2 : 1.10 ( 1%) usr 0.00 ( 0%) sys 1.08 ( 1%) wall
22323 kB ( 1%) ggc
loop unrolling : 3.99 ( 3%) usr 0.03 ( 1%) sys 4.11 ( 3%) wall
185245 kB (11%) ggc
CPROP : 2.25 ( 2%) usr 0.01 ( 0%) sys 2.00 ( 1%) wall
25084 kB ( 1%) ggc
PRE : 1.20 ( 1%) usr 0.00 ( 0%) sys 1.13 ( 1%) wall
1576 kB ( 0%) ggc
web : 1.09 ( 1%) usr 0.00 ( 0%) sys 1.09 ( 1%) wall
8368 kB ( 0%) ggc
CSE 2 : 2.10 ( 2%) usr 0.01 ( 0%) sys 2.17 ( 2%) wall
2122 kB ( 0%) ggc
combiner : 3.97 ( 3%) usr 0.00 ( 0%) sys 3.96 ( 3%) wall
60594 kB ( 3%) ggc
integrated RA : 10.18 ( 7%) usr 0.01 ( 0%) sys 10.27 ( 7%) wall
44477 kB ( 3%) ggc
reload : 6.31 ( 5%) usr 0.01 ( 0%) sys 6.24 ( 4%) wall
10153 kB ( 1%) ggc
reload CSE regs : 4.39 ( 3%) usr 0.01 ( 0%) sys 4.17 ( 3%) wall
37354 kB ( 2%) ggc
rename registers : 1.13 ( 1%) usr 0.00 ( 0%) sys 1.18 ( 1%) wall
2500 kB ( 0%) ggc
scheduling 2 : 5.84 ( 4%) usr 0.02 ( 1%) sys 5.81 ( 4%) wall
1160 kB ( 0%) ggc
final : 4.29 ( 3%) usr 0.04 ( 2%) sys 4.66 ( 3%) wall
10463 kB ( 1%) ggc
variable tracking : 2.76 ( 2%) usr 0.01 ( 0%) sys 2.73 ( 2%) wall
64964 kB ( 4%) ggc
var-tracking dataflow : 3.86 ( 3%) usr 0.02 ( 1%) sys 3.90 ( 3%) wall
0 kB ( 0%) ggc
var-tracking emit : 3.89 ( 3%) usr 0.01 ( 0%) sys 3.85 ( 3%) wall
19488 kB ( 1%) ggc
rest of compilation : 2.27 ( 2%) usr 0.08 ( 4%) sys 2.28 ( 2%) wall
21438 kB ( 1%) ggc
remove unused locals : 1.02 ( 1%) usr 0.01 ( 0%) sys 0.92 ( 1%) wall
0 kB ( 0%) ggc
unaccounted todo : 1.21 ( 1%) usr 0.05 ( 2%) sys 1.19 ( 1%) wall
8 kB ( 0%) ggc
TOTAL : 137.09 2.28 139.39
1741129 kB
/usr/src/gcc-4.5/objr/gcc/f951 -quiet -ftime-report -fbounds-check -g -O3
-ffast-math -funroll-loops -ftree-vectorize -march=amdfam10 pr45422.f90 2>&1 |
grep ':[ ]*[1-9]\|TOTAL'
df live regs : 2.05 ( 4%) usr 0.00 ( 0%) sys 1.95 ( 4%) wall
0 kB ( 0%) ggc
tree VRP : 1.43 ( 3%) usr 0.15 ( 8%) sys 1.47 ( 3%) wall
56376 kB ( 6%) ggc
complete unrolling : 1.14 ( 2%) usr 0.18 (10%) sys 1.39 ( 3%) wall
98554 kB (11%) ggc
tree iv optimization : 5.31 (10%) usr 0.05 ( 3%) sys 5.40 (10%) wall
95356 kB (11%) ggc
expand : 2.98 ( 6%) usr 0.11 ( 6%) sys 3.29 ( 6%) wall
69642 kB ( 8%) ggc
combiner : 1.49 ( 3%) usr 0.00 ( 0%) sys 1.22 ( 2%) wall
19980 kB ( 2%) ggc
integrated RA : 3.60 ( 7%) usr 0.01 ( 1%) sys 3.56 ( 6%) wall
12746 kB ( 1%) ggc
reload : 2.21 ( 4%) usr 0.01 ( 1%) sys 2.18 ( 4%) wall
7748 kB ( 1%) ggc
reload CSE regs : 1.24 ( 2%) usr 0.01 ( 1%) sys 1.16 ( 2%) wall
12330 kB ( 1%) ggc
scheduling 2 : 2.73 ( 5%) usr 0.01 ( 1%) sys 2.88 ( 5%) wall
1218 kB ( 0%) ggc
final : 3.14 ( 6%) usr 0.03 ( 2%) sys 3.16 ( 6%) wall
7438 kB ( 1%) ggc
variable tracking : 4.18 ( 8%) usr 0.03 ( 2%) sys 4.25 ( 8%) wall
40204 kB ( 4%) ggc
TOTAL : 53.49 1.81 55.45
897516 kB
/usr/src/gcc/objr/gcc/f951 -quiet -ftime-report -fbounds-check -g -O3
-ffast-math -funroll-loops -ftree-vectorize -march=amdfam10 pr45422.f90
-fno-ivopts 2>&1 | grep ':[ ]*[1-9]\|TOTAL'
cfg cleanup : 1.83 ( 2%) usr 0.01 ( 0%) sys 1.71 ( 2%) wall
7191 kB ( 1%) ggc
df reaching defs : 1.25 ( 1%) usr 0.00 ( 0%) sys 1.33 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 6.19 ( 6%) usr 0.06 ( 3%) sys 6.34 ( 6%) wall
0 kB ( 0%) ggc
df live&initialized regs: 2.47 ( 2%) usr 0.02 ( 1%) sys 2.06 ( 2%) wall
0 kB ( 0%) ggc
df reg dead/unused notes: 2.82 ( 3%) usr 0.01 ( 0%) sys 2.79 ( 3%) wall
9653 kB ( 1%) ggc
register information : 1.19 ( 1%) usr 0.00 ( 0%) sys 1.21 ( 1%) wall
0 kB ( 0%) ggc
alias analysis : 1.75 ( 2%) usr 0.00 ( 0%) sys 1.81 ( 2%) wall
48001 kB ( 3%) ggc
tree CFG cleanup : 1.08 ( 1%) usr 0.01 ( 0%) sys 1.10 ( 1%) wall
4079 kB ( 0%) ggc
tree VRP : 2.13 ( 2%) usr 0.05 ( 2%) sys 2.43 ( 2%) wall
76935 kB ( 5%) ggc
tree SSA incremental : 1.23 ( 1%) usr 0.20 ( 9%) sys 1.48 ( 1%) wall
7193 kB ( 1%) ggc
tree CCP : 1.00 ( 1%) usr 0.06 ( 3%) sys 1.12 ( 1%) wall
4975 kB ( 0%) ggc
complete unrolling : 1.15 ( 1%) usr 0.16 ( 7%) sys 1.12 ( 1%) wall
91888 kB ( 6%) ggc
tree prefetching : 1.45 ( 1%) usr 0.07 ( 3%) sys 1.38 ( 1%) wall
92015 kB ( 6%) ggc
expand : 1.30 ( 1%) usr 0.03 ( 1%) sys 1.23 ( 1%) wall
98494 kB ( 7%) ggc
forward prop : 1.19 ( 1%) usr 0.01 ( 0%) sys 1.05 ( 1%) wall
17136 kB ( 1%) ggc
CSE : 1.64 ( 2%) usr 0.00 ( 0%) sys 1.51 ( 1%) wall
683 kB ( 0%) ggc
dead store elim1 : 1.08 ( 1%) usr 0.00 ( 0%) sys 1.23 ( 1%) wall
24050 kB ( 2%) ggc
loop unrolling : 3.38 ( 3%) usr 0.06 ( 3%) sys 3.69 ( 3%) wall
165346 kB (12%) ggc
CPROP : 1.92 ( 2%) usr 0.02 ( 1%) sys 2.11 ( 2%) wall
23260 kB ( 2%) ggc
PRE : 1.23 ( 1%) usr 0.00 ( 0%) sys 1.06 ( 1%) wall
1979 kB ( 0%) ggc
CSE 2 : 1.98 ( 2%) usr 0.01 ( 0%) sys 1.94 ( 2%) wall
2631 kB ( 0%) ggc
combiner : 4.33 ( 4%) usr 0.00 ( 0%) sys 4.40 ( 4%) wall
76446 kB ( 5%) ggc
integrated RA : 8.83 ( 8%) usr 0.03 ( 1%) sys 9.07 ( 8%) wall
46246 kB ( 3%) ggc
reload : 5.16 ( 5%) usr 0.02 ( 1%) sys 5.12 ( 5%) wall
9244 kB ( 1%) ggc
reload CSE regs : 3.47 ( 3%) usr 0.01 ( 0%) sys 3.59 ( 3%) wall
34826 kB ( 2%) ggc
rename registers : 1.21 ( 1%) usr 0.00 ( 0%) sys 1.13 ( 1%) wall
2675 kB ( 0%) ggc
scheduling 2 : 5.54 ( 5%) usr 0.01 ( 0%) sys 5.46 ( 5%) wall
1216 kB ( 0%) ggc
final : 3.94 ( 4%) usr 0.08 ( 4%) sys 4.16 ( 4%) wall
9291 kB ( 1%) ggc
variable tracking : 2.23 ( 2%) usr 0.05 ( 2%) sys 2.42 ( 2%) wall
61607 kB ( 4%) ggc
var-tracking dataflow : 3.97 ( 4%) usr 0.00 ( 0%) sys 3.76 ( 3%) wall
0 kB ( 0%) ggc
var-tracking emit : 3.75 ( 3%) usr 0.01 ( 0%) sys 3.91 ( 4%) wall
21108 kB ( 1%) ggc
rest of compilation : 1.84 ( 2%) usr 0.08 ( 4%) sys 2.11 ( 2%) wall
16864 kB ( 1%) ggc
TOTAL : 107.95 2.25 110.22
1435716 kB
shows that still the ivopts slowdown isn't so significant, the compiler on this
testcase just slowed down everywhere. Both f951 binaries are
--enable-checking=release. Suprisingly -fno-while-file on the trunk doesn't
make any visible difference in compile time.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2011-01-27 16:03 ` jakub at gcc dot gnu.org
@ 2011-01-27 16:17 ` jakub at gcc dot gnu.org
2011-01-27 16:29 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-27 16:17 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #38 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-27 16:02:49 UTC ---
*.gimple dump is roughly the same size between 4.5 and 4.6, but resulting
assembly size is 15MB in 4.5 and 23MB (with only < 100KB variation with
-fno-ivopts) in 4.6. -fno-inline doesn't help neither compile time nor
assembly size though on 4.6.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2011-01-27 16:17 ` jakub at gcc dot gnu.org
@ 2011-01-27 16:29 ` rguenth at gcc dot gnu.org
2011-01-27 16:31 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-27 16:29 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #39 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-27 16:16:48 UTC ---
The size difference is likely from prefetching, it's 1.5MB vs. 1.1MB without
that (-O3 -fbounds-check -ffast-math -funroll-loops). Prefetching usually
causes another set of (then RTL unrolled) loop copies. See PR44688.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (12 preceding siblings ...)
2011-01-27 16:29 ` rguenth at gcc dot gnu.org
@ 2011-01-27 16:31 ` rguenth at gcc dot gnu.org
2011-01-27 16:40 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-27 16:31 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #40 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-27 16:19:26 UTC ---
Btw, when I remove -fbounds-check the sizes are comparable (without
prefetching),
so I guess we are just better in removing bounds checking for 4.6 and that
triggers size-costly loop opts such as vectorization and unrolling.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (13 preceding siblings ...)
2011-01-27 16:31 ` rguenth at gcc dot gnu.org
@ 2011-01-27 16:40 ` jakub at gcc dot gnu.org
2011-01-27 16:51 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-27 16:40 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #41 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-27 16:28:49 UTC ---
With additional -fno-prefetch-loop-arrays the TOTAL goes down from that 137s to
92.23, and judging from tree dumps between 4.5 and 4.6 we do significantly more
vectorization too (4.6 *.ifcvt is 4.7MB compared to 5.3MB 4.5 *.ifcvt, while
4.6 *.vect grows to 8.3MB while 4.5 *.vect stays at 5.3MB).
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (14 preceding siblings ...)
2011-01-27 16:40 ` jakub at gcc dot gnu.org
@ 2011-01-27 16:51 ` rguenth at gcc dot gnu.org
2011-01-27 16:55 ` jakub at gcc dot gnu.org
2011-01-27 17:55 ` xinliangli at gmail dot com
17 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-01-27 16:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #42 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-01-27 16:30:52 UTC ---
Comparing -O3 -ffast-math -funroll-loops -fno-inline -fno-partial-inlining
(thus generic arch, without prefetching):
trunk:
df live regs : 4.22 ( 6%) usr 0.04 ( 2%) sys 4.11 ( 5%) wall
0 kB ( 0%) ggc
tree iv optimization : 3.92 ( 5%) usr 0.13 ( 5%) sys 4.29 ( 6%) wall
91066 kB (11%) ggc
integrated RA : 5.57 ( 8%) usr 0.10 ( 4%) sys 5.93 ( 8%) wall
26408 kB ( 3%) ggc
scheduling 2 : 3.73 ( 5%) usr 0.04 ( 2%) sys 3.85 ( 5%) wall
939 kB ( 0%) ggc
TOTAL : 73.68 2.37 76.91
852775 kB
4.5:
df live regs : 4.60 ( 7%) usr 0.02 ( 1%) sys 4.62 ( 6%) wall
0 kB ( 0%) ggc
expand : 3.94 ( 6%) usr 0.17 ( 8%) sys 3.94 ( 6%) wall
62218 kB ( 8%) ggc
integrated RA : 5.73 ( 8%) usr 0.02 ( 1%) sys 5.76 ( 8%) wall
22920 kB ( 3%) ggc
reload : 3.78 ( 5%) usr 0.08 ( 4%) sys 3.86 ( 5%) wall
9291 kB ( 1%) ggc
TOTAL : 68.98 2.01 71.22
828137 kB
it would be nice to confirm that we are indeed much better with
optimizing bounds-checking code. The prefetching issue is
tracked as PR44688. So I'd close this either as a dup or as
wontfix (it's a feature that we optimize loops with bounds-checking).
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (15 preceding siblings ...)
2011-01-27 16:51 ` rguenth at gcc dot gnu.org
@ 2011-01-27 16:55 ` jakub at gcc dot gnu.org
2011-01-27 17:55 ` xinliangli at gmail dot com
17 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-01-27 16:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
--- Comment #43 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-01-27 16:43:17 UTC ---
Yeah, I agree.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
[not found] <bug-45422-4@http.gcc.gnu.org/bugzilla/>
` (16 preceding siblings ...)
2011-01-27 16:55 ` jakub at gcc dot gnu.org
@ 2011-01-27 17:55 ` xinliangli at gmail dot com
17 siblings, 0 replies; 28+ messages in thread
From: xinliangli at gmail dot com @ 2011-01-27 17:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
--- Comment #44 from davidxl <xinliangli at gmail dot com> 2011-01-27 17:33:42 UTC ---
Nice triaging..
David
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
@ 2010-08-29 6:38 ` jv244 at cam dot ac dot uk
2010-08-29 9:26 ` rguenth at gcc dot gnu dot org
` (8 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-08-29 6:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from jv244 at cam dot ac dot uk 2010-08-29 06:38 -------
adjust summary according to the last timings
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2010-08-29 05:31:37 |2010-08-29 06:38:26
date| |
Summary|[4.6 Regression] compile |[4.6 Regression] compile
|time increases 5x. |time increases 3x.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
2010-08-29 6:38 ` [Bug middle-end/45422] [4.6 Regression] compile time increases 3x jv244 at cam dot ac dot uk
@ 2010-08-29 9:26 ` rguenth at gcc dot gnu dot org
2010-08-29 15:07 ` jv244 at cam dot ac dot uk
` (7 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-29 9:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from rguenth at gcc dot gnu dot org 2010-08-29 09:25 -------
tree iv optimization : 32.57 (20%) usr 0.10 ( 5%) sys 32.73 (20%) wall
322095 kB (18%) ggc
20% is still completely unreasonable for IV optimization.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |NEW
Last reconfirmed|2010-08-29 06:38:26 |2010-08-29 09:25:52
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
2010-08-29 6:38 ` [Bug middle-end/45422] [4.6 Regression] compile time increases 3x jv244 at cam dot ac dot uk
2010-08-29 9:26 ` rguenth at gcc dot gnu dot org
@ 2010-08-29 15:07 ` jv244 at cam dot ac dot uk
2010-08-30 3:11 ` davidxl at gcc dot gnu dot org
` (6 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: jv244 at cam dot ac dot uk @ 2010-08-29 15:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from jv244 at cam dot ac dot uk 2010-08-29 15:07 -------
FYI, these are the 4.5 branch timings:
Execution times (seconds)
garbage collection : 0.47 ( 1%) usr 0.00 ( 0%) sys 0.47 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 0.05 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall
5996 kB ( 1%) ggc
callgraph optimization: 0.21 ( 0%) usr 0.02 ( 1%) sys 0.26 ( 0%) wall
606 kB ( 0%) ggc
ipa cp : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
1381 kB ( 0%) ggc
ipa reference : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
ipa pure const : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 0.39 ( 1%) usr 0.00 ( 0%) sys 0.51 ( 1%) wall
2459 kB ( 0%) ggc
trivially dead code : 0.34 ( 1%) usr 0.00 ( 0%) sys 0.30 ( 1%) wall
0 kB ( 0%) ggc
df multiple defs : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.27 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 2.08 ( 4%) usr 0.01 ( 1%) sys 2.19 ( 4%) wall
0 kB ( 0%) ggc
df live&initialized regs: 0.98 ( 2%) usr 0.00 ( 0%) sys 0.92 ( 2%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.93 ( 2%) usr 0.00 ( 0%) sys 1.04 ( 2%) wall
5756 kB ( 1%) ggc
register information : 0.51 ( 1%) usr 0.01 ( 1%) sys 0.39 ( 1%) wall
0 kB ( 0%) ggc
alias analysis : 0.78 ( 1%) usr 0.01 ( 1%) sys 0.91 ( 2%) wall
22384 kB ( 3%) ggc
alias stmt walking : 0.50 ( 1%) usr 0.03 ( 2%) sys 0.38 ( 1%) wall
5563 kB ( 1%) ggc
register scan : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
0 kB ( 0%) ggc
rebuild jump labels : 0.19 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
0 kB ( 0%) ggc
parser : 0.82 ( 2%) usr 0.13 ( 9%) sys 0.94 ( 2%) wall
55603 kB ( 6%) ggc
inline heuristics : 0.20 ( 0%) usr 0.01 ( 1%) sys 0.16 ( 0%) wall
0 kB ( 0%) ggc
tree gimplify : 0.38 ( 1%) usr 0.03 ( 2%) sys 0.40 ( 1%) wall
46588 kB ( 5%) ggc
tree eh : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.04 ( 0%) usr 0.02 ( 1%) sys 0.05 ( 0%) wall
11964 kB ( 1%) ggc
tree CFG cleanup : 0.47 ( 1%) usr 0.00 ( 0%) sys 0.79 ( 1%) wall
1829 kB ( 0%) ggc
tree VRP : 1.46 ( 3%) usr 0.05 ( 4%) sys 1.27 ( 2%) wall
56376 kB ( 6%) ggc
tree copy propagation : 0.09 ( 0%) usr 0.02 ( 1%) sys 0.22 ( 0%) wall
746 kB ( 0%) ggc
tree find ref. vars : 0.09 ( 0%) usr 0.01 ( 1%) sys 0.07 ( 0%) wall
3806 kB ( 0%) ggc
tree PTA : 0.30 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall
3836 kB ( 0%) ggc
tree PHI insertion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
3194 kB ( 0%) ggc
tree SSA rewrite : 0.24 ( 0%) usr 0.01 ( 1%) sys 0.29 ( 1%) wall
13860 kB ( 2%) ggc
tree SSA other : 0.13 ( 0%) usr 0.02 ( 1%) sys 0.11 ( 0%) wall
418 kB ( 0%) ggc
tree SSA incremental : 0.89 ( 2%) usr 0.06 ( 4%) sys 0.97 ( 2%) wall
6811 kB ( 1%) ggc
tree operand scan : 0.34 ( 1%) usr 0.23 (17%) sys 0.59 ( 1%) wall
44776 kB ( 5%) ggc
dominator optimization: 0.29 ( 1%) usr 0.01 ( 1%) sys 0.35 ( 1%) wall
5152 kB ( 1%) ggc
tree CCP : 0.51 ( 1%) usr 0.02 ( 1%) sys 0.43 ( 1%) wall
4620 kB ( 1%) ggc
tree PHI const/copy prop: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
106 kB ( 0%) ggc
tree split crit edges : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
2019 kB ( 0%) ggc
tree reassociation : 0.12 ( 0%) usr 0.01 ( 1%) sys 0.12 ( 0%) wall
2946 kB ( 0%) ggc
tree PRE : 0.92 ( 2%) usr 0.00 ( 0%) sys 0.95 ( 2%) wall
7315 kB ( 1%) ggc
tree FRE : 0.45 ( 1%) usr 0.04 ( 3%) sys 0.35 ( 1%) wall
5518 kB ( 1%) ggc
tree code sinking : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
1400 kB ( 0%) ggc
tree linearize phis : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.18 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall
10006 kB ( 1%) ggc
tree conservative DCE : 0.05 ( 0%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall
576 kB ( 0%) ggc
tree aggressive DCE : 0.28 ( 1%) usr 0.01 ( 1%) sys 0.37 ( 1%) wall
8853 kB ( 1%) ggc
tree buildin call DCE : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
132 kB ( 0%) ggc
PHI merge : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
37 kB ( 0%) ggc
tree loop bounds : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.18 ( 0%) wall
8266 kB ( 1%) ggc
tree loop invariant motion: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%)
wall 67 kB ( 0%) ggc
tree canonical iv : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
4779 kB ( 1%) ggc
scev constant prop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
2345 kB ( 0%) ggc
tree loop unswitching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
573 kB ( 0%) ggc
complete unrolling : 1.05 ( 2%) usr 0.11 ( 8%) sys 1.39 ( 3%) wall
98553 kB (11%) ggc
tree vectorization : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
883 kB ( 0%) ggc
tree slp vectorization: 0.61 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 1%) wall
53236 kB ( 6%) ggc
tree iv optimization : 5.80 (11%) usr 0.06 ( 4%) sys 5.94 (11%) wall
95356 kB (11%) ggc
predictive commoning : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
1054 kB ( 0%) ggc
tree loop init : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
1339 kB ( 0%) ggc
tree copy headers : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
1613 kB ( 0%) ggc
tree SSA uncprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree rename SSA copies: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
dominance frontiers : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.26 ( 0%) wall
0 kB ( 0%) ggc
expand : 3.24 ( 6%) usr 0.07 ( 5%) sys 3.34 ( 6%) wall
69633 kB ( 8%) ggc
lower subreg : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 0.48 ( 1%) usr 0.01 ( 1%) sys 0.48 ( 1%) wall
9984 kB ( 1%) ggc
CSE : 0.73 ( 1%) usr 0.00 ( 0%) sys 0.92 ( 2%) wall
248 kB ( 0%) ggc
dead code elimination : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 1%) wall
0 kB ( 0%) ggc
dead store elim1 : 0.33 ( 1%) usr 0.01 ( 1%) sys 0.32 ( 1%) wall
5987 kB ( 1%) ggc
dead store elim2 : 0.44 ( 1%) usr 0.02 ( 1%) sys 0.39 ( 1%) wall
7831 kB ( 1%) ggc
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
718 kB ( 0%) ggc
loop invariant motion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
305 kB ( 0%) ggc
loop unswitching : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
loop unrolling : 0.65 ( 1%) usr 0.00 ( 0%) sys 0.62 ( 1%) wall
32780 kB ( 4%) ggc
CPROP : 0.70 ( 1%) usr 0.00 ( 0%) sys 0.60 ( 1%) wall
7825 kB ( 1%) ggc
PRE : 0.32 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall
719 kB ( 0%) ggc
web : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
594 kB ( 0%) ggc
CSE 2 : 0.75 ( 1%) usr 0.01 ( 1%) sys 0.60 ( 1%) wall
470 kB ( 0%) ggc
branch prediction : 0.19 ( 0%) usr 0.01 ( 1%) sys 0.14 ( 0%) wall
7344 kB ( 1%) ggc
combiner : 1.19 ( 2%) usr 0.01 ( 1%) sys 1.33 ( 2%) wall
19980 kB ( 2%) ggc
if-conversion : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
746 kB ( 0%) ggc
regmove : 0.37 ( 1%) usr 0.01 ( 1%) sys 0.33 ( 1%) wall
0 kB ( 0%) ggc
integrated RA : 3.51 ( 7%) usr 0.01 ( 1%) sys 3.74 ( 7%) wall
12746 kB ( 1%) ggc
reload : 2.16 ( 4%) usr 0.02 ( 1%) sys 2.01 ( 4%) wall
7755 kB ( 1%) ggc
reload CSE regs : 1.38 ( 3%) usr 0.00 ( 0%) sys 1.26 ( 2%) wall
12331 kB ( 1%) ggc
load CSE after reload : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
162 kB ( 0%) ggc
thread pro- & epilogue: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
4370 kB ( 0%) ggc
if-conversion 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
357 kB ( 0%) ggc
combine stack adjustments: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
1899 kB ( 0%) ggc
rename registers : 0.46 ( 1%) usr 0.00 ( 0%) sys 0.55 ( 1%) wall
2237 kB ( 0%) ggc
hard reg cprop : 0.37 ( 1%) usr 0.00 ( 0%) sys 0.48 ( 1%) wall
13 kB ( 0%) ggc
scheduling 2 : 3.30 ( 6%) usr 0.04 ( 3%) sys 3.10 ( 6%) wall
1216 kB ( 0%) ggc
machine dep reorg : 0.38 ( 1%) usr 0.00 ( 0%) sys 0.36 ( 1%) wall
11 kB ( 0%) ggc
reorder blocks : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
1283 kB ( 0%) ggc
final : 0.93 ( 2%) usr 0.07 ( 5%) sys 0.84 ( 2%) wall
6610 kB ( 1%) ggc
symout : 0.30 ( 1%) usr 0.03 ( 2%) sys 0.34 ( 1%) wall
27006 kB ( 3%) ggc
variable tracking : 3.86 ( 7%) usr 0.03 ( 2%) sys 3.99 ( 7%) wall
39804 kB ( 4%) ggc
plugin execution : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 52.50 1.37 53.88
893901 kB
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (2 preceding siblings ...)
2010-08-29 15:07 ` jv244 at cam dot ac dot uk
@ 2010-08-30 3:11 ` davidxl at gcc dot gnu dot org
2010-08-30 3:19 ` davidxl at gcc dot gnu dot org
` (5 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: davidxl at gcc dot gnu dot org @ 2010-08-30 3:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from davidxl at gcc dot gnu dot org 2010-08-30 03:10 -------
(In reply to comment #16)
> adjust summary according to the last timings
>
I am surprised to see such big differences between trunk and previous releases.
Compiling this test case with the those options on my core2 box (2.4GHz ) took
only 56seconds which is comparable with the timing with a 4.4.3 compiler (with
google local patches including ivopt improvements).
David
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (3 preceding siblings ...)
2010-08-30 3:11 ` davidxl at gcc dot gnu dot org
@ 2010-08-30 3:19 ` davidxl at gcc dot gnu dot org
2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: davidxl at gcc dot gnu dot org @ 2010-08-30 3:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from davidxl at gcc dot gnu dot org 2010-08-30 03:19 -------
(In reply to comment #17)
> tree iv optimization : 32.57 (20%) usr 0.10 ( 5%) sys 32.73 (20%) wall
> 322095 kB (18%) ggc
>
>
> 20% is still completely unreasonable for IV optimization.
>
There was a patch in trunk that may double the time in ivopt -- i.e.
find_optimal_iv_set_1 is done twice, one with the original iv set while the
other with full set. This probably needs to be revisited.
David
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (4 preceding siblings ...)
2010-08-30 3:19 ` davidxl at gcc dot gnu dot org
@ 2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-30 7:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from rguenth at gcc dot gnu dot org 2010-08-30 07:12 -------
(In reply to comment #20)
> (In reply to comment #16)
> > adjust summary according to the last timings
> >
>
> I am surprised to see such big differences between trunk and previous releases.
> Compiling this test case with the those options on my core2 box (2.4GHz ) took
> only 56seconds which is comparable with the timing with a 4.4.3 compiler (with
> google local patches including ivopt improvements).
Of course - because the ivopt improvement patches are the problem.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (5 preceding siblings ...)
2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
@ 2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
2010-08-30 16:41 ` davidxl at gcc dot gnu dot org
` (2 subsequent siblings)
9 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-08-30 7:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from rguenth at gcc dot gnu dot org 2010-08-30 07:11 -------
(In reply to comment #22)
> Given the fact that the solution space is really large -- M^N where M is the
> number of candidates and M is the number of uses (here M == 70 and N == 48),
> and the cost function is complicated, it will be challenging to come up with
> algorithm that converges really fast, and most importantly -- 'guarantees' an
> optimal solution..
Well - we can't guarantee an optimal solution. We have to take compile-time
into account which means that O(M^N) is not acceptable but we need to come
up with something that can complete in O((M+N) log (M+N)) time at most.
I btw doubt that the solution found is anywhere near optimal for 32bit
x86 - using 15 IVs instead of 2 can't be cheaper.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (6 preceding siblings ...)
2010-08-30 7:12 ` rguenth at gcc dot gnu dot org
@ 2010-08-30 16:41 ` davidxl at gcc dot gnu dot org
2010-08-31 17:45 ` davidxl at gcc dot gnu dot org
2010-09-02 11:25 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 28+ messages in thread
From: davidxl at gcc dot gnu dot org @ 2010-08-30 16:41 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from davidxl at gcc dot gnu dot org 2010-08-30 16:41 -------
(In reply to comment #24)
> (In reply to comment #20)
> > (In reply to comment #16)
> > > adjust summary according to the last timings
> > >
> >
> > I am surprised to see such big differences between trunk and previous releases.
> > Compiling this test case with the those options on my core2 box (2.4GHz ) took
> > only 56seconds which is comparable with the timing with a 4.4.3 compiler (with
> > google local patches including ivopt improvements).
>
> Of course - because the ivopt improvement patches are the problem.
>
It is just the total time diff from Joost's measure can be just explained by
ivopt component.
David
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (7 preceding siblings ...)
2010-08-30 16:41 ` davidxl at gcc dot gnu dot org
@ 2010-08-31 17:45 ` davidxl at gcc dot gnu dot org
2010-09-02 11:25 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 28+ messages in thread
From: davidxl at gcc dot gnu dot org @ 2010-08-31 17:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #26 from davidxl at gcc dot gnu dot org 2010-08-31 17:45 -------
Good observation re. the number of IVs in the final set. This usually points to
some problem/bug in the cost function. I briefly looked at this case -- it
indeed exposes two more bugs in the cost model:
1) the computation cost of the all the cost pairs in an assignment can actually
not simply be added together, because many rewrite expressions can be commoned.
We now have the mechanism to compute with common loop invariants for register
pressure estimation, and this mechnasim needs to be extended for computation
cost.
2) the offset is not stripped when computing loop invariant expression ids --
this can cause problem in overestimating reg pressure. (The case arises more
often with loop unrolling).
David
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
2010-08-26 18:33 [Bug middle-end/45422] New: [4.6 Regression] compile time increases 8x jv244 at cam dot ac dot uk
` (8 preceding siblings ...)
2010-08-31 17:45 ` davidxl at gcc dot gnu dot org
@ 2010-09-02 11:25 ` rguenth at gcc dot gnu dot org
9 siblings, 0 replies; 28+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-09-02 11:25 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
^ permalink raw reply [flat|nested] 28+ messages in thread