* Huge compile time & run time performance regression 3.3 -> HEAD
@ 2003-05-18 16:10 Richard Guenther
2003-05-18 19:22 ` Richard Guenther
2003-05-19 16:18 ` Matt Austern
0 siblings, 2 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 16:10 UTC (permalink / raw)
To: gcc
Hi!
As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
performance and performance of the resulting code. As always these
comparisons are for a POOMA based scientific application.
I experience a 100% compile time regression (673.50s -> 1284.48s) and
a 12% runtime performance regression (150s -> 171s) when comparing
gcc3.3 to HEAD.
Time reports follow, the most prominent regressions are expand, global CSE
(>300%!), loop analysis and branch prediction.
Compile options are -ftemplate-depth-80 -fno-exceptions -O2 -march=athlon
-funroll-loops -fomit-frame-pointer
Richard.
gcc-3.3:
Execution times (seconds)
garbage collection : 21.82 ( 3%) usr 0.01 ( 0%) sys 22.00 ( 3%)
cfg construction : 3.32 ( 1%) usr 0.08 ( 0%) sys 4.00 ( 1%)
cfg cleanup : 58.32 ( 9%) usr 0.20 ( 1%) sys 60.00 ( 9%)
trivially dead code : 9.11 ( 1%) usr 0.01 ( 0%) sys 9.00 ( 1%)
life analysis : 14.49 ( 2%) usr 0.60 ( 4%) sys 15.00 ( 2%)
life info update : 6.95 ( 1%) usr 0.53 ( 3%) sys 8.50 ( 1%)
preprocessing : 0.52 ( 0%) usr 0.24 ( 1%) sys 1.00 ( 0%)
lexical analysis : 0.31 ( 0%) usr 0.17 ( 1%) sys 1.50 ( 0%)
parser : 13.78 ( 2%) usr 0.55 ( 3%) sys 19.00 ( 3%)
name lookup : 6.88 ( 1%) usr 0.84 ( 5%) sys 4.50 ( 1%)
expand : 29.68 ( 5%) usr 1.63 (10%) sys 36.00 ( 5%)
varconst : 0.32 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%)
integration : 27.52 ( 4%) usr 0.74 ( 5%) sys 27.50 ( 4%)
jump : 17.88 ( 3%) usr 0.26 ( 2%) sys 17.50 ( 3%)
CSE : 23.11 ( 4%) usr 0.05 ( 0%) sys 25.00 ( 4%)
global CSE : 108.29 (16%) usr 1.30 ( 8%) sys 108.00 (16%)
loop analysis : 206.77 (31%) usr 8.24 (51%) sys 213.50 (32%)
CSE 2 : 9.69 ( 1%) usr 0.01 ( 0%) sys 10.00 ( 1%)
branch prediction : 8.94 ( 1%) usr 0.04 ( 0%) sys 9.00 ( 1%)
flow analysis : 1.56 ( 0%) usr 0.02 ( 0%) sys 2.00 ( 0%)
combiner : 5.52 ( 1%) usr 0.07 ( 0%) sys 3.50 ( 1%)
if-conversion : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%)
regmove : 6.41 ( 1%) usr 0.00 ( 0%) sys 6.00 ( 1%)
mode switching : 0.91 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%)
local alloc : 24.06 ( 4%) usr 0.11 ( 1%) sys 23.00 ( 3%)
global alloc : 10.66 ( 2%) usr 0.11 ( 1%) sys 9.50 ( 1%)
reload CSE regs : 15.54 ( 2%) usr 0.05 ( 0%) sys 15.00 ( 2%)
flow 2 : 2.01 ( 0%) usr 0.01 ( 0%) sys 3.00 ( 0%)
if-conversion 2 : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
peephole 2 : 0.62 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%)
rename registers : 2.52 ( 0%) usr 0.03 ( 0%) sys 1.00 ( 0%)
scheduling 2 : 6.27 ( 1%) usr 0.14 ( 1%) sys 7.50 ( 1%)
reorder blocks : 0.21 ( 0%) usr 0.02 ( 0%) sys 0.50 ( 0%)
shorten branches : 1.76 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%)
reg stack : 0.53 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%)
final : 1.72 ( 0%) usr 0.15 ( 1%) sys 0.50 ( 0%)
rest of compilation : 7.72 ( 1%) usr 0.01 ( 0%) sys 6.50 ( 1%)
TOTAL : 656.71 16.26 673.50
gcc 3.4:
Execution times (seconds)
garbage collection : 25.93 ( 2%) usr 0.12 ( 0%) sys 26.04 ( 2%)
cfg construction : 4.14 ( 0%) usr 0.22 ( 0%) sys 4.06 ( 0%)
cfg cleanup : 8.45 ( 1%) usr 0.68 ( 0%) sys 9.17 ( 1%)
trivially dead code : 9.92 ( 1%) usr 0.16 ( 0%) sys 10.21 ( 1%)
life analysis : 17.45 ( 2%) usr 56.73 (31%) sys 74.21 ( 6%)
life info update : 8.87 ( 1%) usr 47.74 (26%) sys 56.72 ( 4%)
alias analysis : 18.90 ( 2%) usr 1.36 ( 1%) sys 19.57 ( 2%)
register scan : 3.63 ( 0%) usr 0.00 ( 0%) sys 3.80 ( 0%)
rebuild jump labels : 3.17 ( 0%) usr 0.00 ( 0%) sys 3.30 ( 0%)
preprocessing : 0.60 ( 0%) usr 0.24 ( 0%) sys 6.20 ( 0%)
parser : 13.69 ( 1%) usr 0.72 ( 0%) sys 14.31 ( 1%)
name lookup : 9.07 ( 1%) usr 0.72 ( 0%) sys 10.07 ( 1%)
expand : 190.76 (17%) usr 7.66 ( 4%) sys 198.53 (15%)
varconst : 0.30 ( 0%) usr 0.01 ( 0%) sys 0.31 ( 0%)
integration : 27.47 ( 3%) usr 2.63 ( 1%) sys 29.84 ( 2%)
jump : 15.55 ( 1%) usr 1.51 ( 1%) sys 17.02 ( 1%)
CSE : 20.92 ( 2%) usr 0.13 ( 0%) sys 21.00 ( 2%)
global CSE : 365.18 (33%) usr 3.40 ( 2%) sys 368.96 (29%)
loop analysis : 227.82 (21%) usr 57.81 (31%) sys 286.05 (22%)
bypass jumps : 4.69 ( 0%) usr 0.49 ( 0%) sys 5.05 ( 0%)
CSE 2 : 9.27 ( 1%) usr 0.06 ( 0%) sys 9.00 ( 1%)
branch prediction : 20.14 ( 2%) usr 1.57 ( 1%) sys 21.76 ( 2%)
flow analysis : 0.48 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%)
combiner : 6.40 ( 1%) usr 0.22 ( 0%) sys 6.81 ( 1%)
if-conversion : 0.52 ( 0%) usr 0.01 ( 0%) sys 0.50 ( 0%)
regmove : 7.61 ( 1%) usr 0.01 ( 0%) sys 7.57 ( 1%)
mode switching : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%)
local alloc : 24.98 ( 2%) usr 0.27 ( 0%) sys 25.20 ( 2%)
global alloc : 12.88 ( 1%) usr 0.26 ( 0%) sys 13.24 ( 1%)
reload CSE regs : 5.70 ( 1%) usr 0.17 ( 0%) sys 5.82 ( 0%)
flow 2 : 1.18 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%)
if-conversion 2 : 0.25 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%)
peephole 2 : 0.94 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%)
rename registers : 2.86 ( 0%) usr 0.07 ( 0%) sys 2.88 ( 0%)
scheduling 2 : 6.90 ( 1%) usr 0.26 ( 0%) sys 7.24 ( 1%)
reorder blocks : 0.51 ( 0%) usr 0.01 ( 0%) sys 0.64 ( 0%)
shorten branches : 1.99 ( 0%) usr 0.11 ( 0%) sys 2.17 ( 0%)
reg stack : 1.02 ( 0%) usr 0.04 ( 0%) sys 1.18 ( 0%)
final : 2.44 ( 0%) usr 0.25 ( 0%) sys 2.54 ( 0%)
rest of compilation : 10.52 ( 1%) usr 0.07 ( 0%) sys 10.56 ( 1%)
TOTAL :1093.17 185.71 1284.48
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
@ 2003-05-18 19:22 ` Richard Guenther
2003-05-18 19:31 ` Richard Guenther
2003-05-19 16:18 ` Matt Austern
1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:22 UTC (permalink / raw)
To: gcc
On Sun, 18 May 2003, Richard Guenther wrote:
> Hi!
>
> As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> performance and performance of the resulting code. As always these
> comparisons are for a POOMA based scientific application.
>
> I experience a 100% compile time regression (673.50s -> 1284.48s) and
> a 12% runtime performance regression (150s -> 171s) when comparing
> gcc3.3 to HEAD.
>
> Time reports follow, the most prominent regressions are expand, global CSE
> (>300%!), loop analysis and branch prediction.
Top of a profile of gcc3.4 is
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
7.47 79.18 79.18 906 0.09 0.09 compute_store_table
6.92 152.54 73.36 262785751 0.00 0.00 expr_equiv_p
6.76 224.21 71.67 170098 0.00 0.00 fixup_var_refs_insns
5.58 283.38 59.17 12865 0.00 0.00 loop_regs_scan
4.39 329.97 46.59 364279 0.00 0.00 compute_transp
4.10 373.46 43.49 308470718 0.00 0.00
splay_tree_splay_helper
3.27 408.15 34.69 201411415 0.00 0.00 fixup_var_refs_1
3.02 440.22 32.07 192083288 0.00 0.00 true_dependence
2.88 470.76 30.54 232168274 0.00 0.00
mems_in_disjoint_alias_sets_p
2.39 496.11 25.35 10203 0.00 0.00 loop_regs_update
2.36 521.17 25.06 686 0.04 0.04 build_store_vectors
2.00 542.36 21.19 3266323 0.00 0.00
invalid_mode_change_p
1.95 563.04 20.68 374046 0.00 0.00 alloc_page
1.86 582.79 19.75 180849685 0.00 0.00 find_loads
1.47 598.39 15.60 214475 0.00 0.00 ldst_entry
1.28 611.95 13.56 4111368 0.00 0.00
gt_ggc_mx_lang_tree_node
1.04 622.99 11.04 201411428 0.00 0.00 fixup_var_refs_insn
1.03 633.94 10.95 906 0.01 0.01 store_motion
1.03 644.86 10.92 18177 0.00 0.00 fixup_var_refs
1.01 655.57 10.71 1468353297 0.00 0.00
splay_tree_compare_ints
1.00 666.14 10.57 15850095 0.00 0.00 walk_tree
full profile can be fetched from
http:://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/profile_34.gz
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-18 19:22 ` Richard Guenther
@ 2003-05-18 19:31 ` Richard Guenther
2003-05-18 19:35 ` Richard Guenther
0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:31 UTC (permalink / raw)
To: gcc
On Sun, 18 May 2003, Richard Guenther wrote:
> On Sun, 18 May 2003, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global CSE
> > (>300%!), loop analysis and branch prediction.
>
> Top of a profile of gcc3.4 is
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 7.47 79.18 79.18 906 0.09 0.09 compute_store_table
> 6.92 152.54 73.36 262785751 0.00 0.00 expr_equiv_p
> 6.76 224.21 71.67 170098 0.00 0.00 fixup_var_refs_insns
> 5.58 283.38 59.17 12865 0.00 0.00 loop_regs_scan
> 4.39 329.97 46.59 364279 0.00 0.00 compute_transp
> 4.10 373.46 43.49 308470718 0.00 0.00
> splay_tree_splay_helper
> 3.27 408.15 34.69 201411415 0.00 0.00 fixup_var_refs_1
> 3.02 440.22 32.07 192083288 0.00 0.00 true_dependence
> 2.88 470.76 30.54 232168274 0.00 0.00
> mems_in_disjoint_alias_sets_p
Compared to a 3.3 profile:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
8.08 45.30 45.30 272942 0.17 0.17 compute_transp
7.15 85.38 40.08 263012170 0.00 0.00
splay_tree_splay_helper
5.62 116.92 31.54 165540544 0.00 0.00 true_dependence
4.72 143.36 26.44 203228718 0.00 0.00
mems_in_disjoint_alias_sets_p
3.80 164.68 21.32 171556 0.12 0.12 clear_table
3.61 184.92 20.24 9131 2.22 2.69 loop_regs_update
3.46 204.34 19.42 3012195 0.01 0.01
invalid_mode_change_p
3.04 221.36 17.02 605912 0.03 0.04 htab_traverse
2.65 236.24 14.88 372632 0.04 0.04 alloc_page
2.05 247.74 11.50 3644058 0.00 0.01
gt_ggc_mx_lang_tree_node
1.81 257.88 10.14 4977 2.04 2.20 unroll_loop
1.80 267.97 10.09 1344607967 0.00 0.00
splay_tree_compare_ints
1.80 278.04 10.07 105168704 0.00 0.00
htab_find_slot_with_hash
1.59 286.94 8.90 5214165 0.00 0.00 loop_invariant_p
gcc3.4
> 6.92 152.54 73.36 262785751 0.00 0.00 expr_equiv_p
seems to be the culprit.
Full gcc3.3 profile at
http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/profile_33.gz
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-18 19:31 ` Richard Guenther
@ 2003-05-18 19:35 ` Richard Guenther
0 siblings, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:35 UTC (permalink / raw)
To: gcc
On Sun, 18 May 2003, Richard Guenther wrote:
> On Sun, 18 May 2003, Richard Guenther wrote:
>
> > Top of a profile of gcc3.4 is
> >
> > Flat profile:
> >
> > Each sample counts as 0.01 seconds.
> > % cumulative self self total
> > time seconds seconds calls s/call s/call name
> > 7.47 79.18 79.18 906 0.09 0.09 compute_store_table
> > 6.92 152.54 73.36 262785751 0.00 0.00 expr_equiv_p
> > 6.76 224.21 71.67 170098 0.00 0.00 fixup_var_refs_insns
> > 5.58 283.38 59.17 12865 0.00 0.00 loop_regs_scan
> > 4.39 329.97 46.59 364279 0.00 0.00 compute_transp
>
> Compared to a 3.3 profile:
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 8.08 45.30 45.30 272942 0.17 0.17 compute_transp
> 7.15 85.38 40.08 263012170 0.00 0.00
> splay_tree_splay_helper
> 5.62 116.92 31.54 165540544 0.00 0.00 true_dependence
> 4.72 143.36 26.44 203228718 0.00 0.00
> mems_in_disjoint_alias_sets_p
> 3.80 164.68 21.32 171556 0.12 0.12 clear_table
> 3.61 184.92 20.24 9131 2.22 2.69 loop_regs_update
Sorry for always talking with myself in the public, but doesnt this look
like store-motion being the culprit? Wasnt that re-enabled in 3.4 only
(despite -fverbose-asm listing -fgcse-sm for both 3.3 and 3.4)?
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
2003-05-18 19:22 ` Richard Guenther
@ 2003-05-19 16:18 ` Matt Austern
2003-05-19 19:01 ` Richard Guenther
2003-05-19 20:51 ` Richard Guenther
1 sibling, 2 replies; 11+ messages in thread
From: Matt Austern @ 2003-05-19 16:18 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc
On Sunday, May 18, 2003, at 08:45 AM, Richard Guenther wrote:
> Hi!
>
> As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> performance and performance of the resulting code. As always these
> comparisons are for a POOMA based scientific application.
>
> I experience a 100% compile time regression (673.50s -> 1284.48s) and
> a 12% runtime performance regression (150s -> 171s) when comparing
> gcc3.3 to HEAD.
>
> Time reports follow, the most prominent regressions are expand, global
> CSE
> (>300%!), loop analysis and branch prediction.
>
> Compile options are -ftemplate-depth-80 -fno-exceptions -O2
> -march=athlon
> -funroll-loops -fomit-frame-pointer
Do you see any compile time regressions at -O0?
(I'm asking for the obvious reason: trying to find out how much work we
need to do in the front end as opposed to the back end.)
--Matt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 16:18 ` Matt Austern
@ 2003-05-19 19:01 ` Richard Guenther
2003-05-19 20:51 ` Richard Guenther
1 sibling, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 19:01 UTC (permalink / raw)
To: Matt Austern; +Cc: gcc
On Mon, 19 May 2003, Matt Austern wrote:
> On Sunday, May 18, 2003, at 08:45 AM, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global
> > CSE
> > (>300%!), loop analysis and branch prediction.
> >
> > Compile options are -ftemplate-depth-80 -fno-exceptions -O2
> > -march=athlon
> > -funroll-loops -fomit-frame-pointer
>
> Do you see any compile time regressions at -O0?
>
> (I'm asking for the obvious reason: trying to find out how much work we
> need to do in the front end as opposed to the back end.)
For 3.3 with -O0 I get:
Execution times (seconds)
garbage collection : 13.19 ( 2%) usr 0.00 ( 0%) sys 12.50 ( 2%)
wall
cfg construction : 2.11 ( 0%) usr 0.02 ( 1%) sys 2.50 ( 0%)
wall
cfg cleanup : 0.83 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%)
wall
trivially dead code : 1.68 ( 0%) usr 0.00 ( 0%) sys 2.00 ( 0%)
wall
life analysis : 5.30 ( 1%) usr 0.01 ( 0%) sys 7.00 ( 1%)
wall
life info update : 1.46 ( 0%) usr 0.00 ( 0%) sys 1.00 ( 0%)
wall
preprocessing : 0.57 ( 0%) usr 0.24 ( 8%) sys 0.50 ( 0%)
wall
lexical analysis : 0.44 ( 0%) usr 0.18 ( 6%) sys 0.50 ( 0%)
wall
parser : 14.73 ( 2%) usr 0.54 (17%) sys 14.50 ( 2%)
wall
name lookup : 7.38 ( 1%) usr 0.95 (30%) sys 7.50 ( 1%)
wall
expand : 539.94 (81%) usr 0.54 (17%) sys 540.00 (81%)
wall
varconst : 0.72 ( 0%) usr 0.01 ( 0%) sys 1.00 ( 0%)
wall
integration : 19.82 ( 3%) usr 0.19 ( 6%) sys 19.50 ( 3%)
wall
jump : 2.51 ( 0%) usr 0.00 ( 0%) sys 3.50 ( 1%)
wall
flow analysis : 0.89 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
mode switching : 2.33 ( 0%) usr 0.01 ( 0%) sys 3.00 ( 0%)
wall
scheduling : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%)
wall
local alloc : 9.75 ( 1%) usr 0.01 ( 0%) sys 11.00 ( 2%)
wall
global alloc : 23.63 ( 4%) usr 0.15 ( 5%) sys 22.50 ( 3%)
wall
flow 2 : 1.59 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%)
wall
shorten branches : 2.15 ( 0%) usr 0.00 ( 0%) sys 1.50 ( 0%)
wall
reg stack : 1.48 ( 0%) usr 0.00 ( 0%) sys 3.00 ( 0%)
wall
final : 3.79 ( 1%) usr 0.31 (10%) sys 6.50 ( 1%)
wall
rest of compilation : 10.17 ( 2%) usr 0.02 ( 1%) sys 10.00 ( 1%)
wall
TOTAL : 666.48 3.19 670.50
For 3.4:
Execution times (seconds)
garbage collection : 12.27 ( 2%) usr 0.10 ( 0%) sys 12.38 ( 2%)
wall
cfg construction : 2.25 ( 0%) usr 0.10 ( 0%) sys 2.30 ( 0%)
wall
cfg cleanup : 0.99 ( 0%) usr 0.01 ( 0%) sys 0.89 ( 0%)
wall
trivially dead code : 1.31 ( 0%) usr 0.00 ( 0%) sys 1.39 ( 0%)
wall
life analysis : 5.39 ( 1%) usr 0.12 ( 1%) sys 5.65 ( 1%)
wall
life info update : 1.42 ( 0%) usr 0.00 ( 0%) sys 1.49 ( 0%)
wall
register scan : 1.34 ( 0%) usr 0.02 ( 0%) sys 1.33 ( 0%)
wall
rebuild jump labels : 1.22 ( 0%) usr 0.00 ( 0%) sys 1.23 ( 0%)
wall
preprocessing : 0.44 ( 0%) usr 0.17 ( 1%) sys 0.91 ( 0%)
wall
parser : 14.65 ( 2%) usr 0.65 ( 3%) sys 14.82 ( 2%)
wall
name lookup : 9.66 ( 1%) usr 0.77 ( 4%) sys 10.66 ( 1%)
wall
expand : 555.99 (78%) usr 11.15 (54%) sys 567.86 (78%)
wall
varconst : 0.73 ( 0%) usr 0.01 ( 0%) sys 0.71 ( 0%)
wall
integration : 17.85 ( 3%) usr 6.29 (30%) sys 24.09 ( 3%)
wall
jump : 0.79 ( 0%) usr 0.04 ( 0%) sys 0.86 ( 0%)
wall
flow analysis : 0.66 ( 0%) usr 0.00 ( 0%) sys 0.85 ( 0%)
wall
mode switching : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%)
wall
scheduling : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%)
wall
local alloc : 37.47 ( 5%) usr 0.03 ( 0%) sys 37.60 ( 5%)
wall
global alloc : 24.17 ( 3%) usr 0.80 ( 4%) sys 24.83 ( 3%)
wall
flow 2 : 1.53 ( 0%) usr 0.07 ( 0%) sys 1.56 ( 0%)
wall
machine dep reorg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%)
wall
shorten branches : 2.60 ( 0%) usr 0.04 ( 0%) sys 2.79 ( 0%)
wall
reg stack : 1.31 ( 0%) usr 0.00 ( 0%) sys 1.35 ( 0%)
wall
final : 4.38 ( 1%) usr 0.22 ( 1%) sys 4.62 ( 1%)
wall
symout : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%)
wall
rest of compilation : 10.13 ( 1%) usr 0.22 ( 1%) sys 9.96 ( 1%)
wall
TOTAL : 708.60 20.83 730.22
Thats about 60s difference in wall clock time. Is it true, that even at
-O0 -finline is set?
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 16:18 ` Matt Austern
2003-05-19 19:01 ` Richard Guenther
@ 2003-05-19 20:51 ` Richard Guenther
2003-05-19 21:04 ` Gabriel Dos Reis
1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 20:51 UTC (permalink / raw)
To: Matt Austern; +Cc: gcc
On Mon, 19 May 2003, Matt Austern wrote:
> On Sunday, May 18, 2003, at 08:45 AM, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global
> > CSE
> > (>300%!), loop analysis and branch prediction.
> >
> > Compile options are -ftemplate-depth-80 -fno-exceptions -O2
> > -march=athlon
> > -funroll-loops -fomit-frame-pointer
>
> Do you see any compile time regressions at -O0?
>
> (I'm asking for the obvious reason: trying to find out how much work we
> need to do in the front end as opposed to the back end.)
After killing all forced inlining, I get almost the same timings from 3.3
and 3.4, namely 42.50 and 44.89 seconds.
Slowdown comes from
3.3: name lookup : 6.92 (17%) usr 0.90 (43%) sys 7.50
(18%) wall
3.4: name lookup : 9.06 (22%) usr 0.81 (39%) sys 10.29
(23%) wall
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 20:51 ` Richard Guenther
@ 2003-05-19 21:04 ` Gabriel Dos Reis
2003-05-19 21:10 ` Richard Guenther
2003-05-19 21:24 ` Richard Guenther
0 siblings, 2 replies; 11+ messages in thread
From: Gabriel Dos Reis @ 2003-05-19 21:04 UTC (permalink / raw)
To: Richard Guenther; +Cc: Matt Austern, gcc
Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
| > Do you see any compile time regressions at -O0?
| >
| > (I'm asking for the obvious reason: trying to find out how much work we
| > need to do in the front end as opposed to the back end.)
|
| After killing all forced inlining, I get almost the same timings from 3.3
| and 3.4, namely 42.50 and 44.89 seconds.
|
| Slowdown comes from
|
| 3.3: name lookup : 6.92 (17%) usr 0.90 (43%) sys 7.50
| (18%) wall
| 3.4: name lookup : 9.06 (22%) usr 0.81 (39%) sys 10.29
| (23%) wall
I've also noticed that name lookup time has increased from 3.3 to 3.4,
probably mostly because now we're doing things more correctly and
partly because we didn't really take care to optimize it. It would be
interesting if you could report numbers for name lookup for 3.4:
* before I applied the name lookup
* after I applied it (i.e. cvs as of this moment)
What I've noted (and I posted figures) wkas that the patch I applied
cut the name lookup time about half on mainline, whereas I got at
least 20% on branch.
Thanks,
-- Gaby
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 21:04 ` Gabriel Dos Reis
@ 2003-05-19 21:10 ` Richard Guenther
2003-05-19 21:14 ` Gabriel Dos Reis
2003-05-19 21:24 ` Richard Guenther
1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 21:10 UTC (permalink / raw)
To: Gabriel Dos Reis; +Cc: Matt Austern, gcc
On 19 May 2003, Gabriel Dos Reis wrote:
> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> | > Do you see any compile time regressions at -O0?
> | >
> | > (I'm asking for the obvious reason: trying to find out how much work we
> | > need to do in the front end as opposed to the back end.)
> |
> | After killing all forced inlining, I get almost the same timings from 3.3
> | and 3.4, namely 42.50 and 44.89 seconds.
> |
> | Slowdown comes from
> |
> | 3.3: name lookup : 6.92 (17%) usr 0.90 (43%) sys 7.50
> | (18%) wall
> | 3.4: name lookup : 9.06 (22%) usr 0.81 (39%) sys 10.29
> | (23%) wall
>
> I've also noticed that name lookup time has increased from 3.3 to 3.4,
> probably mostly because now we're doing things more correctly and
> partly because we didn't really take care to optimize it. It would be
> interesting if you could report numbers for name lookup for 3.4:
>
> * before I applied the name lookup
> * after I applied it (i.e. cvs as of this moment)
>
> What I've noted (and I posted figures) wkas that the patch I applied
> cut the name lookup time about half on mainline, whereas I got at
> least 20% on branch.
If you mean
2003-05-18 Gabriel Dos Reis <gdr@integrable-solutions.net>
* cp-tree.h (struct lang_type_class): Replace data member tags
with hash-table nested_udts.
(CLASSTYPE_NESTED_UTDS): Rename from CLASSTYPE_TAGS.
[...]
this is already included in the numbers. I.e. the 3.4 numbers are from CVS
about 2 hours ago.
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 21:10 ` Richard Guenther
@ 2003-05-19 21:14 ` Gabriel Dos Reis
0 siblings, 0 replies; 11+ messages in thread
From: Gabriel Dos Reis @ 2003-05-19 21:14 UTC (permalink / raw)
To: Richard Guenther; +Cc: Matt Austern, gcc
Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
| On 19 May 2003, Gabriel Dos Reis wrote:
|
| > Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
| >
| > | > Do you see any compile time regressions at -O0?
| > | >
| > | > (I'm asking for the obvious reason: trying to find out how much work we
| > | > need to do in the front end as opposed to the back end.)
| > |
| > | After killing all forced inlining, I get almost the same timings from 3.3
| > | and 3.4, namely 42.50 and 44.89 seconds.
| > |
| > | Slowdown comes from
| > |
| > | 3.3: name lookup : 6.92 (17%) usr 0.90 (43%) sys 7.50
| > | (18%) wall
| > | 3.4: name lookup : 9.06 (22%) usr 0.81 (39%) sys 10.29
| > | (23%) wall
| >
| > I've also noticed that name lookup time has increased from 3.3 to 3.4,
| > probably mostly because now we're doing things more correctly and
| > partly because we didn't really take care to optimize it. It would be
| > interesting if you could report numbers for name lookup for 3.4:
| >
| > * before I applied the name lookup
| > * after I applied it (i.e. cvs as of this moment)
| >
| > What I've noted (and I posted figures) wkas that the patch I applied
| > cut the name lookup time about half on mainline, whereas I got at
| > least 20% on branch.
|
| If you mean
|
| 2003-05-18 Gabriel Dos Reis <gdr@integrable-solutions.net>
|
| * cp-tree.h (struct lang_type_class): Replace data member tags
| with hash-table nested_udts.
| (CLASSTYPE_NESTED_UTDS): Rename from CLASSTYPE_TAGS.
| [...]
|
| this is already included in the numbers. I.e. the 3.4 numbers are from CVS
| about 2 hours ago.
Thanks. Do you have numbers for say, mainline as of yesterday?
-- Gaby
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Huge compile time & run time performance regression 3.3 -> HEAD
2003-05-19 21:04 ` Gabriel Dos Reis
2003-05-19 21:10 ` Richard Guenther
@ 2003-05-19 21:24 ` Richard Guenther
1 sibling, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 21:24 UTC (permalink / raw)
To: Gabriel Dos Reis; +Cc: Richard Guenther, Matt Austern, gcc
On 19 May 2003, Gabriel Dos Reis wrote:
> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> | > Do you see any compile time regressions at -O0?
> | >
> | > (I'm asking for the obvious reason: trying to find out how much work we
> | > need to do in the front end as opposed to the back end.)
> |
> | After killing all forced inlining, I get almost the same timings from 3.3
> | and 3.4, namely 42.50 and 44.89 seconds.
> |
> | Slowdown comes from
> |
> | 3.3: name lookup : 6.92 (17%) usr 0.90 (43%) sys 7.50
> | (18%) wall
> | 3.4: name lookup : 9.06 (22%) usr 0.81 (39%) sys 10.29
> | (23%) wall
>
> I've also noticed that name lookup time has increased from 3.3 to 3.4,
> probably mostly because now we're doing things more correctly and
> partly because we didn't really take care to optimize it. It would be
> interesting if you could report numbers for name lookup for 3.4:
>
> * before I applied the name lookup
> * after I applied it (i.e. cvs as of this moment)
With g++ (GCC) 3.4 20030505 (experimental) I get a total time of 46.37 and
name lookup : 9.42 (23%) usr 0.93 (40%) sys 10.20 (22%)
wall
Richard.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-05-19 21:14 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
2003-05-18 19:22 ` Richard Guenther
2003-05-18 19:31 ` Richard Guenther
2003-05-18 19:35 ` Richard Guenther
2003-05-19 16:18 ` Matt Austern
2003-05-19 19:01 ` Richard Guenther
2003-05-19 20:51 ` Richard Guenther
2003-05-19 21:04 ` Gabriel Dos Reis
2003-05-19 21:10 ` Richard Guenther
2003-05-19 21:14 ` Gabriel Dos Reis
2003-05-19 21:24 ` Richard Guenther
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).