* Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
@ 2004-08-31 9:58 Karel Gardas
2004-08-31 10:12 ` Steven Bosscher
2004-09-01 11:18 ` Giovanni Bajo
0 siblings, 2 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 9:58 UTC (permalink / raw)
To: GCC Mailing List
Hello,
several times promised here are finally the results obtained for
yesterday's main-trunk and -O0/1/2 compilations (whole table is below)
As I've already reported -O0 is better, which is great! And O1 and O2 are
slower for about 8.5% and 7%.
Interesting files seem to be:
1) typecode.cc: 40% regression on O1 while 7% speedup on O2
2) orb.cc: 10% seepdup on O0, 16% regression on O1 and only 1.2%
regression on O2
3) basic_seq.cc: 10%, 20% and 33% regressions on O0/1/2
4) static.cc: 1, 24 and 27% regression on O0/1/2
5) valuetype_impl.cc: 12 and 23% regression on O1/2
So you see that some files' biggest regression is on O1 and on other files
on O2.
Also the biggest regression are (not counting very short compilations of
uni_*.cc files):
-O0: 10% basic_seq.cc
-O1: 40% typecode.cc, 24% and 28% static.cc and pi_impl.cc
-O2: 33% basic_seq.cc and following with 27% static.cc
Anything other what should I provide to help you with these issues?
Especially please have a look into table and choose your "interesting file
for preprocessing" candidate which I will then upload to PR#13776.
Thanks and especially thanks for appreciable progress on O0!
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
os-unix.cc 4.14 4.09 1.22 4.47 4.7 -4.89 4.55 4.97 -8.45
dii.cc 12.8 11.76 8.84 13.97 15.7 -11.02 17 18.59 -8.55
typecode.cc 9.11 9.42 -3.29 13.16 22.06 -40.34 32.25 30.05 7.32
any.cc 6.88 6.69 2.84 9.14 10.91 -16.22 12.94 13.87 -6.71
codec.cc 5.9 5.74 2.79 7.45 8.6 -13.37 9.29 11.1 -16.31
buffer.cc 3.34 3.31 0.91 3.52 3.64 -3.3 3.62 3.93 -7.89
context.cc 3.51 3.57 -1.68 3.83 4.41 -13.15 4.16 4.77 -12.79
except.cc 4.34 4.25 2.12 4.97 5.12 -2.93 6.05 6.27 -3.51
dispatch.cc 4.4 4.46 -1.35 5.24 5.1 2.75 4.95 5.64 -12.23
string.cc 3.35 3.26 2.76 3.5 3.47 0.86 3.4 3.6 -5.56
object.cc 4.69 4.76 -1.47 5.87 7 -16.14 7.01 8.07 -13.14
address.cc 5.26 4.93 6.69 6.43 6.83 -5.86 7.22 7.63 -5.37
ior.cc 12.48 11.35 9.96 14.81 15.31 -3.27 16.99 17.46 -2.69
orb.cc 16.81 15.3 9.87 25.62 30.52 -16.06 37.07 37.52 -1.2
boa.cc 9.22 8.48 8.73 11.74 13.16 -10.79 14.11 15.87 -11.09
dsi.cc 10.31 9.13 12.92 11.69 11.73 -0.34 12.57 13.19 -4.7
transport.cc 4.06 3.96 2.53 4.35 4.33 0.46 4.47 4.64 -3.66
t..port/tcp.cc 4.02 3.9 3.08 4.37 4.26 2.58 4.39 4.55 -3.52
t..port/udp.cc 4.11 4.02 2.24 4.47 4.45 0.45 4.65 4.79 -2.92
t..port/unix.cc 4.06 3.89 4.37 4.31 4.21 2.38 4.31 4.51 -4.43
iop.cc 16.43 15.03 9.31 22.25 25.39 -12.37 29.03 32.78 -11.44
util.cc 5.97 6 -0.5 7.79 10.07 -22.64 10.06 11.94 -15.75
basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
fast_array.cc 3.89 3.74 4.01 3.95 3.88 1.8 3.87 4.07 -4.91
ssl.cc 9.29 7.73 20.18 9.25 7.84 17.98 8.99 7.91 13.65
fixed.cc 3.75 3.73 0.54 4.08 4.34 -5.99 4.22 4.85 -12.99
intercept.cc 10.27 9.5 8.11 11.64 12.31 -5.44 12.24 14.19 -13.74
codeset.cc 5.96 5.72 4.2 7.3 8.37 -12.78 9.88 10.87 -9.11
queue.cc 4.35 4.53 -3.97 4.68 5.27 -11.2 4.97 5.84 -14.9
static.cc 20.26 20.63 -1.79 24.42 32.31 -24.42 29.12 40.06 -27.31
current.cc 8.91 7.39 20.57 8.78 7.49 17.22 8.67 7.56 14.68
policy_impl.cc 12.7 11.96 6.19 13.65 14.62 -6.63 15.43 16.76 -7.94
service_info.cc 8.84 7.33 20.6 8.87 7.48 18.58 8.51 7.55 12.72
ioptypes.cc 10.69 9.46 13 12.76 12.69 0.55 13.66 14.52 -5.92
ssliop.cc 9.01 7.57 19.02 9.11 7.62 19.55 8.62 7.64 12.83
value.cc 11.27 9.31 21.05 12.08 11.11 8.73 12.36 12.17 1.56
valuetype.cc 9.96 8.48 17.45 10.59 9.7 9.18 10.92 10.64 2.63
v..type_impl.cc 12.47 12.19 2.3 13.12 14.93 -12.12 13.43 17.46 -23.08
dynany_impl.cc 10.61 10.14 4.64 15.94 20.11 -20.74 23 25.82 -10.92
policy2.cc 9.1 7.62 19.42 9.14 7.85 16.43 9.01 7.91 13.91
tckind.cc 8.77 7.33 19.65 8.82 7.39 19.35 8.56 7.42 15.36
orb_excepts.cc 9.01 7.51 19.97 9.05 7.67 17.99 8.87 7.84 13.14
policy.cc 8.96 7.47 19.95 9.09 7.64 18.98 8.83 7.87 12.2
poa.cc 13.07 11.51 13.55 15.24 14.84 2.7 17.67 17.62 0.28
poa_base.cc 10.22 8.88 15.09 10.77 10.13 6.32 11.54 11.13 3.68
poa_impl.cc 17.42 16.2 7.53 22.82 25.91 -11.93 29.78 32.73 -9.01
dynany.cc 10.26 8.83 16.19 10.81 10.21 5.88 11.72 11.06 5.97
uni_base64.cc 0.12 0.12 0 0.17 0.21 -19.05 0.25 0.28 -10.71
uni_unicode.cc 0.2 0.21 -4.76 0.28 0.36 -22.22 0.43 0.51 -15.69
uni_fromuni.cc 0.4 0.43 -6.98 0.58 0.82 -29.27 1.1 1.32 -16.67
uni_touni.cc 0.43 0.47 -8.51 0.69 0.96 -28.13 1.21 1.41 -14.18
except2.cc 6.73 6.16 9.25 10.03 10.03 0 12.98 12.54 3.51
pi.cc 11.48 9.48 21.1 12.59 11.91 5.71 13.25 13.4 -1.12
pi_impl.cc 18.92 18.96 -0.21 23.3 30.73 -24.18 30.53 37.56 -18.72
typecode_seq.cc 9.15 8.15 12.27 9.56 8.64 10.65 9.3 9.02 3.1
timebase.cc 8.78 7.53 16.6 8.94 7.45 20 8.63 7.66 12.66
ir.cc 46.58 48.62 -4.2 70.96 87.47 -18.88 97.81 114.45 -14.54
ir_base.cc 11.57 10.14 14.1 13.49 15.37 -12.23 15.67 17.76 -11.77
imr.cc 14.34 13.85 3.54 18.6 20.62 -9.8 24.84 25.31 -1.86
mtdebug.cc 3.72 3.72 0 3.95 3.77 4.77 3.69 3.82 -3.4
Sum 530.42 494.11 7.35 636.03 696.01 -8.62 767.47 827.99 -7.31
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
@ 2004-08-31 10:12 ` Steven Bosscher
2004-08-31 10:28 ` Karel Gardas
2004-09-01 11:18 ` Giovanni Bajo
1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:12 UTC (permalink / raw)
To: Karel Gardas, GCC Mailing List
On Tuesday 31 August 2004 11:11, Karel Gardas wrote:
> Hello,
>
> several times promised here are finally the results obtained for
> yesterday's main-trunk and -O0/1/2 compilations (whole table is below)
>
> As I've already reported -O0 is better, which is great! And O1 and O2 are
> slower for about 8.5% and 7%.
>
> Interesting files seem to be:
>
> 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
Can you show us the time report for the 40% regression?
Gr.
Steven
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:12 ` Steven Bosscher
@ 2004-08-31 10:28 ` Karel Gardas
2004-08-31 10:44 ` Paolo Bonzini
0 siblings, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 10:28 UTC (permalink / raw)
To: Steven Bosscher; +Cc: GCC Mailing List
On Tue, 31 Aug 2004, Steven Bosscher wrote:
> On Tuesday 31 August 2004 11:11, Karel Gardas wrote:
> > Hello,
> >
> > several times promised here are finally the results obtained for
> > yesterday's main-trunk and -O0/1/2 compilations (whole table is below)
> >
> > As I've already reported -O0 is better, which is great! And O1 and O2 are
> > slower for about 8.5% and 7%.
> >
> > Interesting files seem to be:
> >
> > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>
> Can you show us the time report for the 40% regression?
Here we go.
Execution times (seconds)
garbage collection : 0.52 ( 2%) usr 0.00 ( 0%) sys 0.53 ( 2%) wall
callgraph construction: 0.19 ( 1%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall
callgraph optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
cfg cleanup : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall
trivially dead code : 0.12 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall
life analysis : 0.97 ( 4%) usr 0.00 ( 0%) sys 0.84 ( 3%) wall
life info update : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
alias analysis : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.14 ( 1%) wall
register scan : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall
rebuild jump labels : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
preprocessing : 0.48 ( 2%) usr 0.23 (12%) sys 0.57 ( 2%) wall
parser : 3.93 (17%) usr 0.58 (30%) sys 4.67 (18%) wall
name lookup : 1.09 ( 5%) usr 0.46 (24%) sys 1.79 ( 7%) wall
integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall
tree gimplify : 0.60 ( 3%) usr 0.04 ( 2%) sys 0.60 ( 2%) wall
tree eh : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
tree CFG construction : 0.10 ( 0%) usr 0.03 ( 2%) sys 0.13 ( 0%) wall
tree CFG cleanup : 0.21 ( 1%) usr 0.01 ( 1%) sys 0.14 ( 1%) wall
tree PTA : 0.20 ( 1%) usr 0.00 ( 0%) sys 0.26 ( 1%) wall
tree alias analysis : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.37 ( 1%) wall
tree PHI insertion : 0.42 ( 2%) usr 0.01 ( 1%) sys 0.50 ( 2%) wall
tree SSA rewrite : 0.58 ( 3%) usr 0.00 ( 0%) sys 0.71 ( 3%) wall
tree SSA other : 0.82 ( 4%) usr 0.12 ( 6%) sys 0.98 ( 4%) wall
tree operand scan : 0.59 ( 3%) usr 0.16 ( 8%) sys 0.98 ( 4%) wall
dominator optimization: 1.48 ( 6%) usr 0.02 ( 1%) sys 1.50 ( 6%) wall
tree SRA : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
tree CCP : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall
tree split crit edges : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
tree PRE : 0.41 ( 2%) usr 0.01 ( 1%) sys 0.41 ( 2%) wall
tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
tree conservative DCE : 0.30 ( 1%) usr 0.01 ( 1%) sys 0.28 ( 1%) wall
tree aggressive DCE : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
tree DSE : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall
loop invariant motion : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.22 ( 1%) wall
tree copy headers : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall
tree SSA to normal : 0.26 ( 1%) usr 0.01 ( 1%) sys 0.36 ( 1%) wall
tree rename SSA copies: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
dominance frontiers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall
varconst : 0.08 ( 0%) usr 0.02 ( 1%) sys 0.09 ( 0%) wall
jump : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
CSE : 0.58 ( 3%) usr 0.00 ( 0%) sys 0.55 ( 2%) wall
loop analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
branch prediction : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
flow analysis : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall
combiner : 0.55 ( 2%) usr 0.00 ( 0%) sys 0.64 ( 2%) wall
if-conversion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
local alloc : 0.30 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall
global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall
reload CSE regs : 0.31 ( 1%) usr 0.00 ( 0%) sys 0.28 ( 1%) wall
flow 2 : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
if-conversion 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
rename registers : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall
machine dep reorg : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall
shorten branches : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall
final : 0.26 ( 1%) usr 0.01 ( 1%) sys 0.26 ( 1%) wall
symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
rest of compilation : 0.14 ( 1%) usr 0.01 ( 1%) sys 0.19 ( 1%) wall
TOTAL : 23.12 1.91 26.21
# cc1plus 23.13 1.93
# as 0.34 0.02
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:28 ` Karel Gardas
@ 2004-08-31 10:44 ` Paolo Bonzini
2004-08-31 10:46 ` Karel Gardas
0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2004-08-31 10:44 UTC (permalink / raw)
To: Karel Gardas; +Cc: GCC Mailing List
>>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>>
>>Can you show us the time report for the 40% regression?
Also for 3.4.1?
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:44 ` Paolo Bonzini
@ 2004-08-31 10:46 ` Karel Gardas
2004-08-31 10:49 ` Steven Bosscher
2004-08-31 10:55 ` Steven Bosscher
0 siblings, 2 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 10:46 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: GCC Mailing List
On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> >>
> >>Can you show us the time report for the 40% regression?
>
> Also for 3.4.1?
Sure!
Execution times (seconds)
garbage collection : 0.79 ( 6%) usr 0.00 ( 0%) sys 0.84 ( 5%) wall
cfg construction : 0.09 ( 1%) usr 0.00 ( 0%) sys 0.11 ( 1%) wall
cfg cleanup : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.16 ( 1%) wall
trivially dead code : 0.10 ( 1%) usr 0.01 ( 0%) sys 0.14 ( 1%) wall
life analysis : 0.80 ( 6%) usr 0.00 ( 0%) sys 0.85 ( 5%) wall
life info update : 0.08 ( 1%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall
alias analysis : 0.16 ( 1%) usr 0.00 ( 0%) sys 0.21 ( 1%) wall
register scan : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall
rebuild jump labels : 0.07 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall
preprocessing : 0.44 ( 3%) usr 0.21 (10%) sys 0.65 ( 4%) wall
parser : 4.41 (31%) usr 0.67 (31%) sys 5.22 (31%) wall
name lookup : 1.61 (11%) usr 1.17 (53%) sys 2.90 (17%) wall
expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) wall
varconst : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 1%) wall
integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall
jump : 0.05 ( 0%) usr 0.02 ( 1%) sys 0.02 ( 0%) wall
CSE : 0.49 ( 3%) usr 0.00 ( 0%) sys 0.46 ( 3%) wall
loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
branch prediction : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.16 ( 1%) wall
flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
combiner : 0.34 ( 2%) usr 0.00 ( 0%) sys 0.42 ( 2%) wall
if-conversion : 0.09 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
mode switching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
local alloc : 0.34 ( 2%) usr 0.00 ( 0%) sys 0.28 ( 2%) wall
global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall
reload CSE regs : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 1%) wall
flow 2 : 0.18 ( 1%) usr 0.01 ( 0%) sys 0.12 ( 1%) wall
if-conversion 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
rename registers : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.10 ( 1%) wall
machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
shorten branches : 0.16 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 1%) wall
final : 0.28 ( 2%) usr 0.01 ( 0%) sys 0.43 ( 3%) wall
symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
rest of compilation : 0.42 ( 3%) usr 0.02 ( 1%) sys 0.49 ( 3%) wall
TOTAL : 14.25 2.19 16.89
# cc1plus 14.26 2.21
# as 0.34 0.02
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:46 ` Karel Gardas
@ 2004-08-31 10:49 ` Steven Bosscher
2004-08-31 11:00 ` Paolo Bonzini
2004-08-31 10:55 ` Steven Bosscher
1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:49 UTC (permalink / raw)
To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List
On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > >>
> > >>Can you show us the time report for the 40% regression?
> >
> > Also for 3.4.1?
3.4.1:
expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%)
3.5.0-HEAD:
expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall
I wonder why this is. I would have expected it to be the other
way around...
Gr.
Steven
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:46 ` Karel Gardas
2004-08-31 10:49 ` Steven Bosscher
@ 2004-08-31 10:55 ` Steven Bosscher
2004-08-31 13:57 ` Karel Gardas
1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:55 UTC (permalink / raw)
To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List
On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > >>
> > >>Can you show us the time report for the 40% regression?
> >
> > Also for 3.4.1?
>
> Sure!
Hmm... No obvious hot spots eh?
Looks like the tree optimizers are to blame. We spend roughly the same
amount of time in the post-GIMPLE passes, and we spend >7.5s in the tree
optimizers. The total slowdown you measured was ~8.9s. The other 1.4s
are spent in expand as shown in the previous message:
3.4.1: expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) wall
3.5.0: expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall
Hmm, we should probably disable at least flag_thread_jumps and
flag_loop_optimize at -O1, and perhaps consider disabling some
of the more expensive (parts of the) tree optimizers... And
of course see if it makes sense to disable a few RTL optimizers.
So, looks like a tuning problem to me, not really a slowdown that
indicates something algorithmic being really wrong.
Gr.
Steven
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:49 ` Steven Bosscher
@ 2004-08-31 11:00 ` Paolo Bonzini
2004-08-31 11:24 ` Steven Bosscher
2004-08-31 12:48 ` Karel Gardas
0 siblings, 2 replies; 18+ messages in thread
From: Paolo Bonzini @ 2004-08-31 11:00 UTC (permalink / raw)
To: Steven Bosscher, Karel Gardas; +Cc: GCC Mailing List
> 3.4.1:
> expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%)
>
> 3.5.0-HEAD:
> expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall
Also:
3.4.1:
integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall
global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall
3.5.0-HEAD:
integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall
global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall
This is overall +0.5 seconds, which another 4%. And then:
DOM: 1.48 ( 6%) usr 0.02 ( 1%) sys 1.50 ( 6%) wall
There are quite high times for "tree SSA other", "tree conservative
DCE", "tree SSA rewrite" too.
Note that the parser and name lookup have indeed become faster which is
the result of Mark's work and part of the reason why -O0 is faster.
The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is
helping a lot.
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 11:00 ` Paolo Bonzini
@ 2004-08-31 11:24 ` Steven Bosscher
2004-08-31 19:30 ` Mike Stump
2004-08-31 12:48 ` Karel Gardas
1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 11:24 UTC (permalink / raw)
To: Paolo Bonzini, Karel Gardas; +Cc: GCC Mailing List
On Tuesday 31 August 2004 12:50, Paolo Bonzini wrote:
> > 3.4.1:
> > expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%)
> >
> > 3.5.0-HEAD:
> > expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%)
> > wall
>
> Also:
>
> 3.4.1:
> integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall
> global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall
>
> 3.5.0-HEAD:
> integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall
> global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall
>
> This is overall +0.5 seconds, which another 4%. And then:
This may also just be noise. Some passes run so fast that the
time vars are not accurate enough to record it. You'll see that
for bodies of code with many small functions, -ftime-report will
give a very different TOTAL than /usr/bin/time ;-)
> The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is
> helping a lot.
Rather the other way around, since GCC 3.5 has -funit-at-a-time
enabled for C++ at -O0.
Gr.
Steven
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 11:00 ` Paolo Bonzini
2004-08-31 11:24 ` Steven Bosscher
@ 2004-08-31 12:48 ` Karel Gardas
2004-09-01 7:18 ` Paolo Bonzini
1 sibling, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 12:48 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Steven Bosscher, GCC Mailing List
On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is
> helping a lot.
Here are reports for -O2 for both trunk and gcc3.4.1:
Trunk:
Execution times (seconds)
garbage collection : 1.21 ( 4%) usr 0.00 ( 0%) sys 1.23 ( 4%) wall
callgraph construction: 0.19 ( 1%) usr 0.01 ( 1%) sys 0.20 ( 1%) wall
callgraph optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
cfg cleanup : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall
trivially dead code : 0.38 ( 1%) usr 0.01 ( 1%) sys 0.34 ( 1%) wall
life analysis : 0.76 ( 2%) usr 0.01 ( 1%) sys 0.68 ( 2%) wall
life info update : 0.35 ( 1%) usr 0.00 ( 0%) sys 0.44 ( 1%) wall
alias analysis : 0.38 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall
register scan : 0.31 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 1%) wall
rebuild jump labels : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
preprocessing : 0.38 ( 1%) usr 0.17 ( 9%) sys 0.56 ( 2%) wall
parser : 3.89 (13%) usr 0.53 (27%) sys 4.81 (14%) wall
name lookup : 1.25 ( 4%) usr 0.58 (30%) sys 1.66 ( 5%) wall
integration : 0.86 ( 3%) usr 0.00 ( 0%) sys 0.95 ( 3%) wall
tree gimplify : 0.53 ( 2%) usr 0.03 ( 2%) sys 0.59 ( 2%) wall
tree eh : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
tree CFG construction : 0.16 ( 1%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall
tree CFG cleanup : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 1%) wall
tree PTA : 0.23 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
tree alias analysis : 0.43 ( 1%) usr 0.01 ( 1%) sys 0.51 ( 2%) wall
tree PHI insertion : 0.59 ( 2%) usr 0.02 ( 1%) sys 0.56 ( 2%) wall
tree SSA rewrite : 0.68 ( 2%) usr 0.00 ( 0%) sys 0.72 ( 2%) wall
tree SSA other : 1.14 ( 4%) usr 0.11 ( 6%) sys 1.41 ( 4%) wall
tree operand scan : 0.75 ( 2%) usr 0.15 ( 8%) sys 0.77 ( 2%) wall
dominator optimization: 1.79 ( 6%) usr 0.06 ( 3%) sys 1.69 ( 5%) wall
tree SRA : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
tree CCP : 0.24 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%) wall
tree split crit edges : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
tree PRE : 0.42 ( 1%) usr 0.02 ( 1%) sys 0.52 ( 2%) wall
tree forward propagate: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
tree conservative DCE : 0.32 ( 1%) usr 0.01 ( 1%) sys 0.33 ( 1%) wall
tree aggressive DCE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
tree DSE : 0.29 ( 1%) usr 0.00 ( 0%) sys 0.38 ( 1%) wall
loop invariant motion : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall
tree copy headers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
tree SSA to normal : 0.28 ( 1%) usr 0.02 ( 1%) sys 0.34 ( 1%) wall
tree rename SSA copies: 0.10 ( 0%) usr 0.01 ( 1%) sys 0.11 ( 0%) wall
dominance frontiers : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
expand : 2.25 ( 7%) usr 0.02 ( 1%) sys 2.33 ( 7%) wall
varconst : 0.09 ( 0%) usr 0.03 ( 2%) sys 0.11 ( 0%) wall
jump : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
CSE : 1.72 ( 6%) usr 0.01 ( 1%) sys 1.86 ( 5%) wall
loop analysis : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall
global CSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
CPROP 1 : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
PRE : 0.37 ( 1%) usr 0.01 ( 1%) sys 0.43 ( 1%) wall
CPROP 2 : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
LSM : 0.35 ( 1%) usr 0.01 ( 1%) sys 0.38 ( 1%) wall
bypass jumps : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 1%) wall
web : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
CSE 2 : 0.94 ( 3%) usr 0.00 ( 0%) sys 0.94 ( 3%) wall
branch prediction : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.23 ( 1%) wall
flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
combiner : 0.55 ( 2%) usr 0.00 ( 0%) sys 0.54 ( 2%) wall
if-conversion : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
regmove : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
local alloc : 0.43 ( 1%) usr 0.01 ( 1%) sys 0.49 ( 1%) wall
global alloc : 1.16 ( 4%) usr 0.00 ( 0%) sys 1.14 ( 3%) wall
reload CSE regs : 0.53 ( 2%) usr 0.01 ( 1%) sys 0.46 ( 1%) wall
flow 2 : 0.11 ( 0%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall
if-conversion 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
rename registers : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
scheduling 2 : 0.81 ( 3%) usr 0.00 ( 0%) sys 0.97 ( 3%) wall
machine dep reorg : 0.24 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall
reorder blocks : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
shorten branches : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
final : 0.31 ( 1%) usr 0.03 ( 2%) sys 0.32 ( 1%) wall
symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
rest of compilation : 0.15 ( 0%) usr 0.01 ( 1%) sys 0.14 ( 0%) wall
TOTAL : 31.01 1.94 33.84
# cc1plus 31.02 1.97
# as 0.36 0.02
GCC 3.4.1:
Execution times (seconds)
garbage collection : 1.20 ( 4%) usr 0.00 ( 0%) sys 1.22 ( 3%) wall
callgraph construction: 0.14 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
callgraph optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
cfg construction : 0.40 ( 1%) usr 0.00 ( 0%) sys 0.43 ( 1%) wall
cfg cleanup : 0.46 ( 1%) usr 0.02 ( 1%) sys 0.44 ( 1%) wall
trivially dead code : 0.48 ( 1%) usr 0.00 ( 0%) sys 0.53 ( 1%) wall
life analysis : 0.74 ( 2%) usr 0.00 ( 0%) sys 0.76 ( 2%) wall
life info update : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.39 ( 1%) wall
alias analysis : 0.68 ( 2%) usr 0.00 ( 0%) sys 0.59 ( 2%) wall
register scan : 0.40 ( 1%) usr 0.01 ( 0%) sys 0.37 ( 1%) wall
rebuild jump labels : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall
preprocessing : 0.47 ( 1%) usr 0.18 ( 8%) sys 0.66 ( 2%) wall
parser : 4.21 (13%) usr 0.76 (33%) sys 5.18 (14%) wall
name lookup : 1.41 ( 4%) usr 0.99 (42%) sys 2.34 ( 6%) wall
expand : 8.79 (26%) usr 0.01 ( 0%) sys 8.90 (24%) wall
varconst : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
integration : 1.54 ( 5%) usr 0.00 ( 0%) sys 1.78 ( 5%) wall
jump : 0.69 ( 2%) usr 0.11 ( 5%) sys 0.70 ( 2%) wall
CSE : 2.72 ( 8%) usr 0.00 ( 0%) sys 2.84 ( 8%) wall
global CSE : 2.13 ( 6%) usr 0.12 ( 5%) sys 2.41 ( 7%) wall
loop analysis : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
bypass jumps : 0.23 ( 1%) usr 0.02 ( 1%) sys 0.37 ( 1%) wall
CSE 2 : 0.80 ( 2%) usr 0.00 ( 0%) sys 0.78 ( 2%) wall
branch prediction : 0.46 ( 1%) usr 0.00 ( 0%) sys 0.54 ( 1%) wall
flow analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
combiner : 0.44 ( 1%) usr 0.00 ( 0%) sys 0.43 ( 1%) wall
if-conversion : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
regmove : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall
local alloc : 0.37 ( 1%) usr 0.02 ( 1%) sys 0.39 ( 1%) wall
global alloc : 0.88 ( 3%) usr 0.02 ( 1%) sys 0.95 ( 3%) wall
reload CSE regs : 0.41 ( 1%) usr 0.01 ( 0%) sys 0.49 ( 1%) wall
flow 2 : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall
if-conversion 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
rename registers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall
scheduling 2 : 0.67 ( 2%) usr 0.03 ( 1%) sys 0.64 ( 2%) wall
reorder blocks : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
shorten branches : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.10 ( 0%) wall
final : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%) wall
symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
rest of compilation : 0.93 ( 3%) usr 0.01 ( 0%) sys 0.90 ( 2%) wall
TOTAL : 33.50 2.33 36.75
# cc1plus 33.50 2.35
# as 0.29 0.02
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 10:55 ` Steven Bosscher
@ 2004-08-31 13:57 ` Karel Gardas
0 siblings, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 13:57 UTC (permalink / raw)
To: Steven Bosscher; +Cc: Paolo Bonzini, GCC Mailing List
On Tue, 31 Aug 2004, Steven Bosscher wrote:
> On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> > On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > > >>
> > > >>Can you show us the time report for the 40% regression?
> > >
> > > Also for 3.4.1?
> >
> > Sure!
>
> Hmm... No obvious hot spots eh?
>
BTW: gcc3.4.1 consumes about 66MB of RAM to compile this file, while trunk
consumes about 98MB to compile it and also testing box is pIII mobile with
only 256kb cache, so higher memory usage also might add something to the
regression...
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 11:24 ` Steven Bosscher
@ 2004-08-31 19:30 ` Mike Stump
0 siblings, 0 replies; 18+ messages in thread
From: Mike Stump @ 2004-08-31 19:30 UTC (permalink / raw)
To: Steven Bosscher; +Cc: Paolo Bonzini, Karel Gardas, GCC Mailing List
On Aug 31, 2004, at 3:49 AM, Steven Bosscher wrote:
> This may also just be noise. Some passes run so fast that the
> time vars are not accurate enough to record it.
We at apple use ~10ns clocks to record times... works better... Only
problem, getting user/sys time out of the kernel into user space, so
that merely grabbing time isn't costly. :-( Not really a problem, as
wall is the only thing you can trust, and what matters the most anyway.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 12:48 ` Karel Gardas
@ 2004-09-01 7:18 ` Paolo Bonzini
0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2004-09-01 7:18 UTC (permalink / raw)
To: Karel Gardas; +Cc: Steven Bosscher, GCC Mailing List
3.4.1 seems to have a problem with the expander at -O2:
> expand : 8.79 (26%) usr 0.01 ( 0%) sys 8.90 (24%) wall
So this 3.4.1 bug is somehow work around by gimplification and tree
optimization, and this masks the 40% difference (which would still be
there at -O2 if it weren't for the improvement in expander time).
Paolo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
2004-08-31 10:12 ` Steven Bosscher
@ 2004-09-01 11:18 ` Giovanni Bajo
2004-09-02 9:41 ` Karel Gardas
2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
1 sibling, 2 replies; 18+ messages in thread
From: Giovanni Bajo @ 2004-09-01 11:18 UTC (permalink / raw)
To: Karel Gardas; +Cc: gcc
Karel Gardas wrote:
> 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
Can you please file a new bugreport with this -O1 regression, attacching this
preprocessed testcase and the time reports to it? Also link Steven's message in
it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the
analysys of this.
Then we can set that the new bug blocks PR 13776.
I think it is better to track these issues with different PRs, and just
connects them to PR 13776 (which is quite confusing at this point) just with
the Bugzilla relationships.
> -O2: 33% basic_seq.cc and following with 27% static.cc
Can you open also a new bugreport about the regression of basic_seq.cc, which
regresses at all optimization levels? Again, attacch preprocessed testcases, a
comparison with 3.4 for all optimization levels, and the relative time reports.
Actually, I should also note that at this point we cannot probably do much
about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
optimization passes, so it is already a half miracle we don't regress
everywhere. Code generation is also improved of course, so we have to lose a
little somwhere. Of course, big regressions (>20% on files of non-trivial size)
could probably still analyzed a little to see if we find obvious offenders.
Thank you for doing this, it is of great help!
Giovanni Bajo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-09-01 11:18 ` Giovanni Bajo
@ 2004-09-02 9:41 ` Karel Gardas
2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
1 sibling, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-09-02 9:41 UTC (permalink / raw)
To: Giovanni Bajo, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List
Giovanni,
I'm working on submiting bugreports right now, but I have observed one to
me very interesting fact. When I compile files preprocessed by 3.5.0 with
gcc3.4.1 I got slower compile-times, which means regression(s) are not
that dramatic. For example for typecode.cc I got from 40% to 30%. Also for
basic_seq.cc which should regress on all optimization levels, I now got
_no_ regression at all! In fact I got speedups! Look at following table:
Not preprocessed file:
File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
File preprocessed by GCC 3.4.1:
File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56
File preprocessed by GCC 3.5.0:
File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9
So it seems 3.5.0 is _always_ faster on preprocessed file than 3.4.1! So
either 3.5.0's libstdc++ library is bigger or 3.5.0's cpp is slower.
Size comparison of these two files look:
$ ls -la basic_seq.*.ii
-rw-rw-r-- 1 karel karel 1223628 Sep 2 11:13 basic_seq.341.ii
-rw-rw-r-- 1 karel karel 1243090 Sep 2 11:01 basic_seq.350.ii
I hope you understand that I'm reluctant to submit a regression bugreport
in this case. :-) I have also noted this thing in PR c++/17278 -- which is
for typecode regression...
When I compare table (1) 341-O0 - table (2) 341-O0 == 3.77 - 3.69 == 0.08
seconds spent for 3.4.1's cpp
The same for 3.5.0 is table (1) 350-00 - table (3) 350-O0 == 4.21 - 4.15 == 0.06
seconds, so even 3.5.0's cpp should be a bit faster. So it seems the
culprit should be libstdc++ in 3.5.0, but is it possible that the size
difference of 20kB i.e. 1% difference might do such big difference in
compilation speed?
Thanks,
Karel
On Wed, 1 Sep 2004, Giovanni Bajo wrote:
> Karel Gardas wrote:
>
> > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>
> Can you please file a new bugreport with this -O1 regression, attacching this
> preprocessed testcase and the time reports to it? Also link Steven's message in
> it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the
> analysys of this.
> Then we can set that the new bug blocks PR 13776.
>
> I think it is better to track these issues with different PRs, and just
> connects them to PR 13776 (which is quite confusing at this point) just with
> the Bugzilla relationships.
>
> > -O2: 33% basic_seq.cc and following with 27% static.cc
>
> Can you open also a new bugreport about the regression of basic_seq.cc, which
> regresses at all optimization levels? Again, attacch preprocessed testcases, a
> comparison with 3.4 for all optimization levels, and the relative time reports.
>
> Actually, I should also note that at this point we cannot probably do much
> about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
> optimization passes, so it is already a half miracle we don't regress
> everywhere. Code generation is also improved of course, so we have to lose a
> little somwhere. Of course, big regressions (>20% on files of non-trivial size)
> could probably still analyzed a little to see if we find obvious offenders.
>
> Thank you for doing this, it is of great help!
>
> Giovanni Bajo
>
>
>
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
2004-09-01 11:18 ` Giovanni Bajo
2004-09-02 9:41 ` Karel Gardas
@ 2004-09-02 9:44 ` Karel Gardas
1 sibling, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-09-02 9:44 UTC (permalink / raw)
To: Giovanni Bajo; +Cc: gcc
On Wed, 1 Sep 2004, Giovanni Bajo wrote:
> Actually, I should also note that at this point we cannot probably do much
> about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
> optimization passes, so it is already a half miracle we don't regress
> everywhere.
Yes, I'm also surprised that 3.5 looks so good even so much stuff was
added.
> Code generation is also improved of course, so we have to lose a
> little somwhere. Of course, big regressions (>20% on files of non-trivial size)
> could probably still analyzed a little to see if we find obvious offenders.
>
> Thank you for doing this, it is of great help!
You are welcome! Now, in the light of observation described in my last
email I'm thinking how to mix 3.5.0 with 3.4.1's libstdc++ together to get
the best of both in one experimental compiler. :-)
Cheers,
Karel
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources
2004-09-02 9:41 ` Karel Gardas
@ 2004-09-02 20:32 ` Giovanni Bajo
2004-09-04 7:35 ` Karel Gardas
0 siblings, 1 reply; 18+ messages in thread
From: Giovanni Bajo @ 2004-09-02 20:32 UTC (permalink / raw)
To: Karel Gardas, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List
Karel Gardas wrote:
> Also for basic_seq.cc which should regress on all optimization
> levels, I now got _no_ regression at all! In fact I got speedups!
> Look at following table:
>
> Not preprocessed file:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
>
> File preprocessed by GCC 3.4.1:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56
>
> File preprocessed by GCC 3.5.0:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9
This is very interesting. Can you please file a bug report about this issue?
You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed
files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it,
please.
I'll try reproducing these numbers, and check if it's really a problem with v3
code, or something else.
Thanks again,
Giovanni Bajo
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources
2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
@ 2004-09-04 7:35 ` Karel Gardas
0 siblings, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-09-04 7:35 UTC (permalink / raw)
To: Giovanni Bajo; +Cc: Steven Bosscher, Paolo Bonzini, GCC Mailing List
Giovanni,
I've created bugreport right now, but I have forgotten to add you to cc
list. Please have a look at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17315 and edit it accordingly,
since I really do not know how to describe this issue better.
Thanks for looking into this!
Karel
On Thu, 2 Sep 2004, Giovanni Bajo wrote:
> Karel Gardas wrote:
>
> > Also for basic_seq.cc which should regress on all optimization
> > levels, I now got _no_ regression at all! In fact I got speedups!
> > Look at following table:
> >
> > Not preprocessed file:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
> >
> > File preprocessed by GCC 3.4.1:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56
> >
> > File preprocessed by GCC 3.5.0:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9
>
> This is very interesting. Can you please file a bug report about this issue?
> You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed
> files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it,
> please.
>
> I'll try reproducing these numbers, and check if it's really a problem with v3
> code, or something else.
>
> Thanks again,
> Giovanni Bajo
>
>
>
--
Karel Gardas kgardas@objectsecurity.com
ObjectSecurity Ltd. http://www.objectsecurity.com
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2004-09-04 7:35 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
2004-08-31 10:12 ` Steven Bosscher
2004-08-31 10:28 ` Karel Gardas
2004-08-31 10:44 ` Paolo Bonzini
2004-08-31 10:46 ` Karel Gardas
2004-08-31 10:49 ` Steven Bosscher
2004-08-31 11:00 ` Paolo Bonzini
2004-08-31 11:24 ` Steven Bosscher
2004-08-31 19:30 ` Mike Stump
2004-08-31 12:48 ` Karel Gardas
2004-09-01 7:18 ` Paolo Bonzini
2004-08-31 10:55 ` Steven Bosscher
2004-08-31 13:57 ` Karel Gardas
2004-09-01 11:18 ` Giovanni Bajo
2004-09-02 9:41 ` Karel Gardas
2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
2004-09-04 7:35 ` Karel Gardas
2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).