public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
@ 2004-08-31  9:58 Karel Gardas
  2004-08-31 10:12 ` Steven Bosscher
  2004-09-01 11:18 ` Giovanni Bajo
  0 siblings, 2 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31  9:58 UTC (permalink / raw)
  To: GCC Mailing List


Hello,

several times promised here are finally the results obtained for
yesterday's main-trunk and -O0/1/2 compilations (whole table is below)

As I've already reported -O0 is better, which is great! And O1 and O2 are
slower for about 8.5% and 7%.

Interesting files seem to be:

1) typecode.cc: 40% regression on O1 while 7% speedup on O2
2) orb.cc: 10% seepdup on O0, 16% regression on O1 and only 1.2%
           regression on O2
3) basic_seq.cc: 10%, 20% and 33% regressions on O0/1/2
4) static.cc: 1, 24 and 27% regression on O0/1/2
5) valuetype_impl.cc: 12 and 23% regression on O1/2

So you see that some files' biggest regression is on O1 and on other files
on O2.

Also the biggest regression are (not counting very short compilations of
uni_*.cc files):

-O0: 10% basic_seq.cc
-O1: 40% typecode.cc, 24% and 28% static.cc and pi_impl.cc
-O2: 33% basic_seq.cc and following with 27% static.cc

Anything other what should I provide to help you with these issues?
Especially please have a look into table and choose your "interesting file
for preprocessing" candidate which I will then upload to PR#13776.

Thanks and especially thanks for appreciable progress on O0!

Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com


File		341-O0	350-O0	Delta%	341-O1	350-O1	Delta%	341-O2	350-O2	Delta%

os-unix.cc	4.14	4.09	1.22	4.47	4.7	-4.89	4.55	4.97	-8.45
dii.cc		12.8	11.76	8.84	13.97	15.7	-11.02	17	18.59	-8.55
typecode.cc	9.11	9.42	-3.29	13.16	22.06	-40.34	32.25	30.05	7.32
any.cc		6.88	6.69	2.84	9.14	10.91	-16.22	12.94	13.87	-6.71
codec.cc	5.9	5.74	2.79	7.45	8.6	-13.37	9.29	11.1	-16.31
buffer.cc	3.34	3.31	0.91	3.52	3.64	-3.3	3.62	3.93	-7.89
context.cc	3.51	3.57	-1.68	3.83	4.41	-13.15	4.16	4.77	-12.79
except.cc	4.34	4.25	2.12	4.97	5.12	-2.93	6.05	6.27	-3.51
dispatch.cc	4.4	4.46	-1.35	5.24	5.1	2.75	4.95	5.64	-12.23
string.cc	3.35	3.26	2.76	3.5	3.47	0.86	3.4	3.6	-5.56
object.cc	4.69	4.76	-1.47	5.87	7	-16.14	7.01	8.07	-13.14
address.cc	5.26	4.93	6.69	6.43	6.83	-5.86	7.22	7.63	-5.37
ior.cc		12.48	11.35	9.96	14.81	15.31	-3.27	16.99	17.46	-2.69
orb.cc		16.81	15.3	9.87	25.62	30.52	-16.06	37.07	37.52	-1.2
boa.cc		9.22	8.48	8.73	11.74	13.16	-10.79	14.11	15.87	-11.09
dsi.cc		10.31	9.13	12.92	11.69	11.73	-0.34	12.57	13.19	-4.7
transport.cc	4.06	3.96	2.53	4.35	4.33	0.46	4.47	4.64	-3.66
t..port/tcp.cc	4.02	3.9	3.08	4.37	4.26	2.58	4.39	4.55	-3.52
t..port/udp.cc	4.11	4.02	2.24	4.47	4.45	0.45	4.65	4.79	-2.92
t..port/unix.cc	4.06	3.89	4.37	4.31	4.21	2.38	4.31	4.51	-4.43
iop.cc		16.43	15.03	9.31	22.25	25.39	-12.37	29.03	32.78	-11.44
util.cc		5.97	6	-0.5	7.79	10.07	-22.64	10.06	11.94	-15.75
basic_seq.cc	3.77	4.21	-10.45	3.98	4.99	-20.24	3.82	5.72	-33.22
fast_array.cc	3.89	3.74	4.01	3.95	3.88	1.8	3.87	4.07	-4.91
ssl.cc		9.29	7.73	20.18	9.25	7.84	17.98	8.99	7.91	13.65
fixed.cc	3.75	3.73	0.54	4.08	4.34	-5.99	4.22	4.85	-12.99
intercept.cc	10.27	9.5	8.11	11.64	12.31	-5.44	12.24	14.19	-13.74
codeset.cc	5.96	5.72	4.2	7.3	8.37	-12.78	9.88	10.87	-9.11
queue.cc	4.35	4.53	-3.97	4.68	5.27	-11.2	4.97	5.84	-14.9
static.cc	20.26	20.63	-1.79	24.42	32.31	-24.42	29.12	40.06	-27.31
current.cc	8.91	7.39	20.57	8.78	7.49	17.22	8.67	7.56	14.68
policy_impl.cc	12.7	11.96	6.19	13.65	14.62	-6.63	15.43	16.76	-7.94
service_info.cc	8.84	7.33	20.6	8.87	7.48	18.58	8.51	7.55	12.72
ioptypes.cc	10.69	9.46	13	12.76	12.69	0.55	13.66	14.52	-5.92
ssliop.cc	9.01	7.57	19.02	9.11	7.62	19.55	8.62	7.64	12.83
value.cc	11.27	9.31	21.05	12.08	11.11	8.73	12.36	12.17	1.56
valuetype.cc	9.96	8.48	17.45	10.59	9.7	9.18	10.92	10.64	2.63
v..type_impl.cc	12.47	12.19	2.3	13.12	14.93	-12.12	13.43	17.46	-23.08
dynany_impl.cc	10.61	10.14	4.64	15.94	20.11	-20.74	23	25.82	-10.92
policy2.cc	9.1	7.62	19.42	9.14	7.85	16.43	9.01	7.91	13.91
tckind.cc	8.77	7.33	19.65	8.82	7.39	19.35	8.56	7.42	15.36
orb_excepts.cc	9.01	7.51	19.97	9.05	7.67	17.99	8.87	7.84	13.14
policy.cc	8.96	7.47	19.95	9.09	7.64	18.98	8.83	7.87	12.2
poa.cc		13.07	11.51	13.55	15.24	14.84	2.7	17.67	17.62	0.28
poa_base.cc	10.22	8.88	15.09	10.77	10.13	6.32	11.54	11.13	3.68
poa_impl.cc	17.42	16.2	7.53	22.82	25.91	-11.93	29.78	32.73	-9.01
dynany.cc	10.26	8.83	16.19	10.81	10.21	5.88	11.72	11.06	5.97
uni_base64.cc	0.12	0.12	0	0.17	0.21	-19.05	0.25	0.28	-10.71
uni_unicode.cc	0.2	0.21	-4.76	0.28	0.36	-22.22	0.43	0.51	-15.69
uni_fromuni.cc	0.4	0.43	-6.98	0.58	0.82	-29.27	1.1	1.32	-16.67
uni_touni.cc	0.43	0.47	-8.51	0.69	0.96	-28.13	1.21	1.41	-14.18
except2.cc	6.73	6.16	9.25	10.03	10.03	0	12.98	12.54	3.51
pi.cc		11.48	9.48	21.1	12.59	11.91	5.71	13.25	13.4	-1.12
pi_impl.cc	18.92	18.96	-0.21	23.3	30.73	-24.18	30.53	37.56	-18.72
typecode_seq.cc	9.15	8.15	12.27	9.56	8.64	10.65	9.3	9.02	3.1
timebase.cc	8.78	7.53	16.6	8.94	7.45	20	8.63	7.66	12.66
ir.cc		46.58	48.62	-4.2	70.96	87.47	-18.88	97.81	114.45	-14.54
ir_base.cc	11.57	10.14	14.1	13.49	15.37	-12.23	15.67	17.76	-11.77
imr.cc		14.34	13.85	3.54	18.6	20.62	-9.8	24.84	25.31	-1.86
mtdebug.cc	3.72	3.72	0	3.95	3.77	4.77	3.69	3.82	-3.4

Sum		530.42	494.11	7.35	636.03	696.01	-8.62	767.47	827.99	-7.31




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31  9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
@ 2004-08-31 10:12 ` Steven Bosscher
  2004-08-31 10:28   ` Karel Gardas
  2004-09-01 11:18 ` Giovanni Bajo
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:12 UTC (permalink / raw)
  To: Karel Gardas, GCC Mailing List

On Tuesday 31 August 2004 11:11, Karel Gardas wrote:
> Hello,
>
> several times promised here are finally the results obtained for
> yesterday's main-trunk and -O0/1/2 compilations (whole table is below)
>
> As I've already reported -O0 is better, which is great! And O1 and O2 are
> slower for about 8.5% and 7%.
>
> Interesting files seem to be:
>
> 1) typecode.cc: 40% regression on O1 while 7% speedup on O2

Can you show us the time report for the 40% regression?

Gr.
Steven

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:12 ` Steven Bosscher
@ 2004-08-31 10:28   ` Karel Gardas
  2004-08-31 10:44     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 10:28 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Mailing List

On Tue, 31 Aug 2004, Steven Bosscher wrote:

> On Tuesday 31 August 2004 11:11, Karel Gardas wrote:
> > Hello,
> >
> > several times promised here are finally the results obtained for
> > yesterday's main-trunk and -O0/1/2 compilations (whole table is below)
> >
> > As I've already reported -O0 is better, which is great! And O1 and O2 are
> > slower for about 8.5% and 7%.
> >
> > Interesting files seem to be:
> >
> > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>
> Can you show us the time report for the 40% regression?

Here we go.

Execution times (seconds)
 garbage collection    :   0.52 ( 2%) usr   0.00 ( 0%) sys   0.53 ( 2%) wall
 callgraph construction:   0.19 ( 1%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall
 callgraph optimization:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 cfg cleanup           :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall
 trivially dead code   :   0.12 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 1%) wall
 life analysis         :   0.97 ( 4%) usr   0.00 ( 0%) sys   0.84 ( 3%) wall
 life info update      :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall
 alias analysis        :   0.17 ( 1%) usr   0.01 ( 1%) sys   0.14 ( 1%) wall
 register scan         :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 rebuild jump labels   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 preprocessing         :   0.48 ( 2%) usr   0.23 (12%) sys   0.57 ( 2%) wall
 parser                :   3.93 (17%) usr   0.58 (30%) sys   4.67 (18%) wall
 name lookup           :   1.09 ( 5%) usr   0.46 (24%) sys   1.79 ( 7%) wall
 integration           :   1.01 ( 4%) usr   0.06 ( 3%) sys   0.88 ( 3%) wall
 tree gimplify         :   0.60 ( 3%) usr   0.04 ( 2%) sys   0.60 ( 2%) wall
 tree eh               :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 tree CFG construction :   0.10 ( 0%) usr   0.03 ( 2%) sys   0.13 ( 0%) wall
 tree CFG cleanup      :   0.21 ( 1%) usr   0.01 ( 1%) sys   0.14 ( 1%) wall
 tree PTA              :   0.20 ( 1%) usr   0.00 ( 0%) sys   0.26 ( 1%) wall
 tree alias analysis   :   0.33 ( 1%) usr   0.00 ( 0%) sys   0.37 ( 1%) wall
 tree PHI insertion    :   0.42 ( 2%) usr   0.01 ( 1%) sys   0.50 ( 2%) wall
 tree SSA rewrite      :   0.58 ( 3%) usr   0.00 ( 0%) sys   0.71 ( 3%) wall
 tree SSA other        :   0.82 ( 4%) usr   0.12 ( 6%) sys   0.98 ( 4%) wall
 tree operand scan     :   0.59 ( 3%) usr   0.16 ( 8%) sys   0.98 ( 4%) wall
 dominator optimization:   1.48 ( 6%) usr   0.02 ( 1%) sys   1.50 ( 6%) wall
 tree SRA              :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 tree CCP              :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall
 tree split crit edges :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 tree PRE              :   0.41 ( 2%) usr   0.01 ( 1%) sys   0.41 ( 2%) wall
 tree forward propagate:   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 tree conservative DCE :   0.30 ( 1%) usr   0.01 ( 1%) sys   0.28 ( 1%) wall
 tree aggressive DCE   :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 tree DSE              :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall
 loop invariant motion :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.22 ( 1%) wall
 tree copy headers     :   0.06 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall
 tree SSA to normal    :   0.26 ( 1%) usr   0.01 ( 1%) sys   0.36 ( 1%) wall
 tree rename SSA copies:   0.11 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 dominance frontiers   :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 expand                :   2.08 ( 9%) usr   0.07 ( 4%) sys   2.51 (10%) wall
 varconst              :   0.08 ( 0%) usr   0.02 ( 1%) sys   0.09 ( 0%) wall
 jump                  :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 CSE                   :   0.58 ( 3%) usr   0.00 ( 0%) sys   0.55 ( 2%) wall
 loop analysis         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 branch prediction     :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall
 combiner              :   0.55 ( 2%) usr   0.00 ( 0%) sys   0.64 ( 2%) wall
 if-conversion         :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 local alloc           :   0.30 ( 1%) usr   0.00 ( 0%) sys   0.33 ( 1%) wall
 global alloc          :   1.16 ( 5%) usr   0.01 ( 1%) sys   1.34 ( 5%) wall
 reload CSE regs       :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.28 ( 1%) wall
 flow 2                :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 if-conversion 2       :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall
 rename registers      :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall
 machine dep reorg     :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 shorten branches      :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.17 ( 1%) wall
 final                 :   0.26 ( 1%) usr   0.01 ( 1%) sys   0.26 ( 1%) wall
 symout                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 rest of compilation   :   0.14 ( 1%) usr   0.01 ( 1%) sys   0.19 ( 1%) wall
 TOTAL                 :  23.12             1.91            26.21
# cc1plus 23.13 1.93
# as 0.34 0.02


Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0  2004-08-30 on MICO sources
  2004-08-31 10:28   ` Karel Gardas
@ 2004-08-31 10:44     ` Paolo Bonzini
  2004-08-31 10:46       ` Karel Gardas
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2004-08-31 10:44 UTC (permalink / raw)
  To: Karel Gardas; +Cc: GCC Mailing List

>>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>>
>>Can you show us the time report for the 40% regression?

Also for 3.4.1?

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:44     ` Paolo Bonzini
@ 2004-08-31 10:46       ` Karel Gardas
  2004-08-31 10:49         ` Steven Bosscher
  2004-08-31 10:55         ` Steven Bosscher
  0 siblings, 2 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 10:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: GCC Mailing List

On Tue, 31 Aug 2004, Paolo Bonzini wrote:

> >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> >>
> >>Can you show us the time report for the 40% regression?
>
> Also for 3.4.1?

Sure!

Execution times (seconds)
 garbage collection    :   0.79 ( 6%) usr   0.00 ( 0%) sys   0.84 ( 5%) wall
 cfg construction      :   0.09 ( 1%) usr   0.00 ( 0%) sys   0.11 ( 1%) wall
 cfg cleanup           :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.16 ( 1%) wall
 trivially dead code   :   0.10 ( 1%) usr   0.01 ( 0%) sys   0.14 ( 1%) wall
 life analysis         :   0.80 ( 6%) usr   0.00 ( 0%) sys   0.85 ( 5%) wall
 life info update      :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.15 ( 1%) wall
 alias analysis        :   0.16 ( 1%) usr   0.00 ( 0%) sys   0.21 ( 1%) wall
 register scan         :   0.13 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall
 rebuild jump labels   :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall
 preprocessing         :   0.44 ( 3%) usr   0.21 (10%) sys   0.65 ( 4%) wall
 parser                :   4.41 (31%) usr   0.67 (31%) sys   5.22 (31%) wall
 name lookup           :   1.61 (11%) usr   1.17 (53%) sys   2.90 (17%) wall
 expand                :   0.79 ( 6%) usr   0.03 ( 1%) sys   0.78 ( 5%) wall
 varconst              :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.09 ( 1%) wall
 integration           :   0.65 ( 5%) usr   0.00 ( 0%) sys   0.67 ( 4%) wall
 jump                  :   0.05 ( 0%) usr   0.02 ( 1%) sys   0.02 ( 0%) wall
 CSE                   :   0.49 ( 3%) usr   0.00 ( 0%) sys   0.46 ( 3%) wall
 loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 branch prediction     :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.16 ( 1%) wall
 flow analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 combiner              :   0.34 ( 2%) usr   0.00 ( 0%) sys   0.42 ( 2%) wall
 if-conversion         :   0.09 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 mode switching        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 local alloc           :   0.34 ( 2%) usr   0.00 ( 0%) sys   0.28 ( 2%) wall
 global alloc          :   0.91 ( 6%) usr   0.02 ( 1%) sys   0.83 ( 5%) wall
 reload CSE regs       :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.25 ( 1%) wall
 flow 2                :   0.18 ( 1%) usr   0.01 ( 0%) sys   0.12 ( 1%) wall
 if-conversion 2       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 rename registers      :   0.15 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall
 machine dep reorg     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 shorten branches      :   0.16 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall
 final                 :   0.28 ( 2%) usr   0.01 ( 0%) sys   0.43 ( 3%) wall
 symout                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 rest of compilation   :   0.42 ( 3%) usr   0.02 ( 1%) sys   0.49 ( 3%) wall
 TOTAL                 :  14.25             2.19            16.89
# cc1plus 14.26 2.21
# as 0.34 0.02


Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:46       ` Karel Gardas
@ 2004-08-31 10:49         ` Steven Bosscher
  2004-08-31 11:00           ` Paolo Bonzini
  2004-08-31 10:55         ` Steven Bosscher
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:49 UTC (permalink / raw)
  To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List

On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > >>
> > >>Can you show us the time report for the 40% regression?
> >
> > Also for 3.4.1?

3.4.1:
expand                :   0.79 ( 6%) usr   0.03 ( 1%) sys   0.78 ( 5%)

3.5.0-HEAD:
expand                :   2.08 ( 9%) usr   0.07 ( 4%) sys   2.51 (10%) wall

I wonder why this is.  I would have expected it to be the other
way around...

Gr.
Steven

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:46       ` Karel Gardas
  2004-08-31 10:49         ` Steven Bosscher
@ 2004-08-31 10:55         ` Steven Bosscher
  2004-08-31 13:57           ` Karel Gardas
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 10:55 UTC (permalink / raw)
  To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List

On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > >>
> > >>Can you show us the time report for the 40% regression?
> >
> > Also for 3.4.1?
>
> Sure!

Hmm...  No obvious hot spots eh?

Looks like the tree optimizers are to blame.  We spend roughly the same
amount of time in the post-GIMPLE passes, and we spend >7.5s in the tree
optimizers.  The total slowdown you measured was ~8.9s.  The other 1.4s
are spent in expand as shown in the previous message:

3.4.1:  expand                :   0.79 ( 6%) usr   0.03 ( 1%) sys   0.78 ( 5%) wall
3.5.0:  expand                :   2.08 ( 9%) usr   0.07 ( 4%) sys   2.51 (10%) wall

Hmm, we should probably disable at least flag_thread_jumps and
flag_loop_optimize at -O1, and perhaps consider disabling some
of the more expensive (parts of the) tree optimizers...   And
of course see if it makes sense to disable a few RTL optimizers.

So, looks like a tuning problem to me, not really a slowdown that
indicates something algorithmic being really wrong.

Gr.
Steven


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:49         ` Steven Bosscher
@ 2004-08-31 11:00           ` Paolo Bonzini
  2004-08-31 11:24             ` Steven Bosscher
  2004-08-31 12:48             ` Karel Gardas
  0 siblings, 2 replies; 18+ messages in thread
From: Paolo Bonzini @ 2004-08-31 11:00 UTC (permalink / raw)
  To: Steven Bosscher, Karel Gardas; +Cc: GCC Mailing List

> 3.4.1:
> expand                :   0.79 ( 6%) usr   0.03 ( 1%) sys   0.78 ( 5%)
> 
> 3.5.0-HEAD:
> expand                :   2.08 ( 9%) usr   0.07 ( 4%) sys   2.51 (10%) wall


Also:

3.4.1:
integration    :   0.65 ( 5%) usr   0.00 ( 0%) sys   0.67 ( 4%) wall
global alloc   :   0.91 ( 6%) usr   0.02 ( 1%) sys   0.83 ( 5%) wall

3.5.0-HEAD:
integration    :   1.01 ( 4%) usr   0.06 ( 3%) sys   0.88 ( 3%) wall
global alloc   :   1.16 ( 5%) usr   0.01 ( 1%) sys   1.34 ( 5%) wall

This is overall +0.5 seconds, which another 4%.  And then:

DOM:   1.48 ( 6%) usr   0.02 ( 1%) sys   1.50 ( 6%) wall

There are quite high times for "tree SSA other", "tree conservative 
DCE", "tree SSA rewrite" too.

Note that the parser and name lookup have indeed become faster which is 
the result of Mark's work and part of the reason why -O0 is faster.

The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is 
helping a lot.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 11:00           ` Paolo Bonzini
@ 2004-08-31 11:24             ` Steven Bosscher
  2004-08-31 19:30               ` Mike Stump
  2004-08-31 12:48             ` Karel Gardas
  1 sibling, 1 reply; 18+ messages in thread
From: Steven Bosscher @ 2004-08-31 11:24 UTC (permalink / raw)
  To: Paolo Bonzini, Karel Gardas; +Cc: GCC Mailing List

On Tuesday 31 August 2004 12:50, Paolo Bonzini wrote:
> > 3.4.1:
> > expand                :   0.79 ( 6%) usr   0.03 ( 1%) sys   0.78 ( 5%)
> >
> > 3.5.0-HEAD:
> > expand                :   2.08 ( 9%) usr   0.07 ( 4%) sys   2.51 (10%)
> > wall
>
> Also:
>
> 3.4.1:
> integration    :   0.65 ( 5%) usr   0.00 ( 0%) sys   0.67 ( 4%) wall
> global alloc   :   0.91 ( 6%) usr   0.02 ( 1%) sys   0.83 ( 5%) wall
>
> 3.5.0-HEAD:
> integration    :   1.01 ( 4%) usr   0.06 ( 3%) sys   0.88 ( 3%) wall
> global alloc   :   1.16 ( 5%) usr   0.01 ( 1%) sys   1.34 ( 5%) wall
>
> This is overall +0.5 seconds, which another 4%.  And then:

This may also just be noise.  Some passes run so fast that the
time vars are not accurate enough to record it.  You'll see that
for bodies of code with many small functions, -ftime-report will
give a very different TOTAL than /usr/bin/time  ;-)

> The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is
> helping a lot.

Rather the other way around, since GCC 3.5 has -funit-at-a-time
enabled for C++ at -O0.

Gr.
Steven


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 11:00           ` Paolo Bonzini
  2004-08-31 11:24             ` Steven Bosscher
@ 2004-08-31 12:48             ` Karel Gardas
  2004-09-01  7:18               ` Paolo Bonzini
  1 sibling, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 12:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Steven Bosscher, GCC Mailing List

On Tue, 31 Aug 2004, Paolo Bonzini wrote:

> The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is
> helping a lot.

Here are reports for -O2 for both trunk and gcc3.4.1:

Trunk:

Execution times (seconds)
 garbage collection    :   1.21 ( 4%) usr   0.00 ( 0%) sys   1.23 ( 4%) wall
 callgraph construction:   0.19 ( 1%) usr   0.01 ( 1%) sys   0.20 ( 1%) wall
 callgraph optimization:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 cfg construction      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 cfg cleanup           :   0.28 ( 1%) usr   0.00 ( 0%) sys   0.31 ( 1%) wall
 trivially dead code   :   0.38 ( 1%) usr   0.01 ( 1%) sys   0.34 ( 1%) wall
 life analysis         :   0.76 ( 2%) usr   0.01 ( 1%) sys   0.68 ( 2%) wall
 life info update      :   0.35 ( 1%) usr   0.00 ( 0%) sys   0.44 ( 1%) wall
 alias analysis        :   0.38 ( 1%) usr   0.00 ( 0%) sys   0.46 ( 1%) wall
 register scan         :   0.31 ( 1%) usr   0.00 ( 0%) sys   0.35 ( 1%) wall
 rebuild jump labels   :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 preprocessing         :   0.38 ( 1%) usr   0.17 ( 9%) sys   0.56 ( 2%) wall
 parser                :   3.89 (13%) usr   0.53 (27%) sys   4.81 (14%) wall
 name lookup           :   1.25 ( 4%) usr   0.58 (30%) sys   1.66 ( 5%) wall
 integration           :   0.86 ( 3%) usr   0.00 ( 0%) sys   0.95 ( 3%) wall
 tree gimplify         :   0.53 ( 2%) usr   0.03 ( 2%) sys   0.59 ( 2%) wall
 tree eh               :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 tree CFG construction :   0.16 ( 1%) usr   0.02 ( 1%) sys   0.16 ( 0%) wall
 tree CFG cleanup      :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.25 ( 1%) wall
 tree PTA              :   0.23 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall
 tree alias analysis   :   0.43 ( 1%) usr   0.01 ( 1%) sys   0.51 ( 2%) wall
 tree PHI insertion    :   0.59 ( 2%) usr   0.02 ( 1%) sys   0.56 ( 2%) wall
 tree SSA rewrite      :   0.68 ( 2%) usr   0.00 ( 0%) sys   0.72 ( 2%) wall
 tree SSA other        :   1.14 ( 4%) usr   0.11 ( 6%) sys   1.41 ( 4%) wall
 tree operand scan     :   0.75 ( 2%) usr   0.15 ( 8%) sys   0.77 ( 2%) wall
 dominator optimization:   1.79 ( 6%) usr   0.06 ( 3%) sys   1.69 ( 5%) wall
 tree SRA              :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 tree CCP              :   0.24 ( 1%) usr   0.00 ( 0%) sys   0.23 ( 1%) wall
 tree split crit edges :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 tree PRE              :   0.42 ( 1%) usr   0.02 ( 1%) sys   0.52 ( 2%) wall
 tree forward propagate:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 tree conservative DCE :   0.32 ( 1%) usr   0.01 ( 1%) sys   0.33 ( 1%) wall
 tree aggressive DCE   :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 tree DSE              :   0.29 ( 1%) usr   0.00 ( 0%) sys   0.38 ( 1%) wall
 loop invariant motion :   0.21 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall
 tree copy headers     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 tree SSA to normal    :   0.28 ( 1%) usr   0.02 ( 1%) sys   0.34 ( 1%) wall
 tree rename SSA copies:   0.10 ( 0%) usr   0.01 ( 1%) sys   0.11 ( 0%) wall
 dominance frontiers   :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 control dependences   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 expand                :   2.25 ( 7%) usr   0.02 ( 1%) sys   2.33 ( 7%) wall
 varconst              :   0.09 ( 0%) usr   0.03 ( 2%) sys   0.11 ( 0%) wall
 jump                  :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 CSE                   :   1.72 ( 6%) usr   0.01 ( 1%) sys   1.86 ( 5%) wall
 loop analysis         :   0.17 ( 1%) usr   0.01 ( 1%) sys   0.13 ( 0%) wall
 global CSE            :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 CPROP 1               :   0.22 ( 1%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 PRE                   :   0.37 ( 1%) usr   0.01 ( 1%) sys   0.43 ( 1%) wall
 CPROP 2               :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 LSM                   :   0.35 ( 1%) usr   0.01 ( 1%) sys   0.38 ( 1%) wall
 bypass jumps          :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 1%) wall
 web                   :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 CSE 2                 :   0.94 ( 3%) usr   0.00 ( 0%) sys   0.94 ( 3%) wall
 branch prediction     :   0.17 ( 1%) usr   0.01 ( 1%) sys   0.23 ( 1%) wall
 flow analysis         :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :   0.55 ( 2%) usr   0.00 ( 0%) sys   0.54 ( 2%) wall
 if-conversion         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 regmove               :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall
 local alloc           :   0.43 ( 1%) usr   0.01 ( 1%) sys   0.49 ( 1%) wall
 global alloc          :   1.16 ( 4%) usr   0.00 ( 0%) sys   1.14 ( 3%) wall
 reload CSE regs       :   0.53 ( 2%) usr   0.01 ( 1%) sys   0.46 ( 1%) wall
 flow 2                :   0.11 ( 0%) usr   0.01 ( 1%) sys   0.13 ( 0%) wall
 if-conversion 2       :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 peephole 2            :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.11 ( 0%) wall
 rename registers      :   0.13 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall
 scheduling 2          :   0.81 ( 3%) usr   0.00 ( 0%) sys   0.97 ( 3%) wall
 machine dep reorg     :   0.24 ( 1%) usr   0.00 ( 0%) sys   0.18 ( 1%) wall
 reorder blocks        :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall
 shorten branches      :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall
 final                 :   0.31 ( 1%) usr   0.03 ( 2%) sys   0.32 ( 1%) wall
 symout                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 rest of compilation   :   0.15 ( 0%) usr   0.01 ( 1%) sys   0.14 ( 0%) wall
 TOTAL                 :  31.01             1.94            33.84
# cc1plus 31.02 1.97
# as 0.36 0.02


GCC 3.4.1:

Execution times (seconds)
 garbage collection    :   1.20 ( 4%) usr   0.00 ( 0%) sys   1.22 ( 3%) wall
 callgraph construction:   0.14 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall
 callgraph optimization:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 cfg construction      :   0.40 ( 1%) usr   0.00 ( 0%) sys   0.43 ( 1%) wall
 cfg cleanup           :   0.46 ( 1%) usr   0.02 ( 1%) sys   0.44 ( 1%) wall
 trivially dead code   :   0.48 ( 1%) usr   0.00 ( 0%) sys   0.53 ( 1%) wall
 life analysis         :   0.74 ( 2%) usr   0.00 ( 0%) sys   0.76 ( 2%) wall
 life info update      :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.39 ( 1%) wall
 alias analysis        :   0.68 ( 2%) usr   0.00 ( 0%) sys   0.59 ( 2%) wall
 register scan         :   0.40 ( 1%) usr   0.01 ( 0%) sys   0.37 ( 1%) wall
 rebuild jump labels   :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall
 preprocessing         :   0.47 ( 1%) usr   0.18 ( 8%) sys   0.66 ( 2%) wall
 parser                :   4.21 (13%) usr   0.76 (33%) sys   5.18 (14%) wall
 name lookup           :   1.41 ( 4%) usr   0.99 (42%) sys   2.34 ( 6%) wall
 expand                :   8.79 (26%) usr   0.01 ( 0%) sys   8.90 (24%) wall
 varconst              :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall
 integration           :   1.54 ( 5%) usr   0.00 ( 0%) sys   1.78 ( 5%) wall
 jump                  :   0.69 ( 2%) usr   0.11 ( 5%) sys   0.70 ( 2%) wall
 CSE                   :   2.72 ( 8%) usr   0.00 ( 0%) sys   2.84 ( 8%) wall
 global CSE            :   2.13 ( 6%) usr   0.12 ( 5%) sys   2.41 ( 7%) wall
 loop analysis         :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall
 bypass jumps          :   0.23 ( 1%) usr   0.02 ( 1%) sys   0.37 ( 1%) wall
 CSE 2                 :   0.80 ( 2%) usr   0.00 ( 0%) sys   0.78 ( 2%) wall
 branch prediction     :   0.46 ( 1%) usr   0.00 ( 0%) sys   0.54 ( 1%) wall
 flow analysis         :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 combiner              :   0.44 ( 1%) usr   0.00 ( 0%) sys   0.43 ( 1%) wall
 if-conversion         :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall
 regmove               :   0.19 ( 1%) usr   0.00 ( 0%) sys   0.14 ( 0%) wall
 local alloc           :   0.37 ( 1%) usr   0.02 ( 1%) sys   0.39 ( 1%) wall
 global alloc          :   0.88 ( 3%) usr   0.02 ( 1%) sys   0.95 ( 3%) wall
 reload CSE regs       :   0.41 ( 1%) usr   0.01 ( 0%) sys   0.49 ( 1%) wall
 flow 2                :   0.09 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall
 if-conversion 2       :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall
 peephole 2            :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 rename registers      :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall
 scheduling 2          :   0.67 ( 2%) usr   0.03 ( 1%) sys   0.64 ( 2%) wall
 reorder blocks        :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) wall
 shorten branches      :   0.10 ( 0%) usr   0.01 ( 0%) sys   0.10 ( 0%) wall
 final                 :   0.17 ( 1%) usr   0.00 ( 0%) sys   0.23 ( 1%) wall
 symout                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall
 rest of compilation   :   0.93 ( 3%) usr   0.01 ( 0%) sys   0.90 ( 2%) wall
 TOTAL                 :  33.50             2.33            36.75
# cc1plus 33.50 2.35
# as 0.29 0.02


Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 10:55         ` Steven Bosscher
@ 2004-08-31 13:57           ` Karel Gardas
  0 siblings, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-08-31 13:57 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Paolo Bonzini, GCC Mailing List

On Tue, 31 Aug 2004, Steven Bosscher wrote:

> On Tuesday 31 August 2004 12:28, Karel Gardas wrote:
> > On Tue, 31 Aug 2004, Paolo Bonzini wrote:
> > > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2
> > > >>
> > > >>Can you show us the time report for the 40% regression?
> > >
> > > Also for 3.4.1?
> >
> > Sure!
>
> Hmm...  No obvious hot spots eh?
>

BTW: gcc3.4.1 consumes about 66MB of RAM to compile this file, while trunk
consumes about 98MB to compile it and also testing box is pIII mobile with
only 256kb cache, so higher memory usage also might add something to the
regression...

Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31 11:24             ` Steven Bosscher
@ 2004-08-31 19:30               ` Mike Stump
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Stump @ 2004-08-31 19:30 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Paolo Bonzini, Karel Gardas, GCC Mailing List

On Aug 31, 2004, at 3:49 AM, Steven Bosscher wrote:
> This may also just be noise.  Some passes run so fast that the
> time vars are not accurate enough to record it.

We at apple use ~10ns clocks to record times...  works better...  Only 
problem, getting user/sys time out of the kernel into user space, so 
that merely grabbing time isn't costly.  :-(  Not really a problem, as 
wall is the only thing you can trust, and what matters the most anyway.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0  2004-08-30 on MICO sources
  2004-08-31 12:48             ` Karel Gardas
@ 2004-09-01  7:18               ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2004-09-01  7:18 UTC (permalink / raw)
  To: Karel Gardas; +Cc: Steven Bosscher, GCC Mailing List

3.4.1 seems to have a problem with the expander at -O2:

>  expand                :   8.79 (26%) usr   0.01 ( 0%) sys   8.90 (24%) wall

So this 3.4.1 bug is somehow work around by gimplification and tree 
optimization, and this masks the 40% difference (which would still be 
there at -O2 if it weren't for the improvement in expander time).

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-08-31  9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
  2004-08-31 10:12 ` Steven Bosscher
@ 2004-09-01 11:18 ` Giovanni Bajo
  2004-09-02  9:41   ` Karel Gardas
  2004-09-02  9:44   ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
  1 sibling, 2 replies; 18+ messages in thread
From: Giovanni Bajo @ 2004-09-01 11:18 UTC (permalink / raw)
  To: Karel Gardas; +Cc: gcc

Karel Gardas wrote:

> 1) typecode.cc: 40% regression on O1 while 7% speedup on O2

Can you please file a new bugreport with this -O1 regression, attacching this
preprocessed testcase and the time reports to it? Also link Steven's message in
it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the
analysys of this.
Then we can set that the new bug blocks PR 13776.

I think it is better to track these issues with different PRs, and just
connects them to PR 13776 (which is quite confusing at this point) just with
the Bugzilla relationships.

> -O2: 33% basic_seq.cc and following with 27% static.cc

Can you open also a new bugreport about the regression of basic_seq.cc, which
regresses at all optimization levels? Again, attacch preprocessed testcases, a
comparison with 3.4 for all optimization levels, and the relative time reports.

Actually, I should also note that at this point we cannot probably do much
about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
optimization passes, so it is already a half miracle we don't regress
everywhere. Code generation is also improved of course, so we have to lose a
little somwhere. Of course, big regressions (>20% on files of non-trivial size)
could probably still analyzed a little to see if we find obvious offenders.

Thank you for doing this, it is of great help!

Giovanni Bajo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-09-01 11:18 ` Giovanni Bajo
@ 2004-09-02  9:41   ` Karel Gardas
  2004-09-02 20:32     ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
  2004-09-02  9:44   ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
  1 sibling, 1 reply; 18+ messages in thread
From: Karel Gardas @ 2004-09-02  9:41 UTC (permalink / raw)
  To: Giovanni Bajo, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List


Giovanni,

I'm working on submiting bugreports right now, but I have observed one to
me very interesting fact. When I compile files preprocessed by 3.5.0 with
gcc3.4.1 I got slower compile-times, which means regression(s) are not
that dramatic. For example for typecode.cc I got from 40% to 30%. Also for
basic_seq.cc which should regress on all optimization levels, I now got
_no_ regression at all! In fact I got speedups! Look at following table:

Not preprocessed file:
File		341-O0	350-O0	Delta%	341-O1	350-O1	Delta%	341-O2	350-O2	Delta%
basic_seq.cc	3.77	4.21	-10.45	3.98	4.99	-20.24	3.82	5.72	-33.22

File preprocessed by GCC 3.4.1:
File		341-O0	350-O0	Delta%	341-O1	350-O1	Delta%	341-O2	350-O2	Delta%
basic_seq.cc	3.69	3.31	11.48	3.91	3.47	12.68	3.78	3.65	3.56

File preprocessed by GCC 3.5.0:
File		341-O0	350-O0	Delta%	341-O1	350-O1	Delta%	341-O2	350-O2	Delta%
basic_seq.cc	4.61	4.15	11.08	5.28	4.83	9.32	5.62	5.57	0.9


So it seems 3.5.0 is _always_ faster on preprocessed file than 3.4.1! So
either 3.5.0's libstdc++ library is bigger or 3.5.0's cpp is slower.

Size comparison of these two files look:

$ ls -la basic_seq.*.ii
-rw-rw-r--    1 karel    karel     1223628 Sep  2 11:13 basic_seq.341.ii
-rw-rw-r--    1 karel    karel     1243090 Sep  2 11:01 basic_seq.350.ii

I hope you understand that I'm reluctant to submit a regression bugreport
in this case. :-) I have also noted this thing in PR c++/17278 -- which is
for typecode regression...

When I compare table (1) 341-O0 - table (2) 341-O0 == 3.77 - 3.69 == 0.08
seconds spent for 3.4.1's cpp
The same for 3.5.0 is table (1) 350-00 - table (3) 350-O0 == 4.21 - 4.15 == 0.06
seconds, so even 3.5.0's cpp should be a bit faster. So it seems the
culprit should be libstdc++ in 3.5.0, but is it possible that the size
difference of 20kB i.e. 1% difference might do such big difference in
compilation speed?

Thanks,

Karel

On Wed, 1 Sep 2004, Giovanni Bajo wrote:

> Karel Gardas wrote:
>
> > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2
>
> Can you please file a new bugreport with this -O1 regression, attacching this
> preprocessed testcase and the time reports to it? Also link Steven's message in
> it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the
> analysys of this.
> Then we can set that the new bug blocks PR 13776.
>
> I think it is better to track these issues with different PRs, and just
> connects them to PR 13776 (which is quite confusing at this point) just with
> the Bugzilla relationships.
>
> > -O2: 33% basic_seq.cc and following with 27% static.cc
>
> Can you open also a new bugreport about the regression of basic_seq.cc, which
> regresses at all optimization levels? Again, attacch preprocessed testcases, a
> comparison with 3.4 for all optimization levels, and the relative time reports.
>
> Actually, I should also note that at this point we cannot probably do much
> about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
> optimization passes, so it is already a half miracle we don't regress
> everywhere. Code generation is also improved of course, so we have to lose a
> little somwhere. Of course, big regressions (>20% on files of non-trivial size)
> could probably still analyzed a little to see if we find obvious offenders.
>
> Thank you for doing this, it is of great help!
>
> Giovanni Bajo
>
>
>

--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources
  2004-09-01 11:18 ` Giovanni Bajo
  2004-09-02  9:41   ` Karel Gardas
@ 2004-09-02  9:44   ` Karel Gardas
  1 sibling, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-09-02  9:44 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: gcc

On Wed, 1 Sep 2004, Giovanni Bajo wrote:

> Actually, I should also note that at this point we cannot probably do much
> about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new
> optimization passes, so it is already a half miracle we don't regress
> everywhere.

Yes, I'm also surprised that 3.5 looks so good even so much stuff was
added.

> Code generation is also improved of course, so we have to lose a
> little somwhere. Of course, big regressions (>20% on files of non-trivial size)
> could probably still analyzed a little to see if we find obvious offenders.
>
> Thank you for doing this, it is of great help!

You are welcome! Now, in the light of observation described in my last
email I'm thinking how to mix 3.5.0 with 3.4.1's libstdc++ together to get
the best of both in one experimental compiler. :-)

Cheers,

Karel
--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources
  2004-09-02  9:41   ` Karel Gardas
@ 2004-09-02 20:32     ` Giovanni Bajo
  2004-09-04  7:35       ` Karel Gardas
  0 siblings, 1 reply; 18+ messages in thread
From: Giovanni Bajo @ 2004-09-02 20:32 UTC (permalink / raw)
  To: Karel Gardas, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List

Karel Gardas wrote:

> Also for basic_seq.cc which should regress on all optimization
> levels, I now got _no_ regression at all! In fact I got speedups!
> Look at following table:
>
> Not preprocessed file:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
>
> File preprocessed by GCC 3.4.1:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56
>
> File preprocessed by GCC 3.5.0:
> File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9

This is very interesting. Can you please file a bug report about this issue?
You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed
files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it,
please.

I'll try reproducing these numbers, and check if it's really a problem with v3
code, or something else.

Thanks again,
Giovanni Bajo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources
  2004-09-02 20:32     ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
@ 2004-09-04  7:35       ` Karel Gardas
  0 siblings, 0 replies; 18+ messages in thread
From: Karel Gardas @ 2004-09-04  7:35 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: Steven Bosscher, Paolo Bonzini, GCC Mailing List


Giovanni,

I've created bugreport right now, but I have forgotten to add you to cc
list. Please have a look at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17315 and edit it accordingly,
since I really do not know how to describe this issue better.

Thanks for looking into this!

Karel

On Thu, 2 Sep 2004, Giovanni Bajo wrote:

> Karel Gardas wrote:
>
> > Also for basic_seq.cc which should regress on all optimization
> > levels, I now got _no_ regression at all! In fact I got speedups!
> > Look at following table:
> >
> > Not preprocessed file:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22
> >
> > File preprocessed by GCC 3.4.1:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56
> >
> > File preprocessed by GCC 3.5.0:
> > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta%
> > basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9
>
> This is very interesting. Can you please file a bug report about this issue?
> You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed
> files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it,
> please.
>
> I'll try reproducing these numbers, and check if it's really a problem with v3
> code, or something else.
>
> Thanks again,
> Giovanni Bajo
>
>
>

--
Karel Gardas                  kgardas@objectsecurity.com
ObjectSecurity Ltd.           http://www.objectsecurity.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-09-04  7:35 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-31  9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas
2004-08-31 10:12 ` Steven Bosscher
2004-08-31 10:28   ` Karel Gardas
2004-08-31 10:44     ` Paolo Bonzini
2004-08-31 10:46       ` Karel Gardas
2004-08-31 10:49         ` Steven Bosscher
2004-08-31 11:00           ` Paolo Bonzini
2004-08-31 11:24             ` Steven Bosscher
2004-08-31 19:30               ` Mike Stump
2004-08-31 12:48             ` Karel Gardas
2004-09-01  7:18               ` Paolo Bonzini
2004-08-31 10:55         ` Steven Bosscher
2004-08-31 13:57           ` Karel Gardas
2004-09-01 11:18 ` Giovanni Bajo
2004-09-02  9:41   ` Karel Gardas
2004-09-02 20:32     ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo
2004-09-04  7:35       ` Karel Gardas
2004-09-02  9:44   ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).