* Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources @ 2004-08-31 9:58 Karel Gardas 2004-08-31 10:12 ` Steven Bosscher 2004-09-01 11:18 ` Giovanni Bajo 0 siblings, 2 replies; 18+ messages in thread From: Karel Gardas @ 2004-08-31 9:58 UTC (permalink / raw) To: GCC Mailing List Hello, several times promised here are finally the results obtained for yesterday's main-trunk and -O0/1/2 compilations (whole table is below) As I've already reported -O0 is better, which is great! And O1 and O2 are slower for about 8.5% and 7%. Interesting files seem to be: 1) typecode.cc: 40% regression on O1 while 7% speedup on O2 2) orb.cc: 10% seepdup on O0, 16% regression on O1 and only 1.2% regression on O2 3) basic_seq.cc: 10%, 20% and 33% regressions on O0/1/2 4) static.cc: 1, 24 and 27% regression on O0/1/2 5) valuetype_impl.cc: 12 and 23% regression on O1/2 So you see that some files' biggest regression is on O1 and on other files on O2. Also the biggest regression are (not counting very short compilations of uni_*.cc files): -O0: 10% basic_seq.cc -O1: 40% typecode.cc, 24% and 28% static.cc and pi_impl.cc -O2: 33% basic_seq.cc and following with 27% static.cc Anything other what should I provide to help you with these issues? Especially please have a look into table and choose your "interesting file for preprocessing" candidate which I will then upload to PR#13776. Thanks and especially thanks for appreciable progress on O0! Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% os-unix.cc 4.14 4.09 1.22 4.47 4.7 -4.89 4.55 4.97 -8.45 dii.cc 12.8 11.76 8.84 13.97 15.7 -11.02 17 18.59 -8.55 typecode.cc 9.11 9.42 -3.29 13.16 22.06 -40.34 32.25 30.05 7.32 any.cc 6.88 6.69 2.84 9.14 10.91 -16.22 12.94 13.87 -6.71 codec.cc 5.9 5.74 2.79 7.45 8.6 -13.37 9.29 11.1 -16.31 buffer.cc 3.34 3.31 0.91 3.52 3.64 -3.3 3.62 3.93 -7.89 context.cc 3.51 3.57 -1.68 3.83 4.41 -13.15 4.16 4.77 -12.79 except.cc 4.34 4.25 2.12 4.97 5.12 -2.93 6.05 6.27 -3.51 dispatch.cc 4.4 4.46 -1.35 5.24 5.1 2.75 4.95 5.64 -12.23 string.cc 3.35 3.26 2.76 3.5 3.47 0.86 3.4 3.6 -5.56 object.cc 4.69 4.76 -1.47 5.87 7 -16.14 7.01 8.07 -13.14 address.cc 5.26 4.93 6.69 6.43 6.83 -5.86 7.22 7.63 -5.37 ior.cc 12.48 11.35 9.96 14.81 15.31 -3.27 16.99 17.46 -2.69 orb.cc 16.81 15.3 9.87 25.62 30.52 -16.06 37.07 37.52 -1.2 boa.cc 9.22 8.48 8.73 11.74 13.16 -10.79 14.11 15.87 -11.09 dsi.cc 10.31 9.13 12.92 11.69 11.73 -0.34 12.57 13.19 -4.7 transport.cc 4.06 3.96 2.53 4.35 4.33 0.46 4.47 4.64 -3.66 t..port/tcp.cc 4.02 3.9 3.08 4.37 4.26 2.58 4.39 4.55 -3.52 t..port/udp.cc 4.11 4.02 2.24 4.47 4.45 0.45 4.65 4.79 -2.92 t..port/unix.cc 4.06 3.89 4.37 4.31 4.21 2.38 4.31 4.51 -4.43 iop.cc 16.43 15.03 9.31 22.25 25.39 -12.37 29.03 32.78 -11.44 util.cc 5.97 6 -0.5 7.79 10.07 -22.64 10.06 11.94 -15.75 basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22 fast_array.cc 3.89 3.74 4.01 3.95 3.88 1.8 3.87 4.07 -4.91 ssl.cc 9.29 7.73 20.18 9.25 7.84 17.98 8.99 7.91 13.65 fixed.cc 3.75 3.73 0.54 4.08 4.34 -5.99 4.22 4.85 -12.99 intercept.cc 10.27 9.5 8.11 11.64 12.31 -5.44 12.24 14.19 -13.74 codeset.cc 5.96 5.72 4.2 7.3 8.37 -12.78 9.88 10.87 -9.11 queue.cc 4.35 4.53 -3.97 4.68 5.27 -11.2 4.97 5.84 -14.9 static.cc 20.26 20.63 -1.79 24.42 32.31 -24.42 29.12 40.06 -27.31 current.cc 8.91 7.39 20.57 8.78 7.49 17.22 8.67 7.56 14.68 policy_impl.cc 12.7 11.96 6.19 13.65 14.62 -6.63 15.43 16.76 -7.94 service_info.cc 8.84 7.33 20.6 8.87 7.48 18.58 8.51 7.55 12.72 ioptypes.cc 10.69 9.46 13 12.76 12.69 0.55 13.66 14.52 -5.92 ssliop.cc 9.01 7.57 19.02 9.11 7.62 19.55 8.62 7.64 12.83 value.cc 11.27 9.31 21.05 12.08 11.11 8.73 12.36 12.17 1.56 valuetype.cc 9.96 8.48 17.45 10.59 9.7 9.18 10.92 10.64 2.63 v..type_impl.cc 12.47 12.19 2.3 13.12 14.93 -12.12 13.43 17.46 -23.08 dynany_impl.cc 10.61 10.14 4.64 15.94 20.11 -20.74 23 25.82 -10.92 policy2.cc 9.1 7.62 19.42 9.14 7.85 16.43 9.01 7.91 13.91 tckind.cc 8.77 7.33 19.65 8.82 7.39 19.35 8.56 7.42 15.36 orb_excepts.cc 9.01 7.51 19.97 9.05 7.67 17.99 8.87 7.84 13.14 policy.cc 8.96 7.47 19.95 9.09 7.64 18.98 8.83 7.87 12.2 poa.cc 13.07 11.51 13.55 15.24 14.84 2.7 17.67 17.62 0.28 poa_base.cc 10.22 8.88 15.09 10.77 10.13 6.32 11.54 11.13 3.68 poa_impl.cc 17.42 16.2 7.53 22.82 25.91 -11.93 29.78 32.73 -9.01 dynany.cc 10.26 8.83 16.19 10.81 10.21 5.88 11.72 11.06 5.97 uni_base64.cc 0.12 0.12 0 0.17 0.21 -19.05 0.25 0.28 -10.71 uni_unicode.cc 0.2 0.21 -4.76 0.28 0.36 -22.22 0.43 0.51 -15.69 uni_fromuni.cc 0.4 0.43 -6.98 0.58 0.82 -29.27 1.1 1.32 -16.67 uni_touni.cc 0.43 0.47 -8.51 0.69 0.96 -28.13 1.21 1.41 -14.18 except2.cc 6.73 6.16 9.25 10.03 10.03 0 12.98 12.54 3.51 pi.cc 11.48 9.48 21.1 12.59 11.91 5.71 13.25 13.4 -1.12 pi_impl.cc 18.92 18.96 -0.21 23.3 30.73 -24.18 30.53 37.56 -18.72 typecode_seq.cc 9.15 8.15 12.27 9.56 8.64 10.65 9.3 9.02 3.1 timebase.cc 8.78 7.53 16.6 8.94 7.45 20 8.63 7.66 12.66 ir.cc 46.58 48.62 -4.2 70.96 87.47 -18.88 97.81 114.45 -14.54 ir_base.cc 11.57 10.14 14.1 13.49 15.37 -12.23 15.67 17.76 -11.77 imr.cc 14.34 13.85 3.54 18.6 20.62 -9.8 24.84 25.31 -1.86 mtdebug.cc 3.72 3.72 0 3.95 3.77 4.77 3.69 3.82 -3.4 Sum 530.42 494.11 7.35 636.03 696.01 -8.62 767.47 827.99 -7.31 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas @ 2004-08-31 10:12 ` Steven Bosscher 2004-08-31 10:28 ` Karel Gardas 2004-09-01 11:18 ` Giovanni Bajo 1 sibling, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2004-08-31 10:12 UTC (permalink / raw) To: Karel Gardas, GCC Mailing List On Tuesday 31 August 2004 11:11, Karel Gardas wrote: > Hello, > > several times promised here are finally the results obtained for > yesterday's main-trunk and -O0/1/2 compilations (whole table is below) > > As I've already reported -O0 is better, which is great! And O1 and O2 are > slower for about 8.5% and 7%. > > Interesting files seem to be: > > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2 Can you show us the time report for the 40% regression? Gr. Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:12 ` Steven Bosscher @ 2004-08-31 10:28 ` Karel Gardas 2004-08-31 10:44 ` Paolo Bonzini 0 siblings, 1 reply; 18+ messages in thread From: Karel Gardas @ 2004-08-31 10:28 UTC (permalink / raw) To: Steven Bosscher; +Cc: GCC Mailing List On Tue, 31 Aug 2004, Steven Bosscher wrote: > On Tuesday 31 August 2004 11:11, Karel Gardas wrote: > > Hello, > > > > several times promised here are finally the results obtained for > > yesterday's main-trunk and -O0/1/2 compilations (whole table is below) > > > > As I've already reported -O0 is better, which is great! And O1 and O2 are > > slower for about 8.5% and 7%. > > > > Interesting files seem to be: > > > > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > > Can you show us the time report for the 40% regression? Here we go. Execution times (seconds) garbage collection : 0.52 ( 2%) usr 0.00 ( 0%) sys 0.53 ( 2%) wall callgraph construction: 0.19 ( 1%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall callgraph optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall cfg cleanup : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall trivially dead code : 0.12 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall life analysis : 0.97 ( 4%) usr 0.00 ( 0%) sys 0.84 ( 3%) wall life info update : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall alias analysis : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.14 ( 1%) wall register scan : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall rebuild jump labels : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall preprocessing : 0.48 ( 2%) usr 0.23 (12%) sys 0.57 ( 2%) wall parser : 3.93 (17%) usr 0.58 (30%) sys 4.67 (18%) wall name lookup : 1.09 ( 5%) usr 0.46 (24%) sys 1.79 ( 7%) wall integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall tree gimplify : 0.60 ( 3%) usr 0.04 ( 2%) sys 0.60 ( 2%) wall tree eh : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall tree CFG construction : 0.10 ( 0%) usr 0.03 ( 2%) sys 0.13 ( 0%) wall tree CFG cleanup : 0.21 ( 1%) usr 0.01 ( 1%) sys 0.14 ( 1%) wall tree PTA : 0.20 ( 1%) usr 0.00 ( 0%) sys 0.26 ( 1%) wall tree alias analysis : 0.33 ( 1%) usr 0.00 ( 0%) sys 0.37 ( 1%) wall tree PHI insertion : 0.42 ( 2%) usr 0.01 ( 1%) sys 0.50 ( 2%) wall tree SSA rewrite : 0.58 ( 3%) usr 0.00 ( 0%) sys 0.71 ( 3%) wall tree SSA other : 0.82 ( 4%) usr 0.12 ( 6%) sys 0.98 ( 4%) wall tree operand scan : 0.59 ( 3%) usr 0.16 ( 8%) sys 0.98 ( 4%) wall dominator optimization: 1.48 ( 6%) usr 0.02 ( 1%) sys 1.50 ( 6%) wall tree SRA : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall tree CCP : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall tree split crit edges : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall tree PRE : 0.41 ( 2%) usr 0.01 ( 1%) sys 0.41 ( 2%) wall tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall tree conservative DCE : 0.30 ( 1%) usr 0.01 ( 1%) sys 0.28 ( 1%) wall tree aggressive DCE : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall tree DSE : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall loop invariant motion : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.22 ( 1%) wall tree copy headers : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall tree SSA to normal : 0.26 ( 1%) usr 0.01 ( 1%) sys 0.36 ( 1%) wall tree rename SSA copies: 0.11 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall dominance frontiers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall varconst : 0.08 ( 0%) usr 0.02 ( 1%) sys 0.09 ( 0%) wall jump : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall CSE : 0.58 ( 3%) usr 0.00 ( 0%) sys 0.55 ( 2%) wall loop analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall branch prediction : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall flow analysis : 0.02 ( 0%) usr 0.01 ( 1%) sys 0.04 ( 0%) wall combiner : 0.55 ( 2%) usr 0.00 ( 0%) sys 0.64 ( 2%) wall if-conversion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall local alloc : 0.30 ( 1%) usr 0.00 ( 0%) sys 0.33 ( 1%) wall global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall reload CSE regs : 0.31 ( 1%) usr 0.00 ( 0%) sys 0.28 ( 1%) wall flow 2 : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall if-conversion 2 : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall rename registers : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall machine dep reorg : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall shorten branches : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.17 ( 1%) wall final : 0.26 ( 1%) usr 0.01 ( 1%) sys 0.26 ( 1%) wall symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall rest of compilation : 0.14 ( 1%) usr 0.01 ( 1%) sys 0.19 ( 1%) wall TOTAL : 23.12 1.91 26.21 # cc1plus 23.13 1.93 # as 0.34 0.02 Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:28 ` Karel Gardas @ 2004-08-31 10:44 ` Paolo Bonzini 2004-08-31 10:46 ` Karel Gardas 0 siblings, 1 reply; 18+ messages in thread From: Paolo Bonzini @ 2004-08-31 10:44 UTC (permalink / raw) To: Karel Gardas; +Cc: GCC Mailing List >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2 >> >>Can you show us the time report for the 40% regression? Also for 3.4.1? Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:44 ` Paolo Bonzini @ 2004-08-31 10:46 ` Karel Gardas 2004-08-31 10:49 ` Steven Bosscher 2004-08-31 10:55 ` Steven Bosscher 0 siblings, 2 replies; 18+ messages in thread From: Karel Gardas @ 2004-08-31 10:46 UTC (permalink / raw) To: Paolo Bonzini; +Cc: GCC Mailing List On Tue, 31 Aug 2004, Paolo Bonzini wrote: > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > >> > >>Can you show us the time report for the 40% regression? > > Also for 3.4.1? Sure! Execution times (seconds) garbage collection : 0.79 ( 6%) usr 0.00 ( 0%) sys 0.84 ( 5%) wall cfg construction : 0.09 ( 1%) usr 0.00 ( 0%) sys 0.11 ( 1%) wall cfg cleanup : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.16 ( 1%) wall trivially dead code : 0.10 ( 1%) usr 0.01 ( 0%) sys 0.14 ( 1%) wall life analysis : 0.80 ( 6%) usr 0.00 ( 0%) sys 0.85 ( 5%) wall life info update : 0.08 ( 1%) usr 0.00 ( 0%) sys 0.15 ( 1%) wall alias analysis : 0.16 ( 1%) usr 0.00 ( 0%) sys 0.21 ( 1%) wall register scan : 0.13 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall rebuild jump labels : 0.07 ( 0%) usr 0.01 ( 0%) sys 0.05 ( 0%) wall preprocessing : 0.44 ( 3%) usr 0.21 (10%) sys 0.65 ( 4%) wall parser : 4.41 (31%) usr 0.67 (31%) sys 5.22 (31%) wall name lookup : 1.61 (11%) usr 1.17 (53%) sys 2.90 (17%) wall expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) wall varconst : 0.04 ( 0%) usr 0.01 ( 0%) sys 0.09 ( 1%) wall integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall jump : 0.05 ( 0%) usr 0.02 ( 1%) sys 0.02 ( 0%) wall CSE : 0.49 ( 3%) usr 0.00 ( 0%) sys 0.46 ( 3%) wall loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall branch prediction : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.16 ( 1%) wall flow analysis : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall combiner : 0.34 ( 2%) usr 0.00 ( 0%) sys 0.42 ( 2%) wall if-conversion : 0.09 ( 1%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall mode switching : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall local alloc : 0.34 ( 2%) usr 0.00 ( 0%) sys 0.28 ( 2%) wall global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall reload CSE regs : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 1%) wall flow 2 : 0.18 ( 1%) usr 0.01 ( 0%) sys 0.12 ( 1%) wall if-conversion 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall rename registers : 0.15 ( 1%) usr 0.00 ( 0%) sys 0.10 ( 1%) wall machine dep reorg : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall shorten branches : 0.16 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 1%) wall final : 0.28 ( 2%) usr 0.01 ( 0%) sys 0.43 ( 3%) wall symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall rest of compilation : 0.42 ( 3%) usr 0.02 ( 1%) sys 0.49 ( 3%) wall TOTAL : 14.25 2.19 16.89 # cc1plus 14.26 2.21 # as 0.34 0.02 Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:46 ` Karel Gardas @ 2004-08-31 10:49 ` Steven Bosscher 2004-08-31 11:00 ` Paolo Bonzini 2004-08-31 10:55 ` Steven Bosscher 1 sibling, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2004-08-31 10:49 UTC (permalink / raw) To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List On Tuesday 31 August 2004 12:28, Karel Gardas wrote: > On Tue, 31 Aug 2004, Paolo Bonzini wrote: > > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > > >> > > >>Can you show us the time report for the 40% regression? > > > > Also for 3.4.1? 3.4.1: expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) 3.5.0-HEAD: expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall I wonder why this is. I would have expected it to be the other way around... Gr. Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:49 ` Steven Bosscher @ 2004-08-31 11:00 ` Paolo Bonzini 2004-08-31 11:24 ` Steven Bosscher 2004-08-31 12:48 ` Karel Gardas 0 siblings, 2 replies; 18+ messages in thread From: Paolo Bonzini @ 2004-08-31 11:00 UTC (permalink / raw) To: Steven Bosscher, Karel Gardas; +Cc: GCC Mailing List > 3.4.1: > expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) > > 3.5.0-HEAD: > expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall Also: 3.4.1: integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall 3.5.0-HEAD: integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall This is overall +0.5 seconds, which another 4%. And then: DOM: 1.48 ( 6%) usr 0.02 ( 1%) sys 1.50 ( 6%) wall There are quite high times for "tree SSA other", "tree conservative DCE", "tree SSA rewrite" too. Note that the parser and name lookup have indeed become faster which is the result of Mark's work and part of the reason why -O0 is faster. The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is helping a lot. Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 11:00 ` Paolo Bonzini @ 2004-08-31 11:24 ` Steven Bosscher 2004-08-31 19:30 ` Mike Stump 2004-08-31 12:48 ` Karel Gardas 1 sibling, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2004-08-31 11:24 UTC (permalink / raw) To: Paolo Bonzini, Karel Gardas; +Cc: GCC Mailing List On Tuesday 31 August 2004 12:50, Paolo Bonzini wrote: > > 3.4.1: > > expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) > > > > 3.5.0-HEAD: > > expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) > > wall > > Also: > > 3.4.1: > integration : 0.65 ( 5%) usr 0.00 ( 0%) sys 0.67 ( 4%) wall > global alloc : 0.91 ( 6%) usr 0.02 ( 1%) sys 0.83 ( 5%) wall > > 3.5.0-HEAD: > integration : 1.01 ( 4%) usr 0.06 ( 3%) sys 0.88 ( 3%) wall > global alloc : 1.16 ( 5%) usr 0.01 ( 1%) sys 1.34 ( 5%) wall > > This is overall +0.5 seconds, which another 4%. And then: This may also just be noise. Some passes run so fast that the time vars are not accurate enough to record it. You'll see that for bodies of code with many small functions, -ftime-report will give a very different TOTAL than /usr/bin/time ;-) > The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is > helping a lot. Rather the other way around, since GCC 3.5 has -funit-at-a-time enabled for C++ at -O0. Gr. Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 11:24 ` Steven Bosscher @ 2004-08-31 19:30 ` Mike Stump 0 siblings, 0 replies; 18+ messages in thread From: Mike Stump @ 2004-08-31 19:30 UTC (permalink / raw) To: Steven Bosscher; +Cc: Paolo Bonzini, Karel Gardas, GCC Mailing List On Aug 31, 2004, at 3:49 AM, Steven Bosscher wrote: > This may also just be noise. Some passes run so fast that the > time vars are not accurate enough to record it. We at apple use ~10ns clocks to record times... works better... Only problem, getting user/sys time out of the kernel into user space, so that merely grabbing time isn't costly. :-( Not really a problem, as wall is the only thing you can trust, and what matters the most anyway. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 11:00 ` Paolo Bonzini 2004-08-31 11:24 ` Steven Bosscher @ 2004-08-31 12:48 ` Karel Gardas 2004-09-01 7:18 ` Paolo Bonzini 1 sibling, 1 reply; 18+ messages in thread From: Karel Gardas @ 2004-08-31 12:48 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Steven Bosscher, GCC Mailing List On Tue, 31 Aug 2004, Paolo Bonzini wrote: > The -O2 times for 3.5 would help as well, I suspect -funit-at-a-time is > helping a lot. Here are reports for -O2 for both trunk and gcc3.4.1: Trunk: Execution times (seconds) garbage collection : 1.21 ( 4%) usr 0.00 ( 0%) sys 1.23 ( 4%) wall callgraph construction: 0.19 ( 1%) usr 0.01 ( 1%) sys 0.20 ( 1%) wall callgraph optimization: 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall cfg cleanup : 0.28 ( 1%) usr 0.00 ( 0%) sys 0.31 ( 1%) wall trivially dead code : 0.38 ( 1%) usr 0.01 ( 1%) sys 0.34 ( 1%) wall life analysis : 0.76 ( 2%) usr 0.01 ( 1%) sys 0.68 ( 2%) wall life info update : 0.35 ( 1%) usr 0.00 ( 0%) sys 0.44 ( 1%) wall alias analysis : 0.38 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall register scan : 0.31 ( 1%) usr 0.00 ( 0%) sys 0.35 ( 1%) wall rebuild jump labels : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall preprocessing : 0.38 ( 1%) usr 0.17 ( 9%) sys 0.56 ( 2%) wall parser : 3.89 (13%) usr 0.53 (27%) sys 4.81 (14%) wall name lookup : 1.25 ( 4%) usr 0.58 (30%) sys 1.66 ( 5%) wall integration : 0.86 ( 3%) usr 0.00 ( 0%) sys 0.95 ( 3%) wall tree gimplify : 0.53 ( 2%) usr 0.03 ( 2%) sys 0.59 ( 2%) wall tree eh : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall tree CFG construction : 0.16 ( 1%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall tree CFG cleanup : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.25 ( 1%) wall tree PTA : 0.23 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall tree alias analysis : 0.43 ( 1%) usr 0.01 ( 1%) sys 0.51 ( 2%) wall tree PHI insertion : 0.59 ( 2%) usr 0.02 ( 1%) sys 0.56 ( 2%) wall tree SSA rewrite : 0.68 ( 2%) usr 0.00 ( 0%) sys 0.72 ( 2%) wall tree SSA other : 1.14 ( 4%) usr 0.11 ( 6%) sys 1.41 ( 4%) wall tree operand scan : 0.75 ( 2%) usr 0.15 ( 8%) sys 0.77 ( 2%) wall dominator optimization: 1.79 ( 6%) usr 0.06 ( 3%) sys 1.69 ( 5%) wall tree SRA : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall tree CCP : 0.24 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%) wall tree split crit edges : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall tree PRE : 0.42 ( 1%) usr 0.02 ( 1%) sys 0.52 ( 2%) wall tree forward propagate: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall tree conservative DCE : 0.32 ( 1%) usr 0.01 ( 1%) sys 0.33 ( 1%) wall tree aggressive DCE : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall tree DSE : 0.29 ( 1%) usr 0.00 ( 0%) sys 0.38 ( 1%) wall loop invariant motion : 0.21 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall tree copy headers : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall tree SSA to normal : 0.28 ( 1%) usr 0.02 ( 1%) sys 0.34 ( 1%) wall tree rename SSA copies: 0.10 ( 0%) usr 0.01 ( 1%) sys 0.11 ( 0%) wall dominance frontiers : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall control dependences : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall expand : 2.25 ( 7%) usr 0.02 ( 1%) sys 2.33 ( 7%) wall varconst : 0.09 ( 0%) usr 0.03 ( 2%) sys 0.11 ( 0%) wall jump : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall CSE : 1.72 ( 6%) usr 0.01 ( 1%) sys 1.86 ( 5%) wall loop analysis : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall global CSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall CPROP 1 : 0.22 ( 1%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall PRE : 0.37 ( 1%) usr 0.01 ( 1%) sys 0.43 ( 1%) wall CPROP 2 : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall LSM : 0.35 ( 1%) usr 0.01 ( 1%) sys 0.38 ( 1%) wall bypass jumps : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 1%) wall web : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall CSE 2 : 0.94 ( 3%) usr 0.00 ( 0%) sys 0.94 ( 3%) wall branch prediction : 0.17 ( 1%) usr 0.01 ( 1%) sys 0.23 ( 1%) wall flow analysis : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 0.55 ( 2%) usr 0.00 ( 0%) sys 0.54 ( 2%) wall if-conversion : 0.09 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall regmove : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall local alloc : 0.43 ( 1%) usr 0.01 ( 1%) sys 0.49 ( 1%) wall global alloc : 1.16 ( 4%) usr 0.00 ( 0%) sys 1.14 ( 3%) wall reload CSE regs : 0.53 ( 2%) usr 0.01 ( 1%) sys 0.46 ( 1%) wall flow 2 : 0.11 ( 0%) usr 0.01 ( 1%) sys 0.13 ( 0%) wall if-conversion 2 : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall rename registers : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall scheduling 2 : 0.81 ( 3%) usr 0.00 ( 0%) sys 0.97 ( 3%) wall machine dep reorg : 0.24 ( 1%) usr 0.00 ( 0%) sys 0.18 ( 1%) wall reorder blocks : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall shorten branches : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall final : 0.31 ( 1%) usr 0.03 ( 2%) sys 0.32 ( 1%) wall symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall rest of compilation : 0.15 ( 0%) usr 0.01 ( 1%) sys 0.14 ( 0%) wall TOTAL : 31.01 1.94 33.84 # cc1plus 31.02 1.97 # as 0.36 0.02 GCC 3.4.1: Execution times (seconds) garbage collection : 1.20 ( 4%) usr 0.00 ( 0%) sys 1.22 ( 3%) wall callgraph construction: 0.14 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall callgraph optimization: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall cfg construction : 0.40 ( 1%) usr 0.00 ( 0%) sys 0.43 ( 1%) wall cfg cleanup : 0.46 ( 1%) usr 0.02 ( 1%) sys 0.44 ( 1%) wall trivially dead code : 0.48 ( 1%) usr 0.00 ( 0%) sys 0.53 ( 1%) wall life analysis : 0.74 ( 2%) usr 0.00 ( 0%) sys 0.76 ( 2%) wall life info update : 0.36 ( 1%) usr 0.00 ( 0%) sys 0.39 ( 1%) wall alias analysis : 0.68 ( 2%) usr 0.00 ( 0%) sys 0.59 ( 2%) wall register scan : 0.40 ( 1%) usr 0.01 ( 0%) sys 0.37 ( 1%) wall rebuild jump labels : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall preprocessing : 0.47 ( 1%) usr 0.18 ( 8%) sys 0.66 ( 2%) wall parser : 4.21 (13%) usr 0.76 (33%) sys 5.18 (14%) wall name lookup : 1.41 ( 4%) usr 0.99 (42%) sys 2.34 ( 6%) wall expand : 8.79 (26%) usr 0.01 ( 0%) sys 8.90 (24%) wall varconst : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall integration : 1.54 ( 5%) usr 0.00 ( 0%) sys 1.78 ( 5%) wall jump : 0.69 ( 2%) usr 0.11 ( 5%) sys 0.70 ( 2%) wall CSE : 2.72 ( 8%) usr 0.00 ( 0%) sys 2.84 ( 8%) wall global CSE : 2.13 ( 6%) usr 0.12 ( 5%) sys 2.41 ( 7%) wall loop analysis : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall bypass jumps : 0.23 ( 1%) usr 0.02 ( 1%) sys 0.37 ( 1%) wall CSE 2 : 0.80 ( 2%) usr 0.00 ( 0%) sys 0.78 ( 2%) wall branch prediction : 0.46 ( 1%) usr 0.00 ( 0%) sys 0.54 ( 1%) wall flow analysis : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall combiner : 0.44 ( 1%) usr 0.00 ( 0%) sys 0.43 ( 1%) wall if-conversion : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall regmove : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 0%) wall local alloc : 0.37 ( 1%) usr 0.02 ( 1%) sys 0.39 ( 1%) wall global alloc : 0.88 ( 3%) usr 0.02 ( 1%) sys 0.95 ( 3%) wall reload CSE regs : 0.41 ( 1%) usr 0.01 ( 0%) sys 0.49 ( 1%) wall flow 2 : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall if-conversion 2 : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall peephole 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall rename registers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall scheduling 2 : 0.67 ( 2%) usr 0.03 ( 1%) sys 0.64 ( 2%) wall reorder blocks : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall shorten branches : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.10 ( 0%) wall final : 0.17 ( 1%) usr 0.00 ( 0%) sys 0.23 ( 1%) wall symout : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall rest of compilation : 0.93 ( 3%) usr 0.01 ( 0%) sys 0.90 ( 2%) wall TOTAL : 33.50 2.33 36.75 # cc1plus 33.50 2.35 # as 0.29 0.02 Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 12:48 ` Karel Gardas @ 2004-09-01 7:18 ` Paolo Bonzini 0 siblings, 0 replies; 18+ messages in thread From: Paolo Bonzini @ 2004-09-01 7:18 UTC (permalink / raw) To: Karel Gardas; +Cc: Steven Bosscher, GCC Mailing List 3.4.1 seems to have a problem with the expander at -O2: > expand : 8.79 (26%) usr 0.01 ( 0%) sys 8.90 (24%) wall So this 3.4.1 bug is somehow work around by gimplification and tree optimization, and this masks the 40% difference (which would still be there at -O2 if it weren't for the improvement in expander time). Paolo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:46 ` Karel Gardas 2004-08-31 10:49 ` Steven Bosscher @ 2004-08-31 10:55 ` Steven Bosscher 2004-08-31 13:57 ` Karel Gardas 1 sibling, 1 reply; 18+ messages in thread From: Steven Bosscher @ 2004-08-31 10:55 UTC (permalink / raw) To: Karel Gardas, Paolo Bonzini; +Cc: GCC Mailing List On Tuesday 31 August 2004 12:28, Karel Gardas wrote: > On Tue, 31 Aug 2004, Paolo Bonzini wrote: > > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > > >> > > >>Can you show us the time report for the 40% regression? > > > > Also for 3.4.1? > > Sure! Hmm... No obvious hot spots eh? Looks like the tree optimizers are to blame. We spend roughly the same amount of time in the post-GIMPLE passes, and we spend >7.5s in the tree optimizers. The total slowdown you measured was ~8.9s. The other 1.4s are spent in expand as shown in the previous message: 3.4.1: expand : 0.79 ( 6%) usr 0.03 ( 1%) sys 0.78 ( 5%) wall 3.5.0: expand : 2.08 ( 9%) usr 0.07 ( 4%) sys 2.51 (10%) wall Hmm, we should probably disable at least flag_thread_jumps and flag_loop_optimize at -O1, and perhaps consider disabling some of the more expensive (parts of the) tree optimizers... And of course see if it makes sense to disable a few RTL optimizers. So, looks like a tuning problem to me, not really a slowdown that indicates something algorithmic being really wrong. Gr. Steven ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 10:55 ` Steven Bosscher @ 2004-08-31 13:57 ` Karel Gardas 0 siblings, 0 replies; 18+ messages in thread From: Karel Gardas @ 2004-08-31 13:57 UTC (permalink / raw) To: Steven Bosscher; +Cc: Paolo Bonzini, GCC Mailing List On Tue, 31 Aug 2004, Steven Bosscher wrote: > On Tuesday 31 August 2004 12:28, Karel Gardas wrote: > > On Tue, 31 Aug 2004, Paolo Bonzini wrote: > > > >>>1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > > > >> > > > >>Can you show us the time report for the 40% regression? > > > > > > Also for 3.4.1? > > > > Sure! > > Hmm... No obvious hot spots eh? > BTW: gcc3.4.1 consumes about 66MB of RAM to compile this file, while trunk consumes about 98MB to compile it and also testing box is pIII mobile with only 256kb cache, so higher memory usage also might add something to the regression... Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas 2004-08-31 10:12 ` Steven Bosscher @ 2004-09-01 11:18 ` Giovanni Bajo 2004-09-02 9:41 ` Karel Gardas 2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas 1 sibling, 2 replies; 18+ messages in thread From: Giovanni Bajo @ 2004-09-01 11:18 UTC (permalink / raw) To: Karel Gardas; +Cc: gcc Karel Gardas wrote: > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2 Can you please file a new bugreport with this -O1 regression, attacching this preprocessed testcase and the time reports to it? Also link Steven's message in it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the analysys of this. Then we can set that the new bug blocks PR 13776. I think it is better to track these issues with different PRs, and just connects them to PR 13776 (which is quite confusing at this point) just with the Bugzilla relationships. > -O2: 33% basic_seq.cc and following with 27% static.cc Can you open also a new bugreport about the regression of basic_seq.cc, which regresses at all optimization levels? Again, attacch preprocessed testcases, a comparison with 3.4 for all optimization levels, and the relative time reports. Actually, I should also note that at this point we cannot probably do much about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new optimization passes, so it is already a half miracle we don't regress everywhere. Code generation is also improved of course, so we have to lose a little somwhere. Of course, big regressions (>20% on files of non-trivial size) could probably still analyzed a little to see if we find obvious offenders. Thank you for doing this, it is of great help! Giovanni Bajo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-09-01 11:18 ` Giovanni Bajo @ 2004-09-02 9:41 ` Karel Gardas 2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo 2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas 1 sibling, 1 reply; 18+ messages in thread From: Karel Gardas @ 2004-09-02 9:41 UTC (permalink / raw) To: Giovanni Bajo, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List Giovanni, I'm working on submiting bugreports right now, but I have observed one to me very interesting fact. When I compile files preprocessed by 3.5.0 with gcc3.4.1 I got slower compile-times, which means regression(s) are not that dramatic. For example for typecode.cc I got from 40% to 30%. Also for basic_seq.cc which should regress on all optimization levels, I now got _no_ regression at all! In fact I got speedups! Look at following table: Not preprocessed file: File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22 File preprocessed by GCC 3.4.1: File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56 File preprocessed by GCC 3.5.0: File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9 So it seems 3.5.0 is _always_ faster on preprocessed file than 3.4.1! So either 3.5.0's libstdc++ library is bigger or 3.5.0's cpp is slower. Size comparison of these two files look: $ ls -la basic_seq.*.ii -rw-rw-r-- 1 karel karel 1223628 Sep 2 11:13 basic_seq.341.ii -rw-rw-r-- 1 karel karel 1243090 Sep 2 11:01 basic_seq.350.ii I hope you understand that I'm reluctant to submit a regression bugreport in this case. :-) I have also noted this thing in PR c++/17278 -- which is for typecode regression... When I compare table (1) 341-O0 - table (2) 341-O0 == 3.77 - 3.69 == 0.08 seconds spent for 3.4.1's cpp The same for 3.5.0 is table (1) 350-00 - table (3) 350-O0 == 4.21 - 4.15 == 0.06 seconds, so even 3.5.0's cpp should be a bit faster. So it seems the culprit should be libstdc++ in 3.5.0, but is it possible that the size difference of 20kB i.e. 1% difference might do such big difference in compilation speed? Thanks, Karel On Wed, 1 Sep 2004, Giovanni Bajo wrote: > Karel Gardas wrote: > > > 1) typecode.cc: 40% regression on O1 while 7% speedup on O2 > > Can you please file a new bugreport with this -O1 regression, attacching this > preprocessed testcase and the time reports to it? Also link Steven's message in > it: http://gcc.gnu.org/ml/gcc/2004-08/msg01602.html, which contains the > analysys of this. > Then we can set that the new bug blocks PR 13776. > > I think it is better to track these issues with different PRs, and just > connects them to PR 13776 (which is quite confusing at this point) just with > the Bugzilla relationships. > > > -O2: 33% basic_seq.cc and following with 27% static.cc > > Can you open also a new bugreport about the regression of basic_seq.cc, which > regresses at all optimization levels? Again, attacch preprocessed testcases, a > comparison with 3.4 for all optimization levels, and the relative time reports. > > Actually, I should also note that at this point we cannot probably do much > about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new > optimization passes, so it is already a half miracle we don't regress > everywhere. Code generation is also improved of course, so we have to lose a > little somwhere. Of course, big regressions (>20% on files of non-trivial size) > could probably still analyzed a little to see if we find obvious offenders. > > Thank you for doing this, it is of great help! > > Giovanni Bajo > > > -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources 2004-09-02 9:41 ` Karel Gardas @ 2004-09-02 20:32 ` Giovanni Bajo 2004-09-04 7:35 ` Karel Gardas 0 siblings, 1 reply; 18+ messages in thread From: Giovanni Bajo @ 2004-09-02 20:32 UTC (permalink / raw) To: Karel Gardas, Steven Bosscher, Paolo Bonzini; +Cc: GCC Mailing List Karel Gardas wrote: > Also for basic_seq.cc which should regress on all optimization > levels, I now got _no_ regression at all! In fact I got speedups! > Look at following table: > > Not preprocessed file: > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22 > > File preprocessed by GCC 3.4.1: > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56 > > File preprocessed by GCC 3.5.0: > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9 This is very interesting. Can you please file a bug report about this issue? You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it, please. I'll try reproducing these numbers, and check if it's really a problem with v3 code, or something else. Thanks again, Giovanni Bajo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 on MICO sources 2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo @ 2004-09-04 7:35 ` Karel Gardas 0 siblings, 0 replies; 18+ messages in thread From: Karel Gardas @ 2004-09-04 7:35 UTC (permalink / raw) To: Giovanni Bajo; +Cc: Steven Bosscher, Paolo Bonzini, GCC Mailing List Giovanni, I've created bugreport right now, but I have forgotten to add you to cc list. Please have a look at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17315 and edit it accordingly, since I really do not know how to describe this issue better. Thanks for looking into this! Karel On Thu, 2 Sep 2004, Giovanni Bajo wrote: > Karel Gardas wrote: > > > Also for basic_seq.cc which should regress on all optimization > > levels, I now got _no_ regression at all! In fact I got speedups! > > Look at following table: > > > > Not preprocessed file: > > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > > basic_seq.cc 3.77 4.21 -10.45 3.98 4.99 -20.24 3.82 5.72 -33.22 > > > > File preprocessed by GCC 3.4.1: > > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > > basic_seq.cc 3.69 3.31 11.48 3.91 3.47 12.68 3.78 3.65 3.56 > > > > File preprocessed by GCC 3.5.0: > > File 341-O0 350-O0 Delta% 341-O1 350-O1 Delta% 341-O2 350-O2 Delta% > > basic_seq.cc 4.61 4.15 11.08 5.28 4.83 9.32 5.62 5.57 0.9 > > This is very interesting. Can you please file a bug report about this issue? > You can attacch the unpreprocessed basic_seq.cc, and the two preprocessed > files, with 3.4.1 and 3.5.0, and include all the timings you did. CC me on it, > please. > > I'll try reproducing these numbers, and check if it's really a problem with v3 > code, or something else. > > Thanks again, > Giovanni Bajo > > > -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources 2004-09-01 11:18 ` Giovanni Bajo 2004-09-02 9:41 ` Karel Gardas @ 2004-09-02 9:44 ` Karel Gardas 1 sibling, 0 replies; 18+ messages in thread From: Karel Gardas @ 2004-09-02 9:44 UTC (permalink / raw) To: Giovanni Bajo; +Cc: gcc On Wed, 1 Sep 2004, Giovanni Bajo wrote: > Actually, I should also note that at this point we cannot probably do much > about compile time regressions at -O1/2/3. GCC 3.5 features more than 60 new > optimization passes, so it is already a half miracle we don't regress > everywhere. Yes, I'm also surprised that 3.5 looks so good even so much stuff was added. > Code generation is also improved of course, so we have to lose a > little somwhere. Of course, big regressions (>20% on files of non-trivial size) > could probably still analyzed a little to see if we find obvious offenders. > > Thank you for doing this, it is of great help! You are welcome! Now, in the light of observation described in my last email I'm thinking how to mix 3.5.0 with 3.4.1's libstdc++ together to get the best of both in one experimental compiler. :-) Cheers, Karel -- Karel Gardas kgardas@objectsecurity.com ObjectSecurity Ltd. http://www.objectsecurity.com ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2004-09-04 7:35 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-08-31 9:58 Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 on MICO sources Karel Gardas 2004-08-31 10:12 ` Steven Bosscher 2004-08-31 10:28 ` Karel Gardas 2004-08-31 10:44 ` Paolo Bonzini 2004-08-31 10:46 ` Karel Gardas 2004-08-31 10:49 ` Steven Bosscher 2004-08-31 11:00 ` Paolo Bonzini 2004-08-31 11:24 ` Steven Bosscher 2004-08-31 19:30 ` Mike Stump 2004-08-31 12:48 ` Karel Gardas 2004-09-01 7:18 ` Paolo Bonzini 2004-08-31 10:55 ` Steven Bosscher 2004-08-31 13:57 ` Karel Gardas 2004-09-01 11:18 ` Giovanni Bajo 2004-09-02 9:41 ` Karel Gardas 2004-09-02 20:32 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.02004-08-30 " Giovanni Bajo 2004-09-04 7:35 ` Karel Gardas 2004-09-02 9:44 ` Compilation performance comparison of gcc3.4.1 and gcc3.5.0 2004-08-30 " Karel Gardas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).