public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Huge compile time & run time performance regression 3.3 -> HEAD
@ 2003-05-18 16:10 Richard Guenther
  2003-05-18 19:22 ` Richard Guenther
  2003-05-19 16:18 ` Matt Austern
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 16:10 UTC (permalink / raw)
  To: gcc

Hi!

As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
performance and performance of the resulting code. As always these
comparisons are for a POOMA based scientific application.

I experience a 100% compile time regression (673.50s -> 1284.48s) and
a 12% runtime performance regression (150s -> 171s) when comparing
gcc3.3 to HEAD.

Time reports follow, the most prominent regressions are expand, global CSE
(>300%!), loop analysis and branch prediction.

Compile options are -ftemplate-depth-80 -fno-exceptions -O2 -march=athlon
-funroll-loops -fomit-frame-pointer

Richard.

gcc-3.3:

Execution times (seconds)
 garbage collection    :  21.82 ( 3%) usr   0.01 ( 0%) sys  22.00 ( 3%)
 cfg construction      :   3.32 ( 1%) usr   0.08 ( 0%) sys   4.00 ( 1%)
 cfg cleanup           :  58.32 ( 9%) usr   0.20 ( 1%) sys  60.00 ( 9%)
 trivially dead code   :   9.11 ( 1%) usr   0.01 ( 0%) sys   9.00 ( 1%)
 life analysis         :  14.49 ( 2%) usr   0.60 ( 4%) sys  15.00 ( 2%)
 life info update      :   6.95 ( 1%) usr   0.53 ( 3%) sys   8.50 ( 1%)
 preprocessing         :   0.52 ( 0%) usr   0.24 ( 1%) sys   1.00 ( 0%)
 lexical analysis      :   0.31 ( 0%) usr   0.17 ( 1%) sys   1.50 ( 0%)
 parser                :  13.78 ( 2%) usr   0.55 ( 3%) sys  19.00 ( 3%)
 name lookup           :   6.88 ( 1%) usr   0.84 ( 5%) sys   4.50 ( 1%)
 expand                :  29.68 ( 5%) usr   1.63 (10%) sys  36.00 ( 5%)
 varconst              :   0.32 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%)
 integration           :  27.52 ( 4%) usr   0.74 ( 5%) sys  27.50 ( 4%)
 jump                  :  17.88 ( 3%) usr   0.26 ( 2%) sys  17.50 ( 3%)
 CSE                   :  23.11 ( 4%) usr   0.05 ( 0%) sys  25.00 ( 4%)
 global CSE            : 108.29 (16%) usr   1.30 ( 8%) sys 108.00 (16%)
 loop analysis         : 206.77 (31%) usr   8.24 (51%) sys 213.50 (32%)
 CSE 2                 :   9.69 ( 1%) usr   0.01 ( 0%) sys  10.00 ( 1%)
 branch prediction     :   8.94 ( 1%) usr   0.04 ( 0%) sys   9.00 ( 1%)
 flow analysis         :   1.56 ( 0%) usr   0.02 ( 0%) sys   2.00 ( 0%)
 combiner              :   5.52 ( 1%) usr   0.07 ( 0%) sys   3.50 ( 1%)
 if-conversion         :   0.75 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 0%)
 regmove               :   6.41 ( 1%) usr   0.00 ( 0%) sys   6.00 ( 1%)
 mode switching        :   0.91 ( 0%) usr   0.00 ( 0%) sys   1.50 ( 0%)
 local alloc           :  24.06 ( 4%) usr   0.11 ( 1%) sys  23.00 ( 3%)
 global alloc          :  10.66 ( 2%) usr   0.11 ( 1%) sys   9.50 ( 1%)
 reload CSE regs       :  15.54 ( 2%) usr   0.05 ( 0%) sys  15.00 ( 2%)
 flow 2                :   2.01 ( 0%) usr   0.01 ( 0%) sys   3.00 ( 0%)
 if-conversion 2       :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%)
 peephole 2            :   0.62 ( 0%) usr   0.01 ( 0%) sys   0.50 ( 0%)
 rename registers      :   2.52 ( 0%) usr   0.03 ( 0%) sys   1.00 ( 0%)
 scheduling 2          :   6.27 ( 1%) usr   0.14 ( 1%) sys   7.50 ( 1%)
 reorder blocks        :   0.21 ( 0%) usr   0.02 ( 0%) sys   0.50 ( 0%)
 shorten branches      :   1.76 ( 0%) usr   0.01 ( 0%) sys   1.00 ( 0%)
 reg stack             :   0.53 ( 0%) usr   0.01 ( 0%) sys   1.00 ( 0%)
 final                 :   1.72 ( 0%) usr   0.15 ( 1%) sys   0.50 ( 0%)
 rest of compilation   :   7.72 ( 1%) usr   0.01 ( 0%) sys   6.50 ( 1%)
 TOTAL                 : 656.71            16.26           673.50

gcc 3.4:

Execution times (seconds)
 garbage collection    :  25.93 ( 2%) usr   0.12 ( 0%) sys  26.04 ( 2%)
 cfg construction      :   4.14 ( 0%) usr   0.22 ( 0%) sys   4.06 ( 0%)
 cfg cleanup           :   8.45 ( 1%) usr   0.68 ( 0%) sys   9.17 ( 1%)
 trivially dead code   :   9.92 ( 1%) usr   0.16 ( 0%) sys  10.21 ( 1%)
 life analysis         :  17.45 ( 2%) usr  56.73 (31%) sys  74.21 ( 6%)
 life info update      :   8.87 ( 1%) usr  47.74 (26%) sys  56.72 ( 4%)
 alias analysis        :  18.90 ( 2%) usr   1.36 ( 1%) sys  19.57 ( 2%)
 register scan         :   3.63 ( 0%) usr   0.00 ( 0%) sys   3.80 ( 0%)
 rebuild jump labels   :   3.17 ( 0%) usr   0.00 ( 0%) sys   3.30 ( 0%)
 preprocessing         :   0.60 ( 0%) usr   0.24 ( 0%) sys   6.20 ( 0%)
 parser                :  13.69 ( 1%) usr   0.72 ( 0%) sys  14.31 ( 1%)
 name lookup           :   9.07 ( 1%) usr   0.72 ( 0%) sys  10.07 ( 1%)
 expand                : 190.76 (17%) usr   7.66 ( 4%) sys 198.53 (15%)
 varconst              :   0.30 ( 0%) usr   0.01 ( 0%) sys   0.31 ( 0%)
 integration           :  27.47 ( 3%) usr   2.63 ( 1%) sys  29.84 ( 2%)
 jump                  :  15.55 ( 1%) usr   1.51 ( 1%) sys  17.02 ( 1%)
 CSE                   :  20.92 ( 2%) usr   0.13 ( 0%) sys  21.00 ( 2%)
 global CSE            : 365.18 (33%) usr   3.40 ( 2%) sys 368.96 (29%)
 loop analysis         : 227.82 (21%) usr  57.81 (31%) sys 286.05 (22%)
 bypass jumps          :   4.69 ( 0%) usr   0.49 ( 0%) sys   5.05 ( 0%)
 CSE 2                 :   9.27 ( 1%) usr   0.06 ( 0%) sys   9.00 ( 1%)
 branch prediction     :  20.14 ( 2%) usr   1.57 ( 1%) sys  21.76 ( 2%)
 flow analysis         :   0.48 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%)
 combiner              :   6.40 ( 1%) usr   0.22 ( 0%) sys   6.81 ( 1%)
 if-conversion         :   0.52 ( 0%) usr   0.01 ( 0%) sys   0.50 ( 0%)
 regmove               :   7.61 ( 1%) usr   0.01 ( 0%) sys   7.57 ( 1%)
 mode switching        :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%)
 local alloc           :  24.98 ( 2%) usr   0.27 ( 0%) sys  25.20 ( 2%)
 global alloc          :  12.88 ( 1%) usr   0.26 ( 0%) sys  13.24 ( 1%)
 reload CSE regs       :   5.70 ( 1%) usr   0.17 ( 0%) sys   5.82 ( 0%)
 flow 2                :   1.18 ( 0%) usr   0.00 ( 0%) sys   1.27 ( 0%)
 if-conversion 2       :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.25 ( 0%)
 peephole 2            :   0.94 ( 0%) usr   0.00 ( 0%) sys   0.93 ( 0%)
 rename registers      :   2.86 ( 0%) usr   0.07 ( 0%) sys   2.88 ( 0%)
 scheduling 2          :   6.90 ( 1%) usr   0.26 ( 0%) sys   7.24 ( 1%)
 reorder blocks        :   0.51 ( 0%) usr   0.01 ( 0%) sys   0.64 ( 0%)
 shorten branches      :   1.99 ( 0%) usr   0.11 ( 0%) sys   2.17 ( 0%)
 reg stack             :   1.02 ( 0%) usr   0.04 ( 0%) sys   1.18 ( 0%)
 final                 :   2.44 ( 0%) usr   0.25 ( 0%) sys   2.54 ( 0%)
 rest of compilation   :  10.52 ( 1%) usr   0.07 ( 0%) sys  10.56 ( 1%)
 TOTAL                 :1093.17           185.71          1284.48


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
@ 2003-05-18 19:22 ` Richard Guenther
  2003-05-18 19:31   ` Richard Guenther
  2003-05-19 16:18 ` Matt Austern
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:22 UTC (permalink / raw)
  To: gcc

On Sun, 18 May 2003, Richard Guenther wrote:

> Hi!
>
> As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> performance and performance of the resulting code. As always these
> comparisons are for a POOMA based scientific application.
>
> I experience a 100% compile time regression (673.50s -> 1284.48s) and
> a 12% runtime performance regression (150s -> 171s) when comparing
> gcc3.3 to HEAD.
>
> Time reports follow, the most prominent regressions are expand, global CSE
> (>300%!), loop analysis and branch prediction.

Top of a profile of gcc3.4 is

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  7.47     79.18    79.18      906     0.09     0.09  compute_store_table
  6.92    152.54    73.36 262785751     0.00     0.00  expr_equiv_p
  6.76    224.21    71.67   170098     0.00     0.00  fixup_var_refs_insns
  5.58    283.38    59.17    12865     0.00     0.00  loop_regs_scan
  4.39    329.97    46.59   364279     0.00     0.00  compute_transp
  4.10    373.46    43.49 308470718     0.00     0.00
splay_tree_splay_helper
  3.27    408.15    34.69 201411415     0.00     0.00  fixup_var_refs_1
  3.02    440.22    32.07 192083288     0.00     0.00  true_dependence
  2.88    470.76    30.54 232168274     0.00     0.00
mems_in_disjoint_alias_sets_p
  2.39    496.11    25.35    10203     0.00     0.00  loop_regs_update
  2.36    521.17    25.06      686     0.04     0.04  build_store_vectors
  2.00    542.36    21.19  3266323     0.00     0.00
invalid_mode_change_p
  1.95    563.04    20.68   374046     0.00     0.00  alloc_page
  1.86    582.79    19.75 180849685     0.00     0.00  find_loads
  1.47    598.39    15.60   214475     0.00     0.00  ldst_entry
  1.28    611.95    13.56  4111368     0.00     0.00
gt_ggc_mx_lang_tree_node
  1.04    622.99    11.04 201411428     0.00     0.00  fixup_var_refs_insn
  1.03    633.94    10.95      906     0.01     0.01  store_motion
  1.03    644.86    10.92    18177     0.00     0.00  fixup_var_refs
  1.01    655.57    10.71 1468353297     0.00     0.00
splay_tree_compare_ints
  1.00    666.14    10.57 15850095     0.00     0.00  walk_tree

full profile can be fetched from

http:://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/profile_34.gz

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-18 19:22 ` Richard Guenther
@ 2003-05-18 19:31   ` Richard Guenther
  2003-05-18 19:35     ` Richard Guenther
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:31 UTC (permalink / raw)
  To: gcc

On Sun, 18 May 2003, Richard Guenther wrote:

> On Sun, 18 May 2003, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global CSE
> > (>300%!), loop analysis and branch prediction.
>
> Top of a profile of gcc3.4 is
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>   7.47     79.18    79.18      906     0.09     0.09  compute_store_table
>   6.92    152.54    73.36 262785751     0.00     0.00  expr_equiv_p
>   6.76    224.21    71.67   170098     0.00     0.00  fixup_var_refs_insns
>   5.58    283.38    59.17    12865     0.00     0.00  loop_regs_scan
>   4.39    329.97    46.59   364279     0.00     0.00  compute_transp
>   4.10    373.46    43.49 308470718     0.00     0.00
> splay_tree_splay_helper
>   3.27    408.15    34.69 201411415     0.00     0.00  fixup_var_refs_1
>   3.02    440.22    32.07 192083288     0.00     0.00  true_dependence
>   2.88    470.76    30.54 232168274     0.00     0.00
> mems_in_disjoint_alias_sets_p

Compared to a 3.3 profile:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  8.08     45.30    45.30   272942     0.17     0.17  compute_transp
  7.15     85.38    40.08 263012170     0.00     0.00
splay_tree_splay_helper
  5.62    116.92    31.54 165540544     0.00     0.00  true_dependence
  4.72    143.36    26.44 203228718     0.00     0.00
mems_in_disjoint_alias_sets_p
  3.80    164.68    21.32   171556     0.12     0.12  clear_table
  3.61    184.92    20.24     9131     2.22     2.69  loop_regs_update
  3.46    204.34    19.42  3012195     0.01     0.01
invalid_mode_change_p
  3.04    221.36    17.02   605912     0.03     0.04  htab_traverse
  2.65    236.24    14.88   372632     0.04     0.04  alloc_page
  2.05    247.74    11.50  3644058     0.00     0.01
gt_ggc_mx_lang_tree_node
  1.81    257.88    10.14     4977     2.04     2.20  unroll_loop
  1.80    267.97    10.09 1344607967     0.00     0.00
splay_tree_compare_ints
  1.80    278.04    10.07 105168704     0.00     0.00
htab_find_slot_with_hash
  1.59    286.94     8.90  5214165     0.00     0.00  loop_invariant_p


gcc3.4
>   6.92    152.54    73.36 262785751     0.00     0.00  expr_equiv_p
seems to be the culprit.

Full gcc3.3 profile at

http://www.tat.physik.uni-tuebingen.de/~rguenth/gcc/profile_33.gz

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-18 19:31   ` Richard Guenther
@ 2003-05-18 19:35     ` Richard Guenther
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-18 19:35 UTC (permalink / raw)
  To: gcc

On Sun, 18 May 2003, Richard Guenther wrote:

> On Sun, 18 May 2003, Richard Guenther wrote:
>
> > Top of a profile of gcc3.4 is
> >
> > Flat profile:
> >
> > Each sample counts as 0.01 seconds.
> >   %   cumulative   self              self     total
> >  time   seconds   seconds    calls   s/call   s/call  name
> >   7.47     79.18    79.18      906     0.09     0.09  compute_store_table
> >   6.92    152.54    73.36 262785751     0.00     0.00  expr_equiv_p
> >   6.76    224.21    71.67   170098     0.00     0.00  fixup_var_refs_insns
> >   5.58    283.38    59.17    12865     0.00     0.00  loop_regs_scan
> >   4.39    329.97    46.59   364279     0.00     0.00  compute_transp
>
> Compared to a 3.3 profile:
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   8.08     45.30    45.30   272942     0.17     0.17  compute_transp
>   7.15     85.38    40.08 263012170     0.00     0.00
> splay_tree_splay_helper
>   5.62    116.92    31.54 165540544     0.00     0.00  true_dependence
>   4.72    143.36    26.44 203228718     0.00     0.00
> mems_in_disjoint_alias_sets_p
>   3.80    164.68    21.32   171556     0.12     0.12  clear_table
>   3.61    184.92    20.24     9131     2.22     2.69  loop_regs_update

Sorry for always talking with myself in the public, but doesnt this look
like store-motion being the culprit? Wasnt that re-enabled in 3.4 only
(despite -fverbose-asm listing -fgcse-sm for both 3.3 and 3.4)?

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
  2003-05-18 19:22 ` Richard Guenther
@ 2003-05-19 16:18 ` Matt Austern
  2003-05-19 19:01   ` Richard Guenther
  2003-05-19 20:51   ` Richard Guenther
  1 sibling, 2 replies; 11+ messages in thread
From: Matt Austern @ 2003-05-19 16:18 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc

On Sunday, May 18, 2003, at 08:45  AM, Richard Guenther wrote:

> Hi!
>
> As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> performance and performance of the resulting code. As always these
> comparisons are for a POOMA based scientific application.
>
> I experience a 100% compile time regression (673.50s -> 1284.48s) and
> a 12% runtime performance regression (150s -> 171s) when comparing
> gcc3.3 to HEAD.
>
> Time reports follow, the most prominent regressions are expand, global 
> CSE
> (>300%!), loop analysis and branch prediction.
>
> Compile options are -ftemplate-depth-80 -fno-exceptions -O2 
> -march=athlon
> -funroll-loops -fomit-frame-pointer

Do you see any compile time regressions at -O0?

(I'm asking for the obvious reason: trying to find out how much work we
need to do in the front end as opposed to the back end.)

			--Matt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 16:18 ` Matt Austern
@ 2003-05-19 19:01   ` Richard Guenther
  2003-05-19 20:51   ` Richard Guenther
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 19:01 UTC (permalink / raw)
  To: Matt Austern; +Cc: gcc

On Mon, 19 May 2003, Matt Austern wrote:

> On Sunday, May 18, 2003, at 08:45  AM, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global
> > CSE
> > (>300%!), loop analysis and branch prediction.
> >
> > Compile options are -ftemplate-depth-80 -fno-exceptions -O2
> > -march=athlon
> > -funroll-loops -fomit-frame-pointer
>
> Do you see any compile time regressions at -O0?
>
> (I'm asking for the obvious reason: trying to find out how much work we
> need to do in the front end as opposed to the back end.)

For 3.3 with -O0 I get:

Execution times (seconds)
 garbage collection    :  13.19 ( 2%) usr   0.00 ( 0%) sys  12.50 ( 2%)
wall
 cfg construction      :   2.11 ( 0%) usr   0.02 ( 1%) sys   2.50 ( 0%)
wall
 cfg cleanup           :   0.83 ( 0%) usr   0.00 ( 0%) sys   1.00 ( 0%)
wall
 trivially dead code   :   1.68 ( 0%) usr   0.00 ( 0%) sys   2.00 ( 0%)
wall
 life analysis         :   5.30 ( 1%) usr   0.01 ( 0%) sys   7.00 ( 1%)
wall
 life info update      :   1.46 ( 0%) usr   0.00 ( 0%) sys   1.00 ( 0%)
wall
 preprocessing         :   0.57 ( 0%) usr   0.24 ( 8%) sys   0.50 ( 0%)
wall
 lexical analysis      :   0.44 ( 0%) usr   0.18 ( 6%) sys   0.50 ( 0%)
wall
 parser                :  14.73 ( 2%) usr   0.54 (17%) sys  14.50 ( 2%)
wall
 name lookup           :   7.38 ( 1%) usr   0.95 (30%) sys   7.50 ( 1%)
wall
 expand                : 539.94 (81%) usr   0.54 (17%) sys 540.00 (81%)
wall
 varconst              :   0.72 ( 0%) usr   0.01 ( 0%) sys   1.00 ( 0%)
wall
 integration           :  19.82 ( 3%) usr   0.19 ( 6%) sys  19.50 ( 3%)
wall
 jump                  :   2.51 ( 0%) usr   0.00 ( 0%) sys   3.50 ( 1%)
wall
 flow analysis         :   0.89 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%)
wall
 mode switching        :   2.33 ( 0%) usr   0.01 ( 0%) sys   3.00 ( 0%)
wall
 scheduling            :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%)
wall
 local alloc           :   9.75 ( 1%) usr   0.01 ( 0%) sys  11.00 ( 2%)
wall
 global alloc          :  23.63 ( 4%) usr   0.15 ( 5%) sys  22.50 ( 3%)
wall
 flow 2                :   1.59 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%)
wall
 shorten branches      :   2.15 ( 0%) usr   0.00 ( 0%) sys   1.50 ( 0%)
wall
 reg stack             :   1.48 ( 0%) usr   0.00 ( 0%) sys   3.00 ( 0%)
wall
 final                 :   3.79 ( 1%) usr   0.31 (10%) sys   6.50 ( 1%)
wall
 rest of compilation   :  10.17 ( 2%) usr   0.02 ( 1%) sys  10.00 ( 1%)
wall
 TOTAL                 : 666.48             3.19           670.50

For 3.4:

Execution times (seconds)
 garbage collection    :  12.27 ( 2%) usr   0.10 ( 0%) sys  12.38 ( 2%)
wall
 cfg construction      :   2.25 ( 0%) usr   0.10 ( 0%) sys   2.30 ( 0%)
wall
 cfg cleanup           :   0.99 ( 0%) usr   0.01 ( 0%) sys   0.89 ( 0%)
wall
 trivially dead code   :   1.31 ( 0%) usr   0.00 ( 0%) sys   1.39 ( 0%)
wall
 life analysis         :   5.39 ( 1%) usr   0.12 ( 1%) sys   5.65 ( 1%)
wall
 life info update      :   1.42 ( 0%) usr   0.00 ( 0%) sys   1.49 ( 0%)
wall
 register scan         :   1.34 ( 0%) usr   0.02 ( 0%) sys   1.33 ( 0%)
wall
 rebuild jump labels   :   1.22 ( 0%) usr   0.00 ( 0%) sys   1.23 ( 0%)
wall
 preprocessing         :   0.44 ( 0%) usr   0.17 ( 1%) sys   0.91 ( 0%)
wall
 parser                :  14.65 ( 2%) usr   0.65 ( 3%) sys  14.82 ( 2%)
wall
 name lookup           :   9.66 ( 1%) usr   0.77 ( 4%) sys  10.66 ( 1%)
wall
 expand                : 555.99 (78%) usr  11.15 (54%) sys 567.86 (78%)
wall
 varconst              :   0.73 ( 0%) usr   0.01 ( 0%) sys   0.71 ( 0%)
wall
 integration           :  17.85 ( 3%) usr   6.29 (30%) sys  24.09 ( 3%)
wall
 jump                  :   0.79 ( 0%) usr   0.04 ( 0%) sys   0.86 ( 0%)
wall
 flow analysis         :   0.66 ( 0%) usr   0.00 ( 0%) sys   0.85 ( 0%)
wall
 mode switching        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%)
wall
 scheduling            :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%)
wall
 local alloc           :  37.47 ( 5%) usr   0.03 ( 0%) sys  37.60 ( 5%)
wall
 global alloc          :  24.17 ( 3%) usr   0.80 ( 4%) sys  24.83 ( 3%)
wall
 flow 2                :   1.53 ( 0%) usr   0.07 ( 0%) sys   1.56 ( 0%)
wall
 machine dep reorg     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%)
wall
 shorten branches      :   2.60 ( 0%) usr   0.04 ( 0%) sys   2.79 ( 0%)
wall
 reg stack             :   1.31 ( 0%) usr   0.00 ( 0%) sys   1.35 ( 0%)
wall
 final                 :   4.38 ( 1%) usr   0.22 ( 1%) sys   4.62 ( 1%)
wall
 symout                :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%)
wall
 rest of compilation   :  10.13 ( 1%) usr   0.22 ( 1%) sys   9.96 ( 1%)
wall
 TOTAL                 : 708.60            20.83           730.22

Thats about 60s difference in wall clock time. Is it true, that even at
-O0 -finline is set?

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 16:18 ` Matt Austern
  2003-05-19 19:01   ` Richard Guenther
@ 2003-05-19 20:51   ` Richard Guenther
  2003-05-19 21:04     ` Gabriel Dos Reis
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 20:51 UTC (permalink / raw)
  To: Matt Austern; +Cc: gcc

On Mon, 19 May 2003, Matt Austern wrote:

> On Sunday, May 18, 2003, at 08:45  AM, Richard Guenther wrote:
>
> > Hi!
> >
> > As 3.3 is now out, I start comparing 3.3 to HEAD wrt compile time
> > performance and performance of the resulting code. As always these
> > comparisons are for a POOMA based scientific application.
> >
> > I experience a 100% compile time regression (673.50s -> 1284.48s) and
> > a 12% runtime performance regression (150s -> 171s) when comparing
> > gcc3.3 to HEAD.
> >
> > Time reports follow, the most prominent regressions are expand, global
> > CSE
> > (>300%!), loop analysis and branch prediction.
> >
> > Compile options are -ftemplate-depth-80 -fno-exceptions -O2
> > -march=athlon
> > -funroll-loops -fomit-frame-pointer
>
> Do you see any compile time regressions at -O0?
>
> (I'm asking for the obvious reason: trying to find out how much work we
> need to do in the front end as opposed to the back end.)

After killing all forced inlining, I get almost the same timings from 3.3
and 3.4, namely 42.50 and 44.89 seconds.

Slowdown comes from

3.3: name lookup           :   6.92 (17%) usr   0.90 (43%) sys   7.50
(18%) wall
3.4: name lookup           :   9.06 (22%) usr   0.81 (39%) sys  10.29
(23%) wall

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 20:51   ` Richard Guenther
@ 2003-05-19 21:04     ` Gabriel Dos Reis
  2003-05-19 21:10       ` Richard Guenther
  2003-05-19 21:24       ` Richard Guenther
  0 siblings, 2 replies; 11+ messages in thread
From: Gabriel Dos Reis @ 2003-05-19 21:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Matt Austern, gcc

Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:

| > Do you see any compile time regressions at -O0?
| >
| > (I'm asking for the obvious reason: trying to find out how much work we
| > need to do in the front end as opposed to the back end.)
| 
| After killing all forced inlining, I get almost the same timings from 3.3
| and 3.4, namely 42.50 and 44.89 seconds.
| 
| Slowdown comes from
| 
| 3.3: name lookup           :   6.92 (17%) usr   0.90 (43%) sys   7.50
| (18%) wall
| 3.4: name lookup           :   9.06 (22%) usr   0.81 (39%) sys  10.29
| (23%) wall

I've also noticed that name lookup time has increased from 3.3 to 3.4,
probably mostly because now we're doing things more correctly and
partly because we didn't really take care to optimize it.  It would be
interesting if you could report numbers for name lookup for 3.4:

  * before I applied the name lookup
  * after I applied it (i.e. cvs as of this moment)

What I've noted (and I posted figures) wkas that the patch I applied
cut the name lookup time about half on mainline, whereas I got at
least 20%  on branch.

Thanks,

-- Gaby

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 21:04     ` Gabriel Dos Reis
@ 2003-05-19 21:10       ` Richard Guenther
  2003-05-19 21:14         ` Gabriel Dos Reis
  2003-05-19 21:24       ` Richard Guenther
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 21:10 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: Matt Austern, gcc

On 19 May 2003, Gabriel Dos Reis wrote:

> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> | > Do you see any compile time regressions at -O0?
> | >
> | > (I'm asking for the obvious reason: trying to find out how much work we
> | > need to do in the front end as opposed to the back end.)
> |
> | After killing all forced inlining, I get almost the same timings from 3.3
> | and 3.4, namely 42.50 and 44.89 seconds.
> |
> | Slowdown comes from
> |
> | 3.3: name lookup           :   6.92 (17%) usr   0.90 (43%) sys   7.50
> | (18%) wall
> | 3.4: name lookup           :   9.06 (22%) usr   0.81 (39%) sys  10.29
> | (23%) wall
>
> I've also noticed that name lookup time has increased from 3.3 to 3.4,
> probably mostly because now we're doing things more correctly and
> partly because we didn't really take care to optimize it.  It would be
> interesting if you could report numbers for name lookup for 3.4:
>
>   * before I applied the name lookup
>   * after I applied it (i.e. cvs as of this moment)
>
> What I've noted (and I posted figures) wkas that the patch I applied
> cut the name lookup time about half on mainline, whereas I got at
> least 20%  on branch.

If you mean

2003-05-18  Gabriel Dos Reis  <gdr@integrable-solutions.net>

        * cp-tree.h (struct lang_type_class): Replace data member tags
        with hash-table nested_udts.
        (CLASSTYPE_NESTED_UTDS): Rename from CLASSTYPE_TAGS.
        [...]

this is already included in the numbers. I.e. the 3.4 numbers are from CVS
about 2 hours ago.

Richard.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 21:10       ` Richard Guenther
@ 2003-05-19 21:14         ` Gabriel Dos Reis
  0 siblings, 0 replies; 11+ messages in thread
From: Gabriel Dos Reis @ 2003-05-19 21:14 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Matt Austern, gcc

Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:

| On 19 May 2003, Gabriel Dos Reis wrote:
| 
| > Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
| >
| > | > Do you see any compile time regressions at -O0?
| > | >
| > | > (I'm asking for the obvious reason: trying to find out how much work we
| > | > need to do in the front end as opposed to the back end.)
| > |
| > | After killing all forced inlining, I get almost the same timings from 3.3
| > | and 3.4, namely 42.50 and 44.89 seconds.
| > |
| > | Slowdown comes from
| > |
| > | 3.3: name lookup           :   6.92 (17%) usr   0.90 (43%) sys   7.50
| > | (18%) wall
| > | 3.4: name lookup           :   9.06 (22%) usr   0.81 (39%) sys  10.29
| > | (23%) wall
| >
| > I've also noticed that name lookup time has increased from 3.3 to 3.4,
| > probably mostly because now we're doing things more correctly and
| > partly because we didn't really take care to optimize it.  It would be
| > interesting if you could report numbers for name lookup for 3.4:
| >
| >   * before I applied the name lookup
| >   * after I applied it (i.e. cvs as of this moment)
| >
| > What I've noted (and I posted figures) wkas that the patch I applied
| > cut the name lookup time about half on mainline, whereas I got at
| > least 20%  on branch.
| 
| If you mean
| 
| 2003-05-18  Gabriel Dos Reis  <gdr@integrable-solutions.net>
| 
|         * cp-tree.h (struct lang_type_class): Replace data member tags
|         with hash-table nested_udts.
|         (CLASSTYPE_NESTED_UTDS): Rename from CLASSTYPE_TAGS.
|         [...]
| 
| this is already included in the numbers. I.e. the 3.4 numbers are from CVS
| about 2 hours ago.

Thanks.  Do you have numbers for say, mainline as of yesterday?

-- Gaby

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Huge compile time & run time performance regression 3.3 -> HEAD
  2003-05-19 21:04     ` Gabriel Dos Reis
  2003-05-19 21:10       ` Richard Guenther
@ 2003-05-19 21:24       ` Richard Guenther
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Guenther @ 2003-05-19 21:24 UTC (permalink / raw)
  To: Gabriel Dos Reis; +Cc: Richard Guenther, Matt Austern, gcc

On 19 May 2003, Gabriel Dos Reis wrote:

> Richard Guenther <rguenth@tat.physik.uni-tuebingen.de> writes:
>
> | > Do you see any compile time regressions at -O0?
> | >
> | > (I'm asking for the obvious reason: trying to find out how much work we
> | > need to do in the front end as opposed to the back end.)
> |
> | After killing all forced inlining, I get almost the same timings from 3.3
> | and 3.4, namely 42.50 and 44.89 seconds.
> |
> | Slowdown comes from
> |
> | 3.3: name lookup           :   6.92 (17%) usr   0.90 (43%) sys   7.50
> | (18%) wall
> | 3.4: name lookup           :   9.06 (22%) usr   0.81 (39%) sys  10.29
> | (23%) wall
>
> I've also noticed that name lookup time has increased from 3.3 to 3.4,
> probably mostly because now we're doing things more correctly and
> partly because we didn't really take care to optimize it.  It would be
> interesting if you could report numbers for name lookup for 3.4:
>
>   * before I applied the name lookup
>   * after I applied it (i.e. cvs as of this moment)

With g++ (GCC) 3.4 20030505 (experimental) I get a total time of 46.37 and

 name lookup           :   9.42 (23%) usr   0.93 (40%) sys  10.20 (22%)
wall

Richard.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-05-19 21:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-18 16:10 Huge compile time & run time performance regression 3.3 -> HEAD Richard Guenther
2003-05-18 19:22 ` Richard Guenther
2003-05-18 19:31   ` Richard Guenther
2003-05-18 19:35     ` Richard Guenther
2003-05-19 16:18 ` Matt Austern
2003-05-19 19:01   ` Richard Guenther
2003-05-19 20:51   ` Richard Guenther
2003-05-19 21:04     ` Gabriel Dos Reis
2003-05-19 21:10       ` Richard Guenther
2003-05-19 21:14         ` Gabriel Dos Reis
2003-05-19 21:24       ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).