public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Where does the time go?
@ 2010-05-20 15:55 Steven Bosscher
  2010-05-20 19:16 ` Vladimir Makarov
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Steven Bosscher @ 2010-05-20 15:55 UTC (permalink / raw)
  To: GCC Mailing List

Hello,

For some time now, I've wanted to see where compile time goes in a
typical GCC build, because nobody really seems to know what the
compiler spends its time on. The impressions that get published about
gcc usually indicate that there is at least a feeling that GCC is not
getting faster, and that parts of the compiler are unreasonably slow.
I was hoping to maybe shed some light on what parts that could be.

What I've done is this:
* Build GCC 4.6.0 (trunk r159624) with --enable-checking=release and
with -O2 and install it
* Build GCC 4.6.0 (trunk r159624) again, with the installed compiler
and with "-O2 -g3 -ftime-report". The time reports (along with
everything else on stderr) are piped to an output file
* Extract, sum, and sort time consumed per timevar

Host was cfarm gcc14 (8 x 3GHz Xeon). Target was
x86_64-unknown-linux-gnu. "Build" means non-bootstrap.

Results at the bottom of this mail.

Conclusions:

* There are quite a few timevars for parts of the compiler that have
been removed: TV_SEQABSTR, TV_GLOBAL_ALLOC, TV_LOCAL_ALLOC are the
ones I've spotted so far.  I will go through the whole list, remove
all timevars that are unused, and submit a patch.

* The "slow" parts of the compiler are not exactly news: tree-PRE,
scheduling, register allocation

* Variable tracking costs ~7.8% of compile time. This more than the
cost of the register allocation (IRA+reload)

* The C front end (preprocessing+lexing+parsing) costs ~17%. For an
optimizing compiler with so many passes, this is quite a lot.

* The GIMPLE optimizers (done with egrep
"tree|dominator_opt|alias_stmt_walking|alias_analysis|inline_heuristics|PHI_merge")
together cost ~16%.

* Adding and subtracting the above numbers, the rest of the compiler,
which is mostly the RTL parts, still account for 100-17-16-8=59% of
the total compile time. This was the most surprising result for me.

Ciao!
Steven


auto_inc_dec                                    0.00    0%
callgraph_verifier                              0.00    0%
cfg_construction                                0.00    0%
CFG_verifier                                    0.00    0%
delay_branch_sched                              0.00    0%
df_live_byte_regs                               0.00    0%
df_scan_insns                                   0.00    0%
df_uninitialized_regs_2                         0.00    0%
dump_files                                      0.00    0%
global_alloc                                    0.00    0%
Graphite_code_generation                        0.00    0%
Graphite_data_dep_analysis                      0.00    0%
Graphite_loop_transforms                        0.00    0%
ipa_free_lang_data                              0.00    0%
ipa_lto_cgraph_IO                               0.00    0%
ipa_lto_cgraph_merge                            0.00    0%
ipa_lto_decl_init_IO                            0.00    0%
ipa_lto_decl_IO                                 0.00    0%
ipa_lto_decl_merge                              0.00    0%
ipa_lto_gimple_IO                               0.00    0%
ipa_points_to                                   0.00    0%
ipa_profile                                     0.00    0%
ipa_type_escape                                 0.00    0%
life_analysis                                   0.00    0%
life_info_update                                0.00    0%
load_CSE_after_reload                           0.00    0%
local_alloc                                     0.00    0%
loop_doloop                                     0.00    0%
loop_unrolling                                  0.00    0%
loop_unswitching                                0.00    0%
LSM                                             0.00    0%
lto                                             0.00    0%
name_lookup                                     0.00    0%
overload_resolution                             0.00    0%
PCH_main_state_restore                          0.00    0%
PCH_main_state_save                             0.00    0%
PCH_pointer_reallocation                        0.00    0%
PCH_pointer_sort                                0.00    0%
PCH_preprocessor_state_restore                  0.00    0%
PCH_preprocessor_state_save                     0.00    0%
plugin_execution                                0.00    0%
plugin_initialization                           0.00    0%
predictive_commoning                            0.00    0%
reg_stack                                       0.00    0%
rename_registers                                0.00    0%
rest_of_compilation                             0.00    0%
sequence_abstraction                            0.00    0%
shorten_branches                                0.00    0%
sms_modulo_scheduling                           0.00    0%
template_instantiation                          0.00    0%
total_time                                      0.00    0%
tracer                                          0.00    0%
tree_check_data_dependences                     0.00    0%
tree_loop_distribution                          0.00    0%
tree_loop_linear                                0.00    0%
tree_loop_optimization                          0.00    0%
tree_loop_unswitching                           0.00    0%
tree_parallelize_loops                          0.00    0%
tree_prefetching                                0.00    0%
tree_redundant_PHIs                             0.00    0%
tree_slp_vectorization                          0.00    0%
tree_SSA_to_normal                              0.00    0%
tree_SSA_verifier                               0.00    0%
tree_STMT_verifier                              0.00    0%
tree_STORE_CCP                                  0.00    0%
tree_store_copy_prop                            0.00    0%
tree_vectorization                              0.00    0%
value_profile_opts                              0.00    0%
web                                             0.00    0%
whopr_ltrans                                    0.00    0%
whopr_wpa                                       0.00    0%
whopr_wpa_IO                                    0.00    0%
whopr_wpa_ltrans                                0.00    0%
mode_switching                                  0.01    0.00261117%
tree_NRV_optimization                           0.01    0.00261117%
tree_loop_fini                                  0.03    0.00783351%
tree_switch_initialization_conversion           0.03    0.00783351%
lower_subreg                                    0.04    0.0104447%
tree_buildin_call_DCE                           0.05    0.0130559%
code_hoisting                                   0.06    0.015667%
ipa_reference                                   0.06    0.015667%
tree_canonical_iv                               0.06    0.015667%
tree_if_combine                                 0.06    0.015667%
PHI_merge                                       0.07    0.0182782%
tree_phiprop                                    0.07    0.0182782%
uninit_var_anaysis                              0.07    0.0182782%
control_dependences                             0.08    0.0208894%
varconst                                        0.09    0.0235005%
tree_PHI_const_copy_prop                        0.16    0.0417787%
tree_eh                                         0.19    0.0496122%
tree_split_crit_edges                           0.19    0.0496122%
scev_constant_prop                              0.20    0.0522234%
tree_PHI_insertion                              0.20    0.0522234%
tree_copy_headers                               0.23    0.0600569%
tree_loop_bounds                                0.24    0.0626681%
tree_loop_invariant_motion                      0.27    0.0705016%
variable_output                                 0.27    0.0705016%
combine_stack_adjustments                       0.28    0.0731128%
garbage_collection                              0.28    0.0731128%
loop_analysis                                   0.28    0.0731128%
tree_SSA_uncprop                                0.28    0.0731128%
tree_SRA                                        0.30    0.0783351%
ipa_cp                                          0.34    0.0887798%
tree_linearize_phis                             0.34    0.0887798%
tree_DSE                                        0.39    0.101836%
tree_find_ref._vars                             0.39    0.101836%
varpool_construction                            0.39    0.101836%
tree_rename_SSA_copies                          0.47    0.122725%
complete_unrolling                              0.50    0.130559%
tree_SSA_other                                  0.53    0.138392%
tree_loop_init                                  0.57    0.148837%
tree_CFG_construction                           0.59    0.154059%
tree_code_sinking                               0.60    0.15667%
zee                                             0.62    0.161893%
dominance_frontiers                             0.65    0.169726%
loop_invariant_motion                           0.66    0.172337%
register_scan                                   0.67    0.174948%
ipa_pure_const                                  0.71    0.185393%
tree_reassociation                              0.72    0.188004%
callgraph_construction                          0.73    0.190615%
if_conversion_2                                 0.74    0.193227%
tree_forward_propagate                          0.77    0.20106%
ipa_SRA                                         0.91    0.237617%
peephole_2                                      0.95    0.248061%
tree_conservative_DCE                           0.96    0.250672%
regmove                                         1.02    0.266339%
thread_pro_and_epilogue                         1.28    0.33423%
tree_iv_optimization                            1.31    0.342063%
tree_operand_scan                               1.32    0.344675%
rebuild_jump_labels                             1.33    0.347286%
jump                                            1.34    0.349897%
branch_prediction                               1.35    0.352508%
machine_dep_reorg                               1.36    0.355119%
inline_heuristics                               1.42    0.370786%
df_multiple_defs                                1.50    0.391676%
dead_code_elimination                           1.74    0.454344%
tree_SSA_rewrite                                1.80    0.470011%
df_use_def_def_use_chains                       1.86    0.485678%
trivially_dead_code                             2.00    0.522234%
reorder_blocks                                  2.07    0.540512%
hard_reg_cprop                                  2.10    0.548346%
alias_stmt_walking                              2.15    0.561402%
tree_copy_propagation                           2.19    0.571846%
register_information                            2.27    0.592736%
tree_aggressive_DCE                             2.29    0.597958%
dead_store_elim1                                2.38    0.621459%
dead_store_elim2                                2.40    0.626681%
integration                                     2.73    0.71285%
if_conversion                                   2.80    0.731128%
tree_CCP                                        2.89    0.754628%
tree_gimplify                                   3.03    0.791185%
callgraph_optimization                          3.22    0.840797%
forward_prop                                    3.25    0.84863%
alias_analysis                                  3.41    0.890409%
df_reaching_defs                                3.44    0.898243%
dominator_optimization                          3.49    0.911299%
tree_SSA_incremental                            3.50    0.91391%
tree_FRE                                        3.90    1.01836%
CSE_2                                           4.71    1.22986%
tree_PTA                                        4.80    1.25336%
CPROP                                           4.98    1.30036%
reload_CSE_regs                                 5.23    1.36564%
final                                           5.26    1.37348%
dominance_computation                           5.44    1.42048%
df_reg_dead_unused_notes                        5.62    1.46748%
tree_CFG_cleanup                                5.69    1.48576%
cfg_cleanup                                     6.28    1.63982%
PRE                                             6.64    1.73382%
lexical_analysis                                6.65    1.73643%
CSE                                             8.16    2.13072%
tree_VRP                                        8.36    2.18294%
symout                                          8.94    2.33439%
combiner                                        10.17   2.65556%
tree_PRE                                        11.42   2.98196%
scheduling_2                                    11.44   2.98718%
reload                                          11.7    3.05507%
df_live_initialized_regs                        12.92   3.37363%
integrated_RA                                   16.31   4.25882%
df_live_regs                                    17.52   4.57477%
expand                                          24.18   6.31381%
preprocessing                                   27.59   7.20422%
variable_tracking                               29.17   7.61678%
parser                                          31.53   8.23302%
TOTAL                                           382.97  100%

^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: Where does the time go?
@ 2010-05-20 21:28 Bradley Lucier
  0 siblings, 0 replies; 26+ messages in thread
From: Bradley Lucier @ 2010-05-20 21:28 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Bradley Lucier, GCC Mailing List

On my codes, pre-RA instruction scheduling on X86-64 (a) improves run
times by roughly 10%, and (b) costs a lot of compile time.

The -fscheduling option didn't seem to be on in your time tests (I think
it's not on by default on that architecture at -O2).

Brad

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-05-24 22:59 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-20 15:55 Where does the time go? Steven Bosscher
2010-05-20 19:16 ` Vladimir Makarov
2010-05-20 19:57   ` Toon Moene
2010-05-20 20:36   ` Steven Bosscher
2010-05-20 20:54     ` Duncan Sands
2010-05-20 21:14       ` Steven Bosscher
2010-05-23 19:09         ` Joseph S. Myers
2010-05-24 17:00           ` Mark Mitchell
2010-05-24 21:07             ` Steven Bosscher
2010-05-24 23:22               ` Mark Mitchell
2010-05-25  1:20                 ` Joseph S. Myers
2010-05-20 21:09     ` Ian Lance Taylor
2010-05-20 21:14       ` Xinliang David Li
2010-05-20 21:18         ` Steven Bosscher
2010-05-20 21:21           ` Xinliang David Li
2010-05-21 10:54             ` Richard Guenther
2010-05-21 13:26               ` Jan Hubicka
2010-05-21 15:06                 ` Richard Guenther
2010-05-21 15:49                   ` Jan Hubicka
2010-05-21 17:06               ` Xinliang David Li
2010-05-21 17:07                 ` Richard Guenther
2010-05-20 19:36 ` Joseph S. Myers
2010-05-20 20:35 ` Eric Botcazou
2010-05-20 20:42   ` Eric Botcazou
2010-05-21 20:43 ` Diego Novillo
2010-05-20 21:28 Bradley Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).