From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brad Lucier To: jh@suse.cz (Jan Hubicka) Cc: lucier@math.purdue.edu (Brad Lucier), jh@suse.cz (Jan Hubicka), gcc-patches@gcc.gnu.org, rth@cygnus.com, gcc@gcc.gnu.org Subject: Re: Timing information for CFG manipulations Date: Tue, 16 Oct 2001 16:57:00 -0000 Message-id: <200110162357.f9GNveR26305@banach.math.purdue.edu> References: <20011016234526.B19140@atrey.karlin.mff.cuni.cz> X-SW-Source: 2001-10/msg00934.html > Just curious, how does the time compare to the older gcc versions? Well, 3.0.1 compiles this file in about 1/3 the time: dino01% /soft/parallelisme/linux/gcc-3.0.1/lib/gcc-lib/i686-pc-linux-gnu/3.0.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 23738k -> 7627k} {GC 11539k -> 7534k} {GC 9859k -> 7972k} {GC 11631k -> 8916k} {GC 14125k -> 9023k} ___init_proc ____20___num Execution times (seconds) garbage collection : 0.61 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall preprocessing : 0.14 ( 0%) usr 0.13 (14%) sys 0.27 ( 0%) wall lexical analysis : 0.34 ( 1%) usr 0.23 (24%) sys 0.56 ( 1%) wall parser : 1.10 ( 2%) usr 0.15 (16%) sys 1.27 ( 2%) wall varconst : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall jump : 4.04 ( 6%) usr 0.17 (18%) sys 4.36 ( 6%) wall CSE : 0.75 ( 1%) usr 0.00 ( 0%) sys 0.77 ( 1%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall flow analysis : 11.21 (17%) usr 0.08 ( 8%) sys 12.47 (16%) wall combiner : 1.07 ( 2%) usr 0.01 ( 1%) sys 1.12 ( 1%) wall if-conversion : 1.12 ( 2%) usr 0.03 ( 3%) sys 1.22 ( 2%) wall local alloc : 0.41 ( 1%) usr 0.03 ( 3%) sys 0.56 ( 1%) wall global alloc : 2.58 ( 4%) usr 0.03 ( 3%) sys 3.14 ( 4%) wall reload CSE regs : 9.85 (15%) usr 0.01 ( 1%) sys 12.76 (16%) wall flow 2 : 13.72 (21%) usr 0.01 ( 1%) sys 19.63 (25%) wall if-conversion 2 : 0.96 ( 1%) usr 0.01 ( 1%) sys 1.19 ( 2%) wall shorten branches : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall reg stack : 15.94 (24%) usr 0.05 ( 5%) sys 17.05 (22%) wall final : 0.97 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall rest of compilation : 0.45 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall TOTAL : 65.49 0.96 78.82 > I wonder if we can't speed up the flow.c pass considerably. > What other functions (except for bitmap_operation) does have more than 10 > millions of calls? Do we run into problems with too much RTL traversal > or it is purely dominated by the dataflow bitmaps? Here is some detailed information from gprof: /u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num Execution times (seconds) garbage collection : 1.10 ( 1%) usr 0.00 ( 0%) sys 1.09 ( 1%) wall cfg construction : 8.09 ( 4%) usr 0.40 ( 8%) sys 8.56 ( 4%) wall cfg cleanup : 52.37 (25%) usr 0.03 ( 1%) sys 52.47 (24%) wall preprocessing : 0.48 ( 0%) usr 0.10 ( 2%) sys 0.50 ( 0%) wall lexical analysis : 0.73 ( 0%) usr 0.22 ( 5%) sys 0.94 ( 0%) wall parser : 2.57 ( 1%) usr 0.21 ( 4%) sys 2.94 ( 1%) wall varconst : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall jump : 0.76 ( 0%) usr 0.02 ( 0%) sys 0.72 ( 0%) wall CSE : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.75 ( 1%) wall global CSE : 41.29 (20%) usr 0.53 (11%) sys 41.88 (19%) wall loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall flow analysis : 24.23 (11%) usr 0.16 ( 3%) sys 24.38 (11%) wall combiner : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.78 ( 1%) wall if-conversion : 1.19 ( 1%) usr 0.04 ( 1%) sys 1.25 ( 1%) wall local alloc : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall global alloc : 4.69 ( 2%) usr 0.05 ( 1%) sys 4.75 ( 2%) wall reload CSE regs : 9.90 ( 5%) usr 0.01 ( 0%) sys 9.88 ( 5%) wall flow 2 : 33.92 (16%) usr 0.10 ( 2%) sys 34.00 (16%) wall if-conversion 2 : 0.98 ( 0%) usr 0.03 ( 1%) sys 1.03 ( 0%) wall shorten branches : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall reg stack : 22.45 (11%) usr 2.86 (60%) sys 25.31 (12%) wall final : 0.75 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall rest of compilation : 1.30 ( 1%) usr 0.00 ( 0%) sys 1.38 ( 1%) wall TOTAL : 211.35 4.78 216.50 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 16.66 29.10 29.10 72698858 0.00 0.00 bitmap_operation 12.43 50.81 21.71 13 1670.00 4145.64 calculate_global_regs_live 9.86 68.04 17.23 9305997 0.00 0.00 cached_make_edge 5.57 77.77 9.73 67331 0.14 0.38 try_crossjump_bb 4.03 84.81 7.04 htab_traverse 2.99 90.03 5.22 27855 0.19 0.19 sbitmap_intersection_of_su ccs 2.60 94.57 4.54 207472 0.02 0.02 try_forward_edges 2.31 98.61 4.04 6 673.33 976.87 compute_laterin 2.21 102.47 3.86 6270 0.62 0.62 expunge_block 2.18 106.28 3.81 191 19.95 19.95 sbitmap_vector_alloc 2.15 110.04 3.76 6047223 0.00 0.00 rtx_renumbered_equal_p 2.12 113.74 3.70 31 119.35 119.35 find_unreachable_blocks 1.72 116.75 3.01 61978208 0.00 0.00 active_insn_p 1.67 119.66 2.91 9272730 0.00 0.00 make_label_edge 1.67 122.57 2.91 451 6.45 6.45 propagate_freq 1.35 124.93 2.36 15 157.33 158.67 calc_idoms 1.33 127.26 2.33 101420319 0.00 0.00 bitmap_element_link 1.33 129.59 2.33 29217 0.08 0.08 sbitmap_intersection_of_pr eds 1.11 131.53 1.94 25030880 0.00 0.00 forwarder_block_p 1.01 133.30 1.77 5845902 0.00 0.00 flow_find_cross_jump 0.97 134.99 1.69 9385 0.18 0.20 convert_regs_1 0.95 136.65 1.66 15 110.67 110.67 calc_dfs_tree_nonrec 0.94 138.29 1.64 120 13.67 201.19 make_edges 0.87 139.81 1.52 11128643 0.00 0.00 sbitmap_a_and_b 0.81 141.23 1.42 6407969 0.00 0.00 try_crossjump_to_edge 0.78 142.60 1.37 5627 0.24 0.24 can_delete_label_p 0.69 143.81 1.21 6 201.67 281.03 compute_insert_delete 0.68 144.99 1.18 6 196.67 328.86 compute_earliest 0.64 146.11 1.12 6 186.67 186.67 mark_dfs_back_edges 0.58 147.13 1.02 11904476 0.00 0.00 onlyjump_p 0.56 148.10 0.97 10 97.00 106.99 clear_edges 0.54 149.05 0.95 7458208 0.00 0.00 sbitmap_difference 0.54 150.00 0.95 3 316.67 701.77 flow_loops_find 0.50 150.87 0.87 6 145.00 1027.93 compute_antinout_edge 0.46 151.68 0.81 6 135.00 135.00 create_edge_list 0.42 152.42 0.74 13479620 0.00 0.00 find_reg_note 0.35 153.03 0.61 5 122.00 4570.84 commit_edge_insertions 0.34 153.62 0.59 10292 0.06 0.06 remove_edge 0.33 154.19 0.57 6 95.00 496.89 compute_available 0.33 154.76 0.57 3 190.00 1449.86 estimate_bb_frequencies 0.30 155.29 0.53 11543061 0.00 0.00 find_reg_equal_equiv_note 0.30 155.82 0.53 24708 0.02 0.04 try_combine 0.30 156.34 0.52 12898539 0.00 0.00 side_effects_p 0.26 156.80 0.46 170126 0.00 0.00 find_reloads 0.24 157.22 0.42 4200455 0.00 0.00 ggc_set_mark 0.24 157.64 0.42 1 420.00 2322.97 convert_regs_2 0.22 158.02 0.38 18788 0.02 0.02 remove_fake_successors 0.22 158.40 0.38 6 63.33 71.66 thread_jumps 0.21 158.77 0.37 3 123.33 14103.27 optimize_mode_switching 0.19 159.10 0.33 3 110.00 110.00 compute_alignments 0.18 159.42 0.32 11698103 0.00 0.00 single_set_2 ... ----------------------------------------------- 21.71 32.18 13/13 update_life_info [8] [9] 30.9 21.71 32.18 13 calculate_global_regs_live [9] 28.90 2.36 72205551/72698858 bitmap_operation [14] 0.03 0.82 56808/132182 propagate_block [55] 0.00 0.02 56808/56808 bitmap_equal_p [376] 0.02 0.00 100467/219660 bitmap_copy [279] 0.01 0.00 328395/1374187 bitmap_clear [304] 0.00 0.00 255232/1933812 bitmap_set_bit [282] 0.00 0.00 112039/518311 bitmap_initialize [420] 0.00 0.00 1828/101420319 bitmap_element_link [52] 0.00 0.00 1/1125177 sbitmap_zero [225] ----------------------------------------------- 0.37 41.94 3/3 rest_of_compilation [7] [10] 24.2 0.37 41.94 3 optimize_mode_switching [10] 0.00 20.56 6/6 pre_edge_lcm [19] 0.01 16.47 3/11 update_life_info [8] 0.12 4.45 1/5 commit_edge_insertions [17] 0.24 0.00 12/191 sbitmap_vector_alloc [38] 0.01 0.01 3/21 sbitmap_vector_ones [177] 0.01 0.01 37484/2679415 note_stores [115] 0.00 0.01 37243/80229 get_attr_type [327] 0.01 0.01 12/137 sbitmap_vector_zero [171] 0.01 0.00 48/48 make_preds_opaque [581] 0.00 0.00 3/10 allocate_reg_life_data [457] 0.00 0.00 23837/460580 reg_set_to_hard_reg_set [244] 0.00 0.00 14620/91501 gen_sequence [405] 0.00 0.00 3528/60871 recog_memoized_1 [290] 0.00 0.00 28568/474001 asm_noperands [359] 0.00 0.00 18576/3738386 sbitmap_not [184] 0.00 0.00 14606/14838 emit_insn_before [991] 0.00 0.00 5/5 emit_i387_cw_initialization [1113] 0.00 0.00 14/123 insert_insn_on_edge [1058] 0.00 0.00 10/68 assign_386_stack_local [1079] 0.00 0.00 5/2679415 emit_move_insn [451] 0.00 0.00 24525/24525 reg_dies [1398] 0.00 0.00 14634/108004 end_sequence [1322] 0.00 0.00 14620/108004 start_sequence [1323] 0.00 0.00 9292/9292 new_seginfo [1466] 0.00 0.00 9292/9292 add_seginfo [1465] 0.00 0.00 6/6 free_edge_list [1968] ----------------------------------------------- 0.00 5.32 3/22 update_life_info [8] 0.00 33.69 19/22 rest_of_compilation [7] [11] 22.3 0.00 39.01 22 cleanup_cfg [11] 0.08 35.19 22/22 try_optimize_cfg [13] 0.02 3.72 31/31 delete_unreachable_blocks [40] 0.00 0.00 22/1239368 timevar_push [153] 0.00 0.00 22/1239368 timevar_pop [164] 0.00 0.00 44/136318 free_EXPR_LIST_list [1315] ----------------------------------------------- 0.00 5.54 1/7 reg_to_stack [23] 0.00 33.26 6/7 rest_of_compilation [7] [12] 22.2 0.00 38.81 7 life_analysis [12] 0.03 38.42 7/11 update_life_info [8] 0.05 0.14 6/16 init_alias_analysis [95] 0.04 0.06 7/10 delete_noop_moves [179] 0.00 0.03 7/73 free_basic_block_vars [119] 0.01 0.02 3/3 notice_stack_pointer_modification [365] 0.01 0.01 7/10 allocate_reg_life_data [457] 0.00 0.00 7/7 allocate_bb_life_data [716] 0.00 0.00 7/7 mark_regs_live_at_end [1213] 0.00 0.00 6/16 end_alias_analysis [1922] ----------------------------------------------- 0.08 35.19 22/22 cleanup_cfg [11] [13] 20.2 0.08 35.19 22 try_optimize_cfg [13] 9.73 15.62 67331/67331 try_crossjump_bb [15] 4.54 0.07 207472/207472 try_forward_edges [33] 0.00 2.40 3138/5348 flow_delete_block [36] 0.06 1.72 207472/207472 try_simplify_condjump [59] 0.00 0.41 1099/1099 merge_blocks [107] 0.00 0.38 6/6 remove_fake_edges [111] 0.00 0.18 2011/12100 delete_insn_chain [73] 0.00 0.05 74089/81227 redirect_edge_and_branch [271] 0.01 0.01 95491/25030880 forwarder_block_p [32] 0.01 0.00 85868/11904476 onlyjump_p [64] 0.00 0.00 350/7709 redirect_edge_succ_nodup [604] 0.00 0.00 2011/157599 reg_mentioned_p [422] 0.00 0.00 6/6 add_noreturn_fake_exit_edges [1965] ----------------------------------------------- 0.00 0.00 5/72698858 find_if_case_1 [620] 0.00 0.00 270/72698858 dead_or_predicable [710] 0.01 0.00 18572/72698858 update_equiv_regs [159] 0.02 0.00 56808/72698858 bitmap_equal_p [376] 0.17 0.01 417652/72698858 finish_spills [114] 28.90 2.36 72205551/72698858 calculate_global_regs_live [9] [14] 18.0 29.10 2.38 72698858 bitmap_operation [14] 2.31 0.00 100601217/101420319 bitmap_element_link [52] 0.07 0.00 3360247/4560586 bitmap_element_allocate [214] ----------------------------------------------- 9.73 15.62 67331/67331 try_optimize_cfg [13] [15] 14.5 9.73 15.62 67331 try_crossjump_bb [15] 1.42 14.20 6407969/6407969 try_crossjump_to_edge [21] ----------------------------------------------- 0.14 1.88 10/120 find_basic_blocks [46] 1.50 20.63 110/120 find_sub_basic_blocks [18] [16] 13.8 1.64 22.50 120 make_edges [16] 17.22 0.00 9301682/9305997 cached_make_edge [20] 2.91 0.00 9272730/9272730 make_label_edge [47] 2.19 0.00 110/191 sbitmap_vector_alloc [38] 0.07 0.06 110/137 sbitmap_vector_zero [171] 0.00 0.02 44026/97363 returnjump_p [317] 0.01 0.00 47889/50845 computed_jump_p [493] 0.01 0.00 75082/93306 next_nonnote_insn [556] 0.00 0.00 47889/13479620 find_reg_note [83] ----------------------------------------------- 0.12 4.45 1/5 optimize_mode_switching [10] 0.12 4.45 1/5 convert_regs [27] 0.37 13.35 3/5 thread_prologue_and_epilogue_insns [22] [17] 13.1 0.61 22.24 5 commit_edge_insertions [17] 0.27 21.96 109/110 find_sub_basic_blocks [18] 0.00 0.02 109/109 commit_one_edge_insertion [472] ----------------------------------------------- 0.00 0.20 1/110 split_all_insns [96] 0.27 21.96 109/110 commit_edge_insertions [17] [18] 12.8 0.27 22.16 110 find_sub_basic_blocks [18] 1.50 20.63 110/120 make_edges [16] 0.00 0.03 362/362 find_bb_boundaries [339] 0.00 0.00 362/9683 purge_dead_edges [684] 0.00 0.00 544/13479620 find_reg_note [83] ----------------------------------------------- 0.00 20.56 6/6 optimize_mode_switching [10] [19] 11.8 0.00 20.56 6 pre_edge_lcm [19] 0.87 5.30 6/6 compute_antinout_edge [29] 4.04 1.82 6/6 compute_laterin [30] 0.57 2.41 6/6 compute_available [43] 1.18 0.79 6/6 compute_earliest [56] 1.21 0.48 6/6 compute_insert_delete [60] 1.08 0.00 54/191 sbitmap_vector_alloc [38] 0.81 0.00 6/6 create_edge_list [79] ----------------------------------------------- 0.01 0.00 4315/9305997 make_edge [613] 17.22 0.00 9301682/9305997 make_edges [16] [20] 9.9 17.23 0.00 9305997 cached_make_edge [20] ----------------------------------------------- 1.42 14.20 6407969/6407969 try_crossjump_bb [15] [21] 8.9 1.42 14.20 6407969 try_crossjump_to_edge [21] 1.77 6.66 5845902/5845902 flow_find_cross_jump [24] 1.90 2.93 24480556/25030880 forwarder_block_p [32] 0.00 0.33 3663/12100 delete_insn_chain [73] 0.21 0.01 6407841/6407841 outgoing_edges_match [148] 0.21 0.00 3729/10292 remove_edge [93] 0.00 0.18 517/597 split_block [154] 0.00 0.01 3663/4298 make_single_succ_edge [614] 0.00 0.00 3663/10353 gen_jump [689] 0.00 0.00 3663/3736 emit_jump_insn_after [783] 0.00 0.00 3663/93306 next_nonnote_insn [556] 0.00 0.00 3663/13479620 find_reg_note [83] 0.00 0.00 3729/7450804 free_edge [203] 0.00 0.00 3663/9503 block_label [961] 0.00 0.00 66/153 emit_barrier_after [1036] 0.00 0.00 66/254995 gen_rtx_CONST_INT [519] ----------------------------------------------- 0.00 13.72 3/3 rest_of_compilation [7] [22] 7.9 0.00 13.72 3 thread_prologue_and_epilogue_insns [22] 0.37 13.35 3/5 commit_edge_insertions [17] 0.00 0.00 3/3 gen_prologue [699] 0.00 0.00 3/3 gen_epilogue [1092] 0.00 0.00 6/91501 gen_sequence [405] 0.00 0.00 6/129874 emit_insn [386] 0.00 0.00 6/123 insert_insn_on_edge [1058] 0.00 0.00 6/51353 emit_note [663] 0.00 0.00 3/17313 emit_jump_insn [703] 0.00 0.00 12/108004 end_sequence [1322] 0.00 0.00 6/108004 start_sequence [1323] 0.00 0.00 6/6 record_insns [1976] 0.00 0.00 3/3 ix86_can_use_return_insn_p [2039] ----------------------------------------------- 0.26 12.66 3/3 rest_of_compilation [7] [23] 7.4 0.26 12.66 3 reg_to_stack [23] 0.00 6.92 1/1 convert_regs [27] 0.00 5.54 1/7 life_analysis [12] 0.19 0.00 1/6 mark_dfs_back_edges [72] 0.01 0.00 1/5 count_or_remove_death_notes [309] 0.00 0.00 1/7 delete_dead_jumptables [351] 0.00 0.00 1/4 alloc_aux_for_blocks [582] 0.00 0.00 105/152060 gen_raw_REG [366] 0.00 0.00 105/138338 gen_rtx_REG [1314] 0.00 0.00 1/9317 get_max_uid [1463] 0.00 0.00 1/262 varray_init [1700] ----------------------------------------------- 1.77 6.66 5845902/5845902 try_crossjump_to_edge [21] [24] 4.8 1.77 6.66 5845902 flow_find_cross_jump [24] 3.59 0.00 5771506/6047223 rtx_renumbered_equal_p [39] 0.53 0.96 11543012/11543061 find_reg_equal_equiv_note [65] 1.00 0.47 11691804/11904476 onlyjump_p [64] 0.05 0.04 616039/682505 stack_regs_mentioned [205] 0.00 0.02 46872/97363 returnjump_p [317] 0.00 0.00 188/1264901 rtx_equal_p [442] 0.00 0.00 7/39 remove_note [1892] ----------------------------------------------- 0.02 7.72 9/9 rest_of_compilation [7] [25] 4.4 0.02 7.72 9 if_convert [25] 0.00 5.49 1/11 update_life_info [8] 0.00 1.65 6/15 calculate_dominance_info [35] 0.01 0.40 28166/28166 find_if_header [106] 0.12 0.00 6/191 sbitmap_vector_alloc [38] 0.00 0.04 9/73 free_basic_block_vars [119] 0.01 0.00 1/5 count_or_remove_death_notes [309] 0.00 0.00 1/26 allocate_reg_info [432] 0.00 0.00 1/1125177 sbitmap_zero [225] 0.00 0.00 2/60023 max_reg_num [1344] 0.00 0.00 1/662 sbitmap_alloc [1612]