From mboxrd@z Thu Jan 1 00:00:00 1970 From: lucier@math.purdue.edu To: gcc-gnats@gcc.gnu.org Subject: optimization/4313: Inordinately long time in CFG manipulation Date: Wed, 12 Sep 2001 22:06:00 -0000 Message-id: <20010913050529.4160.qmail@sourceware.cygnus.com> X-SW-Source: 2001-09/msg00261.html List-Id: >Number: 4313 >Category: optimization >Synopsis: Inordinately long time in CFG manipulation >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: sw-bug >Submitter-Id: net >Arrival-Date: Wed Sep 12 22:06:01 PDT 2001 >Closed-Date: >Last-Modified: >Originator: B. Lucier >Release: gcc version 3.1 20010912 (experimental) >Organization: >Environment: Red Hat Linux 7.1 on 350 MHz PII Configured with: ../configure --enable-languages=c --enable-checking=no Thread model: single gcc version 3.1 20010912 (experimental) built with make bootstrap BOOT_CFLAGS='-O2 -g -pg' BOOT_LDFLAGS='-g -pg' >Description: Here are profiled timings for the following command applied to _num.i, which may be at http://www.math.purdue.edu/~lucier/_num.i.gz (the server is down right now, I can't check): [lucier@curie ~]$ time programs/gcc/gcc-3.1/objdir/gcc/cc1 -fno-math-errno -O1 -fschedule-insns2 -fomit-frame-pointer -D___SINGLE_HOST _num.i __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25002k -> 7637k} {GC 10676k -> 7660k} {GC 11856k -> 7923k} {GC 11786k -> 8417k} {GC 13063k -> 9614k} {GC 15281k -> 10232k} ___init_proc ____20___num Execution times (seconds) garbage collection : 3.52 ( 0%) usr 0.01 ( 0%) sys 3.67 ( 0%) wall cfg construction : 643.65 (31%) usr 0.84 ( 0%) sys 663.76 (23%) wall cfg cleanup : 127.71 ( 6%) usr 0.05 ( 0%) sys 136.16 ( 5%) wall preprocessing : 2.43 ( 0%) usr 0.40 ( 0%) sys 2.90 ( 0%) wall lexical analysis : 2.87 ( 0%) usr 0.71 ( 0%) sys 3.57 ( 0%) wall parser : 10.21 ( 0%) usr 0.39 ( 0%) sys 11.01 ( 0%) wall varconst : 0.41 ( 0%) usr 0.02 ( 0%) sys 0.43 ( 0%) wall integration : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall jump : 3.99 ( 0%) usr 0.00 ( 0%) sys 4.04 ( 0%) wall CSE : 6.49 ( 0%) usr 0.00 ( 0%) sys 6.64 ( 0%) wall global CSE : 69.06 ( 3%) usr 1.58 ( 0%) sys 72.31 ( 3%) wall loop analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall flow analysis : 40.20 ( 2%) usr 0.35 ( 0%) sys 41.43 ( 1%) wall combiner : 5.08 ( 0%) usr 0.00 ( 0%) sys 5.27 ( 0%) wall if-conversion : 1.91 ( 0%) usr 0.07 ( 0%) sys 2.06 ( 0%) wall local alloc : 1.80 ( 0%) usr 0.00 ( 0%) sys 1.82 ( 0%) wall global alloc : 11.02 ( 1%) usr 0.06 ( 0%) sys 11.32 ( 0%) wall reload CSE regs : 25.63 ( 1%) usr 0.00 ( 0%) sys 26.21 ( 1%) wall flow 2 : 864.58 (41%) usr 677.04 (98%) sys1595.09 (55%) wall if-conversion 2 : 1.58 ( 0%) usr 0.06 ( 0%) sys 1.66 ( 0%) wall scheduling 2 : 13.27 ( 1%) usr 0.06 ( 0%) sys 13.52 ( 0%) wall shorten branches : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall reg stack : 37.86 ( 2%) usr 7.50 ( 1%) sys 45.86 ( 2%) wall final : 213.11 (10%) usr 0.18 ( 0%) sys 232.41 ( 8%) wall rest of compilation : 3.34 ( 0%) usr 0.00 ( 0%) sys 3.36 ( 0%) wall TOTAL :2090.54 689.32 2885.34 2090.670u 689.440s 48:05.77 96.3% 0+0k 0+0io 1094pf+0w This file is compiled in a few minutes with gcc-3.0. Here are reports of the top times with gprof: Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 29.24 199.93 199.93 9319742 0.02 0.02 cached_make_edge 26.59 381.78 181.85 7449858 0.02 0.02 remove_edge 6.21 424.22 42.44 72443984 0.00 0.00 bitmap_operation 4.68 456.23 32.01 13 2462.31 5891.02 calculate_global_regs_live 2.77 475.14 18.91 67665 0.28 0.71 try_crossjump_bb 2.52 492.36 17.22 9668 1.78 1.78 sbitmap_vector_alloc 2.24 507.68 15.32 htab_traverse 1.62 518.77 11.09 90441614 0.00 0.00 sbitmap_zero ... ----------------------------------------------- 0.00 0.25 10/9594 find_basic_blocks [14] 2.65 240.51 9584/9594 find_sub_basic_blocks [9] [8] 35.6 2.65 240.76 9594 make_edges [8] 199.86 0.00 9316700/9319742 cached_make_edge [11] 9.74 11.05 9584/9614 sbitmap_vector_zero [26] 17.07 0.00 9584/9668 sbitmap_vector_alloc [27] 2.89 0.00 9277825/9277825 make_label_edge [69] 0.02 0.05 44343/95703 returnjump_p [286] 0.01 0.05 48208/51142 computed_jump_p [412] 0.01 0.00 75759/433825 next_nonnote_insn [541] 0.00 0.00 48208/13306482 find_reg_note [98] ----------------------------------------------- 0.00 2.77 109/9584 commit_edge_insertions [56] 0.06 240.51 9475/9584 split_all_insns [10] [9] 35.6 0.06 243.28 9584 find_sub_basic_blocks [9] 2.65 240.51 9584/9594 make_edges [8] 0.00 0.07 96/434 split_block [205] 0.04 0.01 9584/19000 purge_dead_edges [370] 0.00 0.00 96/7449858 remove_edge [13] 0.00 0.00 9748/13306482 find_reg_note [98] ----------------------------------------------- 0.06 241.28 12/12 rest_of_compilation [7] [10] 35.3 0.06 241.28 12 split_all_insns [10] 0.06 240.51 9475/9584 find_sub_basic_blocks [9] 0.06 0.64 252984/252984 split_insn [134] 0.01 0.00 1/34 compute_bb_for_insn [229] 0.00 0.00 12/90441614 sbitmap_zero [30] 0.00 0.00 12/565 sbitmap_alloc [1663] 0.00 0.00 1/9536 get_max_uid [1509] ----------------------------------------------- 0.07 0.00 3042/9319742 make_edge [411] 199.86 0.00 9316700/9319742 make_edges [8] [11] 29.2 199.93 0.00 9319742 cached_make_edge [11] ----------------------------------------------- 0.28 54.49 3/10 free_basic_block_vars [20] 0.65 127.14 7/10 find_basic_blocks [14] [12] 26.7 0.93 181.63 10 clear_edges [12] 181.63 0.00 7440953/7449858 remove_edge [13] ----------------------------------------------- 0.00 0.00 3/7449858 redirect_edge_succ_nodup [809] 0.00 0.00 18/7449858 purge_dead_edges [370] 0.00 0.00 36/7449858 try_redirect_by_replacing_jump [337] 0.00 0.00 96/7449858 find_sub_basic_blocks [9] 0.02 0.00 703/7449858 merge_blocks_nomove [325] 0.06 0.00 2620/7449858 try_crossjump_to_edge [24] 0.13 0.00 5429/7449858 flow_delete_block [41] 181.63 0.00 7440953/7449858 clear_edges [12] [13] 26.6 181.85 0.00 7449858 remove_edge [13] ----------------------------------------------- 0.01 128.33 10/10 rest_of_compilation [7] [14] 18.8 0.01 128.33 10 find_basic_blocks [14] 0.65 127.14 7/10 clear_edges [12] 0.00 0.25 10/9594 make_edges [8] 0.09 0.02 10/10 find_basic_blocks_1 [333] 0.08 0.00 10/34 compute_bb_for_insn [229] 0.06 0.00 10/10 count_basic_blocks [425] 0.03 0.01 10/14 tidy_fallthru_edges [437] 0.00 0.00 10/1239490 timevar_push [201] 0.00 0.00 10/254 varray_init [1743] 0.00 0.00 10/9536 get_max_uid [1509] ----------------------------------------------- 0.00 97.09 3/3 rest_of_compilation [7] [15] 14.2 0.00 97.09 3 schedule_insns [15] 0.12 89.56 9485/9496 update_life_info [16] 0.00 2.83 9485/9485 schedule_region [72] 0.00 2.13 3/3 init_regions [82] 2.13 0.00 9485/18975 count_or_remove_death_notes [55] 0.04 0.24 3/3 sched_init [222] 0.02 0.00 3/34 compute_bb_for_insn [229] 0.01 0.00 3/13 allocate_reg_life_data [485] 0.00 0.00 9485/90441614 sbitmap_zero [30] 0.00 0.00 3/3 reposition_prologue_and_epilogue_notes [1365] 0.00 0.00 3/3796403 sbitmap_ones [242] 0.00 0.00 6/565 sbitmap_alloc [1663] 0.00 0.00 3/9536 get_max_uid [1509] 0.00 0.00 3/3 sched_finish [2063] 0.00 0.00 3/35859 get_insns [1454] ----------------------------------------------- 0.00 0.01 1/9496 if_convert [31] 0.00 0.03 3/9496 optimize_mode_switching [23] 0.00 0.07 7/9496 life_analysis [45] 0.12 89.56 9485/9496 schedule_insns [15] [16] 13.1 0.12 89.67 9496 update_life_info [16] 32.01 44.57 13/13 calculate_global_regs_live [17] 0.00 9.83 3/22 cleanup_cfg [18] 0.18 3.04 84954/143280 propagate_block [48] 0.03 0.00 84954/229107 bitmap_copy [362] 0.00 0.01 9485/9485 verify_local_live_at_start [845] 0.00 0.00 9496/1465878 bitmap_clear [330] 0.00 0.00 3/18975 count_or_remove_death_notes [55] 0.00 0.00 9496/568566 bitmap_initialize [487] ----------------------------------------------- 32.01 44.57 13/13 update_life_info [16] [17] 11.2 32.01 44.57 13 calculate_global_regs_live [17] 42.25 0.00 72111186/72443984 bitmap_operation [22] 0.12 2.06 57752/143280 propagate_block [48] 0.04 0.00 101235/229107 bitmap_copy [362] 0.04 0.00 203195/2166730 bitmap_set_bit [179] 0.00 0.03 57752/67237 bitmap_equal_p [501] 0.02 0.00 328090/1465878 bitmap_clear [330] 0.01 0.00 113217/568566 bitmap_initialize [487] 0.00 0.00 1/90441614 sbitmap_zero [30] ----------------------------------------------- 0.00 9.83 3/22 update_life_info [16] 0.01 62.24 19/22 rest_of_compilation [7] [18] 10.5 0.01 72.07 22 cleanup_cfg [18] 0.09 64.62 22/22 try_optimize_cfg [19] 0.03 7.33 31/31 delete_unreachable_blocks [37] 0.00 0.00 22/1239490 timevar_pop [197] 0.00 0.00 22/1239490 timevar_push [201] 0.00 0.00 44/166659 free_EXPR_LIST_list [995] ----------------------------------------------- 0.09 64.62 22/22 cleanup_cfg [18] [19] 9.5 0.09 64.62 22 try_optimize_cfg [19] 18.91 28.89 67665/67665 try_crossjump_bb [21] 9.23 0.11 208787/208787 try_forward_edges [33] 0.00 3.82 3146/5417 flow_delete_block [41] 0.02 2.75 208787/208787 try_simplify_condjump [73] 0.00 0.37 890/890 merge_blocks [192] 0.00 0.32 2015/10723 flow_delete_insn_chain [86] 0.02 0.12 74300/81241 redirect_edge_and_branch [278] 0.01 0.02 96070/27562210 forwarder_block_p [35] 0.02 0.01 86632/12775689 onlyjump_p [58] 0.00 0.00 2015/138502 reg_mentioned_p [346] 0.00 0.00 353/7724 redirect_edge_succ_nodup [809] ----------------------------------------------- 0.00 5.25 7/73 life_analysis [45] 0.00 6.75 9/73 if_convert [31] 0.00 42.76 57/73 rest_of_compilation [7] [20] 8.0 0.00 54.77 73 free_basic_block_vars [20] 0.28 54.49 3/10 clear_edges [12] ----------------------------------------------- 18.91 28.89 67665/67665 try_optimize_cfg [19] [21] 7.0 18.91 28.89 67665 try_crossjump_bb [21] 2.50 26.39 7261292/7261292 try_crossjump_to_edge [24] ----------------------------------------------- 0.00 0.00 9/72443984 find_if_case_1 [533] 0.00 0.00 14/72443984 find_if_case_2 [642] 0.00 0.00 420/72443984 dead_or_predicable [813] 0.01 0.00 18758/72443984 update_equiv_regs [208] 0.04 0.00 67237/72443984 bitmap_equal_p [501] 0.14 0.00 246360/72443984 finish_spills [181] 42.25 0.00 72111186/72443984 calculate_global_regs_live [17] [22] 6.2 42.44 0.00 72443984 bitmap_operation [22] ----------------------------------------------- 0.88 29.45 3/3 rest_of_compilation [7] [23] 4.4 0.88 29.45 3 optimize_mode_switching [23] 0.00 28.35 6/6 pre_edge_lcm [25] 0.25 0.57 1/5 commit_edge_insertions [56] 0.02 0.03 3/21 sbitmap_vector_ones [188] 0.01 0.03 36043/275175 get_attr_type [207] 0.02 0.02 36284/2348999 note_stores [140] 0.00 0.03 3/9496 update_life_info [16] 0.01 0.01 12/9614 sbitmap_vector_zero [26] 0.02 0.00 23930/276241 reg_set_to_hard_reg_set [228] 0.02 0.00 12/9668 sbitmap_vector_alloc [27] 0.00 0.01 3541/62459 recog_memoized_1 [240] 0.01 0.00 3/13 allocate_reg_life_data [485] 0.01 0.00 27290/384090 asm_noperands [331] 0.00 0.00 14620/86781 gen_sequence [482] 0.00 0.00 14634/102819 end_sequence [728] 0.00 0.00 18762/3738844 sbitmap_not [265] 0.00 0.00 14535/14825 emit_insn_before [1120] 0.00 0.00 5/5 emit_i387_cw_initialization [1235] 0.00 0.00 10/47 assign_386_stack_local [1203] 0.00 0.00 14/123 insert_insn_on_edge [1175] 0.00 0.00 5/2348999 emit_move_insn [460] 0.00 0.00 71/198 emit_block_insn_before [1383] 0.00 0.00 23420/23420 reg_dies [1471] 0.00 0.00 14620/102819 start_sequence [1426] 0.00 0.00 9385/9385 new_seginfo [1512] 0.00 0.00 9385/9385 add_seginfo [1511] 0.00 0.00 48/48 make_preds_opaque [1895] 0.00 0.00 6/6 free_edge_list [1987] ----------------------------------------------- 2.50 26.39 7261292/7261292 try_crossjump_bb [21] [24] 4.2 2.50 26.39 7261292 try_crossjump_to_edge [24] 2.97 14.21 6289011/6289011 flow_find_cross_jump [28] 2.41 5.63 27025588/27562210 forwarder_block_p [35] 0.00 0.40 2549/10723 flow_delete_insn_chain [86] 0.35 0.02 7261156/7261156 outgoing_edges_match [195] 0.01 0.25 338/434 split_block [205] 0.06 0.00 2620/7449858 remove_edge [13] 0.01 0.05 2549/3021 make_single_succ_edge [387] 0.01 0.00 2549/2631 set_block_for_new_insns [789] 0.00 0.00 2549/9171 gen_jump [689] 0.00 0.00 2549/2596 emit_jump_insn_before [949] 0.00 0.00 2549/13306482 find_reg_note [98] 0.00 0.00 2549/433825 next_nonnote_insn [541] 0.00 0.00 2549/8290 block_label [1100] 0.00 0.00 71/176 emit_barrier_after [1166] 0.00 0.00 71/165738 gen_rtx_CONST_INT [686] ----------------------------------------------- 0.00 28.35 6/6 optimize_mode_switching [23] [25] 4.1 0.00 28.35 6 pre_edge_lcm [25] 5.85 4.21 6/6 compute_laterin [32] 1.41 5.29 6/6 compute_antinout_edge [40] 0.88 4.41 6/6 compute_available [49] 1.50 1.51 6/6 compute_earliest [66] 1.16 0.68 6/6 compute_insert_delete [85] 1.35 0.00 6/6 create_edge_list [100] 0.10 0.00 54/9668 sbitmap_vector_alloc [27] ----------------------------------------------- 0.00 0.00 3/9614 init_dependency_caches [698] 0.01 0.01 12/9614 optimize_mode_switching [23] 0.02 0.02 15/9614 idoms_to_doms [248] 9.74 11.05 9584/9614 make_edges [8] [26] 3.1 9.77 11.09 9614 sbitmap_vector_zero [26] 11.09 0.00 90422081/90441614 sbitmap_zero [30] ----------------------------------------------- 0.01 0.00 3/9668 flow_loops_find [63] 0.01 0.00 3/9668 init_dependency_caches [698] 0.01 0.00 6/9668 estimate_probability [34] 0.01 0.00 6/9668 if_convert [31] 0.02 0.00 12/9668 optimize_mode_switching [23] 0.10 0.00 54/9668 pre_edge_lcm [25] 17.07 0.00 9584/9668 make_edges [8] [27] 2.5 17.22 0.00 9668 sbitmap_vector_alloc [27] ----------------------------------------------- 2.97 14.21 6289011/6289011 try_crossjump_to_edge [24] [28] 2.5 2.97 14.21 6289011 flow_find_cross_jump [28] 7.32 0.00 5587015/5643903 rtx_renumbered_equal_p [36] 2.31 1.30 12578022/12775689 onlyjump_p [58] 0.74 2.36 11164966/11164997 find_reg_equal_equiv_note [65] 0.03 0.08 440578/508111 stack_regs_mentioned [316] 0.02 0.05 41435/95703 returnjump_p [286] 0.00 0.00 196/2223309 rtx_equal_p [239] ----------------------------------------------- [29] 2.5 15.32 1.60 htab_traverse [29] 0.00 0.87 43785/43785 rtx_htab_mark_1 [116] 0.20 0.53 460865/460865 cselib_invalidate_mem_1 [127] 0.00 0.00 1170/1170 mark_hash_entry [1067] 0.00 0.00 3386/3386 discard_useless_locs [1565] 0.00 0.00 1631/1631 discard_useless_values [1598] ----------------------------------------------- 0.00 0.00 1/90441614 calculate_global_regs_live [17] 0.00 0.00 1/90441614 flow_depth_first_order_compute [160] 0.00 0.00 1/90441614 if_convert [31] 0.00 0.00 2/90441614 flow_loops_find [63] 0.00 0.00 3/90441614 combine_instructions [74] 0.00 0.00 6/90441614 mark_dfs_back_edges [80] 0.00 0.00 6/90441614 compute_antinout_edge [40] 0.00 0.00 6/90441614 compute_earliest [66] 0.00 0.00 6/90441614 compute_available [49] 0.00 0.00 12/90441614 split_all_insns [10] 0.00 0.00 519/90441614 flow_loop_nodes_find [1183] 0.00 0.00 9485/90441614 init_regions [82] 0.00 0.00 9485/90441614 schedule_insns [15] 11.09 0.00 90422081/90441614 sbitmap_vector_zero [26] [30] 1.6 11.09 0.00 90441614 sbitmap_zero [30] ----------------------------------------------- The enormous amount of system time in flow2 is from mmapping and unmmapping a large bitmap, Daniel Berlin provided the analysis previously. I wanted to send this timing data now because of the large changes being made to the cfg code, with the hope that there can be some improvement. Brad Lucier >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: