* Timing information for CFG manipulations
@ 2001-10-13 20:33 Brad Lucier
2001-10-13 21:53 ` Zack Weinberg
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-13 20:33 UTC (permalink / raw)
To: jh; +Cc: Brad Lucier, gcc
Jan:
Here is timing information for today's CVS version of 3.1 configured
with --enable-languages='c' --enable-checking=no and built with
'make bootstrap BOOT_CFLAGS="-O2 -g -pg" BOOT_LDFLAGS="-O2 -g -pg"'
when called on the file at
http://www.math.purdue.edu/~lucier/_num.i.gz
dino01% /u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
__sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num
Execution times (seconds)
garbage collection : 1.11 ( 0%) usr 0.00 ( 0%) sys 1.12 ( 0%) wall
cfg construction : 269.67 (34%) usr 0.32 ( 0%) sys 270.02 (26%) wall
cfg cleanup : 51.19 ( 6%) usr 0.01 ( 0%) sys 51.20 ( 5%) wall
preprocessing : 0.57 ( 0%) usr 0.10 ( 0%) sys 0.75 ( 0%) wall
lexical analysis : 0.75 ( 0%) usr 0.16 ( 0%) sys 0.83 ( 0%) wall
parser : 2.61 ( 0%) usr 0.13 ( 0%) sys 2.73 ( 0%) wall
varconst : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
jump : 0.78 ( 0%) usr 0.00 ( 0%) sys 0.80 ( 0%) wall
CSE : 1.75 ( 0%) usr 0.00 ( 0%) sys 1.77 ( 0%) wall
global CSE : 41.63 ( 5%) usr 0.50 ( 0%) sys 42.22 ( 4%) wall
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 24.36 ( 3%) usr 0.16 ( 0%) sys 24.52 ( 2%) wall
combiner : 1.76 ( 0%) usr 0.00 ( 0%) sys 1.77 ( 0%) wall
if-conversion : 1.18 ( 0%) usr 0.04 ( 0%) sys 1.20 ( 0%) wall
local alloc : 0.65 ( 0%) usr 0.01 ( 0%) sys 0.67 ( 0%) wall
global alloc : 4.71 ( 1%) usr 0.06 ( 0%) sys 4.81 ( 0%) wall
reload CSE regs : 10.35 ( 1%) usr 0.01 ( 0%) sys 10.36 ( 1%) wall
flow 2 : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
if-conversion 2 : 0.99 ( 0%) usr 0.02 ( 0%) sys 1.02 ( 0%) wall
shorten branches : 0.24 ( 0%) usr 0.01 ( 0%) sys 0.95 ( 0%) wall
reg stack : 22.24 ( 3%) usr 2.84 ( 1%) sys 25.08 ( 2%) wall
final : 87.78 (11%) usr 0.02 ( 0%) sys 87.81 ( 8%) wall
rest of compilation : 1.26 ( 0%) usr 0.00 ( 0%) sys 1.25 ( 0%) wall
TOTAL : 793.36 255.83 1050.08
...
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
44.47 348.50 348.50 7450804 0.05 0.05 remove_edge
32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
3.79 631.85 29.69 72698858 0.00 0.00 bitmap_operation
2.79 653.73 21.88 13 1683.08 4184.80 calculate_global_regs_live
2.15 670.57 16.84 9320420 0.00 0.00 cached_make_edge
1.28 680.62 10.05 67331 0.15 0.37 try_crossjump_bb
0.99 688.41 7.79 88484235 0.00 0.00 sbitmap_zero
0.92 695.63 7.22 htab_traverse
0.66 700.82 5.19 27855 0.19 0.19 sbitmap_intersection_of_succs
0.56 705.23 4.41 207472 0.02 0.02 try_forward_edges
0.51 709.26 4.03 6 671.67 993.14 compute_laterin
0.47 712.95 3.69 6270 0.59 0.59 expunge_block
0.47 716.63 3.68 31 118.71 118.71 find_unreachable_blocks
0.44 720.09 3.46 9502 0.36 1.18 sbitmap_vector_zero
0.42 723.39 3.30 6047223 0.00 0.00 rtx_renumbered_equal_p
0.39 726.44 3.05 9277592 0.00 0.00 make_label_edge
0.38 729.41 2.97 61978208 0.00 0.00 active_insn_p
0.38 732.38 2.97 451 6.59 6.59 propagate_freq
0.33 734.93 2.55 29217 0.09 0.09 sbitmap_intersection_of_preds
0.31 737.35 2.42 15 161.33 162.00 calc_idoms
0.26 739.35 2.00 101420319 0.00 0.00 bitmap_element_link
0.26 741.35 2.00 9485 0.21 30.01 make_edges
...
-----------------------------------------------
0.00 0.00 3/7450804 redirect_edge_succ_nodup [627]
0.00 0.00 18/7450804 purge_dead_edges [321]
0.00 0.00 36/7450804 try_redirect_by_replacing_jump [284]
0.00 0.00 80/7450804 find_sub_basic_blocks [11]
0.05 0.00 1066/7450804 merge_blocks_nomove [88]
0.17 0.00 3729/7450804 try_crossjump_to_edge [27]
0.25 0.00 5360/7450804 flow_delete_block [41]
348.02 0.00 7440512/7450804 clear_edges [9]
[8] 44.5 348.50 0.00 7450804 remove_edge [8]
-----------------------------------------------
0.13 104.41 3/10 free_basic_block_vars [15]
0.30 243.61 7/10 find_basic_blocks [14]
[9] 44.5 0.43 348.02 10 clear_edges [9]
348.02 0.00 7440512/7450804 remove_edge [8]
-----------------------------------------------
0.00 0.30 10/9485 find_basic_blocks [14]
2.00 282.37 9475/9485 find_sub_basic_blocks [11]
[10] 36.3 2.00 282.67 9485 make_edges [10]
251.51 0.00 9475/9556 sbitmap_vector_alloc [13]
16.83 0.00 9316105/9320420 cached_make_edge [26]
3.45 7.77 9475/9502 sbitmap_vector_zero [28]
3.05 0.00 9277592/9277592 make_label_edge [48]
0.00 0.02 44110/100771 returnjump_p [260]
0.01 0.01 47975/50931 computed_jump_p [399]
0.01 0.00 75239/93463 next_nonnote_insn [555]
0.00 0.00 47975/13489246 find_reg_note [82]
-----------------------------------------------
0.00 3.27 109/9475 commit_edge_insertions [42]
0.04 281.15 9366/9475 split_all_insns [12]
[11] 36.3 0.04 284.42 9475 find_sub_basic_blocks [11]
2.00 282.37 9475/9485 make_edges [10]
0.00 0.03 80/597 split_block [154]
0.02 0.00 9475/18796 purge_dead_edges [321]
0.00 0.00 80/7450804 remove_edge [8]
0.00 0.00 9731/13489246 find_reg_note [82]
0.00 0.00 87/2981 can_throw_internal [943]
-----------------------------------------------
0.02 281.37 9/9 rest_of_compilation [7]
[12] 35.9 0.02 281.37 9 split_all_insns [12]
0.04 281.15 9366/9475 find_sub_basic_blocks [11]
0.00 0.19 188995/188995 split_insn [163]
0.00 0.00 9/88484235 sbitmap_zero [31]
0.00 0.00 9/553 sbitmap_alloc [1629]
-----------------------------------------------
0.08 0.00 3/9556 flow_loops_find [59]
0.16 0.00 6/9556 estimate_probability [33]
0.16 0.00 6/9556 if_convert [25]
0.32 0.00 12/9556 optimize_mode_switching [19]
1.43 0.00 54/9556 pre_edge_lcm [24]
251.51 0.00 9475/9556 make_edges [10]
[13] 32.4 253.66 0.00 9556 sbitmap_vector_alloc [13]
-----------------------------------------------
0.00 244.39 10/10 rest_of_compilation [7]
[14] 31.2 0.00 244.39 10 find_basic_blocks [14]
0.30 243.61 7/10 clear_edges [9]
0.00 0.30 10/9485 make_edges [10]
0.06 0.01 10/10 find_basic_blocks_1 [237]
0.05 0.00 10/10 compute_bb_for_insn [279]
0.04 0.00 10/10 count_basic_blocks [307]
0.00 0.01 10/14 tidy_fallthru_edges [446]
0.00 0.00 10/1239368 timevar_push [150]
0.00 0.00 10/262 varray_init [1703]
0.00 0.00 10/9317 get_max_uid [1467]
-----------------------------------------------
0.00 10.02 7/73 life_analysis [18]
0.00 12.89 9/73 if_convert [25]
0.00 81.62 57/73 rest_of_compilation [7]
[15] 13.3 0.00 104.53 73 free_basic_block_vars [15]
0.13 104.41 3/10 clear_edges [9]
-----------------------------------------------
0.00 5.54 1/11 if_convert [25]
0.00 16.61 3/11 optimize_mode_switching [19]
0.00 38.75 7/11 life_analysis [18]
[16] 7.8 0.00 60.89 11 update_life_info [16]
21.88 32.52 13/13 calculate_global_regs_live [17]
0.00 5.27 3/22 cleanup_cfg [20]
0.08 1.09 74687/132182 propagate_block [61]
0.02 0.00 3/5 count_or_remove_death_notes [312]
0.01 0.00 74687/219660 bitmap_copy [270]
0.00 0.00 11/518311 bitmap_initialize [339]
0.00 0.00 11/1374187 bitmap_clear [258]
-----------------------------------------------
21.88 32.52 13/13 update_life_info [16]
[17] 6.9 21.88 32.52 13 calculate_global_regs_live [17]
29.49 2.06 72205551/72698858 bitmap_operation [22]
0.06 0.83 56808/132182 propagate_block [61]
0.00 0.02 56808/56808 bitmap_equal_p [383]
0.02 0.00 100467/219660 bitmap_copy [270]
0.01 0.00 255232/1933812 bitmap_set_bit [198]
0.01 0.00 328395/1374187 bitmap_clear [258]
0.01 0.00 112039/518311 bitmap_initialize [339]
0.00 0.00 1828/101420319 bitmap_element_link [62]
0.00 0.00 1/88484235 sbitmap_zero [31]
-----------------------------------------------
0.00 7.01 1/7 reg_to_stack [29]
0.00 42.07 6/7 rest_of_compilation [7]
[18] 6.3 0.00 49.08 7 life_analysis [18]
0.00 38.75 7/11 update_life_info [16]
0.00 10.02 7/73 free_basic_block_vars [15]
0.07 0.12 6/16 init_alias_analysis [99]
0.04 0.06 7/10 delete_noop_moves [191]
0.00 0.02 3/3 notice_stack_pointer_modification [457]
0.00 0.00 7/7 allocate_bb_life_data [700]
0.00 0.00 7/10 allocate_reg_life_data [688]
0.00 0.00 7/7 mark_regs_live_at_end [1221]
0.00 0.00 6/16 end_alias_analysis [1920]
-----------------------------------------------
0.31 39.43 3/3 rest_of_compilation [7]
[19] 5.1 0.31 39.43 3 optimize_mode_switching [19]
0.00 21.63 6/6 pre_edge_lcm [24]
0.00 16.61 3/11 update_life_info [16]
0.11 0.66 1/5 commit_edge_insertions [42]
0.32 0.00 12/9556 sbitmap_vector_alloc [13]
0.01 0.02 3/21 sbitmap_vector_ones [151]
0.01 0.01 37484/2679415 note_stores <cycle 7> [133]
0.00 0.01 12/9502 sbitmap_vector_zero [28]
0.00 0.01 37243/80229 get_attr_type [371]
0.01 0.00 23837/460580 reg_set_to_hard_reg_set [147]
0.01 0.00 9292/9292 add_seginfo [591]
0.00 0.00 14620/91501 gen_sequence [370]
0.00 0.00 28568/474001 asm_noperands [354]
0.00 0.00 3528/60871 recog_memoized_1 [384]
0.00 0.00 14620/108004 start_sequence [625]
0.00 0.00 3/10 allocate_reg_life_data [688]
0.00 0.00 18576/3738386 sbitmap_not [186]
0.00 0.00 14606/14838 emit_insn_before [935]
0.00 0.00 5/5 emit_i387_cw_initialization [1111]
0.00 0.00 14/123 insert_insn_on_edge [1029]
0.00 0.00 10/68 assign_386_stack_local [1098]
0.00 0.00 5/2679415 emit_move_insn <cycle 7> [450]
0.00 0.00 24525/24525 reg_dies [1412]
0.00 0.00 14634/108004 end_sequence [1347]
0.00 0.00 9292/9292 new_seginfo [1469]
0.00 0.00 48/48 make_preds_opaque [1877]
0.00 0.00 6/6 free_edge_list [1966]
-----------------------------------------------
0.00 5.27 3/22 update_life_info [16]
0.02 33.37 19/22 rest_of_compilation [7]
[20] 4.9 0.02 38.63 22 cleanup_cfg [20]
0.08 34.85 22/22 try_optimize_cfg [21]
0.00 3.70 31/31 delete_unreachable_blocks [43]
0.00 0.00 22/1239368 timevar_push [150]
0.00 0.00 22/1239368 timevar_pop [160]
0.00 0.00 44/136318 free_EXPR_LIST_list [1340]
-----------------------------------------------
0.08 34.85 22/22 cleanup_cfg [20]
[21] 4.5 0.08 34.85 22 try_optimize_cfg [21]
10.05 15.13 67331/67331 try_crossjump_bb [23]
4.41 0.07 207472/207472 try_forward_edges [38]
0.00 2.34 3138/5348 flow_delete_block [41]
0.07 1.69 207472/207472 try_simplify_condjump [67]
0.00 0.39 6/6 remove_fake_edges [112]
0.00 0.39 1099/1099 merge_blocks [113]
0.00 0.22 2011/12100 delete_insn_chain [74]
0.00 0.06 74089/81227 redirect_edge_and_branch [254]
0.01 0.01 95491/25030880 forwarder_block_p [37]
0.01 0.00 85868/11904476 onlyjump_p [68]
0.00 0.00 2011/157599 reg_mentioned_p [274]
0.00 0.00 350/7709 redirect_edge_succ_nodup [627]
0.00 0.00 6/6 add_noreturn_fake_exit_edges [1962]
-----------------------------------------------
0.00 0.00 5/72698858 find_if_case_1 [470]
0.00 0.00 270/72698858 dead_or_predicable [729]
0.01 0.00 18572/72698858 update_equiv_regs [164]
0.02 0.00 56808/72698858 bitmap_equal_p [383]
0.17 0.01 417652/72698858 finish_spills [106]
29.49 2.06 72205551/72698858 calculate_global_regs_live [17]
[22] 4.1 29.69 2.07 72698858 bitmap_operation [22]
1.98 0.00 100601217/101420319 bitmap_element_link [62]
0.09 0.00 3360247/4560586 bitmap_element_allocate [193]
-----------------------------------------------
10.05 15.13 67331/67331 try_optimize_cfg [21]
[23] 3.2 10.05 15.13 67331 try_crossjump_bb [23]
1.31 13.82 6407969/6407969 try_crossjump_to_edge [27]
-----------------------------------------------
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
@ 2001-10-13 21:53 ` Zack Weinberg
2001-10-15 11:58 ` Brad Lucier
2001-10-15 12:54 ` Brad Lucier
2001-10-14 1:18 ` Jan Hubicka
` (2 subsequent siblings)
3 siblings, 2 replies; 21+ messages in thread
From: Zack Weinberg @ 2001-10-13 21:53 UTC (permalink / raw)
To: Brad Lucier; +Cc: jh, gcc
On Sat, Oct 13, 2001 at 10:32:06PM -0500, Brad Lucier wrote:
> Jan:
>
> Here is timing information for today's CVS version of 3.1 configured
> with --enable-languages='c' --enable-checking=no and built with
> 'make bootstrap BOOT_CFLAGS="-O2 -g -pg" BOOT_LDFLAGS="-O2 -g -pg"'
> when called on the file at
>
> http://www.math.purdue.edu/~lucier/_num.i.gz
...
> 44.47 348.50 348.50 7450804 0.05 0.05 remove_edge
> 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
...
> 0.08 0.00 3/9556 flow_loops_find [59]
> 0.16 0.00 6/9556 estimate_probability [33]
> 0.16 0.00 6/9556 if_convert [25]
> 0.32 0.00 12/9556 optimize_mode_switching [19]
> 1.43 0.00 54/9556 pre_edge_lcm [24]
> 251.51 0.00 9475/9556 make_edges [10]
> [13] 32.4 253.66 0.00 9556 sbitmap_vector_alloc [13]
Two things:
- Can you repeat this test with cc1 linked against a profile-
instrumented libc, and libiberty re-built with CFLAGS="-O2 -g -pg"?
I'm suspicious that lots of time is being spent inside malloc.
- Have you tried Daniel Berlin's ebitmap patches?
zw
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
2001-10-13 21:53 ` Zack Weinberg
@ 2001-10-14 1:18 ` Jan Hubicka
2001-10-14 8:46 ` Brad Lucier
2001-10-16 7:22 ` Jan Hubicka
2001-10-16 8:01 ` Jan Hubicka
3 siblings, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-14 1:18 UTC (permalink / raw)
To: Brad Lucier; +Cc: jh, gcc
Thanks,
> cfg construction : 269.67 (34%) usr 0.32 ( 0%) sys 270.02 (26%) wall
This is new problem. I guess it is related to the CFG freeing code...
> flow 2 : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
I will care this one next week (I am at vacation right now).
> 44.47 348.50 348.50 7450804 0.05 0.05 remove_edge
This probably makes clear where we spend all the time. We are now deallocating
each edge in the CFG before rebuild and this may cause problems on walking the
lists.
I will need to add function like "free outgiong edges"...
> 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
Do you have any idea, in what state is the ebitmap stuff?
Thanks for the infromation!
Honza
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-14 1:18 ` Jan Hubicka
@ 2001-10-14 8:46 ` Brad Lucier
2001-10-14 9:21 ` Daniel Berlin
0 siblings, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-14 8:46 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, jh, gcc
> > 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
> Do you have any idea, in what state is the ebitmap stuff?
No.
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-14 8:46 ` Brad Lucier
@ 2001-10-14 9:21 ` Daniel Berlin
0 siblings, 0 replies; 21+ messages in thread
From: Daniel Berlin @ 2001-10-14 9:21 UTC (permalink / raw)
To: Brad Lucier; +Cc: Jan Hubicka, gcc
Brad Lucier <lucier@math.purdue.edu> writes:
>> > 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
>> Do you have any idea, in what state is the ebitmap stuff?
>
> No.
Don't bother with the ebitmaps. I'm going to submit the new bitmap
stuff today.
>
> Brad
--
"I was walking down the street and saw a sign on a post. It
said: "Lost -- $50. If found, just keep it."
"-Steven Wright
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 21:53 ` Zack Weinberg
@ 2001-10-15 11:58 ` Brad Lucier
2001-10-16 21:15 ` Zack Weinberg
2001-10-15 12:54 ` Brad Lucier
1 sibling, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-15 11:58 UTC (permalink / raw)
To: Zack Weinberg; +Cc: Brad Lucier, jh, gcc
>
>
> > 44.47 348.50 348.50 7450804 0.05 0.05 remove_edge
> > 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
>
> ...
>
>
> > 0.08 0.00 3/9556 flow_loops_find [59]
> > 0.16 0.00 6/9556 estimate_probability [33]
> > 0.16 0.00 6/9556 if_convert [25]
> > 0.32 0.00 12/9556 optimize_mode_switching [19]
> > 1.43 0.00 54/9556 pre_edge_lcm [24]
> > 251.51 0.00 9475/9556 make_edges [10]
> > [13] 32.4 253.66 0.00 9556 sbitmap_vector_alloc [13]
>
> Two things:
>
> - Can you repeat this test with cc1 linked against a profile-
> instrumented libc,
Unfortunately, on this box libc_p.a seems to be significantly older than
libc.a and there are link errors.
> and libiberty re-built with CFLAGS="-O2 -g -pg"?
I tried this, too, and it links, but I can't get a gmon.out file from
running the resulting cc1. Perhaps I just don't know how to do such
things properly.
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 21:53 ` Zack Weinberg
2001-10-15 11:58 ` Brad Lucier
@ 2001-10-15 12:54 ` Brad Lucier
2001-10-15 14:18 ` Daniel Berlin
1 sibling, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-15 12:54 UTC (permalink / raw)
To: Zack Weinberg; +Cc: Brad Lucier, jh, gcc
> - Have you tried Daniel Berlin's ebitmap patches?
Sorry, I missed this the first time, the answer is no.
Dan sent a message yesterday saying he would resubmit them; if I
don't hear from him in a day or so I'll try the previous versions.
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-15 12:54 ` Brad Lucier
@ 2001-10-15 14:18 ` Daniel Berlin
0 siblings, 0 replies; 21+ messages in thread
From: Daniel Berlin @ 2001-10-15 14:18 UTC (permalink / raw)
To: Brad Lucier; +Cc: Zack Weinberg, jh, gcc
On Monday, October 15, 2001, at 03:54 PM, Brad Lucier wrote:
>> - Have you tried Daniel Berlin's ebitmap patches?
>
> Sorry, I missed this the first time, the answer is no.
>
> Dan sent a message yesterday saying he would resubmit them; if I
> don't hear from him in a day or so I'll try the previous versions.
Sorry, i've been splitting up the patch, since it requires changes to
sbitmaps as well.
Should be on the way to gcc-patches now, however.
The previous version would probably be slower and take more memory,
slower because of it taking more memory.
At least, on large cases.
Cache thrashing fun.
>
> Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
2001-10-13 21:53 ` Zack Weinberg
2001-10-14 1:18 ` Jan Hubicka
@ 2001-10-16 7:22 ` Jan Hubicka
2001-10-16 8:25 ` Brad Lucier
2001-10-16 12:46 ` Richard Henderson
2001-10-16 8:01 ` Jan Hubicka
3 siblings, 2 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16 7:22 UTC (permalink / raw)
To: Brad Lucier, gcc-patches; +Cc: jh, gcc
> cfg construction : 269.67 (34%) usr 0.32 ( 0%) sys 270.02 (26%) wall
Attached patch should solve this problem.
It avoids the worst case quadratic scenario on removing edges.
Bootstrapped/regtested i386.
Honza
Index: cfg.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfg.c,v
retrieving revision 1.10
diff -c -3 -p -r1.10 cfg.c
*** cfg.c 2001/10/11 03:15:24 1.10
--- cfg.c 2001/10/16 14:20:55
*************** struct basic_block_def entry_exit_blocks
*** 117,122 ****
--- 117,123 ----
};
void debug_flow_info PARAMS ((void));
+ static void free_edge PARAMS ((edge));
\f
/* Called once at intialization time. */
*************** init_flow ()
*** 142,164 ****
}
}
\f
/* Free the memory associated with the edge structures. */
void
clear_edges ()
{
int i;
for (i = 0; i < n_basic_blocks; ++i)
{
basic_block bb = BASIC_BLOCK (i);
! while (bb->succ)
! remove_edge (bb->succ);
}
! while (ENTRY_BLOCK_PTR->succ)
! remove_edge (ENTRY_BLOCK_PTR->succ);
if (n_edges)
abort ();
--- 143,195 ----
}
}
\f
+ /* Helper function for remove_edge and clear_edges. Frees edge structure
+ without actually unlinking it from the pred/succ lists. */
+
+ static void
+ free_edge (e)
+ edge e;
+ {
+ n_edges--;
+ memset (e, 0, sizeof (*e));
+ e->succ_next = first_deleted_edge;
+ first_deleted_edge = e;
+ }
+
/* Free the memory associated with the edge structures. */
void
clear_edges ()
{
int i;
+ edge e;
for (i = 0; i < n_basic_blocks; ++i)
{
basic_block bb = BASIC_BLOCK (i);
+ edge e = bb->succ;
! while (e)
! {
! edge next = e->succ_next;
!
! free_edge (e);
! e = next;
! }
! bb->succ = NULL;
! bb->pred = NULL;
}
! e = ENTRY_BLOCK_PTR->succ;
! while (e)
! {
! edge next = e->succ_next;
!
! free_edge (e);
! e = next;
! }
! EXIT_BLOCK_PTR->pred = NULL;
! ENTRY_BLOCK_PTR->succ = NULL;
if (n_edges)
abort ();
*************** remove_edge (e)
*** 335,344 ****
else
dest->pred = e->pred_next;
! n_edges--;
! memset (e, 0, sizeof (*e));
! e->succ_next = first_deleted_edge;
! first_deleted_edge = e;
}
/* Redirect an edge's successor from one block to another. */
--- 366,372 ----
else
dest->pred = e->pred_next;
! free_edge (e);
}
/* Redirect an edge's successor from one block to another. */
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
` (2 preceding siblings ...)
2001-10-16 7:22 ` Jan Hubicka
@ 2001-10-16 8:01 ` Jan Hubicka
2001-10-16 12:39 ` Brad Lucier
` (2 more replies)
3 siblings, 3 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16 8:01 UTC (permalink / raw)
To: Brad Lucier, gcc-patches, rth; +Cc: jh, gcc
Hi,
this patch should track down the second problem:
> flow 2 : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
> 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
It makes find_sub_basic_blocks to operate at bitmap of blocks, instead of just
one and thus it avoids multiple initializations of edge cache structure.
The possible problem is that I now need to benerate bitmap when calling from
recog.c. I hope it won't cause perofrmance regression as the bitmap is
relativly small (compared to edge cache). If it will, I can use bitmap,
instead of sbitmap, that will add some overhead at split_all_insns side.
I don't have time to finish bootstrap, but the patch appears to work.
OK assuming the bootstrapping/regtesting i386 passes fluently?
Honza
Tue Oct 16 16:56:33 CEST 2001 Jan Hubicka <jh@suse.cz>
* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
* cfgbuild.c (find_bb_boundaries): Break out from ...
(find_sub_basic_blocks): ... here; reorganize to handle multiple
basic blocks at once.
* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
to find_sub_basic_blocks.
* recog.c (split_all_insns): Update find_sub_basic_blocks call.
Index: basic-block.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/basic-block.h,v
retrieving revision 1.123
diff -c -3 -p -r1.123 basic-block.h
*** basic-block.h 2001/10/11 12:43:38 1.123
--- basic-block.h 2001/10/16 14:54:27
*************** extern rtx block_label PARAMS ((basic_
*** 639,645 ****
extern bool forwarder_block_p PARAMS ((basic_block));
extern bool purge_all_dead_edges PARAMS ((void));
extern bool purge_dead_edges PARAMS ((basic_block));
! extern void find_sub_basic_blocks PARAMS ((basic_block));
extern bool can_fallthru PARAMS ((basic_block, basic_block));
extern void flow_nodes_print PARAMS ((const char *, const sbitmap,
FILE *));
--- 639,645 ----
extern bool forwarder_block_p PARAMS ((basic_block));
extern bool purge_all_dead_edges PARAMS ((void));
extern bool purge_dead_edges PARAMS ((basic_block));
! extern void find_sub_basic_blocks PARAMS ((sbitmap));
extern bool can_fallthru PARAMS ((basic_block, basic_block));
extern void flow_nodes_print PARAMS ((const char *, const sbitmap,
FILE *));
Index: cfgbuild.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfgbuild.c,v
retrieving revision 1.7
diff -c -3 -p -r1.7 cfgbuild.c
*** cfgbuild.c 2001/10/11 03:15:24 1.7
--- cfgbuild.c 2001/10/16 14:54:28
*************** static void make_edges PARAMS ((rtx, i
*** 55,60 ****
--- 55,61 ----
static void make_label_edge PARAMS ((sbitmap *, basic_block,
rtx, int));
static void make_eh_edge PARAMS ((sbitmap *, basic_block, rtx));
+ static void find_bb_boundaries PARAMS ((basic_block));
/* Count the basic blocks of the function. */
*************** make_edges (label_value_list, min, max,
*** 246,252 ****
}
/* By nature of the way these get numbered, block 0 is always the entry. */
! cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
for (i = min; i <= max; ++i)
{
--- 247,254 ----
}
/* By nature of the way these get numbered, block 0 is always the entry. */
! if (min == 0)
! cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
for (i = min; i <= max; ++i)
{
*************** find_basic_blocks (f, nregs, file)
*** 663,680 ****
timevar_pop (TV_CFG);
}
\f
! /* Assume that someone emitted code with control flow instructions to the
! basic block. Update the data structure. */
! void
! find_sub_basic_blocks (bb)
basic_block bb;
{
rtx insn = bb->head;
rtx end = bb->end;
rtx flow_transfer_insn = NULL_RTX;
edge fallthru = NULL;
- basic_block first_bb = bb;
- int i;
if (insn == bb->end)
return;
--- 665,691 ----
timevar_pop (TV_CFG);
}
\f
! /* State of basic block as seen by find_sub_basic_blocks. */
! enum state
! {
! BLOCK_NEW = 0,
! BLOCK_ORIGINAL,
! BLOCK_TO_SPLIT
! };
! #define STATE(bb) (enum state)(size_t)(bb)->aux
! #define SET_STATE(bb, state) (bb)->aux = (void *)(state)
!
! /* Scan basic block BB for possible BB boundaries inside the block
! and create new basic blocks in the progress. */
!
! static void
! find_bb_boundaries (bb)
basic_block bb;
{
rtx insn = bb->head;
rtx end = bb->end;
rtx flow_transfer_insn = NULL_RTX;
edge fallthru = NULL;
if (insn == bb->end)
return;
*************** find_sub_basic_blocks (bb)
*** 694,700 ****
abort ();
break;
! /* On code label, split current basic block. */
case CODE_LABEL:
fallthru = split_block (bb, PREV_INSN (insn));
if (flow_transfer_insn)
--- 705,711 ----
abort ();
break;
! /* On code label, split current basic block. */
case CODE_LABEL:
fallthru = split_block (bb, PREV_INSN (insn));
if (flow_transfer_insn)
*************** find_sub_basic_blocks (bb)
*** 725,731 ****
{
if (GET_CODE (PATTERN (insn)) == ADDR_VEC
|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! abort();
flow_transfer_insn = insn;
}
else if (code == CALL_INSN)
--- 736,742 ----
{
if (GET_CODE (PATTERN (insn)) == ADDR_VEC
|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! abort ();
flow_transfer_insn = insn;
}
else if (code == CALL_INSN)
*************** find_sub_basic_blocks (bb)
*** 764,781 ****
followed by cleanup at fallthru edge, so the outgoing edges may
be dead. */
purge_dead_edges (bb);
/* Now re-scan and wire in all edges. This expect simple (conditional)
jumps at the end of each new basic blocks. */
! make_edges (NULL, first_bb->index, bb->index, 1);
/* Update branch probabilities. Expect only (un)conditional jumps
to be created with only the forward edges. */
! for (i = first_bb->index; i <= bb->index; i++)
{
edge e,f;
basic_block b = BASIC_BLOCK (i);
! if (b != first_bb)
{
b->count = 0;
b->frequency = 0;
--- 775,828 ----
followed by cleanup at fallthru edge, so the outgoing edges may
be dead. */
purge_dead_edges (bb);
+ }
+
+ /* Assume that someone emitted code with control flow instructions to the
+ basic block. Update the data structure. */
+ void
+ find_sub_basic_blocks (blocks)
+ sbitmap blocks;
+ {
+ int i;
+ int min, max;
+
+ for (i = 0; i < n_basic_blocks; i++)
+ {
+ SET_STATE (BASIC_BLOCK (i),
+ TEST_BIT (blocks, i)
+ ? BLOCK_TO_SPLIT : BLOCK_ORIGINAL);
+ }
+
+ for (i = 0; i < n_basic_blocks; i++)
+ {
+ basic_block bb = BASIC_BLOCK (i);
+ if (STATE (bb) == BLOCK_TO_SPLIT)
+ find_bb_boundaries (bb);
+ }
+
+ for (i = 0; i < n_basic_blocks; i++)
+ if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+ break;
+ min = max = i;
+ for (; i < n_basic_blocks; i++)
+ if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+ max = i;
+
/* Now re-scan and wire in all edges. This expect simple (conditional)
jumps at the end of each new basic blocks. */
! make_edges (NULL, min, max, 1);
/* Update branch probabilities. Expect only (un)conditional jumps
to be created with only the forward edges. */
! for (i = min; i <= max; i++)
{
edge e,f;
basic_block b = BASIC_BLOCK (i);
!
! if (STATE (b) == BLOCK_ORIGINAL)
! continue;
! if (STATE (b) == BLOCK_NEW)
{
b->count = 0;
b->frequency = 0;
*************** find_sub_basic_blocks (bb)
*** 810,813 ****
--- 857,862 ----
e->count = b->count;
}
}
+ for (i = 0; i < n_basic_blocks; i++)
+ SET_STATE (BASIC_BLOCK (i), 0);
}
Index: cfgrtl.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfgrtl.c,v
retrieving revision 1.4
diff -c -3 -p -r1.4 cfgrtl.c
*** cfgrtl.c 2001/10/11 03:15:24 1.4
--- cfgrtl.c 2001/10/16 14:54:28
*************** commit_one_edge_insertion (e)
*** 1221,1226 ****
--- 1221,1227 ----
{
rtx before = NULL_RTX, after = NULL_RTX, insns, tmp, last;
basic_block bb;
+ sbitmap b;
/* Pull the insns off the edge now since the edge might go away. */
insns = e->insns;
*************** commit_one_edge_insertion (e)
*** 1314,1320 ****
}
else if (GET_CODE (last) == JUMP_INSN)
abort ();
! find_sub_basic_blocks (bb);
}
/* Update the CFG for all queued instructions. */
--- 1315,1325 ----
}
else if (GET_CODE (last) == JUMP_INSN)
abort ();
!
! b = sbitmap_alloc (n_basic_blocks);
! sbitmap_zero (b);
! SET_BIT (b, bb->index);
! find_sub_basic_blocks (b);
}
/* Update the CFG for all queued instructions. */
Index: recog.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/recog.c,v
retrieving revision 1.128
diff -c -3 -p -r1.128 recog.c
*** recog.c 2001/10/16 04:19:26 1.128
--- recog.c 2001/10/16 14:54:31
*************** split_all_insns (upd_life)
*** 2778,2785 ****
if (changed)
{
! for (i = 0; i < n_basic_blocks; i++)
! find_sub_basic_blocks (BASIC_BLOCK (i));
}
if (changed && upd_life)
--- 2778,2784 ----
if (changed)
{
! find_sub_basic_blocks (blocks);
}
if (changed && upd_life)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 7:22 ` Jan Hubicka
@ 2001-10-16 8:25 ` Brad Lucier
2001-10-16 12:46 ` Richard Henderson
1 sibling, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-16 8:25 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, jh, gcc
>
> > cfg construction : 269.67 (34%) usr 0.32 ( 0%) sys 270.02 (26%) wall
>
> Attached patch should solve this problem.
> It avoids the worst case quadratic scenario on removing edges.
This is great! Here's the new time
cfg construction : 8.18 ( 2%) usr 0.32 ( 0%) sys 8.62 ( 1%) wall
> Bootstrapped/regtested i386.
Great! Now on to testing the next patch.
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 8:01 ` Jan Hubicka
@ 2001-10-16 12:39 ` Brad Lucier
2001-10-16 14:45 ` Jan Hubicka
2001-10-17 12:43 ` Brad Lucier
2001-10-17 13:38 ` Richard Henderson
2 siblings, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-16 12:39 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, rth, jh, gcc
>
>
> Hi,
> this patch should track down the second problem:
>
> > flow 2 : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
> > 32.36 602.16 253.66 9556 26.54 26.54 sbitmap_vector_alloc
>
> It makes find_sub_basic_blocks to operate at bitmap of blocks, instead of just
> one and thus it avoids multiple initializations of edge cache structure.
>
> The possible problem is that I now need to benerate bitmap when calling from
> recog.c. I hope it won't cause perofrmance regression as the bitmap is
> relativly small (compared to edge cache). If it will, I can use bitmap,
> instead of sbitmap, that will add some overhead at split_all_insns side.
>
> I don't have time to finish bootstrap, but the patch appears to work.
> OK assuming the bootstrapping/regtesting i386 passes fluently?
Bootstrapped and regtested on i686-pc-linux-gnu, flow2 is now
flow 2 : 33.92 (16%) usr 0.10 ( 2%) sys 34.00 (16%) wall
So, two home runs in one day! (Two goals in one day? ...)
Here are the new timing details:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.66 29.10 29.10 72698858 0.00 0.00 bitmap_operation
12.43 50.81 21.71 13 1670.00 4145.64 calculate_global_regs_live
9.86 68.04 17.23 9305997 0.00 0.00 cached_make_edge
5.57 77.77 9.73 67331 0.14 0.38 try_crossjump_bb
4.03 84.81 7.04 htab_traverse
2.99 90.03 5.22 27855 0.19 0.19 sbitmap_intersection_of_su
ccs
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 7:22 ` Jan Hubicka
2001-10-16 8:25 ` Brad Lucier
@ 2001-10-16 12:46 ` Richard Henderson
1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-16 12:46 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc
On Tue, Oct 16, 2001 at 04:22:46PM +0200, Jan Hubicka wrote:
> > cfg construction : 269.67 (34%) usr 0.32 ( 0%) sys 270.02 (26%) wall
>
> Attached patch should solve this problem.
> It avoids the worst case quadratic scenario on removing edges.
Ok.
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 12:39 ` Brad Lucier
@ 2001-10-16 14:45 ` Jan Hubicka
2001-10-16 16:57 ` Brad Lucier
0 siblings, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16 14:45 UTC (permalink / raw)
To: Brad Lucier; +Cc: Jan Hubicka, gcc-patches, rth, gcc
> Bootstrapped and regtested on i686-pc-linux-gnu, flow2 is now
>
> flow 2 : 33.92 (16%) usr 0.10 ( 2%) sys 34.00 (16%) wall
>
> So, two home runs in one day! (Two goals in one day? ...)
Good news. I've planed those two changes for a while, so don't expect another
two hours tomorrow :)
Just curious, how does the time compare to the older gcc versions?
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 16.66 29.10 29.10 72698858 0.00 0.00 bitmap_operation
> 12.43 50.81 21.71 13 1670.00 4145.64 calculate_global_regs_live
For some purpose, the liveness analyzis by df module appears to work faster than
flow.c in non-patological cases (such as combine.c, where flow liveness takes
about 5-7%, but df.c in my webyzing pass did take about 1%).
I wonder if we can't speed up the flow.c pass considerably.
What other functions (except for bitmap_operation) does have more than 10
millions of calls? Do we run into problems with too much RTL traversal
or it is purely dominated by the dataflow bitmaps?
Honza
> 9.86 68.04 17.23 9305997 0.00 0.00 cached_make_edge
> 5.57 77.77 9.73 67331 0.14 0.38 try_crossjump_bb
> 4.03 84.81 7.04 htab_traverse
> 2.99 90.03 5.22 27855 0.19 0.19 sbitmap_intersection_of_su
> ccs
>
> Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 14:45 ` Jan Hubicka
@ 2001-10-16 16:57 ` Brad Lucier
0 siblings, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-16 16:57 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, gcc-patches, rth, gcc
> Just curious, how does the time compare to the older gcc versions?
Well, 3.0.1 compiles this file in about 1/3 the time:
dino01% /soft/parallelisme/linux/gcc-3.0.1/lib/gcc-lib/i686-pc-linux-gnu/3.0.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
__sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 23738k -> 7627k} {GC 11539k -> 7534k} {GC 9859k -> 7972k} {GC 11631k -> 8916k} {GC 14125k -> 9023k} ___init_proc ____20___num
Execution times (seconds)
garbage collection : 0.61 ( 1%) usr 0.00 ( 0%) sys 0.64 ( 1%) wall
preprocessing : 0.14 ( 0%) usr 0.13 (14%) sys 0.27 ( 0%) wall
lexical analysis : 0.34 ( 1%) usr 0.23 (24%) sys 0.56 ( 1%) wall
parser : 1.10 ( 2%) usr 0.15 (16%) sys 1.27 ( 2%) wall
varconst : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
integration : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
jump : 4.04 ( 6%) usr 0.17 (18%) sys 4.36 ( 6%) wall
CSE : 0.75 ( 1%) usr 0.00 ( 0%) sys 0.77 ( 1%) wall
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 11.21 (17%) usr 0.08 ( 8%) sys 12.47 (16%) wall
combiner : 1.07 ( 2%) usr 0.01 ( 1%) sys 1.12 ( 1%) wall
if-conversion : 1.12 ( 2%) usr 0.03 ( 3%) sys 1.22 ( 2%) wall
local alloc : 0.41 ( 1%) usr 0.03 ( 3%) sys 0.56 ( 1%) wall
global alloc : 2.58 ( 4%) usr 0.03 ( 3%) sys 3.14 ( 4%) wall
reload CSE regs : 9.85 (15%) usr 0.01 ( 1%) sys 12.76 (16%) wall
flow 2 : 13.72 (21%) usr 0.01 ( 1%) sys 19.63 (25%) wall
if-conversion 2 : 0.96 ( 1%) usr 0.01 ( 1%) sys 1.19 ( 2%) wall
shorten branches : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.16 ( 0%) wall
reg stack : 15.94 (24%) usr 0.05 ( 5%) sys 17.05 (22%) wall
final : 0.97 ( 1%) usr 0.00 ( 0%) sys 0.97 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 0.45 ( 1%) usr 0.00 ( 0%) sys 0.46 ( 1%) wall
TOTAL : 65.49 0.96 78.82
> I wonder if we can't speed up the flow.c pass considerably.
> What other functions (except for bitmap_operation) does have more than 10
> millions of calls? Do we run into problems with too much RTL traversal
> or it is purely dominated by the dataflow bitmaps?
Here is some detailed information from gprof:
/u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
__sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num
Execution times (seconds)
garbage collection : 1.10 ( 1%) usr 0.00 ( 0%) sys 1.09 ( 1%) wall
cfg construction : 8.09 ( 4%) usr 0.40 ( 8%) sys 8.56 ( 4%) wall
cfg cleanup : 52.37 (25%) usr 0.03 ( 1%) sys 52.47 (24%) wall
preprocessing : 0.48 ( 0%) usr 0.10 ( 2%) sys 0.50 ( 0%) wall
lexical analysis : 0.73 ( 0%) usr 0.22 ( 5%) sys 0.94 ( 0%) wall
parser : 2.57 ( 1%) usr 0.21 ( 4%) sys 2.94 ( 1%) wall
varconst : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall
jump : 0.76 ( 0%) usr 0.02 ( 0%) sys 0.72 ( 0%) wall
CSE : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.75 ( 1%) wall
global CSE : 41.29 (20%) usr 0.53 (11%) sys 41.88 (19%) wall
loop analysis : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
flow analysis : 24.23 (11%) usr 0.16 ( 3%) sys 24.38 (11%) wall
combiner : 1.76 ( 1%) usr 0.00 ( 0%) sys 1.78 ( 1%) wall
if-conversion : 1.19 ( 1%) usr 0.04 ( 1%) sys 1.25 ( 1%) wall
local alloc : 0.67 ( 0%) usr 0.00 ( 0%) sys 0.69 ( 0%) wall
global alloc : 4.69 ( 2%) usr 0.05 ( 1%) sys 4.75 ( 2%) wall
reload CSE regs : 9.90 ( 5%) usr 0.01 ( 0%) sys 9.88 ( 5%) wall
flow 2 : 33.92 (16%) usr 0.10 ( 2%) sys 34.00 (16%) wall
if-conversion 2 : 0.98 ( 0%) usr 0.03 ( 1%) sys 1.03 ( 0%) wall
shorten branches : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
reg stack : 22.45 (11%) usr 2.86 (60%) sys 25.31 (12%) wall
final : 0.75 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall
rest of compilation : 1.30 ( 1%) usr 0.00 ( 0%) sys 1.38 ( 1%) wall
TOTAL : 211.35 4.78 216.50
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.66 29.10 29.10 72698858 0.00 0.00 bitmap_operation
12.43 50.81 21.71 13 1670.00 4145.64 calculate_global_regs_live
9.86 68.04 17.23 9305997 0.00 0.00 cached_make_edge
5.57 77.77 9.73 67331 0.14 0.38 try_crossjump_bb
4.03 84.81 7.04 htab_traverse
2.99 90.03 5.22 27855 0.19 0.19 sbitmap_intersection_of_su
ccs
2.60 94.57 4.54 207472 0.02 0.02 try_forward_edges
2.31 98.61 4.04 6 673.33 976.87 compute_laterin
2.21 102.47 3.86 6270 0.62 0.62 expunge_block
2.18 106.28 3.81 191 19.95 19.95 sbitmap_vector_alloc
2.15 110.04 3.76 6047223 0.00 0.00 rtx_renumbered_equal_p
2.12 113.74 3.70 31 119.35 119.35 find_unreachable_blocks
1.72 116.75 3.01 61978208 0.00 0.00 active_insn_p
1.67 119.66 2.91 9272730 0.00 0.00 make_label_edge
1.67 122.57 2.91 451 6.45 6.45 propagate_freq
1.35 124.93 2.36 15 157.33 158.67 calc_idoms
1.33 127.26 2.33 101420319 0.00 0.00 bitmap_element_link
1.33 129.59 2.33 29217 0.08 0.08 sbitmap_intersection_of_pr
eds
1.11 131.53 1.94 25030880 0.00 0.00 forwarder_block_p
1.01 133.30 1.77 5845902 0.00 0.00 flow_find_cross_jump
0.97 134.99 1.69 9385 0.18 0.20 convert_regs_1
0.95 136.65 1.66 15 110.67 110.67 calc_dfs_tree_nonrec
0.94 138.29 1.64 120 13.67 201.19 make_edges
0.87 139.81 1.52 11128643 0.00 0.00 sbitmap_a_and_b
0.81 141.23 1.42 6407969 0.00 0.00 try_crossjump_to_edge
0.78 142.60 1.37 5627 0.24 0.24 can_delete_label_p
0.69 143.81 1.21 6 201.67 281.03 compute_insert_delete
0.68 144.99 1.18 6 196.67 328.86 compute_earliest
0.64 146.11 1.12 6 186.67 186.67 mark_dfs_back_edges
0.58 147.13 1.02 11904476 0.00 0.00 onlyjump_p
0.56 148.10 0.97 10 97.00 106.99 clear_edges
0.54 149.05 0.95 7458208 0.00 0.00 sbitmap_difference
0.54 150.00 0.95 3 316.67 701.77 flow_loops_find
0.50 150.87 0.87 6 145.00 1027.93 compute_antinout_edge
0.46 151.68 0.81 6 135.00 135.00 create_edge_list
0.42 152.42 0.74 13479620 0.00 0.00 find_reg_note
0.35 153.03 0.61 5 122.00 4570.84 commit_edge_insertions
0.34 153.62 0.59 10292 0.06 0.06 remove_edge
0.33 154.19 0.57 6 95.00 496.89 compute_available
0.33 154.76 0.57 3 190.00 1449.86 estimate_bb_frequencies
0.30 155.29 0.53 11543061 0.00 0.00 find_reg_equal_equiv_note
0.30 155.82 0.53 24708 0.02 0.04 try_combine
0.30 156.34 0.52 12898539 0.00 0.00 side_effects_p
0.26 156.80 0.46 170126 0.00 0.00 find_reloads
0.24 157.22 0.42 4200455 0.00 0.00 ggc_set_mark
0.24 157.64 0.42 1 420.00 2322.97 convert_regs_2
0.22 158.02 0.38 18788 0.02 0.02 remove_fake_successors
0.22 158.40 0.38 6 63.33 71.66 thread_jumps
0.21 158.77 0.37 3 123.33 14103.27 optimize_mode_switching
0.19 159.10 0.33 3 110.00 110.00 compute_alignments
0.18 159.42 0.32 11698103 0.00 0.00 single_set_2
...
-----------------------------------------------
21.71 32.18 13/13 update_life_info [8]
[9] 30.9 21.71 32.18 13 calculate_global_regs_live [9]
28.90 2.36 72205551/72698858 bitmap_operation [14]
0.03 0.82 56808/132182 propagate_block [55]
0.00 0.02 56808/56808 bitmap_equal_p [376]
0.02 0.00 100467/219660 bitmap_copy [279]
0.01 0.00 328395/1374187 bitmap_clear [304]
0.00 0.00 255232/1933812 bitmap_set_bit [282]
0.00 0.00 112039/518311 bitmap_initialize [420]
0.00 0.00 1828/101420319 bitmap_element_link [52]
0.00 0.00 1/1125177 sbitmap_zero [225]
-----------------------------------------------
0.37 41.94 3/3 rest_of_compilation [7]
[10] 24.2 0.37 41.94 3 optimize_mode_switching [10]
0.00 20.56 6/6 pre_edge_lcm [19]
0.01 16.47 3/11 update_life_info [8]
0.12 4.45 1/5 commit_edge_insertions [17]
0.24 0.00 12/191 sbitmap_vector_alloc [38]
0.01 0.01 3/21 sbitmap_vector_ones [177]
0.01 0.01 37484/2679415 note_stores <cycle 7> [115]
0.00 0.01 37243/80229 get_attr_type [327]
0.01 0.01 12/137 sbitmap_vector_zero [171]
0.01 0.00 48/48 make_preds_opaque [581]
0.00 0.00 3/10 allocate_reg_life_data [457]
0.00 0.00 23837/460580 reg_set_to_hard_reg_set [244]
0.00 0.00 14620/91501 gen_sequence [405]
0.00 0.00 3528/60871 recog_memoized_1 [290]
0.00 0.00 28568/474001 asm_noperands [359]
0.00 0.00 18576/3738386 sbitmap_not [184]
0.00 0.00 14606/14838 emit_insn_before [991]
0.00 0.00 5/5 emit_i387_cw_initialization [1113]
0.00 0.00 14/123 insert_insn_on_edge [1058]
0.00 0.00 10/68 assign_386_stack_local [1079]
0.00 0.00 5/2679415 emit_move_insn <cycle 7> [451]
0.00 0.00 24525/24525 reg_dies [1398]
0.00 0.00 14634/108004 end_sequence [1322]
0.00 0.00 14620/108004 start_sequence [1323]
0.00 0.00 9292/9292 new_seginfo [1466]
0.00 0.00 9292/9292 add_seginfo [1465]
0.00 0.00 6/6 free_edge_list [1968]
-----------------------------------------------
0.00 5.32 3/22 update_life_info [8]
0.00 33.69 19/22 rest_of_compilation [7]
[11] 22.3 0.00 39.01 22 cleanup_cfg [11]
0.08 35.19 22/22 try_optimize_cfg [13]
0.02 3.72 31/31 delete_unreachable_blocks [40]
0.00 0.00 22/1239368 timevar_push [153]
0.00 0.00 22/1239368 timevar_pop [164]
0.00 0.00 44/136318 free_EXPR_LIST_list [1315]
-----------------------------------------------
0.00 5.54 1/7 reg_to_stack [23]
0.00 33.26 6/7 rest_of_compilation [7]
[12] 22.2 0.00 38.81 7 life_analysis [12]
0.03 38.42 7/11 update_life_info [8]
0.05 0.14 6/16 init_alias_analysis [95]
0.04 0.06 7/10 delete_noop_moves [179]
0.00 0.03 7/73 free_basic_block_vars [119]
0.01 0.02 3/3 notice_stack_pointer_modification [365]
0.01 0.01 7/10 allocate_reg_life_data [457]
0.00 0.00 7/7 allocate_bb_life_data [716]
0.00 0.00 7/7 mark_regs_live_at_end [1213]
0.00 0.00 6/16 end_alias_analysis [1922]
-----------------------------------------------
0.08 35.19 22/22 cleanup_cfg [11]
[13] 20.2 0.08 35.19 22 try_optimize_cfg [13]
9.73 15.62 67331/67331 try_crossjump_bb [15]
4.54 0.07 207472/207472 try_forward_edges [33]
0.00 2.40 3138/5348 flow_delete_block [36]
0.06 1.72 207472/207472 try_simplify_condjump [59]
0.00 0.41 1099/1099 merge_blocks [107]
0.00 0.38 6/6 remove_fake_edges [111]
0.00 0.18 2011/12100 delete_insn_chain [73]
0.00 0.05 74089/81227 redirect_edge_and_branch [271]
0.01 0.01 95491/25030880 forwarder_block_p [32]
0.01 0.00 85868/11904476 onlyjump_p [64]
0.00 0.00 350/7709 redirect_edge_succ_nodup [604]
0.00 0.00 2011/157599 reg_mentioned_p [422]
0.00 0.00 6/6 add_noreturn_fake_exit_edges [1965]
-----------------------------------------------
0.00 0.00 5/72698858 find_if_case_1 [620]
0.00 0.00 270/72698858 dead_or_predicable [710]
0.01 0.00 18572/72698858 update_equiv_regs [159]
0.02 0.00 56808/72698858 bitmap_equal_p [376]
0.17 0.01 417652/72698858 finish_spills [114]
28.90 2.36 72205551/72698858 calculate_global_regs_live [9]
[14] 18.0 29.10 2.38 72698858 bitmap_operation [14]
2.31 0.00 100601217/101420319 bitmap_element_link [52]
0.07 0.00 3360247/4560586 bitmap_element_allocate [214]
-----------------------------------------------
9.73 15.62 67331/67331 try_optimize_cfg [13]
[15] 14.5 9.73 15.62 67331 try_crossjump_bb [15]
1.42 14.20 6407969/6407969 try_crossjump_to_edge [21]
-----------------------------------------------
0.14 1.88 10/120 find_basic_blocks [46]
1.50 20.63 110/120 find_sub_basic_blocks [18]
[16] 13.8 1.64 22.50 120 make_edges [16]
17.22 0.00 9301682/9305997 cached_make_edge [20]
2.91 0.00 9272730/9272730 make_label_edge [47]
2.19 0.00 110/191 sbitmap_vector_alloc [38]
0.07 0.06 110/137 sbitmap_vector_zero [171]
0.00 0.02 44026/97363 returnjump_p [317]
0.01 0.00 47889/50845 computed_jump_p [493]
0.01 0.00 75082/93306 next_nonnote_insn [556]
0.00 0.00 47889/13479620 find_reg_note [83]
-----------------------------------------------
0.12 4.45 1/5 optimize_mode_switching [10]
0.12 4.45 1/5 convert_regs [27]
0.37 13.35 3/5 thread_prologue_and_epilogue_insns [22]
[17] 13.1 0.61 22.24 5 commit_edge_insertions [17]
0.27 21.96 109/110 find_sub_basic_blocks [18]
0.00 0.02 109/109 commit_one_edge_insertion [472]
-----------------------------------------------
0.00 0.20 1/110 split_all_insns [96]
0.27 21.96 109/110 commit_edge_insertions [17]
[18] 12.8 0.27 22.16 110 find_sub_basic_blocks [18]
1.50 20.63 110/120 make_edges [16]
0.00 0.03 362/362 find_bb_boundaries [339]
0.00 0.00 362/9683 purge_dead_edges [684]
0.00 0.00 544/13479620 find_reg_note [83]
-----------------------------------------------
0.00 20.56 6/6 optimize_mode_switching [10]
[19] 11.8 0.00 20.56 6 pre_edge_lcm [19]
0.87 5.30 6/6 compute_antinout_edge [29]
4.04 1.82 6/6 compute_laterin [30]
0.57 2.41 6/6 compute_available [43]
1.18 0.79 6/6 compute_earliest [56]
1.21 0.48 6/6 compute_insert_delete [60]
1.08 0.00 54/191 sbitmap_vector_alloc [38]
0.81 0.00 6/6 create_edge_list [79]
-----------------------------------------------
0.01 0.00 4315/9305997 make_edge [613]
17.22 0.00 9301682/9305997 make_edges [16]
[20] 9.9 17.23 0.00 9305997 cached_make_edge [20]
-----------------------------------------------
1.42 14.20 6407969/6407969 try_crossjump_bb [15]
[21] 8.9 1.42 14.20 6407969 try_crossjump_to_edge [21]
1.77 6.66 5845902/5845902 flow_find_cross_jump [24]
1.90 2.93 24480556/25030880 forwarder_block_p [32]
0.00 0.33 3663/12100 delete_insn_chain [73]
0.21 0.01 6407841/6407841 outgoing_edges_match [148]
0.21 0.00 3729/10292 remove_edge [93]
0.00 0.18 517/597 split_block [154]
0.00 0.01 3663/4298 make_single_succ_edge [614]
0.00 0.00 3663/10353 gen_jump [689]
0.00 0.00 3663/3736 emit_jump_insn_after [783]
0.00 0.00 3663/93306 next_nonnote_insn [556]
0.00 0.00 3663/13479620 find_reg_note [83]
0.00 0.00 3729/7450804 free_edge [203]
0.00 0.00 3663/9503 block_label [961]
0.00 0.00 66/153 emit_barrier_after [1036]
0.00 0.00 66/254995 gen_rtx_CONST_INT [519]
-----------------------------------------------
0.00 13.72 3/3 rest_of_compilation [7]
[22] 7.9 0.00 13.72 3 thread_prologue_and_epilogue_insns [22]
0.37 13.35 3/5 commit_edge_insertions [17]
0.00 0.00 3/3 gen_prologue [699]
0.00 0.00 3/3 gen_epilogue [1092]
0.00 0.00 6/91501 gen_sequence [405]
0.00 0.00 6/129874 emit_insn [386]
0.00 0.00 6/123 insert_insn_on_edge [1058]
0.00 0.00 6/51353 emit_note [663]
0.00 0.00 3/17313 emit_jump_insn [703]
0.00 0.00 12/108004 end_sequence [1322]
0.00 0.00 6/108004 start_sequence [1323]
0.00 0.00 6/6 record_insns [1976]
0.00 0.00 3/3 ix86_can_use_return_insn_p [2039]
-----------------------------------------------
0.26 12.66 3/3 rest_of_compilation [7]
[23] 7.4 0.26 12.66 3 reg_to_stack [23]
0.00 6.92 1/1 convert_regs [27]
0.00 5.54 1/7 life_analysis [12]
0.19 0.00 1/6 mark_dfs_back_edges [72]
0.01 0.00 1/5 count_or_remove_death_notes [309]
0.00 0.00 1/7 delete_dead_jumptables [351]
0.00 0.00 1/4 alloc_aux_for_blocks [582]
0.00 0.00 105/152060 gen_raw_REG [366]
0.00 0.00 105/138338 gen_rtx_REG [1314]
0.00 0.00 1/9317 get_max_uid [1463]
0.00 0.00 1/262 varray_init [1700]
-----------------------------------------------
1.77 6.66 5845902/5845902 try_crossjump_to_edge [21]
[24] 4.8 1.77 6.66 5845902 flow_find_cross_jump [24]
3.59 0.00 5771506/6047223 rtx_renumbered_equal_p [39]
0.53 0.96 11543012/11543061 find_reg_equal_equiv_note [65]
1.00 0.47 11691804/11904476 onlyjump_p [64]
0.05 0.04 616039/682505 stack_regs_mentioned [205]
0.00 0.02 46872/97363 returnjump_p [317]
0.00 0.00 188/1264901 rtx_equal_p [442]
0.00 0.00 7/39 remove_note [1892]
-----------------------------------------------
0.02 7.72 9/9 rest_of_compilation [7]
[25] 4.4 0.02 7.72 9 if_convert [25]
0.00 5.49 1/11 update_life_info [8]
0.00 1.65 6/15 calculate_dominance_info [35]
0.01 0.40 28166/28166 find_if_header [106]
0.12 0.00 6/191 sbitmap_vector_alloc [38]
0.00 0.04 9/73 free_basic_block_vars [119]
0.01 0.00 1/5 count_or_remove_death_notes [309]
0.00 0.00 1/26 allocate_reg_info [432]
0.00 0.00 1/1125177 sbitmap_zero [225]
0.00 0.00 2/60023 max_reg_num [1344]
0.00 0.00 1/662 sbitmap_alloc [1612]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-15 11:58 ` Brad Lucier
@ 2001-10-16 21:15 ` Zack Weinberg
0 siblings, 0 replies; 21+ messages in thread
From: Zack Weinberg @ 2001-10-16 21:15 UTC (permalink / raw)
To: Brad Lucier; +Cc: jh, gcc
On Mon, Oct 15, 2001 at 01:58:18PM -0500, Brad Lucier wrote:
> >
> > - Can you repeat this test with cc1 linked against a profile-
> > instrumented libc,
>
> Unfortunately, on this box libc_p.a seems to be significantly older than
> libc.a and there are link errors.
Weird. What system is this?
> > and libiberty re-built with CFLAGS="-O2 -g -pg"?
>
> I tried this, too, and it links, but I can't get a gmon.out file from
> running the resulting cc1. Perhaps I just don't know how to do such
> things properly.
Hmm, the only thing I can think of is if you forgot to spec CFLAGS
including -pg when re-linking cc1; leave that out and gcrt1.o isn't
loaded, and the profiler never gets started.
zw
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 8:01 ` Jan Hubicka
2001-10-16 12:39 ` Brad Lucier
@ 2001-10-17 12:43 ` Brad Lucier
2001-10-17 13:38 ` Richard Henderson
2 siblings, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-17 12:43 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, rth, jh, gcc
> I don't have time to finish bootstrap, but the patch appears to work.
> OK assuming the bootstrapping/regtesting i386 passes fluently?
>
> Honza
>
> Tue Oct 16 16:56:33 CEST 2001 Jan Hubicka <jh@suse.cz>
> * basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
> * cfgbuild.c (find_bb_boundaries): Break out from ...
> (find_sub_basic_blocks): ... here; reorganize to handle multiple
> basic blocks at once.
> * cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> to find_sub_basic_blocks.
> * recog.c (split_all_insns): Update find_sub_basic_blocks call.
I haven't seen this patch approved or acknowledged yet, but if it helps, I
bootstrapped and regtested it on i686-pc-linux-gnu, see
http://gcc.gnu.org/ml/gcc-testresults/2001-10/msg00245.html
Brad
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-16 8:01 ` Jan Hubicka
2001-10-16 12:39 ` Brad Lucier
2001-10-17 12:43 ` Brad Lucier
@ 2001-10-17 13:38 ` Richard Henderson
2001-10-17 14:00 ` Jan Hubicka
2001-10-17 15:38 ` Jan Hubicka
2 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-17 13:38 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc
On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> * cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> to find_sub_basic_blocks.
I don't especially like the feature that commite_one_edge_insertion
has to do all sorts of extra work to update one block. Perhaps you
should create a find_many_sub_basic_blocks that takes the sbitmap,
but find_sub_basic_blocks continues to update one block.
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-17 13:38 ` Richard Henderson
@ 2001-10-17 14:00 ` Jan Hubicka
2001-10-17 15:38 ` Jan Hubicka
1 sibling, 0 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-17 14:00 UTC (permalink / raw)
To: Richard Henderson, Jan Hubicka, Brad Lucier, gcc-patches, gcc
> On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> > * cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> > to find_sub_basic_blocks.
>
> I don't especially like the feature that commite_one_edge_insertion
> has to do all sorts of extra work to update one block. Perhaps you
Agreed.
> should create a find_many_sub_basic_blocks that takes the sbitmap,
> but find_sub_basic_blocks continues to update one block.
Hmm, there will be some code duplication mess :(, but probably still better.
Honza
>
>
> r~
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-17 13:38 ` Richard Henderson
2001-10-17 14:00 ` Jan Hubicka
@ 2001-10-17 15:38 ` Jan Hubicka
2001-10-17 16:10 ` Richard Henderson
1 sibling, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-17 15:38 UTC (permalink / raw)
To: Richard Henderson, Jan Hubicka, Brad Lucier, gcc-patches, gcc
> On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> > * cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> > to find_sub_basic_blocks.
>
> I don't especially like the feature that commite_one_edge_insertion
> has to do all sorts of extra work to update one block. Perhaps you
> should create a find_many_sub_basic_blocks that takes the sbitmap,
> but find_sub_basic_blocks continues to update one block.
>
Done.
Thu Oct 18 01:47:20 CEST 2001 Jan Hubicka <jh@suse.cz>
* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
* cfgbuild.c (find_bb_boundaries, compute_outgoing_frequencies):
Break out from ...
(find_sub_basic_blocks): ... here;
(find_many_sub_basic_blocks): New.
* recog.c (split_all_insns): Update find_sub_basic_blocks call.
*** cfgbuild.c.b Thu Oct 18 01:34:54 2001
--- cfgbuild.c Thu Oct 18 01:46:25 2001
*************** static void make_edges PARAMS ((rtx, i
*** 55,60 ****
--- 55,62 ----
static void make_label_edge PARAMS ((sbitmap *, basic_block,
rtx, int));
static void make_eh_edge PARAMS ((sbitmap *, basic_block, rtx));
+ static void find_bb_boundaries PARAMS ((basic_block));
+ static void compute_outgoing_frequencies PARAMS ((basic_block));
/* Count the basic blocks of the function. */
*************** make_edges (label_value_list, min, max,
*** 246,252 ****
}
/* By nature of the way these get numbered, block 0 is always the entry. */
! cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
for (i = min; i <= max; ++i)
{
--- 248,255 ----
}
/* By nature of the way these get numbered, block 0 is always the entry. */
! if (min == 0)
! cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
for (i = min; i <= max; ++i)
{
*************** find_basic_blocks (f, nregs, file)
*** 663,680 ****
timevar_pop (TV_CFG);
}
\f
! /* Assume that someone emitted code with control flow instructions to the
! basic block. Update the data structure. */
! void
! find_sub_basic_blocks (bb)
basic_block bb;
{
rtx insn = bb->head;
rtx end = bb->end;
rtx flow_transfer_insn = NULL_RTX;
edge fallthru = NULL;
- basic_block first_bb = bb;
- int i;
if (insn == bb->end)
return;
--- 666,692 ----
timevar_pop (TV_CFG);
}
\f
! /* State of basic block as seen by find_sub_basic_blocks. */
! enum state
! {
! BLOCK_NEW = 0,
! BLOCK_ORIGINAL,
! BLOCK_TO_SPLIT
! };
! #define STATE(bb) (enum state)(size_t)(bb)->aux
! #define SET_STATE(bb, state) (bb)->aux = (void *)(state)
!
! /* Scan basic block BB for possible BB boundaries inside the block
! and create new basic blocks in the progress. */
!
! static void
! find_bb_boundaries (bb)
basic_block bb;
{
rtx insn = bb->head;
rtx end = bb->end;
rtx flow_transfer_insn = NULL_RTX;
edge fallthru = NULL;
if (insn == bb->end)
return;
*************** find_sub_basic_blocks (bb)
*** 694,700 ****
abort ();
break;
! /* On code label, split current basic block. */
case CODE_LABEL:
fallthru = split_block (bb, PREV_INSN (insn));
if (flow_transfer_insn)
--- 706,712 ----
abort ();
break;
! /* On code label, split current basic block. */
case CODE_LABEL:
fallthru = split_block (bb, PREV_INSN (insn));
if (flow_transfer_insn)
*************** find_sub_basic_blocks (bb)
*** 725,731 ****
{
if (GET_CODE (PATTERN (insn)) == ADDR_VEC
|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! abort();
flow_transfer_insn = insn;
}
else if (code == CALL_INSN)
--- 737,743 ----
{
if (GET_CODE (PATTERN (insn)) == ADDR_VEC
|| GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! abort ();
flow_transfer_insn = insn;
}
else if (code == CALL_INSN)
*************** find_sub_basic_blocks (bb)
*** 764,781 ****
followed by cleanup at fallthru edge, so the outgoing edges may
be dead. */
purge_dead_edges (bb);
/* Now re-scan and wire in all edges. This expect simple (conditional)
jumps at the end of each new basic blocks. */
! make_edges (NULL, first_bb->index, bb->index, 1);
/* Update branch probabilities. Expect only (un)conditional jumps
to be created with only the forward edges. */
! for (i = first_bb->index; i <= bb->index; i++)
{
! edge e,f;
basic_block b = BASIC_BLOCK (i);
! if (b != first_bb)
{
b->count = 0;
b->frequency = 0;
--- 776,862 ----
followed by cleanup at fallthru edge, so the outgoing edges may
be dead. */
purge_dead_edges (bb);
+ }
+
+ /* Assume that frequency of basic block B is known. Compute frequencies
+ and probabilities of outgoing edges. */
+
+ static void
+ compute_outgoing_frequencies (b)
+ basic_block b;
+ {
+ edge e, f;
+ if (b->succ && b->succ->succ_next && !b->succ->succ_next->succ_next)
+ {
+ rtx note = find_reg_note (b->end, REG_BR_PROB, NULL);
+ int probability;
+
+ if (!note)
+ return;
+ probability = INTVAL (XEXP (find_reg_note (b->end,
+ REG_BR_PROB, NULL), 0));
+ e = BRANCH_EDGE (b);
+ e->probability = probability;
+ e->count = ((b->count * probability + REG_BR_PROB_BASE / 2)
+ / REG_BR_PROB_BASE);
+ f = FALLTHRU_EDGE (b);
+ f->probability = REG_BR_PROB_BASE - probability;
+ f->count = b->count - e->count;
+ }
+ if (b->succ && !b->succ->succ_next)
+ {
+ e = b->succ;
+ e->probability = REG_BR_PROB_BASE;
+ e->count = b->count;
+ }
+ }
+
+ /* Assume that someone emitted code with control flow instructions to the
+ basic block. Update the data structure. */
+
+ void
+ find_many_sub_basic_blocks (blocks)
+ sbitmap blocks;
+ {
+ int i;
+ int min, max;
+
+ for (i = 0; i < n_basic_blocks; i++)
+ {
+ SET_STATE (BASIC_BLOCK (i),
+ TEST_BIT (blocks, i)
+ ? BLOCK_TO_SPLIT : BLOCK_ORIGINAL);
+ }
+
+ for (i = 0; i < n_basic_blocks; i++)
+ {
+ basic_block bb = BASIC_BLOCK (i);
+ if (STATE (bb) == BLOCK_TO_SPLIT)
+ find_bb_boundaries (bb);
+ }
+
+ for (i = 0; i < n_basic_blocks; i++)
+ if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+ break;
+ min = max = i;
+ for (; i < n_basic_blocks; i++)
+ if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+ max = i;
/* Now re-scan and wire in all edges. This expect simple (conditional)
jumps at the end of each new basic blocks. */
! make_edges (NULL, min, max, 1);
/* Update branch probabilities. Expect only (un)conditional jumps
to be created with only the forward edges. */
! for (i = min; i <= max; i++)
{
! edge e;
basic_block b = BASIC_BLOCK (i);
!
! if (STATE (b) == BLOCK_ORIGINAL)
! continue;
! if (STATE (b) == BLOCK_NEW)
{
b->count = 0;
b->frequency = 0;
*************** find_sub_basic_blocks (bb)
*** 785,813 ****
b->frequency += EDGE_FREQUENCY (e);
}
}
! if (b->succ && b->succ->succ_next && !b->succ->succ_next->succ_next)
! {
! rtx note = find_reg_note (b->end, REG_BR_PROB, NULL);
! int probability;
! if (!note)
! continue;
! probability = INTVAL (XEXP (find_reg_note (b->end,
! REG_BR_PROB,
! NULL), 0));
! e = BRANCH_EDGE (b);
! e->probability = probability;
! e->count = ((b->count * probability + REG_BR_PROB_BASE / 2)
! / REG_BR_PROB_BASE);
! f = FALLTHRU_EDGE (b);
! f->probability = REG_BR_PROB_BASE - probability;
! f->count = b->count - e->count;
! }
! if (b->succ && !b->succ->succ_next)
{
! e = b->succ;
! e->probability = REG_BR_PROB_BASE;
! e->count = b->count;
}
}
}
--- 866,913 ----
b->frequency += EDGE_FREQUENCY (e);
}
}
! compute_outgoing_frequencies (b);
! }
! for (i = 0; i < n_basic_blocks; i++)
! SET_STATE (BASIC_BLOCK (i), 0);
! }
! /* Like above but for single basic block only. */
!
! void
! find_sub_basic_blocks (bb)
! basic_block bb;
! {
! int i;
! int min, max;
! basic_block next = (bb->index == n_basic_blocks - 1
! ? NULL : BASIC_BLOCK (bb->index + 1));
!
! min = bb->index;
! find_bb_boundaries (bb);
! max = (next ? next->index : n_basic_blocks) - 1;
!
! /* Now re-scan and wire in all edges. This expect simple (conditional)
! jumps at the end of each new basic blocks. */
! make_edges (NULL, min, max, 1);
!
! /* Update branch probabilities. Expect only (un)conditional jumps
! to be created with only the forward edges. */
! for (i = min; i <= max; i++)
! {
! edge e;
! basic_block b = BASIC_BLOCK (i);
!
! if (i != min)
{
! b->count = 0;
! b->frequency = 0;
! for (e = b->pred; e; e=e->pred_next)
! {
! b->count += e->count;
! b->frequency += EDGE_FREQUENCY (e);
! }
}
+ compute_outgoing_frequencies (b);
}
}
*** basic-block.h.b Thu Oct 18 01:35:12 2001
--- basic-block.h Thu Oct 18 01:35:24 2001
*************** extern bool forwarder_block_p PARAMS ((
*** 642,647 ****
--- 642,648 ----
extern bool purge_all_dead_edges PARAMS ((void));
extern bool purge_dead_edges PARAMS ((basic_block));
extern void find_sub_basic_blocks PARAMS ((basic_block));
+ extern void find_many_sub_basic_blocks PARAMS ((sbitmap));
extern bool can_fallthru PARAMS ((basic_block, basic_block));
extern void flow_nodes_print PARAMS ((const char *, const sbitmap,
FILE *));
*** recog.c.b Thu Oct 18 01:34:59 2001
--- recog.c Thu Oct 18 01:35:04 2001
*************** split_all_insns (upd_life)
*** 2778,2785 ****
if (changed)
{
! for (i = 0; i < n_basic_blocks; i++)
! find_sub_basic_blocks (BASIC_BLOCK (i));
}
if (changed && upd_life)
--- 2778,2784 ----
if (changed)
{
! find_many_sub_basic_blocks (blocks);
}
if (changed && upd_life)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Timing information for CFG manipulations
2001-10-17 15:38 ` Jan Hubicka
@ 2001-10-17 16:10 ` Richard Henderson
0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-17 16:10 UTC (permalink / raw)
To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc
On Thu, Oct 18, 2001 at 12:37:55AM +0200, Jan Hubicka wrote:
> * basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
> * cfgbuild.c (find_bb_boundaries, compute_outgoing_frequencies):
> Break out from ...
> (find_sub_basic_blocks): ... here;
> (find_many_sub_basic_blocks): New.
> * recog.c (split_all_insns): Update find_sub_basic_blocks call.
Ok.
r~
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2001-10-17 16:10 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
2001-10-13 21:53 ` Zack Weinberg
2001-10-15 11:58 ` Brad Lucier
2001-10-16 21:15 ` Zack Weinberg
2001-10-15 12:54 ` Brad Lucier
2001-10-15 14:18 ` Daniel Berlin
2001-10-14 1:18 ` Jan Hubicka
2001-10-14 8:46 ` Brad Lucier
2001-10-14 9:21 ` Daniel Berlin
2001-10-16 7:22 ` Jan Hubicka
2001-10-16 8:25 ` Brad Lucier
2001-10-16 12:46 ` Richard Henderson
2001-10-16 8:01 ` Jan Hubicka
2001-10-16 12:39 ` Brad Lucier
2001-10-16 14:45 ` Jan Hubicka
2001-10-16 16:57 ` Brad Lucier
2001-10-17 12:43 ` Brad Lucier
2001-10-17 13:38 ` Richard Henderson
2001-10-17 14:00 ` Jan Hubicka
2001-10-17 15:38 ` Jan Hubicka
2001-10-17 16:10 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).