public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Timing information for CFG manipulations
@ 2001-10-13 20:33 Brad Lucier
  2001-10-13 21:53 ` Zack Weinberg
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-13 20:33 UTC (permalink / raw)
  To: jh; +Cc: Brad Lucier, gcc

Jan:

Here is timing information for today's CVS version of 3.1 configured
with --enable-languages='c' --enable-checking=no and built with
'make bootstrap BOOT_CFLAGS="-O2 -g -pg" BOOT_LDFLAGS="-O2 -g -pg"'
when called on the file at

http://www.math.purdue.edu/~lucier/_num.i.gz

dino01% /u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
 __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num
Execution times (seconds)
 garbage collection    :   1.11 ( 0%) usr   0.00 ( 0%) sys   1.12 ( 0%) wall
 cfg construction      : 269.67 (34%) usr   0.32 ( 0%) sys 270.02 (26%) wall
 cfg cleanup           :  51.19 ( 6%) usr   0.01 ( 0%) sys  51.20 ( 5%) wall
 preprocessing         :   0.57 ( 0%) usr   0.10 ( 0%) sys   0.75 ( 0%) wall
 lexical analysis      :   0.75 ( 0%) usr   0.16 ( 0%) sys   0.83 ( 0%) wall
 parser                :   2.61 ( 0%) usr   0.13 ( 0%) sys   2.73 ( 0%) wall
 varconst              :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall
 jump                  :   0.78 ( 0%) usr   0.00 ( 0%) sys   0.80 ( 0%) wall
 CSE                   :   1.75 ( 0%) usr   0.00 ( 0%) sys   1.77 ( 0%) wall
 global CSE            :  41.63 ( 5%) usr   0.50 ( 0%) sys  42.22 ( 4%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         :  24.36 ( 3%) usr   0.16 ( 0%) sys  24.52 ( 2%) wall
 combiner              :   1.76 ( 0%) usr   0.00 ( 0%) sys   1.77 ( 0%) wall
 if-conversion         :   1.18 ( 0%) usr   0.04 ( 0%) sys   1.20 ( 0%) wall
 local alloc           :   0.65 ( 0%) usr   0.01 ( 0%) sys   0.67 ( 0%) wall
 global alloc          :   4.71 ( 1%) usr   0.06 ( 0%) sys   4.81 ( 0%) wall
 reload CSE regs       :  10.35 ( 1%) usr   0.01 ( 0%) sys  10.36 ( 1%) wall
 flow 2                : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
 if-conversion 2       :   0.99 ( 0%) usr   0.02 ( 0%) sys   1.02 ( 0%) wall
 shorten branches      :   0.24 ( 0%) usr   0.01 ( 0%) sys   0.95 ( 0%) wall
 reg stack             :  22.24 ( 3%) usr   2.84 ( 1%) sys  25.08 ( 2%) wall
 final                 :  87.78 (11%) usr   0.02 ( 0%) sys  87.81 ( 8%) wall
 rest of compilation   :   1.26 ( 0%) usr   0.00 ( 0%) sys   1.25 ( 0%) wall
 TOTAL                 : 793.36           255.83          1050.08

...
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 44.47    348.50   348.50  7450804     0.05     0.05  remove_edge
 32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
  3.79    631.85    29.69 72698858     0.00     0.00  bitmap_operation
  2.79    653.73    21.88       13  1683.08  4184.80  calculate_global_regs_live
  2.15    670.57    16.84  9320420     0.00     0.00  cached_make_edge
  1.28    680.62    10.05    67331     0.15     0.37  try_crossjump_bb
  0.99    688.41     7.79 88484235     0.00     0.00  sbitmap_zero
  0.92    695.63     7.22                             htab_traverse
  0.66    700.82     5.19    27855     0.19     0.19  sbitmap_intersection_of_succs
  0.56    705.23     4.41   207472     0.02     0.02  try_forward_edges
  0.51    709.26     4.03        6   671.67   993.14  compute_laterin
  0.47    712.95     3.69     6270     0.59     0.59  expunge_block
  0.47    716.63     3.68       31   118.71   118.71  find_unreachable_blocks
  0.44    720.09     3.46     9502     0.36     1.18  sbitmap_vector_zero
  0.42    723.39     3.30  6047223     0.00     0.00  rtx_renumbered_equal_p
  0.39    726.44     3.05  9277592     0.00     0.00  make_label_edge
  0.38    729.41     2.97 61978208     0.00     0.00  active_insn_p
  0.38    732.38     2.97      451     6.59     6.59  propagate_freq
  0.33    734.93     2.55    29217     0.09     0.09  sbitmap_intersection_of_preds
  0.31    737.35     2.42       15   161.33   162.00  calc_idoms
  0.26    739.35     2.00 101420319     0.00     0.00  bitmap_element_link
  0.26    741.35     2.00     9485     0.21    30.01  make_edges

...

-----------------------------------------------
                0.00    0.00       3/7450804     redirect_edge_succ_nodup [627]
                0.00    0.00      18/7450804     purge_dead_edges [321]
                0.00    0.00      36/7450804     try_redirect_by_replacing_jump [284]
                0.00    0.00      80/7450804     find_sub_basic_blocks [11]
                0.05    0.00    1066/7450804     merge_blocks_nomove [88]
                0.17    0.00    3729/7450804     try_crossjump_to_edge [27]
                0.25    0.00    5360/7450804     flow_delete_block [41]
              348.02    0.00 7440512/7450804     clear_edges [9]
[8]     44.5  348.50    0.00 7450804         remove_edge [8]
-----------------------------------------------
                0.13  104.41       3/10          free_basic_block_vars [15]
                0.30  243.61       7/10          find_basic_blocks [14]
[9]     44.5    0.43  348.02      10         clear_edges [9]
              348.02    0.00 7440512/7450804     remove_edge [8]
-----------------------------------------------
                0.00    0.30      10/9485        find_basic_blocks [14]
                2.00  282.37    9475/9485        find_sub_basic_blocks [11]
[10]    36.3    2.00  282.67    9485         make_edges [10]
              251.51    0.00    9475/9556        sbitmap_vector_alloc [13]
               16.83    0.00 9316105/9320420     cached_make_edge [26]
                3.45    7.77    9475/9502        sbitmap_vector_zero [28]
                3.05    0.00 9277592/9277592     make_label_edge [48]
                0.00    0.02   44110/100771      returnjump_p [260]
                0.01    0.01   47975/50931       computed_jump_p [399]
                0.01    0.00   75239/93463       next_nonnote_insn [555]
                0.00    0.00   47975/13489246     find_reg_note [82]
-----------------------------------------------
                0.00    3.27     109/9475        commit_edge_insertions [42]
                0.04  281.15    9366/9475        split_all_insns [12]
[11]    36.3    0.04  284.42    9475         find_sub_basic_blocks [11]
                2.00  282.37    9475/9485        make_edges [10]
                0.00    0.03      80/597         split_block [154]
                0.02    0.00    9475/18796       purge_dead_edges [321]
                0.00    0.00      80/7450804     remove_edge [8]
                0.00    0.00    9731/13489246     find_reg_note [82]
                0.00    0.00      87/2981        can_throw_internal [943]
-----------------------------------------------
                0.02  281.37       9/9           rest_of_compilation [7]
[12]    35.9    0.02  281.37       9         split_all_insns [12]
                0.04  281.15    9366/9475        find_sub_basic_blocks [11]
                0.00    0.19  188995/188995      split_insn [163]
                0.00    0.00       9/88484235     sbitmap_zero [31]
                0.00    0.00       9/553         sbitmap_alloc [1629]
-----------------------------------------------
                0.08    0.00       3/9556        flow_loops_find [59]
                0.16    0.00       6/9556        estimate_probability [33]
                0.16    0.00       6/9556        if_convert [25]
                0.32    0.00      12/9556        optimize_mode_switching [19]
                1.43    0.00      54/9556        pre_edge_lcm [24]
              251.51    0.00    9475/9556        make_edges [10]
[13]    32.4  253.66    0.00    9556         sbitmap_vector_alloc [13]
-----------------------------------------------
                0.00  244.39      10/10          rest_of_compilation [7]
[14]    31.2    0.00  244.39      10         find_basic_blocks [14]
                0.30  243.61       7/10          clear_edges [9]
                0.00    0.30      10/9485        make_edges [10]
                0.06    0.01      10/10          find_basic_blocks_1 [237]
                0.05    0.00      10/10          compute_bb_for_insn [279]
                0.04    0.00      10/10          count_basic_blocks [307]
                0.00    0.01      10/14          tidy_fallthru_edges [446]
                0.00    0.00      10/1239368     timevar_push [150]
                0.00    0.00      10/262         varray_init [1703]
                0.00    0.00      10/9317        get_max_uid [1467]
-----------------------------------------------
                0.00   10.02       7/73          life_analysis [18]
                0.00   12.89       9/73          if_convert [25]
                0.00   81.62      57/73          rest_of_compilation [7]
[15]    13.3    0.00  104.53      73         free_basic_block_vars [15]
                0.13  104.41       3/10          clear_edges [9]
-----------------------------------------------
                0.00    5.54       1/11          if_convert [25]
                0.00   16.61       3/11          optimize_mode_switching [19]
                0.00   38.75       7/11          life_analysis [18]
[16]     7.8    0.00   60.89      11         update_life_info [16]
               21.88   32.52      13/13          calculate_global_regs_live [17]
                0.00    5.27       3/22          cleanup_cfg [20]
                0.08    1.09   74687/132182      propagate_block [61]
                0.02    0.00       3/5           count_or_remove_death_notes [312]
                0.01    0.00   74687/219660      bitmap_copy [270]
                0.00    0.00      11/518311      bitmap_initialize [339]
                0.00    0.00      11/1374187     bitmap_clear [258]
-----------------------------------------------
               21.88   32.52      13/13          update_life_info [16]
[17]     6.9   21.88   32.52      13         calculate_global_regs_live [17]
               29.49    2.06 72205551/72698858     bitmap_operation [22]
                0.06    0.83   56808/132182      propagate_block [61]
                0.00    0.02   56808/56808       bitmap_equal_p [383]
                0.02    0.00  100467/219660      bitmap_copy [270]
                0.01    0.00  255232/1933812     bitmap_set_bit [198]
                0.01    0.00  328395/1374187     bitmap_clear [258]
                0.01    0.00  112039/518311      bitmap_initialize [339]
                0.00    0.00    1828/101420319     bitmap_element_link [62]
                0.00    0.00       1/88484235     sbitmap_zero [31]
-----------------------------------------------
                0.00    7.01       1/7           reg_to_stack [29]
                0.00   42.07       6/7           rest_of_compilation [7]
[18]     6.3    0.00   49.08       7         life_analysis [18]
                0.00   38.75       7/11          update_life_info [16]
                0.00   10.02       7/73          free_basic_block_vars [15]
                0.07    0.12       6/16          init_alias_analysis [99]
                0.04    0.06       7/10          delete_noop_moves [191]
                0.00    0.02       3/3           notice_stack_pointer_modification [457]
                0.00    0.00       7/7           allocate_bb_life_data [700]
                0.00    0.00       7/10          allocate_reg_life_data [688]
                0.00    0.00       7/7           mark_regs_live_at_end [1221]
                0.00    0.00       6/16          end_alias_analysis [1920]
-----------------------------------------------
                0.31   39.43       3/3           rest_of_compilation [7]
[19]     5.1    0.31   39.43       3         optimize_mode_switching [19]
                0.00   21.63       6/6           pre_edge_lcm [24]
                0.00   16.61       3/11          update_life_info [16]
                0.11    0.66       1/5           commit_edge_insertions [42]
                0.32    0.00      12/9556        sbitmap_vector_alloc [13]
                0.01    0.02       3/21          sbitmap_vector_ones [151]
                0.01    0.01   37484/2679415     note_stores <cycle 7> [133]
                0.00    0.01      12/9502        sbitmap_vector_zero [28]
                0.00    0.01   37243/80229       get_attr_type [371]
                0.01    0.00   23837/460580      reg_set_to_hard_reg_set [147]
                0.01    0.00    9292/9292        add_seginfo [591]
                0.00    0.00   14620/91501       gen_sequence [370]
                0.00    0.00   28568/474001      asm_noperands [354]
                0.00    0.00    3528/60871       recog_memoized_1 [384]
                0.00    0.00   14620/108004      start_sequence [625]
                0.00    0.00       3/10          allocate_reg_life_data [688]
                0.00    0.00   18576/3738386     sbitmap_not [186]
                0.00    0.00   14606/14838       emit_insn_before [935]
                0.00    0.00       5/5           emit_i387_cw_initialization [1111]
                0.00    0.00      14/123         insert_insn_on_edge [1029]
                0.00    0.00      10/68          assign_386_stack_local [1098]
                0.00    0.00       5/2679415     emit_move_insn <cycle 7> [450]
                0.00    0.00   24525/24525       reg_dies [1412]
                0.00    0.00   14634/108004      end_sequence [1347]
                0.00    0.00    9292/9292        new_seginfo [1469]
                0.00    0.00      48/48          make_preds_opaque [1877]
                0.00    0.00       6/6           free_edge_list [1966]
-----------------------------------------------
                0.00    5.27       3/22          update_life_info [16]
                0.02   33.37      19/22          rest_of_compilation [7]
[20]     4.9    0.02   38.63      22         cleanup_cfg [20]
                0.08   34.85      22/22          try_optimize_cfg [21]
                0.00    3.70      31/31          delete_unreachable_blocks [43]
                0.00    0.00      22/1239368     timevar_push [150]
                0.00    0.00      22/1239368     timevar_pop [160]
                0.00    0.00      44/136318      free_EXPR_LIST_list [1340]
-----------------------------------------------
                0.08   34.85      22/22          cleanup_cfg [20]
[21]     4.5    0.08   34.85      22         try_optimize_cfg [21]
               10.05   15.13   67331/67331       try_crossjump_bb [23]
                4.41    0.07  207472/207472      try_forward_edges [38]
                0.00    2.34    3138/5348        flow_delete_block [41]
                0.07    1.69  207472/207472      try_simplify_condjump [67]
                0.00    0.39       6/6           remove_fake_edges [112]
                0.00    0.39    1099/1099        merge_blocks [113]
                0.00    0.22    2011/12100       delete_insn_chain [74]
                0.00    0.06   74089/81227       redirect_edge_and_branch [254]
                0.01    0.01   95491/25030880     forwarder_block_p [37]
                0.01    0.00   85868/11904476     onlyjump_p [68]
                0.00    0.00    2011/157599      reg_mentioned_p [274]
                0.00    0.00     350/7709        redirect_edge_succ_nodup [627]
                0.00    0.00       6/6           add_noreturn_fake_exit_edges [1962]
-----------------------------------------------
                0.00    0.00       5/72698858     find_if_case_1 [470]
                0.00    0.00     270/72698858     dead_or_predicable [729]
                0.01    0.00   18572/72698858     update_equiv_regs [164]
                0.02    0.00   56808/72698858     bitmap_equal_p [383]
                0.17    0.01  417652/72698858     finish_spills [106]
               29.49    2.06 72205551/72698858     calculate_global_regs_live [17]
[22]     4.1   29.69    2.07 72698858         bitmap_operation [22]
                1.98    0.00 100601217/101420319     bitmap_element_link [62]
                0.09    0.00 3360247/4560586     bitmap_element_allocate [193]
-----------------------------------------------
               10.05   15.13   67331/67331       try_optimize_cfg [21]
[23]     3.2   10.05   15.13   67331         try_crossjump_bb [23]
                1.31   13.82 6407969/6407969     try_crossjump_to_edge [27]
-----------------------------------------------

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
@ 2001-10-13 21:53 ` Zack Weinberg
  2001-10-15 11:58   ` Brad Lucier
  2001-10-15 12:54   ` Brad Lucier
  2001-10-14  1:18 ` Jan Hubicka
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 21+ messages in thread
From: Zack Weinberg @ 2001-10-13 21:53 UTC (permalink / raw)
  To: Brad Lucier; +Cc: jh, gcc

On Sat, Oct 13, 2001 at 10:32:06PM -0500, Brad Lucier wrote:
> Jan:
> 
> Here is timing information for today's CVS version of 3.1 configured
> with --enable-languages='c' --enable-checking=no and built with
> 'make bootstrap BOOT_CFLAGS="-O2 -g -pg" BOOT_LDFLAGS="-O2 -g -pg"'
> when called on the file at
> 
> http://www.math.purdue.edu/~lucier/_num.i.gz
...


>  44.47    348.50   348.50  7450804     0.05     0.05  remove_edge
>  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc

...


>                 0.08    0.00       3/9556        flow_loops_find [59]
>                 0.16    0.00       6/9556        estimate_probability [33]
>                 0.16    0.00       6/9556        if_convert [25]
>                 0.32    0.00      12/9556        optimize_mode_switching [19]
>                 1.43    0.00      54/9556        pre_edge_lcm [24]
>               251.51    0.00    9475/9556        make_edges [10]
> [13]    32.4  253.66    0.00    9556         sbitmap_vector_alloc [13]

Two things:

- Can you repeat this test with cc1 linked against a profile-
instrumented libc, and libiberty re-built with CFLAGS="-O2 -g -pg"?
I'm suspicious that lots of time is being spent inside malloc.

- Have you tried Daniel Berlin's ebitmap patches?

zw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
  2001-10-13 21:53 ` Zack Weinberg
@ 2001-10-14  1:18 ` Jan Hubicka
  2001-10-14  8:46   ` Brad Lucier
  2001-10-16  7:22 ` Jan Hubicka
  2001-10-16  8:01 ` Jan Hubicka
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-14  1:18 UTC (permalink / raw)
  To: Brad Lucier; +Cc: jh, gcc

Thanks,
>  cfg construction      : 269.67 (34%) usr   0.32 ( 0%) sys 270.02 (26%) wall

This is new problem. I guess it is related to the CFG freeing code...

>  flow 2                : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall

I will care this one next week (I am at vacation right now).

>  44.47    348.50   348.50  7450804     0.05     0.05  remove_edge

This probably makes clear where we spend all the time.  We are now deallocating
each edge in the CFG before rebuild and this may cause problems on walking the
lists.

I will need to add function like "free outgiong edges"...

>  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
Do you have any idea, in what state is the ebitmap stuff?

Thanks for the infromation!
Honza

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-14  1:18 ` Jan Hubicka
@ 2001-10-14  8:46   ` Brad Lucier
  2001-10-14  9:21     ` Daniel Berlin
  0 siblings, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-14  8:46 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, jh, gcc

> >  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
> Do you have any idea, in what state is the ebitmap stuff?

No.

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-14  8:46   ` Brad Lucier
@ 2001-10-14  9:21     ` Daniel Berlin
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Berlin @ 2001-10-14  9:21 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, gcc

Brad Lucier <lucier@math.purdue.edu> writes:

>> >  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
>> Do you have any idea, in what state is the ebitmap stuff?
> 
> No.

Don't bother with the ebitmaps. I'm going to submit the new bitmap
stuff today.

> 
> Brad

-- 
"I was walking down the street and saw a sign on a post.  It
said:  "Lost -- $50.  If found, just keep it."
"-Steven Wright

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 21:53 ` Zack Weinberg
@ 2001-10-15 11:58   ` Brad Lucier
  2001-10-16 21:15     ` Zack Weinberg
  2001-10-15 12:54   ` Brad Lucier
  1 sibling, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-15 11:58 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Brad Lucier, jh, gcc

> 
> 
> >  44.47    348.50   348.50  7450804     0.05     0.05  remove_edge
> >  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
> 
> ...
> 
> 
> >                 0.08    0.00       3/9556        flow_loops_find [59]
> >                 0.16    0.00       6/9556        estimate_probability [33]
> >                 0.16    0.00       6/9556        if_convert [25]
> >                 0.32    0.00      12/9556        optimize_mode_switching [19]
> >                 1.43    0.00      54/9556        pre_edge_lcm [24]
> >               251.51    0.00    9475/9556        make_edges [10]
> > [13]    32.4  253.66    0.00    9556         sbitmap_vector_alloc [13]
> 
> Two things:
> 
> - Can you repeat this test with cc1 linked against a profile-
> instrumented libc,

Unfortunately, on this box libc_p.a seems to be significantly older than
libc.a and there are link errors.

> and libiberty re-built with CFLAGS="-O2 -g -pg"?

I tried this, too, and it links, but I can't get a gmon.out file from
running the resulting cc1.  Perhaps I just don't know how to do such
things properly.

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 21:53 ` Zack Weinberg
  2001-10-15 11:58   ` Brad Lucier
@ 2001-10-15 12:54   ` Brad Lucier
  2001-10-15 14:18     ` Daniel Berlin
  1 sibling, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-15 12:54 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Brad Lucier, jh, gcc

> - Have you tried Daniel Berlin's ebitmap patches?

Sorry, I missed this the first time, the answer is no.

Dan sent a message yesterday saying he would resubmit them; if I
don't hear from him in a day or so I'll try the previous versions.

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-15 12:54   ` Brad Lucier
@ 2001-10-15 14:18     ` Daniel Berlin
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Berlin @ 2001-10-15 14:18 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Zack Weinberg, jh, gcc

On Monday, October 15, 2001, at 03:54  PM, Brad Lucier wrote:

>> - Have you tried Daniel Berlin's ebitmap patches?
>
> Sorry, I missed this the first time, the answer is no.
>
> Dan sent a message yesterday saying he would resubmit them; if I
> don't hear from him in a day or so I'll try the previous versions.

Sorry, i've been splitting up the patch, since it requires changes to 
sbitmaps as well.
Should be on the way to gcc-patches now, however.

The previous version would probably be slower and take more memory, 
slower because of it taking more memory.
At least, on large cases.
Cache thrashing fun.
>
> Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
  2001-10-13 21:53 ` Zack Weinberg
  2001-10-14  1:18 ` Jan Hubicka
@ 2001-10-16  7:22 ` Jan Hubicka
  2001-10-16  8:25   ` Brad Lucier
  2001-10-16 12:46   ` Richard Henderson
  2001-10-16  8:01 ` Jan Hubicka
  3 siblings, 2 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16  7:22 UTC (permalink / raw)
  To: Brad Lucier, gcc-patches; +Cc: jh, gcc

>  cfg construction      : 269.67 (34%) usr   0.32 ( 0%) sys 270.02 (26%) wall

Attached patch should solve this problem.
It avoids the worst case quadratic scenario on removing edges.

Bootstrapped/regtested i386.

Honza

Index: cfg.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfg.c,v
retrieving revision 1.10
diff -c -3 -p -r1.10 cfg.c
*** cfg.c	2001/10/11 03:15:24	1.10
--- cfg.c	2001/10/16 14:20:55
*************** struct basic_block_def entry_exit_blocks
*** 117,122 ****
--- 117,123 ----
  };
  
  void debug_flow_info			PARAMS ((void));
+ static void free_edge			PARAMS ((edge));
  \f
  /* Called once at intialization time.  */
  
*************** init_flow ()
*** 142,164 ****
      }
  }
  \f
  /* Free the memory associated with the edge structures.  */
  
  void
  clear_edges ()
  {
    int i;
  
    for (i = 0; i < n_basic_blocks; ++i)
      {
        basic_block bb = BASIC_BLOCK (i);
  
!       while (bb->succ)
! 	remove_edge (bb->succ);
      }
  
!   while (ENTRY_BLOCK_PTR->succ)
!     remove_edge (ENTRY_BLOCK_PTR->succ);
  
    if (n_edges)
      abort ();
--- 143,195 ----
      }
  }
  \f
+ /* Helper function for remove_edge and clear_edges.  Frees edge structure
+    without actually unlinking it from the pred/succ lists.  */
+ 
+ static void
+ free_edge (e)
+      edge e;
+ {
+   n_edges--;
+   memset (e, 0, sizeof (*e));
+   e->succ_next = first_deleted_edge;
+   first_deleted_edge = e;
+ }
+ 
  /* Free the memory associated with the edge structures.  */
  
  void
  clear_edges ()
  {
    int i;
+   edge e;
  
    for (i = 0; i < n_basic_blocks; ++i)
      {
        basic_block bb = BASIC_BLOCK (i);
+       edge e = bb->succ;
  
!       while (e)
! 	{
! 	  edge next = e->succ_next;
! 
! 	  free_edge (e);
! 	  e = next;
! 	}
!       bb->succ = NULL;
!       bb->pred = NULL;
      }
  
!   e = ENTRY_BLOCK_PTR->succ;
!   while (e)
!     {
!       edge next = e->succ_next;
! 
!       free_edge (e);
!       e = next;
!     }
!   EXIT_BLOCK_PTR->pred = NULL;
!   ENTRY_BLOCK_PTR->succ = NULL;
  
    if (n_edges)
      abort ();
*************** remove_edge (e)
*** 335,344 ****
    else
      dest->pred = e->pred_next;
  
!   n_edges--;
!   memset (e, 0, sizeof (*e));
!   e->succ_next = first_deleted_edge;
!   first_deleted_edge = e;
  }
  
  /* Redirect an edge's successor from one block to another.  */
--- 366,372 ----
    else
      dest->pred = e->pred_next;
  
!   free_edge (e);
  }
  
  /* Redirect an edge's successor from one block to another.  */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
                   ` (2 preceding siblings ...)
  2001-10-16  7:22 ` Jan Hubicka
@ 2001-10-16  8:01 ` Jan Hubicka
  2001-10-16 12:39   ` Brad Lucier
                     ` (2 more replies)
  3 siblings, 3 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16  8:01 UTC (permalink / raw)
  To: Brad Lucier, gcc-patches, rth; +Cc: jh, gcc

Hi,
this patch should track down the second problem:

>  flow 2                : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
>  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc

It makes find_sub_basic_blocks to operate at bitmap of blocks, instead of just
one and thus it avoids multiple initializations of edge cache structure.

The possible problem is that I now need to benerate bitmap when calling from
recog.c.  I hope it won't cause perofrmance regression as the bitmap is
relativly small (compared to edge cache).  If it will, I can use bitmap,
instead of sbitmap, that will add some overhead at split_all_insns side.

I don't have time to finish bootstrap, but the patch appears to work.
OK assuming the bootstrapping/regtesting i386 passes fluently?

Honza

Tue Oct 16 16:56:33 CEST 2001  Jan Hubicka  <jh@suse.cz>
	* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
	* cfgbuild.c (find_bb_boundaries): Break out from ...
	(find_sub_basic_blocks): ... here; reorganize to handle multiple
	basic blocks at once.
	* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
	to find_sub_basic_blocks.
	* recog.c (split_all_insns): Update find_sub_basic_blocks call.


Index: basic-block.h
===================================================================
RCS file: /cvs/gcc/egcs/gcc/basic-block.h,v
retrieving revision 1.123
diff -c -3 -p -r1.123 basic-block.h
*** basic-block.h	2001/10/11 12:43:38	1.123
--- basic-block.h	2001/10/16 14:54:27
*************** extern rtx block_label			PARAMS ((basic_
*** 639,645 ****
  extern bool forwarder_block_p		PARAMS ((basic_block));
  extern bool purge_all_dead_edges	PARAMS ((void));
  extern bool purge_dead_edges		PARAMS ((basic_block));
! extern void find_sub_basic_blocks	PARAMS ((basic_block));
  extern bool can_fallthru		PARAMS ((basic_block, basic_block));
  extern void flow_nodes_print		PARAMS ((const char *, const sbitmap,
  						 FILE *));
--- 639,645 ----
  extern bool forwarder_block_p		PARAMS ((basic_block));
  extern bool purge_all_dead_edges	PARAMS ((void));
  extern bool purge_dead_edges		PARAMS ((basic_block));
! extern void find_sub_basic_blocks	PARAMS ((sbitmap));
  extern bool can_fallthru		PARAMS ((basic_block, basic_block));
  extern void flow_nodes_print		PARAMS ((const char *, const sbitmap,
  						 FILE *));
Index: cfgbuild.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfgbuild.c,v
retrieving revision 1.7
diff -c -3 -p -r1.7 cfgbuild.c
*** cfgbuild.c	2001/10/11 03:15:24	1.7
--- cfgbuild.c	2001/10/16 14:54:28
*************** static void make_edges			PARAMS ((rtx, i
*** 55,60 ****
--- 55,61 ----
  static void make_label_edge		PARAMS ((sbitmap *, basic_block,
  						 rtx, int));
  static void make_eh_edge		PARAMS ((sbitmap *, basic_block, rtx));
+ static void find_bb_boundaries		PARAMS ((basic_block));
  
  /* Count the basic blocks of the function.  */
  
*************** make_edges (label_value_list, min, max, 
*** 246,252 ****
      }
  
    /* By nature of the way these get numbered, block 0 is always the entry.  */
!   cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
  
    for (i = min; i <= max; ++i)
      {
--- 247,254 ----
      }
  
    /* By nature of the way these get numbered, block 0 is always the entry.  */
!   if (min == 0)
!     cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
  
    for (i = min; i <= max; ++i)
      {
*************** find_basic_blocks (f, nregs, file)
*** 663,680 ****
    timevar_pop (TV_CFG);
  }
  \f
! /* Assume that someone emitted code with control flow instructions to the
!    basic block.  Update the data structure.  */
! void
! find_sub_basic_blocks (bb)
       basic_block bb;
  {
    rtx insn = bb->head;
    rtx end = bb->end;
    rtx flow_transfer_insn = NULL_RTX;
    edge fallthru = NULL;
-   basic_block first_bb = bb;
-   int i;
  
    if (insn == bb->end)
      return;
--- 665,691 ----
    timevar_pop (TV_CFG);
  }
  \f
! /* State of basic block as seen by find_sub_basic_blocks.  */
! enum state
!   {
!     BLOCK_NEW = 0,
!     BLOCK_ORIGINAL,
!     BLOCK_TO_SPLIT
!   };
! #define STATE(bb) (enum state)(size_t)(bb)->aux
! #define SET_STATE(bb, state) (bb)->aux = (void *)(state)
! 
! /* Scan basic block BB for possible BB boundaries inside the block
!    and create new basic blocks in the progress.  */
! 
! static void
! find_bb_boundaries (bb)
       basic_block bb;
  {
    rtx insn = bb->head;
    rtx end = bb->end;
    rtx flow_transfer_insn = NULL_RTX;
    edge fallthru = NULL;
  
    if (insn == bb->end)
      return;
*************** find_sub_basic_blocks (bb)
*** 694,700 ****
  	    abort ();
  	  break;
  
! 	/* On code label, split current basic block.  */
  	case CODE_LABEL:
  	  fallthru = split_block (bb, PREV_INSN (insn));
  	  if (flow_transfer_insn)
--- 705,711 ----
  	    abort ();
  	  break;
  
! 	  /* On code label, split current basic block.  */
  	case CODE_LABEL:
  	  fallthru = split_block (bb, PREV_INSN (insn));
  	  if (flow_transfer_insn)
*************** find_sub_basic_blocks (bb)
*** 725,731 ****
  	    {
  	      if (GET_CODE (PATTERN (insn)) == ADDR_VEC
  		  || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! 		abort();
  	      flow_transfer_insn = insn;
  	    }
  	  else if (code == CALL_INSN)
--- 736,742 ----
  	    {
  	      if (GET_CODE (PATTERN (insn)) == ADDR_VEC
  		  || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! 		abort ();
  	      flow_transfer_insn = insn;
  	    }
  	  else if (code == CALL_INSN)
*************** find_sub_basic_blocks (bb)
*** 764,781 ****
       followed by cleanup at fallthru edge, so the outgoing edges may
       be dead.  */
    purge_dead_edges (bb);
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (NULL, first_bb->index, bb->index, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
!   for (i = first_bb->index; i <= bb->index; i++)
      {
        edge e,f;
        basic_block b = BASIC_BLOCK (i);
!       if (b != first_bb)
  	{
  	  b->count = 0;
  	  b->frequency = 0;
--- 775,828 ----
       followed by cleanup at fallthru edge, so the outgoing edges may
       be dead.  */
    purge_dead_edges (bb);
+ }
+ 
+ /* Assume that someone emitted code with control flow instructions to the
+    basic block.  Update the data structure.  */
  
+ void
+ find_sub_basic_blocks (blocks)
+      sbitmap blocks;
+ {
+   int i;
+   int min, max;
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     {
+       SET_STATE (BASIC_BLOCK (i),
+ 		 TEST_BIT (blocks, i)
+ 		 ? BLOCK_TO_SPLIT : BLOCK_ORIGINAL);
+     }
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     {
+       basic_block bb = BASIC_BLOCK (i);
+       if (STATE (bb) == BLOCK_TO_SPLIT)
+ 	find_bb_boundaries (bb);
+     }
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+       break;
+   min = max = i;
+   for (; i < n_basic_blocks; i++)
+     if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+       max = i;
+ 
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (NULL, min, max, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
!   for (i = min; i <= max; i++)
      {
        edge e,f;
        basic_block b = BASIC_BLOCK (i);
! 
!       if (STATE (b) == BLOCK_ORIGINAL)
! 	continue;
!       if (STATE (b) == BLOCK_NEW)
  	{
  	  b->count = 0;
  	  b->frequency = 0;
*************** find_sub_basic_blocks (bb)
*** 810,813 ****
--- 857,862 ----
  	  e->count = b->count;
  	}
      }
+   for (i = 0; i < n_basic_blocks; i++)
+     SET_STATE (BASIC_BLOCK (i), 0);
  }
Index: cfgrtl.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfgrtl.c,v
retrieving revision 1.4
diff -c -3 -p -r1.4 cfgrtl.c
*** cfgrtl.c	2001/10/11 03:15:24	1.4
--- cfgrtl.c	2001/10/16 14:54:28
*************** commit_one_edge_insertion (e)
*** 1221,1226 ****
--- 1221,1227 ----
  {
    rtx before = NULL_RTX, after = NULL_RTX, insns, tmp, last;
    basic_block bb;
+   sbitmap b;
  
    /* Pull the insns off the edge now since the edge might go away.  */
    insns = e->insns;
*************** commit_one_edge_insertion (e)
*** 1314,1320 ****
      }
    else if (GET_CODE (last) == JUMP_INSN)
      abort ();
!   find_sub_basic_blocks (bb);
  }
  
  /* Update the CFG for all queued instructions.  */
--- 1315,1325 ----
      }
    else if (GET_CODE (last) == JUMP_INSN)
      abort ();
! 
!   b = sbitmap_alloc (n_basic_blocks);
!   sbitmap_zero (b);
!   SET_BIT (b, bb->index);
!   find_sub_basic_blocks (b);
  }
  
  /* Update the CFG for all queued instructions.  */
Index: recog.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/recog.c,v
retrieving revision 1.128
diff -c -3 -p -r1.128 recog.c
*** recog.c	2001/10/16 04:19:26	1.128
--- recog.c	2001/10/16 14:54:31
*************** split_all_insns (upd_life)
*** 2778,2785 ****
  
    if (changed)
      {
!       for (i = 0; i < n_basic_blocks; i++)
! 	find_sub_basic_blocks (BASIC_BLOCK (i));
      }
  
    if (changed && upd_life)
--- 2778,2784 ----
  
    if (changed)
      {
!       find_sub_basic_blocks (blocks);
      }
  
    if (changed && upd_life)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16  7:22 ` Jan Hubicka
@ 2001-10-16  8:25   ` Brad Lucier
  2001-10-16 12:46   ` Richard Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-16  8:25 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, jh, gcc

> 
> >  cfg construction      : 269.67 (34%) usr   0.32 ( 0%) sys 270.02 (26%) wall
> 
> Attached patch should solve this problem.
> It avoids the worst case quadratic scenario on removing edges.

This is great! Here's the new time

 cfg construction      :   8.18 ( 2%) usr   0.32 ( 0%) sys   8.62 ( 1%) wall

> Bootstrapped/regtested i386.

Great!  Now on to testing the next patch.

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16  8:01 ` Jan Hubicka
@ 2001-10-16 12:39   ` Brad Lucier
  2001-10-16 14:45     ` Jan Hubicka
  2001-10-17 12:43   ` Brad Lucier
  2001-10-17 13:38   ` Richard Henderson
  2 siblings, 1 reply; 21+ messages in thread
From: Brad Lucier @ 2001-10-16 12:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, rth, jh, gcc

> 
> 
> Hi,
> this patch should track down the second problem:
> 
> >  flow 2                : 267.66 (34%) usr 251.44 (98%) sys 519.08 (49%) wall
> >  32.36    602.16   253.66     9556    26.54    26.54  sbitmap_vector_alloc
> 
> It makes find_sub_basic_blocks to operate at bitmap of blocks, instead of just
> one and thus it avoids multiple initializations of edge cache structure.
> 
> The possible problem is that I now need to benerate bitmap when calling from
> recog.c.  I hope it won't cause perofrmance regression as the bitmap is
> relativly small (compared to edge cache).  If it will, I can use bitmap,
> instead of sbitmap, that will add some overhead at split_all_insns side.
> 
> I don't have time to finish bootstrap, but the patch appears to work.
> OK assuming the bootstrapping/regtesting i386 passes fluently?

Bootstrapped and regtested on i686-pc-linux-gnu, flow2 is now

 flow 2                :  33.92 (16%) usr   0.10 ( 2%) sys  34.00 (16%) wall

So, two home runs in one day!  (Two goals in one day? ...)

Here are the new timing details:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 16.66     29.10    29.10 72698858     0.00     0.00  bitmap_operation
 12.43     50.81    21.71       13  1670.00  4145.64  calculate_global_regs_live
  9.86     68.04    17.23  9305997     0.00     0.00  cached_make_edge
  5.57     77.77     9.73    67331     0.14     0.38  try_crossjump_bb
  4.03     84.81     7.04                             htab_traverse
  2.99     90.03     5.22    27855     0.19     0.19  sbitmap_intersection_of_su
ccs

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16  7:22 ` Jan Hubicka
  2001-10-16  8:25   ` Brad Lucier
@ 2001-10-16 12:46   ` Richard Henderson
  1 sibling, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-16 12:46 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc

On Tue, Oct 16, 2001 at 04:22:46PM +0200, Jan Hubicka wrote:
> >  cfg construction      : 269.67 (34%) usr   0.32 ( 0%) sys 270.02 (26%) wall
> 
> Attached patch should solve this problem.
> It avoids the worst case quadratic scenario on removing edges.

Ok.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16 12:39   ` Brad Lucier
@ 2001-10-16 14:45     ` Jan Hubicka
  2001-10-16 16:57       ` Brad Lucier
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-16 14:45 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, gcc-patches, rth, gcc

> Bootstrapped and regtested on i686-pc-linux-gnu, flow2 is now
> 
>  flow 2                :  33.92 (16%) usr   0.10 ( 2%) sys  34.00 (16%) wall
> 
> So, two home runs in one day!  (Two goals in one day? ...)
Good news.  I've planed those two changes for a while, so don't expect another
two hours tomorrow :)
Just curious, how does the time compare to the older gcc versions?

> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  16.66     29.10    29.10 72698858     0.00     0.00  bitmap_operation
>  12.43     50.81    21.71       13  1670.00  4145.64  calculate_global_regs_live

For some purpose, the liveness analyzis by df module appears to work faster than
flow.c in non-patological cases (such as combine.c, where flow liveness takes
about 5-7%, but df.c in my webyzing pass did take about 1%).

I wonder if we can't speed up the flow.c pass considerably.
What other functions (except for bitmap_operation) does have more than 10
millions of calls? Do we run into problems with too much RTL traversal
or it is purely dominated by the dataflow bitmaps?

Honza
>   9.86     68.04    17.23  9305997     0.00     0.00  cached_make_edge
>   5.57     77.77     9.73    67331     0.14     0.38  try_crossjump_bb
>   4.03     84.81     7.04                             htab_traverse
>   2.99     90.03     5.22    27855     0.19     0.19  sbitmap_intersection_of_su
> ccs
> 
> Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16 14:45     ` Jan Hubicka
@ 2001-10-16 16:57       ` Brad Lucier
  0 siblings, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-16 16:57 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, gcc-patches, rth, gcc

> Just curious, how does the time compare to the older gcc versions?

Well, 3.0.1 compiles this file in about 1/3 the time:

dino01% /soft/parallelisme/linux/gcc-3.0.1/lib/gcc-lib/i686-pc-linux-gnu/3.0.1/cc1  -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
 __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 23738k -> 7627k} {GC 11539k -> 7534k} {GC 9859k -> 7972k} {GC 11631k -> 8916k} {GC 14125k -> 9023k} ___init_proc ____20___num
Execution times (seconds)
 garbage collection    :   0.61 ( 1%) usr   0.00 ( 0%) sys   0.64 ( 1%) wall
 preprocessing         :   0.14 ( 0%) usr   0.13 (14%) sys   0.27 ( 0%) wall
 lexical analysis      :   0.34 ( 1%) usr   0.23 (24%) sys   0.56 ( 1%) wall
 parser                :   1.10 ( 2%) usr   0.15 (16%) sys   1.27 ( 2%) wall
 varconst              :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) wall
 integration           :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 jump                  :   4.04 ( 6%) usr   0.17 (18%) sys   4.36 ( 6%) wall
 CSE                   :   0.75 ( 1%) usr   0.00 ( 0%) sys   0.77 ( 1%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall
 CSE 2                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         :  11.21 (17%) usr   0.08 ( 8%) sys  12.47 (16%) wall
 combiner              :   1.07 ( 2%) usr   0.01 ( 1%) sys   1.12 ( 1%) wall
 if-conversion         :   1.12 ( 2%) usr   0.03 ( 3%) sys   1.22 ( 2%) wall
 local alloc           :   0.41 ( 1%) usr   0.03 ( 3%) sys   0.56 ( 1%) wall
 global alloc          :   2.58 ( 4%) usr   0.03 ( 3%) sys   3.14 ( 4%) wall
 reload CSE regs       :   9.85 (15%) usr   0.01 ( 1%) sys  12.76 (16%) wall
 flow 2                :  13.72 (21%) usr   0.01 ( 1%) sys  19.63 (25%) wall
 if-conversion 2       :   0.96 ( 1%) usr   0.01 ( 1%) sys   1.19 ( 2%) wall
 shorten branches      :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall
 reg stack             :  15.94 (24%) usr   0.05 ( 5%) sys  17.05 (22%) wall
 final                 :   0.97 ( 1%) usr   0.00 ( 0%) sys   0.97 ( 1%) wall
 symout                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 rest of compilation   :   0.45 ( 1%) usr   0.00 ( 0%) sys   0.46 ( 1%) wall
 TOTAL                 :  65.49             0.96            78.82

> I wonder if we can't speed up the flow.c pass considerably.
> What other functions (except for bitmap_operation) does have more than 10
> millions of calls? Do we run into problems with too much RTL traversal
> or it is purely dominated by the dataflow bitmaps?

Here is some detailed information from gprof:

/u/lucier/local/gcc-3.1/lib/gcc-lib/i686-pc-linux-gnu/3.1/cc1 -fpic -fomit-frame-pointer -O1 -fno-math-errno -fno-strict-aliasing -mcpu=athlon -march=athlon _num.i
 __sgn __sgnf __sgnl atan2 atan2f atan2l __atan2l fmod fmodf fmodl sqrt sqrtf sqrtl __sqrtl fabs fabsf fabsl __fabsl atan atanf atanl __sgn1l floor floorf floorl ceil ceilf ceill ldexp log1p log1pf log1pl asinh asinhf asinhl acosh acoshf acoshl atanh atanhf atanhl hypot hypotf hypotl logb logbf logbl drem dremf dreml __finite ___H__20___num {GC 25431k -> 7824k} {GC 10943k -> 7882k} {GC 10372k -> 7769k} {GC 13951k -> 8583k} {GC 14265k -> 9195k} ___init_proc {GC 12103k -> 9289k} ____20___num
Execution times (seconds)
 garbage collection    :   1.10 ( 1%) usr   0.00 ( 0%) sys   1.09 ( 1%) wall
 cfg construction      :   8.09 ( 4%) usr   0.40 ( 8%) sys   8.56 ( 4%) wall
 cfg cleanup           :  52.37 (25%) usr   0.03 ( 1%) sys  52.47 (24%) wall
 preprocessing         :   0.48 ( 0%) usr   0.10 ( 2%) sys   0.50 ( 0%) wall
 lexical analysis      :   0.73 ( 0%) usr   0.22 ( 5%) sys   0.94 ( 0%) wall
 parser                :   2.57 ( 1%) usr   0.21 ( 4%) sys   2.94 ( 1%) wall
 varconst              :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.16 ( 0%) wall
 jump                  :   0.76 ( 0%) usr   0.02 ( 0%) sys   0.72 ( 0%) wall
 CSE                   :   1.76 ( 1%) usr   0.00 ( 0%) sys   1.75 ( 1%) wall
 global CSE            :  41.29 (20%) usr   0.53 (11%) sys  41.88 (19%) wall
 loop analysis         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall
 flow analysis         :  24.23 (11%) usr   0.16 ( 3%) sys  24.38 (11%) wall
 combiner              :   1.76 ( 1%) usr   0.00 ( 0%) sys   1.78 ( 1%) wall
 if-conversion         :   1.19 ( 1%) usr   0.04 ( 1%) sys   1.25 ( 1%) wall
 local alloc           :   0.67 ( 0%) usr   0.00 ( 0%) sys   0.69 ( 0%) wall
 global alloc          :   4.69 ( 2%) usr   0.05 ( 1%) sys   4.75 ( 2%) wall
 reload CSE regs       :   9.90 ( 5%) usr   0.01 ( 0%) sys   9.88 ( 5%) wall
 flow 2                :  33.92 (16%) usr   0.10 ( 2%) sys  34.00 (16%) wall
 if-conversion 2       :   0.98 ( 0%) usr   0.03 ( 1%) sys   1.03 ( 0%) wall
 shorten branches      :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) wall
 reg stack             :  22.45 (11%) usr   2.86 (60%) sys  25.31 (12%) wall
 final                 :   0.75 ( 0%) usr   0.01 ( 0%) sys   0.78 ( 0%) wall
 rest of compilation   :   1.30 ( 1%) usr   0.00 ( 0%) sys   1.38 ( 1%) wall
 TOTAL                 : 211.35             4.78           216.50


Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 16.66     29.10    29.10 72698858     0.00     0.00  bitmap_operation
 12.43     50.81    21.71       13  1670.00  4145.64  calculate_global_regs_live
  9.86     68.04    17.23  9305997     0.00     0.00  cached_make_edge
  5.57     77.77     9.73    67331     0.14     0.38  try_crossjump_bb
  4.03     84.81     7.04                             htab_traverse
  2.99     90.03     5.22    27855     0.19     0.19  sbitmap_intersection_of_su
ccs
  2.60     94.57     4.54   207472     0.02     0.02  try_forward_edges
  2.31     98.61     4.04        6   673.33   976.87  compute_laterin
  2.21    102.47     3.86     6270     0.62     0.62  expunge_block
  2.18    106.28     3.81      191    19.95    19.95  sbitmap_vector_alloc
  2.15    110.04     3.76  6047223     0.00     0.00  rtx_renumbered_equal_p
  2.12    113.74     3.70       31   119.35   119.35  find_unreachable_blocks
  1.72    116.75     3.01 61978208     0.00     0.00  active_insn_p
  1.67    119.66     2.91  9272730     0.00     0.00  make_label_edge
  1.67    122.57     2.91      451     6.45     6.45  propagate_freq
  1.35    124.93     2.36       15   157.33   158.67  calc_idoms
  1.33    127.26     2.33 101420319     0.00     0.00  bitmap_element_link
  1.33    129.59     2.33    29217     0.08     0.08  sbitmap_intersection_of_pr
eds
  1.11    131.53     1.94 25030880     0.00     0.00  forwarder_block_p
  1.01    133.30     1.77  5845902     0.00     0.00  flow_find_cross_jump
  0.97    134.99     1.69     9385     0.18     0.20  convert_regs_1
  0.95    136.65     1.66       15   110.67   110.67  calc_dfs_tree_nonrec
  0.94    138.29     1.64      120    13.67   201.19  make_edges
  0.87    139.81     1.52 11128643     0.00     0.00  sbitmap_a_and_b
  0.81    141.23     1.42  6407969     0.00     0.00  try_crossjump_to_edge
  0.78    142.60     1.37     5627     0.24     0.24  can_delete_label_p
  0.69    143.81     1.21        6   201.67   281.03  compute_insert_delete
  0.68    144.99     1.18        6   196.67   328.86  compute_earliest
  0.64    146.11     1.12        6   186.67   186.67  mark_dfs_back_edges
  0.58    147.13     1.02 11904476     0.00     0.00  onlyjump_p
  0.56    148.10     0.97       10    97.00   106.99  clear_edges
  0.54    149.05     0.95  7458208     0.00     0.00  sbitmap_difference
  0.54    150.00     0.95        3   316.67   701.77  flow_loops_find
  0.50    150.87     0.87        6   145.00  1027.93  compute_antinout_edge
  0.46    151.68     0.81        6   135.00   135.00  create_edge_list
  0.42    152.42     0.74 13479620     0.00     0.00  find_reg_note
  0.35    153.03     0.61        5   122.00  4570.84  commit_edge_insertions
  0.34    153.62     0.59    10292     0.06     0.06  remove_edge
  0.33    154.19     0.57        6    95.00   496.89  compute_available
  0.33    154.76     0.57        3   190.00  1449.86  estimate_bb_frequencies
  0.30    155.29     0.53 11543061     0.00     0.00  find_reg_equal_equiv_note
  0.30    155.82     0.53    24708     0.02     0.04  try_combine
  0.30    156.34     0.52 12898539     0.00     0.00  side_effects_p
  0.26    156.80     0.46   170126     0.00     0.00  find_reloads
  0.24    157.22     0.42  4200455     0.00     0.00  ggc_set_mark
  0.24    157.64     0.42        1   420.00  2322.97  convert_regs_2
  0.22    158.02     0.38    18788     0.02     0.02  remove_fake_successors
  0.22    158.40     0.38        6    63.33    71.66  thread_jumps
  0.21    158.77     0.37        3   123.33 14103.27  optimize_mode_switching
  0.19    159.10     0.33        3   110.00   110.00  compute_alignments
  0.18    159.42     0.32 11698103     0.00     0.00  single_set_2
...
-----------------------------------------------
               21.71   32.18      13/13          update_life_info [8]
[9]     30.9   21.71   32.18      13         calculate_global_regs_live [9]
               28.90    2.36 72205551/72698858     bitmap_operation [14]
                0.03    0.82   56808/132182      propagate_block [55]
                0.00    0.02   56808/56808       bitmap_equal_p [376]
                0.02    0.00  100467/219660      bitmap_copy [279]
                0.01    0.00  328395/1374187     bitmap_clear [304]
                0.00    0.00  255232/1933812     bitmap_set_bit [282]
                0.00    0.00  112039/518311      bitmap_initialize [420]
                0.00    0.00    1828/101420319     bitmap_element_link [52]
                0.00    0.00       1/1125177     sbitmap_zero [225]
-----------------------------------------------
                0.37   41.94       3/3           rest_of_compilation [7]
[10]    24.2    0.37   41.94       3         optimize_mode_switching [10]
                0.00   20.56       6/6           pre_edge_lcm [19]
                0.01   16.47       3/11          update_life_info [8]
                0.12    4.45       1/5           commit_edge_insertions [17]
                0.24    0.00      12/191         sbitmap_vector_alloc [38]
                0.01    0.01       3/21          sbitmap_vector_ones [177]
                0.01    0.01   37484/2679415     note_stores <cycle 7> [115]
                0.00    0.01   37243/80229       get_attr_type [327]
                0.01    0.01      12/137         sbitmap_vector_zero [171]
                0.01    0.00      48/48          make_preds_opaque [581]
                0.00    0.00       3/10          allocate_reg_life_data [457]
                0.00    0.00   23837/460580      reg_set_to_hard_reg_set [244]
                0.00    0.00   14620/91501       gen_sequence [405]
                0.00    0.00    3528/60871       recog_memoized_1 [290]
                0.00    0.00   28568/474001      asm_noperands [359]
                0.00    0.00   18576/3738386     sbitmap_not [184]
                0.00    0.00   14606/14838       emit_insn_before [991]
                0.00    0.00       5/5           emit_i387_cw_initialization [1113]
                0.00    0.00      14/123         insert_insn_on_edge [1058]
                0.00    0.00      10/68          assign_386_stack_local [1079]
                0.00    0.00       5/2679415     emit_move_insn <cycle 7> [451]
                0.00    0.00   24525/24525       reg_dies [1398]
                0.00    0.00   14634/108004      end_sequence [1322]
                0.00    0.00   14620/108004      start_sequence [1323]
                0.00    0.00    9292/9292        new_seginfo [1466]
                0.00    0.00    9292/9292        add_seginfo [1465]
                0.00    0.00       6/6           free_edge_list [1968]
-----------------------------------------------
                0.00    5.32       3/22          update_life_info [8]
                0.00   33.69      19/22          rest_of_compilation [7]
[11]    22.3    0.00   39.01      22         cleanup_cfg [11]
                0.08   35.19      22/22          try_optimize_cfg [13]
                0.02    3.72      31/31          delete_unreachable_blocks [40]
                0.00    0.00      22/1239368     timevar_push [153]
                0.00    0.00      22/1239368     timevar_pop [164]
                0.00    0.00      44/136318      free_EXPR_LIST_list [1315]
-----------------------------------------------
                0.00    5.54       1/7           reg_to_stack [23]
                0.00   33.26       6/7           rest_of_compilation [7]
[12]    22.2    0.00   38.81       7         life_analysis [12]
                0.03   38.42       7/11          update_life_info [8]
                0.05    0.14       6/16          init_alias_analysis [95]
                0.04    0.06       7/10          delete_noop_moves [179]
                0.00    0.03       7/73          free_basic_block_vars [119]
                0.01    0.02       3/3           notice_stack_pointer_modification [365]
                0.01    0.01       7/10          allocate_reg_life_data [457]
                0.00    0.00       7/7           allocate_bb_life_data [716] 
                0.00    0.00       7/7           mark_regs_live_at_end [1213]
                0.00    0.00       6/16          end_alias_analysis [1922]
-----------------------------------------------
                0.08   35.19      22/22          cleanup_cfg [11]
[13]    20.2    0.08   35.19      22         try_optimize_cfg [13]
                9.73   15.62   67331/67331       try_crossjump_bb [15]
                4.54    0.07  207472/207472      try_forward_edges [33]
                0.00    2.40    3138/5348        flow_delete_block [36]
                0.06    1.72  207472/207472      try_simplify_condjump [59]
                0.00    0.41    1099/1099        merge_blocks [107]
                0.00    0.38       6/6           remove_fake_edges [111]
                0.00    0.18    2011/12100       delete_insn_chain [73]
                0.00    0.05   74089/81227       redirect_edge_and_branch [271]
                0.01    0.01   95491/25030880     forwarder_block_p [32]
                0.01    0.00   85868/11904476     onlyjump_p [64] 
                0.00    0.00     350/7709        redirect_edge_succ_nodup [604]
                0.00    0.00    2011/157599      reg_mentioned_p [422]
                0.00    0.00       6/6           add_noreturn_fake_exit_edges [1965]
-----------------------------------------------
                0.00    0.00       5/72698858     find_if_case_1 [620]
                0.00    0.00     270/72698858     dead_or_predicable [710]
                0.01    0.00   18572/72698858     update_equiv_regs [159]
                0.02    0.00   56808/72698858     bitmap_equal_p [376]
                0.17    0.01  417652/72698858     finish_spills [114]
               28.90    2.36 72205551/72698858     calculate_global_regs_live [9]
[14]    18.0   29.10    2.38 72698858         bitmap_operation [14]
                2.31    0.00 100601217/101420319     bitmap_element_link [52]
                0.07    0.00 3360247/4560586     bitmap_element_allocate [214]
-----------------------------------------------
                9.73   15.62   67331/67331       try_optimize_cfg [13]
[15]    14.5    9.73   15.62   67331         try_crossjump_bb [15]
                1.42   14.20 6407969/6407969     try_crossjump_to_edge [21]
-----------------------------------------------
                0.14    1.88      10/120         find_basic_blocks [46]
                1.50   20.63     110/120         find_sub_basic_blocks [18]
[16]    13.8    1.64   22.50     120         make_edges [16]
               17.22    0.00 9301682/9305997     cached_make_edge [20]
                2.91    0.00 9272730/9272730     make_label_edge [47]
                2.19    0.00     110/191         sbitmap_vector_alloc [38]
                0.07    0.06     110/137         sbitmap_vector_zero [171]
                0.00    0.02   44026/97363       returnjump_p [317]
                0.01    0.00   47889/50845       computed_jump_p [493]
                0.01    0.00   75082/93306       next_nonnote_insn [556]
                0.00    0.00   47889/13479620     find_reg_note [83]
-----------------------------------------------
                0.12    4.45       1/5           optimize_mode_switching [10]
                0.12    4.45       1/5           convert_regs [27]
                0.37   13.35       3/5           thread_prologue_and_epilogue_insns [22]
[17]    13.1    0.61   22.24       5         commit_edge_insertions [17]
                0.27   21.96     109/110         find_sub_basic_blocks [18]
                0.00    0.02     109/109         commit_one_edge_insertion [472]
-----------------------------------------------
                0.00    0.20       1/110         split_all_insns [96]
                0.27   21.96     109/110         commit_edge_insertions [17]
[18]    12.8    0.27   22.16     110         find_sub_basic_blocks [18]
                1.50   20.63     110/120         make_edges [16]
                0.00    0.03     362/362         find_bb_boundaries [339]
                0.00    0.00     362/9683        purge_dead_edges [684]
                0.00    0.00     544/13479620     find_reg_note [83]
-----------------------------------------------
                0.00   20.56       6/6           optimize_mode_switching [10]
[19]    11.8    0.00   20.56       6         pre_edge_lcm [19]
                0.87    5.30       6/6           compute_antinout_edge [29]
                4.04    1.82       6/6           compute_laterin [30]
                0.57    2.41       6/6           compute_available [43]
                1.18    0.79       6/6           compute_earliest [56]
                1.21    0.48       6/6           compute_insert_delete [60]
                1.08    0.00      54/191         sbitmap_vector_alloc [38]
                0.81    0.00       6/6           create_edge_list [79]
-----------------------------------------------
                0.01    0.00    4315/9305997     make_edge [613]
               17.22    0.00 9301682/9305997     make_edges [16]
[20]     9.9   17.23    0.00 9305997         cached_make_edge [20]
-----------------------------------------------
                1.42   14.20 6407969/6407969     try_crossjump_bb [15]
[21]     8.9    1.42   14.20 6407969         try_crossjump_to_edge [21]
                1.77    6.66 5845902/5845902     flow_find_cross_jump [24]
                1.90    2.93 24480556/25030880     forwarder_block_p [32]
                0.00    0.33    3663/12100       delete_insn_chain [73]
                0.21    0.01 6407841/6407841     outgoing_edges_match [148]
                0.21    0.00    3729/10292       remove_edge [93]
                0.00    0.18     517/597         split_block [154]
                0.00    0.01    3663/4298        make_single_succ_edge [614]
                0.00    0.00    3663/10353       gen_jump [689]
                0.00    0.00    3663/3736        emit_jump_insn_after [783]
                0.00    0.00    3663/93306       next_nonnote_insn [556]
                0.00    0.00    3663/13479620     find_reg_note [83]
                0.00    0.00    3729/7450804     free_edge [203]
                0.00    0.00    3663/9503        block_label [961]
                0.00    0.00      66/153         emit_barrier_after [1036]
                0.00    0.00      66/254995      gen_rtx_CONST_INT [519]
-----------------------------------------------
                0.00   13.72       3/3           rest_of_compilation [7]
[22]     7.9    0.00   13.72       3         thread_prologue_and_epilogue_insns [22]
                0.37   13.35       3/5           commit_edge_insertions [17]
                0.00    0.00       3/3           gen_prologue [699]
                0.00    0.00       3/3           gen_epilogue [1092]
                0.00    0.00       6/91501       gen_sequence [405]
                0.00    0.00       6/129874      emit_insn [386]
                0.00    0.00       6/123         insert_insn_on_edge [1058]
                0.00    0.00       6/51353       emit_note [663]
                0.00    0.00       3/17313       emit_jump_insn [703]
                0.00    0.00      12/108004      end_sequence [1322]
                0.00    0.00       6/108004      start_sequence [1323]
                0.00    0.00       6/6           record_insns [1976]
                0.00    0.00       3/3           ix86_can_use_return_insn_p [2039]
-----------------------------------------------
                0.26   12.66       3/3           rest_of_compilation [7]
[23]     7.4    0.26   12.66       3         reg_to_stack [23]
                0.00    6.92       1/1           convert_regs [27]
                0.00    5.54       1/7           life_analysis [12]
                0.19    0.00       1/6           mark_dfs_back_edges [72]
                0.01    0.00       1/5           count_or_remove_death_notes [309]
                0.00    0.00       1/7           delete_dead_jumptables [351]
                0.00    0.00       1/4           alloc_aux_for_blocks [582]
                0.00    0.00     105/152060      gen_raw_REG [366]
                0.00    0.00     105/138338      gen_rtx_REG [1314]
                0.00    0.00       1/9317        get_max_uid [1463]
                0.00    0.00       1/262         varray_init [1700]
-----------------------------------------------
                1.77    6.66 5845902/5845902     try_crossjump_to_edge [21]
[24]     4.8    1.77    6.66 5845902         flow_find_cross_jump [24]
                3.59    0.00 5771506/6047223     rtx_renumbered_equal_p [39]
                0.53    0.96 11543012/11543061     find_reg_equal_equiv_note [65]
                1.00    0.47 11691804/11904476     onlyjump_p [64]
                0.05    0.04  616039/682505      stack_regs_mentioned [205]
                0.00    0.02   46872/97363       returnjump_p [317]
                0.00    0.00     188/1264901     rtx_equal_p [442]
                0.00    0.00       7/39          remove_note [1892]
-----------------------------------------------
                0.02    7.72       9/9           rest_of_compilation [7]
[25]     4.4    0.02    7.72       9         if_convert [25]
                0.00    5.49       1/11          update_life_info [8]
                0.00    1.65       6/15          calculate_dominance_info [35]
                0.01    0.40   28166/28166       find_if_header [106]
                0.12    0.00       6/191         sbitmap_vector_alloc [38]
                0.00    0.04       9/73          free_basic_block_vars [119]
                0.01    0.00       1/5           count_or_remove_death_notes [309]
                0.00    0.00       1/26          allocate_reg_info [432]
                0.00    0.00       1/1125177     sbitmap_zero [225]
                0.00    0.00       2/60023       max_reg_num [1344]
                0.00    0.00       1/662         sbitmap_alloc [1612]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-15 11:58   ` Brad Lucier
@ 2001-10-16 21:15     ` Zack Weinberg
  0 siblings, 0 replies; 21+ messages in thread
From: Zack Weinberg @ 2001-10-16 21:15 UTC (permalink / raw)
  To: Brad Lucier; +Cc: jh, gcc

On Mon, Oct 15, 2001 at 01:58:18PM -0500, Brad Lucier wrote:
> > 
> > - Can you repeat this test with cc1 linked against a profile-
> > instrumented libc,
> 
> Unfortunately, on this box libc_p.a seems to be significantly older than
> libc.a and there are link errors.

Weird.  What system is this?

> > and libiberty re-built with CFLAGS="-O2 -g -pg"?
> 
> I tried this, too, and it links, but I can't get a gmon.out file from
> running the resulting cc1.  Perhaps I just don't know how to do such
> things properly.

Hmm, the only thing I can think of is if you forgot to spec CFLAGS
including -pg when re-linking cc1; leave that out and gcrt1.o isn't
loaded, and the profiler never gets started.

zw

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16  8:01 ` Jan Hubicka
  2001-10-16 12:39   ` Brad Lucier
@ 2001-10-17 12:43   ` Brad Lucier
  2001-10-17 13:38   ` Richard Henderson
  2 siblings, 0 replies; 21+ messages in thread
From: Brad Lucier @ 2001-10-17 12:43 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, rth, jh, gcc

> I don't have time to finish bootstrap, but the patch appears to work.
> OK assuming the bootstrapping/regtesting i386 passes fluently?
> 
> Honza
> 
> Tue Oct 16 16:56:33 CEST 2001  Jan Hubicka  <jh@suse.cz>
> 	* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
> 	* cfgbuild.c (find_bb_boundaries): Break out from ...
> 	(find_sub_basic_blocks): ... here; reorganize to handle multiple
> 	basic blocks at once.
> 	* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> 	to find_sub_basic_blocks.
> 	* recog.c (split_all_insns): Update find_sub_basic_blocks call.

I haven't seen this patch approved or acknowledged yet, but if it helps, I
bootstrapped and regtested it on i686-pc-linux-gnu, see

http://gcc.gnu.org/ml/gcc-testresults/2001-10/msg00245.html

Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-16  8:01 ` Jan Hubicka
  2001-10-16 12:39   ` Brad Lucier
  2001-10-17 12:43   ` Brad Lucier
@ 2001-10-17 13:38   ` Richard Henderson
  2001-10-17 14:00     ` Jan Hubicka
  2001-10-17 15:38     ` Jan Hubicka
  2 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-17 13:38 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc

On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> 	* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> 	to find_sub_basic_blocks.

I don't especially like the feature that commite_one_edge_insertion
has to do all sorts of extra work to update one block.  Perhaps you
should create a find_many_sub_basic_blocks that takes the sbitmap,
but find_sub_basic_blocks continues to update one block.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-17 13:38   ` Richard Henderson
@ 2001-10-17 14:00     ` Jan Hubicka
  2001-10-17 15:38     ` Jan Hubicka
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Hubicka @ 2001-10-17 14:00 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, Brad Lucier, gcc-patches, gcc

> On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> > 	* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> > 	to find_sub_basic_blocks.
> 
> I don't especially like the feature that commite_one_edge_insertion
> has to do all sorts of extra work to update one block.  Perhaps you
Agreed.
> should create a find_many_sub_basic_blocks that takes the sbitmap,
> but find_sub_basic_blocks continues to update one block.
Hmm, there will be some code duplication mess :(, but probably still better.

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-17 13:38   ` Richard Henderson
  2001-10-17 14:00     ` Jan Hubicka
@ 2001-10-17 15:38     ` Jan Hubicka
  2001-10-17 16:10       ` Richard Henderson
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Hubicka @ 2001-10-17 15:38 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, Brad Lucier, gcc-patches, gcc

> On Tue, Oct 16, 2001 at 05:01:37PM +0200, Jan Hubicka wrote:
> > 	* cfgrtl.c (commite_one_edge_insertion): Create an bitmap to pass
> > 	to find_sub_basic_blocks.
> 
> I don't especially like the feature that commite_one_edge_insertion
> has to do all sorts of extra work to update one block.  Perhaps you
> should create a find_many_sub_basic_blocks that takes the sbitmap,
> but find_sub_basic_blocks continues to update one block.
> 
Done.

Thu Oct 18 01:47:20 CEST 2001  Jan Hubicka  <jh@suse.cz>
	* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
	* cfgbuild.c (find_bb_boundaries, compute_outgoing_frequencies):
	Break out from ...
	(find_sub_basic_blocks): ... here;
	(find_many_sub_basic_blocks): New.
	* recog.c (split_all_insns): Update find_sub_basic_blocks call.

*** cfgbuild.c.b	Thu Oct 18 01:34:54 2001
--- cfgbuild.c	Thu Oct 18 01:46:25 2001
*************** static void make_edges			PARAMS ((rtx, i
*** 55,60 ****
--- 55,62 ----
  static void make_label_edge		PARAMS ((sbitmap *, basic_block,
  						 rtx, int));
  static void make_eh_edge		PARAMS ((sbitmap *, basic_block, rtx));
+ static void find_bb_boundaries		PARAMS ((basic_block));
+ static void compute_outgoing_frequencies PARAMS ((basic_block));
  
  /* Count the basic blocks of the function.  */
  
*************** make_edges (label_value_list, min, max, 
*** 246,252 ****
      }
  
    /* By nature of the way these get numbered, block 0 is always the entry.  */
!   cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
  
    for (i = min; i <= max; ++i)
      {
--- 248,255 ----
      }
  
    /* By nature of the way these get numbered, block 0 is always the entry.  */
!   if (min == 0)
!     cached_make_edge (edge_cache, ENTRY_BLOCK_PTR, BASIC_BLOCK (0), EDGE_FALLTHRU);
  
    for (i = min; i <= max; ++i)
      {
*************** find_basic_blocks (f, nregs, file)
*** 663,680 ****
    timevar_pop (TV_CFG);
  }
  \f
! /* Assume that someone emitted code with control flow instructions to the
!    basic block.  Update the data structure.  */
! void
! find_sub_basic_blocks (bb)
       basic_block bb;
  {
    rtx insn = bb->head;
    rtx end = bb->end;
    rtx flow_transfer_insn = NULL_RTX;
    edge fallthru = NULL;
-   basic_block first_bb = bb;
-   int i;
  
    if (insn == bb->end)
      return;
--- 666,692 ----
    timevar_pop (TV_CFG);
  }
  \f
! /* State of basic block as seen by find_sub_basic_blocks.  */
! enum state
!   {
!     BLOCK_NEW = 0,
!     BLOCK_ORIGINAL,
!     BLOCK_TO_SPLIT
!   };
! #define STATE(bb) (enum state)(size_t)(bb)->aux
! #define SET_STATE(bb, state) (bb)->aux = (void *)(state)
! 
! /* Scan basic block BB for possible BB boundaries inside the block
!    and create new basic blocks in the progress.  */
! 
! static void
! find_bb_boundaries (bb)
       basic_block bb;
  {
    rtx insn = bb->head;
    rtx end = bb->end;
    rtx flow_transfer_insn = NULL_RTX;
    edge fallthru = NULL;
  
    if (insn == bb->end)
      return;
*************** find_sub_basic_blocks (bb)
*** 694,700 ****
  	    abort ();
  	  break;
  
! 	/* On code label, split current basic block.  */
  	case CODE_LABEL:
  	  fallthru = split_block (bb, PREV_INSN (insn));
  	  if (flow_transfer_insn)
--- 706,712 ----
  	    abort ();
  	  break;
  
! 	  /* On code label, split current basic block.  */
  	case CODE_LABEL:
  	  fallthru = split_block (bb, PREV_INSN (insn));
  	  if (flow_transfer_insn)
*************** find_sub_basic_blocks (bb)
*** 725,731 ****
  	    {
  	      if (GET_CODE (PATTERN (insn)) == ADDR_VEC
  		  || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! 		abort();
  	      flow_transfer_insn = insn;
  	    }
  	  else if (code == CALL_INSN)
--- 737,743 ----
  	    {
  	      if (GET_CODE (PATTERN (insn)) == ADDR_VEC
  		  || GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
! 		abort ();
  	      flow_transfer_insn = insn;
  	    }
  	  else if (code == CALL_INSN)
*************** find_sub_basic_blocks (bb)
*** 764,781 ****
       followed by cleanup at fallthru edge, so the outgoing edges may
       be dead.  */
    purge_dead_edges (bb);
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (NULL, first_bb->index, bb->index, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
!   for (i = first_bb->index; i <= bb->index; i++)
      {
!       edge e,f;
        basic_block b = BASIC_BLOCK (i);
!       if (b != first_bb)
  	{
  	  b->count = 0;
  	  b->frequency = 0;
--- 776,862 ----
       followed by cleanup at fallthru edge, so the outgoing edges may
       be dead.  */
    purge_dead_edges (bb);
+ }
+ 
+ /*  Assume that frequency of basic block B is known.  Compute frequencies
+     and probabilities of outgoing edges.  */
+ 
+ static void
+ compute_outgoing_frequencies (b)
+      basic_block b;
+ {
+   edge e, f;
+   if (b->succ && b->succ->succ_next && !b->succ->succ_next->succ_next)
+     {
+       rtx note = find_reg_note (b->end, REG_BR_PROB, NULL);
+       int probability;
+ 
+       if (!note)
+ 	return;
+       probability = INTVAL (XEXP (find_reg_note (b->end,
+ 						 REG_BR_PROB, NULL), 0));
+       e = BRANCH_EDGE (b);
+       e->probability = probability;
+       e->count = ((b->count * probability + REG_BR_PROB_BASE / 2)
+ 		  / REG_BR_PROB_BASE);
+       f = FALLTHRU_EDGE (b);
+       f->probability = REG_BR_PROB_BASE - probability;
+       f->count = b->count - e->count;
+     }
+   if (b->succ && !b->succ->succ_next)
+     {
+       e = b->succ;
+       e->probability = REG_BR_PROB_BASE;
+       e->count = b->count;
+     }
+ }
+ 
+ /* Assume that someone emitted code with control flow instructions to the
+    basic block.  Update the data structure.  */
+ 
+ void
+ find_many_sub_basic_blocks (blocks)
+      sbitmap blocks;
+ {
+   int i;
+   int min, max;
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     {
+       SET_STATE (BASIC_BLOCK (i),
+ 		 TEST_BIT (blocks, i)
+ 		 ? BLOCK_TO_SPLIT : BLOCK_ORIGINAL);
+     }
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     {
+       basic_block bb = BASIC_BLOCK (i);
+       if (STATE (bb) == BLOCK_TO_SPLIT)
+ 	find_bb_boundaries (bb);
+     }
+ 
+   for (i = 0; i < n_basic_blocks; i++)
+     if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+       break;
+   min = max = i;
+   for (; i < n_basic_blocks; i++)
+     if (STATE (BASIC_BLOCK (i)) != BLOCK_ORIGINAL)
+       max = i;
  
    /* Now re-scan and wire in all edges.  This expect simple (conditional)
       jumps at the end of each new basic blocks.  */
!   make_edges (NULL, min, max, 1);
  
    /* Update branch probabilities.  Expect only (un)conditional jumps
       to be created with only the forward edges.  */
!   for (i = min; i <= max; i++)
      {
!       edge e;
        basic_block b = BASIC_BLOCK (i);
! 
!       if (STATE (b) == BLOCK_ORIGINAL)
! 	continue;
!       if (STATE (b) == BLOCK_NEW)
  	{
  	  b->count = 0;
  	  b->frequency = 0;
*************** find_sub_basic_blocks (bb)
*** 785,813 ****
  	      b->frequency += EDGE_FREQUENCY (e);
  	    }
  	}
!       if (b->succ && b->succ->succ_next && !b->succ->succ_next->succ_next)
! 	{
! 	  rtx note = find_reg_note (b->end, REG_BR_PROB, NULL);
! 	  int probability;
  
! 	  if (!note)
! 	    continue;
! 	  probability = INTVAL (XEXP (find_reg_note (b->end,
! 						     REG_BR_PROB,
! 						     NULL), 0));
! 	  e = BRANCH_EDGE (b);
! 	  e->probability = probability;
! 	  e->count = ((b->count * probability + REG_BR_PROB_BASE / 2)
! 		      / REG_BR_PROB_BASE);
! 	  f = FALLTHRU_EDGE (b);
! 	  f->probability = REG_BR_PROB_BASE - probability;
! 	  f->count = b->count - e->count;
! 	}
!       if (b->succ && !b->succ->succ_next)
  	{
! 	  e = b->succ;
! 	  e->probability = REG_BR_PROB_BASE;
! 	  e->count = b->count;
  	}
      }
  }
--- 866,913 ----
  	      b->frequency += EDGE_FREQUENCY (e);
  	    }
  	}
!       compute_outgoing_frequencies (b);
!     }
!   for (i = 0; i < n_basic_blocks; i++)
!     SET_STATE (BASIC_BLOCK (i), 0);
! }
  
! /* Like above but for single basic block only.  */
! 
! void
! find_sub_basic_blocks (bb)
!     basic_block bb;
! {
!   int i;
!   int min, max;
!   basic_block next = (bb->index == n_basic_blocks - 1
! 		      ? NULL : BASIC_BLOCK (bb->index + 1));
! 
!   min = bb->index;
!   find_bb_boundaries (bb);
!   max = (next ? next->index : n_basic_blocks) - 1;
! 
!   /* Now re-scan and wire in all edges.  This expect simple (conditional)
!      jumps at the end of each new basic blocks.  */
!   make_edges (NULL, min, max, 1);
! 
!   /* Update branch probabilities.  Expect only (un)conditional jumps
!      to be created with only the forward edges.  */
!   for (i = min; i <= max; i++)
!     {
!       edge e;
!       basic_block b = BASIC_BLOCK (i);
! 
!       if (i != min)
  	{
! 	  b->count = 0;
! 	  b->frequency = 0;
! 	  for (e = b->pred; e; e=e->pred_next)
! 	    {
! 	      b->count += e->count;
! 	      b->frequency += EDGE_FREQUENCY (e);
! 	    }
  	}
+       compute_outgoing_frequencies (b);
      }
  }
*** basic-block.h.b	Thu Oct 18 01:35:12 2001
--- basic-block.h	Thu Oct 18 01:35:24 2001
*************** extern bool forwarder_block_p		PARAMS ((
*** 642,647 ****
--- 642,648 ----
  extern bool purge_all_dead_edges	PARAMS ((void));
  extern bool purge_dead_edges		PARAMS ((basic_block));
  extern void find_sub_basic_blocks	PARAMS ((basic_block));
+ extern void find_many_sub_basic_blocks	PARAMS ((sbitmap));
  extern bool can_fallthru		PARAMS ((basic_block, basic_block));
  extern void flow_nodes_print		PARAMS ((const char *, const sbitmap,
  						 FILE *));
*** recog.c.b	Thu Oct 18 01:34:59 2001
--- recog.c	Thu Oct 18 01:35:04 2001
*************** split_all_insns (upd_life)
*** 2778,2785 ****
  
    if (changed)
      {
!       for (i = 0; i < n_basic_blocks; i++)
! 	find_sub_basic_blocks (BASIC_BLOCK (i));
      }
  
    if (changed && upd_life)
--- 2778,2784 ----
  
    if (changed)
      {
!       find_many_sub_basic_blocks (blocks);
      }
  
    if (changed && upd_life)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Timing information for CFG manipulations
  2001-10-17 15:38     ` Jan Hubicka
@ 2001-10-17 16:10       ` Richard Henderson
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2001-10-17 16:10 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, gcc-patches, gcc

On Thu, Oct 18, 2001 at 12:37:55AM +0200, Jan Hubicka wrote:
> 	* basic-block.h (find_sub_basic_blocks): Use sbitmap parameter.
> 	* cfgbuild.c (find_bb_boundaries, compute_outgoing_frequencies):
> 	Break out from ...
> 	(find_sub_basic_blocks): ... here;
> 	(find_many_sub_basic_blocks): New.
> 	* recog.c (split_all_insns): Update find_sub_basic_blocks call.

Ok.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-10-17 16:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-13 20:33 Timing information for CFG manipulations Brad Lucier
2001-10-13 21:53 ` Zack Weinberg
2001-10-15 11:58   ` Brad Lucier
2001-10-16 21:15     ` Zack Weinberg
2001-10-15 12:54   ` Brad Lucier
2001-10-15 14:18     ` Daniel Berlin
2001-10-14  1:18 ` Jan Hubicka
2001-10-14  8:46   ` Brad Lucier
2001-10-14  9:21     ` Daniel Berlin
2001-10-16  7:22 ` Jan Hubicka
2001-10-16  8:25   ` Brad Lucier
2001-10-16 12:46   ` Richard Henderson
2001-10-16  8:01 ` Jan Hubicka
2001-10-16 12:39   ` Brad Lucier
2001-10-16 14:45     ` Jan Hubicka
2001-10-16 16:57       ` Brad Lucier
2001-10-17 12:43   ` Brad Lucier
2001-10-17 13:38   ` Richard Henderson
2001-10-17 14:00     ` Jan Hubicka
2001-10-17 15:38     ` Jan Hubicka
2001-10-17 16:10       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).