From mboxrd@z Thu Jan 1 00:00:00 1970 From: lucier@math.purdue.edu To: gcc-gnats@gcc.gnu.org Subject: optimization/4121: split_all_insns performance bottleneck Date: Fri, 24 Aug 2001 17:06:00 -0000 Message-id: <20010824235922.28496.qmail@sourceware.cygnus.com> X-SW-Source: 2001-08/msg00652.html List-Id: >Number: 4121 >Category: optimization >Synopsis: split_all_insns performance bottleneck >Confidential: no >Severity: serious >Priority: medium >Responsible: unassigned >State: open >Class: sw-bug >Submitter-Id: net >Arrival-Date: Fri Aug 24 17:06:01 PDT 2001 >Closed-Date: >Last-Modified: >Originator: B. Lucier >Release: 3.1 20010824 (experimental) >Organization: >Environment: sparc-sun-solaris28 >Description: The following PR can probably be closed now http://gcc.gnu.org/ml/gcc-prs/2001-08/msg00184.html The current problem when compiling http://www.math.purdue.edu/~lucier/all.i.gz seems to be that split_all_insns calls find_sub_basic_block each time the CFG is altered. Here is the timing and profiling data with banach-109% /pkgs/gcc-2.96/bin/gcc -v Reading specs from /pkgs/gcc-2.96/lib/gcc-lib/sparc-sun-solaris2.8/3.1/specs Configured with: ../configure --prefix=/pkgs/gcc-2.96 --enable-checking=no --enable-languages=c Thread model: posix gcc version 3.1 20010824 (experimental) and the calling options /pkgs/gcc-2.96/lib/gcc-lib/sparc-sun-solaris2.8/3.1//cc1 -fPIC -O1 -fschedule-insns2 -fno-math-errno -fno-strict-aliasing -mcpu=supersparc -mtune=ultrasparc -Wall -W -Wno-unused all.i ___H__20_all {GC 72513k -> 24052k} {GC 32111k -> 25325k} {GC 33960k -> 25286k} {GC 40663k -> 24398k} {GC 37453k -> 27411k} {GC 50777k -> 30006k} ___init_proc ____20_all Execution times (seconds) garbage collection : 6.33 ( 0%) usr 0.00 ( 0%) sys 6.44 ( 0%) wall cfg construction : 184.32 ( 6%) usr 14.80 (49%) sys 199.06 ( 6%) wall cfg cleanup : 529.64 (16%) usr 0.00 ( 0%) sys 529.69 (16%) wall preprocessing : 2.26 ( 0%) usr 3.03 (10%) sys 4.88 ( 0%) wall lexical analysis : 2.28 ( 0%) usr 6.28 (21%) sys 8.94 ( 0%) wall parser : 16.05 ( 0%) usr 4.14 (14%) sys 20.25 ( 1%) wall varconst : 0.74 ( 0%) usr 0.01 ( 0%) sys 0.62 ( 0%) wall jump : 13.81 ( 0%) usr 0.03 ( 0%) sys 13.88 ( 0%) wall CSE : 12.64 ( 0%) usr 0.00 ( 0%) sys 12.75 ( 0%) wall loop analysis : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall flow analysis : 180.06 ( 5%) usr 1.16 ( 4%) sys 181.25 ( 5%) wall combiner : 13.89 ( 0%) usr 0.01 ( 0%) sys 13.94 ( 0%) wall if-conversion : 9.96 ( 0%) usr 0.01 ( 0%) sys 10.00 ( 0%) wall local alloc : 6.35 ( 0%) usr 0.00 ( 0%) sys 6.38 ( 0%) wall global alloc : 23.79 ( 1%) usr 0.62 ( 2%) sys 24.44 ( 1%) wall reload CSE regs : 87.27 ( 3%) usr 0.00 ( 0%) sys 87.25 ( 3%) wall flow 2 :2150.64 (65%) usr 0.00 ( 0%) sys2150.69 (64%) wall if-conversion 2 : 10.33 ( 0%) usr 0.00 ( 0%) sys 10.31 ( 0%) wall scheduling 2 : 41.51 ( 1%) usr 0.00 ( 0%) sys 41.50 ( 1%) wall delay branch sched : 6.23 ( 0%) usr 0.00 ( 0%) sys 6.25 ( 0%) wall shorten branches : 0.72 ( 0%) usr 0.00 ( 0%) sys 0.75 ( 0%) wall final : 16.81 ( 1%) usr 0.01 ( 0%) sys 16.75 ( 0%) wall symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall rest of compilation : 12.87 ( 0%) usr 0.00 ( 0%) sys 12.88 ( 0%) wall TOTAL :3328.58 30.15 3359.19 3328.68u 30.31s 56:00.70 99.9% Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 43.06 370.73 370.73 69535200 0.01 0.01 make_edge 12.23 476.05 105.32 internal_mcount 8.84 552.15 76.10 htab_traverse 3.80 584.88 32.73 29 1128.62 1128.62 mark_critical_edges 3.57 615.66 30.78 2474 12.44 12.44 propagate_freq 2.52 637.35 21.69 15 1446.00 1449.33 calc_idoms 2.51 658.99 21.64 258060 0.08 0.08 try_forward_edges 2.49 680.42 21.43 120405144 0.00 0.00 bitmap_operation 2.12 698.68 18.26 15 1217.33 1217.33 calc_dfs_tree_nonrec 2.08 716.57 17.89 8 2236.25 4988.53 calculate_global_regs_live 1.93 733.17 16.60 25 664.00 664.00 find_unreachable_blocks 1.88 749.38 16.21 3 5403.33 9329.03 flow_loops_find 1.29 760.50 11.12 69475595 0.00 0.00 make_label_edge 1.12 770.15 9.65 5 1930.00 1930.00 mark_dfs_back_edges 0.89 777.83 7.68 3 2560.00 15154.13 estimate_bb_frequencies ... ----------------------------------------------- 0.00 0.28 10/13720 find_basic_blocks [34] 3.87 386.48 13710/13720 find_sub_basic_blocks [10] [8] 45.4 3.87 386.76 13720 make_edges [8] 370.69 0.00 69527073/69535200 make_edge [11] 11.12 0.00 69475595/69475595 make_label_edge [36] 3.24 0.00 13712/13730 sbitmap_vector_alloc [53] 0.05 1.56 13712/13730 sbitmap_vector_zero [71] 0.03 0.04 57675/877995 for_each_rtx [77] 0.01 0.02 70795/76239 computed_jump_p [470] 0.01 0.00 107911/923704 next_nonnote_insn [400] 0.00 0.00 70795/66160938 find_reg_note [165] 0.00 0.00 57675/295634 returnjump_p [1163] ----------------------------------------------- 0.00 390.55 9/9 rest_of_compilation [7] [9] 45.4 0.00 390.55 9 split_all_insns [9] 0.03 390.27 13707/13710 find_sub_basic_blocks [10] 0.03 0.20 387759/508164 split_insn [174] 0.01 0.00 1/34 compute_bb_for_insn [172] 0.00 0.00 9/188062848 sbitmap_zero [72] 0.00 0.00 9/2519 sbitmap_alloc [1383] 0.00 0.00 1/13762 get_max_uid [1303] ----------------------------------------------- 0.00 0.09 3/13710 commit_edge_insertions [275] 0.03 390.27 13707/13710 split_all_insns [9] [10] 45.4 0.03 390.36 13710 find_sub_basic_blocks [10] 3.87 386.48 13710/13720 make_edges [8] 0.01 0.01 13710/27426 purge_dead_edges [458] 0.00 0.00 13156/66160938 find_reg_note [165] ----------------------------------------------- 0.04 0.00 8127/69535200 try_crossjump_to_edge [59] 370.69 0.00 69527073/69535200 make_edges [8] [11] 43.1 370.73 0.00 69535200 make_edge [11] ----------------------------------------------- >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: