From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12346 invoked by alias); 22 Jul 2006 19:30:59 -0000 Received: (qmail 12276 invoked by alias); 22 Jul 2006 19:30:44 -0000 Date: Sat, 22 Jul 2006 19:30:00 -0000 Message-ID: <20060722193044.12275.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug rtl-optimization/28071] [4.1/4.2 regression] A file that can not be compiled in reasonable time/space In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "hubicka at ucw dot cz" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2006-07/txt/msg01713.txt.bz2 List-Id: ------- Comment #14 from hubicka at ucw dot cz 2006-07-22 19:30 ------- Subject: Re: [4.1/4.2 regression] A file that can not be compiled in reasonable time/space Hi, with the attached patch I can cure the regmove quadratic behaviour and the time report is not so unresonable now: gnu_dev_major gnu_dev_minor gnu_dev_makedev max min f fx fy fz add addl addr sub subl subr mul mull mulr divl ipow fi Analyzing compilation unitPerforming intraprocedural optimizations Assembling functions: max min add addl addr sub subl subr mul mull mulr divl ipow fz fy fx f fi {GC 126177k -> 85112k} {GC 327625k -> 39474k} Execution times (seconds) garbage collection : 0.83 ( 0%) usr 0.00 ( 0%) sys 0.82 ( 0%) wall 0 kB ( 0%) ggc callgraph construction: 0.16 ( 0%) usr 0.02 ( 1%) sys 0.16 ( 0%) wall 1147 kB ( 0%) ggc callgraph optimization: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 533 kB ( 0%) ggc ipa reference : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc ipa pure const : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa type escape : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 0 kB ( 0%) ggc trivially dead code : 0.45 ( 0%) usr 0.00 ( 0%) sys 0.42 ( 0%) wall 0 kB ( 0%) ggc life analysis : 21.38 ( 3%) usr 0.02 ( 1%) sys 21.39 ( 3%) wall 1120 kB ( 0%) ggc life info update : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.61 ( 0%) wall 0 kB ( 0%) ggc alias analysis : 0.87 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall 4266 kB ( 1%) ggc register scan : 0.42 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall 150 kB ( 0%) ggc rebuild jump labels : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc preprocessing : 0.27 ( 0%) usr 0.06 ( 2%) sys 0.36 ( 0%) wall 471 kB ( 0%) ggc lexical analysis : 0.04 ( 0%) usr 0.05 ( 2%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc parser : 0.12 ( 0%) usr 0.03 ( 1%) sys 0.17 ( 0%) wall 3207 kB ( 1%) ggc inline heuristics : 15.14 ( 2%) usr 0.01 ( 0%) sys 15.26 ( 2%) wall 1486 kB ( 0%) ggc integration : 21.35 ( 3%) usr 0.12 ( 4%) sys 21.71 ( 3%) wall 33445 kB ( 8%) ggc tree gimplify : 0.18 ( 0%) usr 0.01 ( 0%) sys 0.19 ( 0%) wall 3341 kB ( 1%) ggc tree eh : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree CFG construction : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1338 kB ( 0%) ggc tree CFG cleanup : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 20 kB ( 0%) ggc tree VRP : 0.38 ( 0%) usr 0.01 ( 0%) sys 0.42 ( 0%) wall 11 kB ( 0%) ggc tree copy propagation : 0.23 ( 0%) usr 0.01 ( 0%) sys 0.28 ( 0%) wall 222 kB ( 0%) ggc tree store copy prop : 0.11 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall 4 kB ( 0%) ggc tree find ref. vars : 0.10 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 8137 kB ( 2%) ggc tree PTA : 1.29 ( 0%) usr 0.04 ( 1%) sys 1.36 ( 0%) wall 57 kB ( 0%) ggc tree alias analysis : 1.89 ( 0%) usr 0.20 ( 7%) sys 2.10 ( 0%) wall 0 kB ( 0%) ggc tree PHI insertion : 1.68 ( 0%) usr 0.01 ( 0%) sys 1.70 ( 0%) wall 18 kB ( 0%) ggc tree SSA rewrite : 0.62 ( 0%) usr 0.04 ( 1%) sys 0.65 ( 0%) wall 17084 kB ( 4%) ggc tree SSA other : 0.48 ( 0%) usr 0.08 ( 3%) sys 0.56 ( 0%) wall 0 kB ( 0%) ggc tree SSA incremental : 1.20 ( 0%) usr 0.00 ( 0%) sys 1.24 ( 0%) wall 0 kB ( 0%) ggc tree operand scan : 1.48 ( 0%) usr 0.34 (11%) sys 1.93 ( 0%) wall 15634 kB ( 4%) ggc dominator optimization: 1.05 ( 0%) usr 0.05 ( 2%) sys 1.05 ( 0%) wall 2698 kB ( 1%) ggc tree SRA : 1.05 ( 0%) usr 0.09 ( 3%) sys 1.15 ( 0%) wall 24835 kB ( 6%) ggc tree STORE-CCP : 0.09 ( 0%) usr 0.01 ( 0%) sys 0.11 ( 0%) wall 4 kB ( 0%) ggc tree CCP : 0.51 ( 0%) usr 0.02 ( 1%) sys 0.56 ( 0%) wall 154 kB ( 0%) ggc tree reassociation : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 0 kB ( 0%) ggc tree PRE : 296.46 (45%) usr 0.49 (16%) sys 298.81 (45%) wall 19481 kB ( 5%) ggc tree FRE : 0.96 ( 0%) usr 0.05 ( 2%) sys 1.00 ( 0%) wall 7991 kB ( 2%) ggc tree forward propagate: 0.04 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree conservative DCE : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall 0 kB ( 0%) ggc tree aggressive DCE : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 0 kB ( 0%) ggc tree DSE : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 8 kB ( 0%) ggc tree SSA uncprop : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc tree SSA to normal : 27.19 ( 4%) usr 0.01 ( 0%) sys 27.33 ( 4%) wall 22 kB ( 0%) ggc tree rename SSA copies: 0.15 ( 0%) usr 0.01 ( 0%) sys 0.16 ( 0%) wall 0 kB ( 0%) ggc dominance frontiers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc expand : 2.96 ( 0%) usr 0.09 ( 3%) sys 3.05 ( 0%) wall 24095 kB ( 6%) ggc jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc CSE : 1.87 ( 0%) usr 0.00 ( 0%) sys 1.88 ( 0%) wall 118 kB ( 0%) ggc global CSE : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc CPROP 1 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 1620 kB ( 0%) ggc PRE : 21.36 ( 3%) usr 0.01 ( 0%) sys 21.41 ( 3%) wall 200 kB ( 0%) ggc CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.31 ( 0%) wall 390 kB ( 0%) ggc bypass jumps : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall 389 kB ( 0%) ggc CSE 2 : 1.05 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall 72 kB ( 0%) ggc branch prediction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1 kB ( 0%) ggc flow analysis : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc combiner : 0.87 ( 0%) usr 0.01 ( 0%) sys 0.88 ( 0%) wall 1745 kB ( 0%) ggc if-conversion : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall 3 kB ( 0%) ggc regmove : 21.69 ( 3%) usr 0.02 ( 1%) sys 21.78 ( 3%) wall 2 kB ( 0%) ggc local alloc : 7.60 ( 1%) usr 0.00 ( 0%) sys 7.62 ( 1%) wall 1480 kB ( 0%) ggc global alloc : 16.47 ( 2%) usr 0.35 (12%) sys 16.91 ( 3%) wall 16915 kB ( 4%) ggc reload CSE regs : 107.52 (16%) usr 0.15 ( 5%) sys 108.55 (16%) wall 4783 kB ( 1%) ggc flow 2 : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall 225 kB ( 0%) ggc peephole 2 : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.20 ( 0%) wall 0 kB ( 0%) ggc rename registers : 0.41 ( 0%) usr 0.00 ( 0%) sys 0.39 ( 0%) wall 0 kB ( 0%) ggc scheduling 2 : 75.09 (11%) usr 0.53 (18%) sys 76.86 (12%) wall 206227 kB (51%) ggc machine dep reorg : 0.36 ( 0%) usr 0.00 ( 0%) sys 0.35 ( 0%) wall 0 kB ( 0%) ggc reorder blocks : 0.22 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall 15 kB ( 0%) ggc reg stack : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall 37 kB ( 0%) ggc final : 0.66 ( 0%) usr 0.02 ( 1%) sys 0.74 ( 0%) wall 1156 kB ( 0%) ggc TOTAL : 659.57 2.99 668.06 407297 kB PRE is somewhat slow, but I will leave this to Danny. For scheduling the situation is quite clear - we have huge basic blocks and produce huge amount of dependencies. For reload, I am also not really surprised since the code produces is regalloc nightmare and reload manages to create very huge bitmaps that results in quadratic behaviour. Since Danny asked for allocpools: Alloc-pool Kind Pools Allocated Peak Leak ------------------------------------------------------------- Value sets 18 2230608 1929200 0 Bitmap sets 18 9504 8432 0 Value set nodes 18 2032208 1768488 0 Binary tree nodes 18 1291320 783992 0 value 48 3875872 1246744 0 et_occ pool 127 238144 48040 0 et_node pool 127 159680 36024 0 Reference tree nodes 18 1430880 1437864 0 Expression tree nodes 18 426240 428840 0 elt_list 48 3639816 397672 0 List tree nodes 18 511488 516880 0 elt_loc_list 48 14186784 975240 0 Comparison tree nodes 18 4520 4832 0 original_copy 26 48 88 0 Constraint pool 108 4335432 1501136 0 Unary tree nodes 18 96 968 0 Variable info pool 108 12261704 4550848 0 Constraint edges 108 2112 496 0 operand entry pool 36 512 248 0 cselib_val_list 48 11627616 974144 0 ------------------------------------------------------------- Total 994 58264584 Memory consumption is now dominated by scheduler's dependency info: ggc-common.c:193 (ggc_calloc) 6303224: 1.9% 5139976:12.3% 1863696: 8.8% 1073688:21.8% 530 gimplify.c:453 (create_tmp_var_raw) 7325032: 2.2% 0: 0.0% 889240: 4.2% 0: 0.0% 93344 genrtl.c:17 (gen_rtx_fmt_ee) 9819384: 2.9% 0: 0.0% 138900: 0.7% 0: 0.0% 829857 tree-dfa.c:186 (create_stmt_ann) 9970168: 2.9% 763932: 1.8% 3692: 0.0% 0: 0.0% 206496 tree-ssanames.c:147 (make_ssa_name) 9740544: 2.9% 0: 0.0% 2373936:11.2% 0: 0.0% 252385 bitmap.c:139 (bitmap_element_allocate) 18876340: 5.6% 0: 0.0% 0: 0.0% 0: 0.0% 674155 genrtl.c:32 (gen_rtx_fmt_ue) 193579104:57.2% 0: 0.0% 0: 0.0% 0: 0.0% 16131592 Total 338496482 41839722 21146495 4929007 22457179 I am now looking into -O3 compilation that creases at into-ssa by overly large stack. Honza ------- Comment #15 from hubicka at ucw dot cz 2006-07-22 19:30 ------- Created an attachment (id=11920) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11920&action=view) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28071