From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14296 invoked by alias); 25 Apr 2002 07:18:33 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 14281 invoked from network); 25 Apr 2002 07:18:26 -0000 Received: from unknown (HELO vexpert.dbai.tuwien.ac.at) (128.130.111.12) by sources.redhat.com with SMTP; 25 Apr 2002 07:18:26 -0000 Received: from pulcherrima (pulcherrima [128.130.111.23]) by vexpert.dbai.tuwien.ac.at (8.11.6/8.11.6) with ESMTP id g3P7IPW12568; Thu, 25 Apr 2002 09:18:25 +0200 (MET DST) Date: Thu, 25 Apr 2002 00:21:00 -0000 From: Gerald Pfeifer To: Kurt Garloff cc: gcc@gcc.gnu.org Subject: Re: inliner in gcc-3.1 In-Reply-To: <20020424132314.B27120@gum01m.etpnet.phys.tue.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002-04/txt/msg01292.txt.bz2 On Wed, 24 Apr 2002, Kurt Garloff wrote: > I was browsing the gcc ML archives (I'm not subscribed) and found > that the inliner may still not be tuned optimally in gcc-3.1. > [...] > * I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23) > and it still works ... > I do believe it's somewhat saner than v1, but the benefits are > actually small. (up to 3% in some benchmarks, 0 in others, no > pessimizations found). Bad news: This patch increases compilation time (for DLV, the package I've been using to test performance) quite a bit: 2.95.3 4:01 4430752 3.0 23:54 6295044 3.0.3 3:58 3948444 3.1-20020422 4:38 3996096 3.1-20020424+kurtpatch 5:35 4102432 3.1-20020422+limit=800 6:37 4177344 3.1-20020422+limit=1200 16:50 4597888 3.2-20020422 5:15 4003276 And excellent news: This patch really improves the quality of the generated code, and quite significantly so in several cases (much more than those 3% you claimed)! Times in [s] | 2.95.3| 3.1-20020422 |3.1-.-kurtpatch| --------------------+-------+---------------+---------------+ STRATCOMP1-ALL| 3.57 | 96.67 (0.10) | 24.93 (0.01) | STRATCOMP-770.2-Q| 0.73 | 0.94 (0.01) | 0.79 (0.00) | 2QBF1| 19.08 | 22.26 (0.01) | 20.94 (0.01) | PRIMEIMPL2| 10.74 | 12.91 (0.01) | 10.59 (0.01) | ANCESTOR| 8.88 | 9.53 (0.01) | 9.45 (0.01) | 3COL-SIMPLEX1| 6.30 | 7.16 (0.00) | 6.86 (0.00) | 3COL-LADDER1| 36.24 | 42.33 (0.04) | 40.05 (0.02) | 3COL-N-LADDER1| 19.81 | 22.47 (0.16) | 20.44 (0.04) | 3COL-RANDOM1| 10.69 | 12.23 (0.01) | 11.17 (0.01) | HP-RANDOM1| 13.16 | 14.82 (0.03) | 14.43 (0.03) | HAMCYCLE-FREE| 1.18 | 1.71 (0.00) | 1.64 (0.00) | DECOMP2| 21.91 | 24.02 (0.01) | 25.03 (0.02) | BW-P4-Esra-a| 91.71 | 99.28 (0.01) | 95.02 (0.04) | BW-P5-nopush| 6.96 | 7.43 (0.00) | 7.11 (0.00) | BW-P5-pushbin| 6.20 | 6.50 (0.00) | 6.16 (0.00) | BW-P5-nopushbin| 1.94 | 2.07 (0.01) | 1.98 (0.00) | 3SAT-1| 32.92 | 38.48 (0.02) | 32.80 (0.01) | 3SAT-1-CONSTRAINT| 17.46 | 20.64 (0.00) | 18.67 (0.00) | HANOI-Towers| 4.73 | 4.95 (0.00) | 4.93 (0.01) | RAMSEY| 8.00 | 8.66 (0.00) | 8.42 (0.01) | CRISTAL| 11.07 | 13.38 (0.01) | 11.74 (0.01) | HANOI-K| 33.41 | 38.76 (0.02) | 34.86 (0.01) | 21-QUEENS| 9.66 | 10.41 (0.01) | 9.58 (0.00) | MSTDir[V=13,A=40]| 25.71 | 21.09 (0.00) | 19.93 (0.00) | MSTDir[V=15,A=40]| 25.81 | 21.14 (0.00) | 19.97 (0.00) | MSTUndir[V=13,A=40]| 12.86 | 11.46 (0.01) | 10.74 (0.00) | MSTUndir[V=15,A=40]|214.87 | 188.57 (0.00) | 177.02 (0.03) | TIMETABLING| 9.63 | 10.63 (0.01) | 10.21 (0.02) | --------------------+-------+---------------+---------------+ This would be very nice to have in GCC 3.1, if it were not for the longer compile time. However, I'd really like to apply it to mainline as soon as possible, because it (finally) compensates most of the code quality regressions we have been seeing since GCC 2.95. Gerald 2002-04-23 Kurt Garloff * tree-inline.c: Improve heuristics by using a smoother function to cut down allowable inlinable size. --- gcc/tree-inline.c.orig Tue Apr 23 22:54:17 2002 +++ gcc/tree-inline.c Tue Apr 23 23:24:57 2002 @@ -706,14 +706,32 @@ /* Even if this function is not itself too big to inline, it might be that we've done so much inlining already that we don't want to - risk too much inlining any more and thus halve the acceptable - size. */ + risk too much inlining any more */ if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn) && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) * INSNS_PER_STMT - > MAX_INLINE_INSNS) - && DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS / 4) - inlinable = 0; - + > MAX_INLINE_INSNS * 128)) + inlinable = 0; + /* If we did not hit the extreme limit 128*MAX_INLINE_INSNS by recursion, + and we did not hit the limit for a single function (MAX_INLINE_INSNS/2) + but we are above the recursive throttling threshold (MAX_INLINE_INSNS), + we use a limit that descreases linearly with the already inlined + code. We always allow very small funtions (13 statements) to be inlined. + Value (13*INSNS_PER_STMT) found by numerous experiments in 3.0.x with + C++ code */ + else if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn) + && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) + * INSNS_PER_STMT > MAX_INLINE_INSNS) + && DECL_NUM_STMTS (fn) > 13) { + /* Use a linear function with a slope of -0.03125 + we could also use an int approx. of sqrt or similar things */ + signed int max_curr = MAX_INLINE_INSNS/2 + - (( DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) + * INSNS_PER_STMT - MAX_INLINE_INSNS) / 32; + + if ((signed int)(DECL_NUM_STMTS (fn) * INSNS_PER_STMT) > max_curr) + inlinable = 0; + } + if (inlinable && (*lang_hooks.tree_inlining.cannot_inline_tree_fn) (&fn)) inlinable = 0; @@ -968,7 +986,8 @@ /* Our function now has more statements than it did before. */ DECL_NUM_STMTS (VARRAY_TREE (id->fns, 0)) += DECL_NUM_STMTS (fn); - id->inlined_stmts += DECL_NUM_STMTS (fn); + /* For accounting, subtract one for the saved call/ret */ + id->inlined_stmts += DECL_NUM_STMTS (fn) - 1; /* Recurse into the body of the just inlined function. */ expand_calls_inline (inlined_body, id);