From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-50640-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14296 invoked by alias); 25 Apr 2002 07:18:33 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 14281 invoked from network); 25 Apr 2002 07:18:26 -0000
Received: from unknown (HELO vexpert.dbai.tuwien.ac.at) (128.130.111.12)
  by sources.redhat.com with SMTP; 25 Apr 2002 07:18:26 -0000
Received: from pulcherrima (pulcherrima [128.130.111.23])
	by vexpert.dbai.tuwien.ac.at (8.11.6/8.11.6) with ESMTP id g3P7IPW12568;
	Thu, 25 Apr 2002 09:18:25 +0200 (MET DST)
Date: Thu, 25 Apr 2002 00:21:00 -0000
From: Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at>
To: Kurt Garloff <garloff@suse.de>
cc: gcc@gcc.gnu.org
Subject: Re: inliner in gcc-3.1
In-Reply-To: <20020424132314.B27120@gum01m.etpnet.phys.tue.nl>
Message-ID: <Pine.BSF.4.44.0204250908090.79997-100000@pulcherrima.dbai.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2002-04/txt/msg01292.txt.bz2

On Wed, 24 Apr 2002, Kurt Garloff wrote:
> I was browsing the gcc ML archives (I'm not subscribed) and found
> that the inliner may still not be tuned optimally in gcc-3.1.
> [...]
> * I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23)
>   and it still works ...
>   I do believe it's somewhat saner than v1, but the benefits are
>   actually small. (up to 3% in some benchmarks, 0 in others, no
>   pessimizations found).

Bad news: This patch increases compilation time (for DLV, the package
I've been using to test performance) quite a bit:

  2.95.3			 4:01	4430752
  3.0				23:54	6295044
  3.0.3				 3:58	3948444
  3.1-20020422			 4:38	3996096
  3.1-20020424+kurtpatch	 5:35   4102432
  3.1-20020422+limit=800	 6:37	4177344
  3.1-20020422+limit=1200	16:50	4597888
  3.2-20020422			 5:15	4003276

And excellent news: This patch really improves the quality of the
generated code, and quite significantly so in several cases (much
more than those 3% you claimed)!

Times in [s]        | 2.95.3|  3.1-20020422 |3.1-.-kurtpatch|
--------------------+-------+---------------+---------------+
      STRATCOMP1-ALL|  3.57 |  96.67 (0.10) |  24.93 (0.01) |
   STRATCOMP-770.2-Q|  0.73 |   0.94 (0.01) |   0.79 (0.00) |
               2QBF1| 19.08 |  22.26 (0.01) |  20.94 (0.01) |
          PRIMEIMPL2| 10.74 |  12.91 (0.01) |  10.59 (0.01) |
            ANCESTOR|  8.88 |   9.53 (0.01) |   9.45 (0.01) |
       3COL-SIMPLEX1|  6.30 |   7.16 (0.00) |   6.86 (0.00) |
        3COL-LADDER1| 36.24 |  42.33 (0.04) |  40.05 (0.02) |
      3COL-N-LADDER1| 19.81 |  22.47 (0.16) |  20.44 (0.04) |
        3COL-RANDOM1| 10.69 |  12.23 (0.01) |  11.17 (0.01) |
          HP-RANDOM1| 13.16 |  14.82 (0.03) |  14.43 (0.03) |
       HAMCYCLE-FREE|  1.18 |   1.71 (0.00) |   1.64 (0.00) |
             DECOMP2| 21.91 |  24.02 (0.01) |  25.03 (0.02) |
        BW-P4-Esra-a| 91.71 |  99.28 (0.01) |  95.02 (0.04) |
        BW-P5-nopush|  6.96 |   7.43 (0.00) |   7.11 (0.00) |
       BW-P5-pushbin|  6.20 |   6.50 (0.00) |   6.16 (0.00) |
     BW-P5-nopushbin|  1.94 |   2.07 (0.01) |   1.98 (0.00) |
              3SAT-1| 32.92 |  38.48 (0.02) |  32.80 (0.01) |
   3SAT-1-CONSTRAINT| 17.46 |  20.64 (0.00) |  18.67 (0.00) |
        HANOI-Towers|  4.73 |   4.95 (0.00) |   4.93 (0.01) |
              RAMSEY|  8.00 |   8.66 (0.00) |   8.42 (0.01) |
             CRISTAL| 11.07 |  13.38 (0.01) |  11.74 (0.01) |
             HANOI-K| 33.41 |  38.76 (0.02) |  34.86 (0.01) |
           21-QUEENS|  9.66 |  10.41 (0.01) |   9.58 (0.00) |
   MSTDir[V=13,A=40]| 25.71 |  21.09 (0.00) |  19.93 (0.00) |
   MSTDir[V=15,A=40]| 25.81 |  21.14 (0.00) |  19.97 (0.00) |
 MSTUndir[V=13,A=40]| 12.86 |  11.46 (0.01) |  10.74 (0.00) |
 MSTUndir[V=15,A=40]|214.87 | 188.57 (0.00) | 177.02 (0.03) |
         TIMETABLING|  9.63 |  10.63 (0.01) |  10.21 (0.02) |
--------------------+-------+---------------+---------------+

This would be very nice to have in GCC 3.1, if it were not for the longer
compile time.

However, I'd really like to apply it to mainline as soon as possible,
because it (finally) compensates most of the code quality regressions
we have been seeing since GCC 2.95.

Gerald


2002-04-23  Kurt Garloff <garloff@suse.de>

	* tree-inline.c: Improve heuristics by using a smoother
	function to cut down allowable inlinable size.

--- gcc/tree-inline.c.orig	Tue Apr 23 22:54:17 2002
+++ gcc/tree-inline.c	Tue Apr 23 23:24:57 2002
@@ -706,14 +706,32 @@

   /* Even if this function is not itself too big to inline, it might
      be that we've done so much inlining already that we don't want to
-     risk too much inlining any more and thus halve the acceptable
-     size.  */
+     risk too much inlining any more */
   if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
       && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) * INSNS_PER_STMT
-	  > MAX_INLINE_INSNS)
-      && DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS / 4)
-    inlinable = 0;
-
+	  > MAX_INLINE_INSNS * 128))
+      inlinable = 0;
+  /* If we did not hit the extreme limit 128*MAX_INLINE_INSNS by recursion,
+     and we did not hit the limit for a single function (MAX_INLINE_INSNS/2)
+     but we are above the recursive throttling threshold (MAX_INLINE_INSNS),
+     we use a limit that descreases linearly with the already inlined
+     code. We always allow very small funtions (13 statements) to be inlined.
+     Value (13*INSNS_PER_STMT) found by numerous experiments in 3.0.x with
+     C++ code */
+  else if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
+	   && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+	       * INSNS_PER_STMT > MAX_INLINE_INSNS)
+	   && DECL_NUM_STMTS (fn) > 13) {
+     /* Use a linear function with a slope of -0.03125
+        we could also use an int approx. of sqrt or similar things */
+     signed int max_curr = MAX_INLINE_INSNS/2
+       - (( DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+          * INSNS_PER_STMT - MAX_INLINE_INSNS) / 32;
+
+     if ((signed int)(DECL_NUM_STMTS (fn) * INSNS_PER_STMT) > max_curr)
+	inlinable = 0;
+  }
+
   if (inlinable && (*lang_hooks.tree_inlining.cannot_inline_tree_fn) (&fn))
     inlinable = 0;

@@ -968,7 +986,8 @@

   /* Our function now has more statements than it did before.  */
   DECL_NUM_STMTS (VARRAY_TREE (id->fns, 0)) += DECL_NUM_STMTS (fn);
-  id->inlined_stmts += DECL_NUM_STMTS (fn);
+  /* For accounting, subtract one for the saved call/ret */
+  id->inlined_stmts += DECL_NUM_STMTS (fn) - 1;

   /* Recurse into the body of the just inlined function.  */
   expand_calls_inline (inlined_body, id);