inliner in gcc-3.1

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* inliner in gcc-3.1
@ 2002-04-24  4:37 Kurt Garloff
  2002-04-24 16:31 ` Daniel Berlin
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Kurt Garloff @ 2002-04-24  4:37 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 3991 bytes --]

Hi,

I was browsing the gcc ML archives (I'm not subscribed) and found
that the inliner may still not be tuned optimally in gcc-3.1.
http://gcc.gnu.org/ml/gcc/2002-04/msg01168.html
The problem seems to be mainly with C++ functions, where you want small
accessor functions to be reliably inlined and where the inliner heuristics
sometimes fails.

I've done some work on this before and got a patch of mine merged (between
3.0.1 and 3.0.3) that fixed the heuristics which lead to ridicolous compile
time resource requirements (gcc-3.0.0) resp. ridicolous performance
(gcc-3.0.1) according to my own and Gerard Pfeifer's benchmarks. See
http://www.garloff.de/kurt/freesoft/gcc/

To give a very short summary: Before gcc-3.0, some accounting for the
inlining was introduced to prevent huge functions to be inlined due to
recursive inlining. Unfortunately, inlining starts at the root
of the call tree, so in the end, the leaves would not be inlined any more 
despite being the most performance critical very often.
This was fixed with a simplistic patch that does just reduce the acceptable
size for inlining for single functions by a factor of two and -- after
reaching the (full) limit by recursion -- cut it further down by another
factor of two. (This is what I refer to as v1 of my inliner patch.)
That patch fixed the most serious problems and got merged.

Later I did some more fine tuning (patches v2,v3) and did use some linear
function to limit acceptable inlinable sizes for single functions. This 
seems somewhat saner. It did only give slightly better results in
benchmarks.

I'd like to add a few comments again:
* I was expecting to find the AST inliner in gcc-3.1.
  I do believe it's more sane starting to inline from the leaves of
  a call tree than from the root, so I would have expected that approach
  (if tuned well) to win all benchmarks.
* I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23)
  and it still works ...
  I do believe it's somewhat saner than v1, but the benefits are
  actually small. (up to 3% in some benchmarks, 0 in others, no
  pessimizations found).

It would be nice if C++ people would try it to see whether it helps them.
I would certainly appreciate if it gets merged. (Yes, the FSF has a signed
copyright assignment ...)

I made another patch for 3.0.1 and adapted it to 3.1, which is probably much
more interesting: The -O3 (-finline-functions) fix.

In the code I use, I put the inline keyword at all places where it was
obvious to my eyes that inlining was a good idea. When I compile that code
with -O3 (aka -finline-functions), performance drops by 5--10%.

The reason is that comparably large functions get automatically inlined, so
the recursive inline limit is hit earlier, and the important leaf functions
sometimes do not get inlined any more. The problem is that all functions
are treated as if they were declared inline, so the keyword gets completely
ignored. I changed this and allow automatically inlined functions only half
the size of functions declared inline by the programmer.
This reduced the performance drop incurred by -finline-functions to 0.5--1.5%.

It would be nice if this patch 
http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff
would be tested by more people and integrated into 3.1.
I'd like to know whether it will help to get the SpecCPU/FP peak value above
the base for more benchmarks.
It probably needs some review by people who know gcc code better than I,
so I would certainly appreciate if somebody picks up the patch and
polishes it ... (and e.g. ports it to other languages than C and C++).

Regards,
-- 
Kurt Garloff                   <kurt@garloff.de>         [Eindhoven, NL]
Physics: Plasma simulations  <K.Garloff@Phys.TUE.NL>  [TU Eindhoven, NL]
Linux: SCSI, Security          <garloff@suse.de>    [SuSE Nuernberg, DE]
 (See mail header or public key servers for PGP2 and GPG public keys.)

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-24  4:37 inliner in gcc-3.1 Kurt Garloff
@ 2002-04-24 16:31 ` Daniel Berlin
  2002-04-25 15:06   ` Kurt Garloff
  2002-04-25  0:21 ` Gerald Pfeifer
  2002-04-25  9:41 ` Gerald Pfeifer
  2 siblings, 1 reply; 14+ messages in thread
From: Daniel Berlin @ 2002-04-24 16:31 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc

On Wed, 24 Apr 2002, Kurt Garloff wrote:

> Hi,
> 
> I was browsing the gcc ML archives (I'm not subscribed) and found
> that the inliner may still not be tuned optimally in gcc-3.1.
> http://gcc.gnu.org/ml/gcc/2002-04/msg01168.html
> The problem seems to be mainly with C++ functions, where you want small
> accessor functions to be reliably inlined and where the inliner heuristics
> sometimes fails.
> 
> I've done some work on this before and got a patch of mine merged (between
> 3.0.1 and 3.0.3) that fixed the heuristics which lead to ridicolous compile
> time resource requirements (gcc-3.0.0) resp. ridicolous performance
> (gcc-3.0.1) according to my own and Gerard Pfeifer's benchmarks. See
> http://www.garloff.de/kurt/freesoft/gcc/
> 
> To give a very short summary: Before gcc-3.0, some accounting for the
> inlining was introduced to prevent huge functions to be inlined due to
> recursive inlining. Unfortunately, inlining starts at the root
> of the call tree, so in the end, the leaves would not be inlined any more 
> despite being the most performance critical very often.
> This was fixed with a simplistic patch that does just reduce the acceptable
> size for inlining for single functions by a factor of two and -- after
> reaching the (full) limit by recursion -- cut it further down by another
> factor of two. (This is what I refer to as v1 of my inliner patch.)
> That patch fixed the most serious problems and got merged.
> 
> Later I did some more fine tuning (patches v2,v3) and did use some linear
> function to limit acceptable inlinable sizes for single functions. This 
> seems somewhat saner. It did only give slightly better results in
> benchmarks.
> 
> I'd like to add a few comments again:
> * I was expecting to find the AST inliner in gcc-3.1.
>   I do believe it's more sane starting to inline from the leaves of
>   a call tree than from the root, so I would have expected that approach
>   (if tuned well) to win all benchmarks.

Do you have anything to back up that statement?

Are there any commercial compilers that do it that way?

Intel's, for instance, has only a few differences from what we have.
This is what they do:

Start at root.

At most 2000 intel intermediate code statements are inlined into the 
caller.

1. Functions with the following substrings in their name are not inlined: 
(various names signifying aborts, exits, fails, and warns, as well as 
alloca)

2. Focus on callers containing loops, and callees  containing 
loops.  (This is probably most important).

3. Don't inline functions > a certain number of statements. They 
default to 230 Intel intermediate code statements.

4. Stop when you detect direct recursion.

5. All  functions < a certain number of statements are inlined. (This is 
because it's cheaper to inline than do the arg setup).
For IA32, the number of statements is 7, for IA64, it's 15.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-24  4:37 inliner in gcc-3.1 Kurt Garloff
  2002-04-24 16:31 ` Daniel Berlin
@ 2002-04-25  0:21 ` Gerald Pfeifer
  2002-04-25  0:48   ` Richard Henderson
  2002-04-25 15:51   ` Kurt Garloff
  2002-04-25  9:41 ` Gerald Pfeifer
  2 siblings, 2 replies; 14+ messages in thread
From: Gerald Pfeifer @ 2002-04-25  0:21 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc

On Wed, 24 Apr 2002, Kurt Garloff wrote:
> I was browsing the gcc ML archives (I'm not subscribed) and found
> that the inliner may still not be tuned optimally in gcc-3.1.
> [...]
> * I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23)
>   and it still works ...
>   I do believe it's somewhat saner than v1, but the benefits are
>   actually small. (up to 3% in some benchmarks, 0 in others, no
>   pessimizations found).

Bad news: This patch increases compilation time (for DLV, the package
I've been using to test performance) quite a bit:

  2.95.3			 4:01	4430752
  3.0				23:54	6295044
  3.0.3				 3:58	3948444
  3.1-20020422			 4:38	3996096
  3.1-20020424+kurtpatch	 5:35   4102432
  3.1-20020422+limit=800	 6:37	4177344
  3.1-20020422+limit=1200	16:50	4597888
  3.2-20020422			 5:15	4003276

And excellent news: This patch really improves the quality of the
generated code, and quite significantly so in several cases (much
more than those 3% you claimed)!

Times in [s]        | 2.95.3|  3.1-20020422 |3.1-.-kurtpatch|
--------------------+-------+---------------+---------------+
      STRATCOMP1-ALL|  3.57 |  96.67 (0.10) |  24.93 (0.01) |
   STRATCOMP-770.2-Q|  0.73 |   0.94 (0.01) |   0.79 (0.00) |
               2QBF1| 19.08 |  22.26 (0.01) |  20.94 (0.01) |
          PRIMEIMPL2| 10.74 |  12.91 (0.01) |  10.59 (0.01) |
            ANCESTOR|  8.88 |   9.53 (0.01) |   9.45 (0.01) |
       3COL-SIMPLEX1|  6.30 |   7.16 (0.00) |   6.86 (0.00) |
        3COL-LADDER1| 36.24 |  42.33 (0.04) |  40.05 (0.02) |
      3COL-N-LADDER1| 19.81 |  22.47 (0.16) |  20.44 (0.04) |
        3COL-RANDOM1| 10.69 |  12.23 (0.01) |  11.17 (0.01) |
          HP-RANDOM1| 13.16 |  14.82 (0.03) |  14.43 (0.03) |
       HAMCYCLE-FREE|  1.18 |   1.71 (0.00) |   1.64 (0.00) |
             DECOMP2| 21.91 |  24.02 (0.01) |  25.03 (0.02) |
        BW-P4-Esra-a| 91.71 |  99.28 (0.01) |  95.02 (0.04) |
        BW-P5-nopush|  6.96 |   7.43 (0.00) |   7.11 (0.00) |
       BW-P5-pushbin|  6.20 |   6.50 (0.00) |   6.16 (0.00) |
     BW-P5-nopushbin|  1.94 |   2.07 (0.01) |   1.98 (0.00) |
              3SAT-1| 32.92 |  38.48 (0.02) |  32.80 (0.01) |
   3SAT-1-CONSTRAINT| 17.46 |  20.64 (0.00) |  18.67 (0.00) |
        HANOI-Towers|  4.73 |   4.95 (0.00) |   4.93 (0.01) |
              RAMSEY|  8.00 |   8.66 (0.00) |   8.42 (0.01) |
             CRISTAL| 11.07 |  13.38 (0.01) |  11.74 (0.01) |
             HANOI-K| 33.41 |  38.76 (0.02) |  34.86 (0.01) |
           21-QUEENS|  9.66 |  10.41 (0.01) |   9.58 (0.00) |
   MSTDir[V=13,A=40]| 25.71 |  21.09 (0.00) |  19.93 (0.00) |
   MSTDir[V=15,A=40]| 25.81 |  21.14 (0.00) |  19.97 (0.00) |
 MSTUndir[V=13,A=40]| 12.86 |  11.46 (0.01) |  10.74 (0.00) |
 MSTUndir[V=15,A=40]|214.87 | 188.57 (0.00) | 177.02 (0.03) |
         TIMETABLING|  9.63 |  10.63 (0.01) |  10.21 (0.02) |
--------------------+-------+---------------+---------------+

This would be very nice to have in GCC 3.1, if it were not for the longer
compile time.

However, I'd really like to apply it to mainline as soon as possible,
because it (finally) compensates most of the code quality regressions
we have been seeing since GCC 2.95.

Gerald


2002-04-23  Kurt Garloff <garloff@suse.de>

	* tree-inline.c: Improve heuristics by using a smoother
	function to cut down allowable inlinable size.

--- gcc/tree-inline.c.orig	Tue Apr 23 22:54:17 2002
+++ gcc/tree-inline.c	Tue Apr 23 23:24:57 2002
@@ -706,14 +706,32 @@

   /* Even if this function is not itself too big to inline, it might
      be that we've done so much inlining already that we don't want to
-     risk too much inlining any more and thus halve the acceptable
-     size.  */
+     risk too much inlining any more */
   if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
       && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0)) * INSNS_PER_STMT
-	  > MAX_INLINE_INSNS)
-      && DECL_NUM_STMTS (fn) * INSNS_PER_STMT > MAX_INLINE_INSNS / 4)
-    inlinable = 0;
-
+	  > MAX_INLINE_INSNS * 128))
+      inlinable = 0;
+  /* If we did not hit the extreme limit 128*MAX_INLINE_INSNS by recursion,
+     and we did not hit the limit for a single function (MAX_INLINE_INSNS/2)
+     but we are above the recursive throttling threshold (MAX_INLINE_INSNS),
+     we use a limit that descreases linearly with the already inlined
+     code. We always allow very small funtions (13 statements) to be inlined.
+     Value (13*INSNS_PER_STMT) found by numerous experiments in 3.0.x with
+     C++ code */
+  else if (! (*lang_hooks.tree_inlining.disregard_inline_limits) (fn)
+	   && ((DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+	       * INSNS_PER_STMT > MAX_INLINE_INSNS)
+	   && DECL_NUM_STMTS (fn) > 13) {
+     /* Use a linear function with a slope of -0.03125
+        we could also use an int approx. of sqrt or similar things */
+     signed int max_curr = MAX_INLINE_INSNS/2
+       - (( DECL_NUM_STMTS (fn) + (id ? id->inlined_stmts : 0))
+          * INSNS_PER_STMT - MAX_INLINE_INSNS) / 32;
+
+     if ((signed int)(DECL_NUM_STMTS (fn) * INSNS_PER_STMT) > max_curr)
+	inlinable = 0;
+  }
+
   if (inlinable && (*lang_hooks.tree_inlining.cannot_inline_tree_fn) (&fn))
     inlinable = 0;

@@ -968,7 +986,8 @@

   /* Our function now has more statements than it did before.  */
   DECL_NUM_STMTS (VARRAY_TREE (id->fns, 0)) += DECL_NUM_STMTS (fn);
-  id->inlined_stmts += DECL_NUM_STMTS (fn);
+  /* For accounting, subtract one for the saved call/ret */
+  id->inlined_stmts += DECL_NUM_STMTS (fn) - 1;

   /* Recurse into the body of the just inlined function.  */
   expand_calls_inline (inlined_body, id);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25  0:21 ` Gerald Pfeifer
@ 2002-04-25  0:48   ` Richard Henderson
  2002-04-25 15:51   ` Kurt Garloff
  1 sibling, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2002-04-25  0:48 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Kurt Garloff, gcc

On Thu, Apr 25, 2002 at 09:18:24AM +0200, Gerald Pfeifer wrote:
> However, I'd really like to apply it to mainline as soon as possible,
> because it (finally) compensates most of the code quality regressions
> we have been seeing since GCC 2.95.

I'm willing to entertain this, modulo the code style problems.


r~

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-24  4:37 inliner in gcc-3.1 Kurt Garloff
  2002-04-24 16:31 ` Daniel Berlin
  2002-04-25  0:21 ` Gerald Pfeifer
@ 2002-04-25  9:41 ` Gerald Pfeifer
  2002-04-25 15:30   ` Kurt Garloff
  2002-04-27  9:49   ` Kurt Garloff
  2 siblings, 2 replies; 14+ messages in thread
From: Gerald Pfeifer @ 2002-04-25  9:41 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc, Andreas Jaeger

On Wed, 24 Apr 2002, Kurt Garloff wrote:
> I made another patch for 3.0.1 and adapted it to 3.1, which is probably much
> more interesting: The -O3 (-finline-functions) fix.
>
> In the code I use, I put the inline keyword at all places where it was
> obvious to my eyes that inlining was a good idea. When I compile that code
> with -O3 (aka -finline-functions), performance drops by 5--10%.
>
> The reason is that comparably large functions get automatically inlined, so
> the recursive inline limit is hit earlier, and the important leaf functions
> sometimes do not get inlined any more. The problem is that all functions
> are treated as if they were declared inline, so the keyword gets completely
> ignored. I changed this and allow automatically inlined functions only half
> the size of functions declared inline by the programmer.
> This reduced the performance drop incurred by -finline-functions to 0.5--1.5%.
>
> It would be nice if this patch
> http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff
> would be tested by more people and integrated into 3.1.

This second patch (partially) fixes a very bad regression we've been
having since GCC 3.0; build time and binary size seem to be fine, though
we seem to degrade slightly for some of the other benchmarks.

I'd really like to see what this does to SPEC -- Andreas, could you give
it a try?

(This is due to a C module in the C++ application, and your fix seems
to be critical for many C applications in general.)

                     build time   binary size
  2.95.3                   4:01   4430752
  3.0                     23:54   6295044
  3.0.3                    3:58   3948444
  3.1-20020422             4:38   3996096  <-- without patch
  3.1-20020424+kurt-v3     5:35   4102432
  3.1.20020425+kurt-finl.  4:32   3912640  <-- this is with the patch
  3.1-20020422+limit=800   6:37   4177344
  3.1-20020422+limit=1200 16:50   4597888
  3.2-20020422             5:15   4003276

Times in [s]        | 2.95.3|       3.1      | 3.1-with-patch|
--------------------+-------+ ---------------+---------------+
      STRATCOMP1-ALL|  3.57 |   96.26 (0.01) |  23.74 (0.01) | <-- REGRESS!
   STRATCOMP-770.2-Q|  0.73 |    0.93 (0.01) |   0.83 (0.00) |
               2QBF1| 19.08 |   22.25 (0.01) |  21.33 (0.00) |
          PRIMEIMPL2| 10.74 |   12.90 (0.02) |  12.85 (0.00) |
            ANCESTOR|  8.88 |    9.54 (0.01) |   9.61 (0.01) |
       3COL-SIMPLEX1|  6.30 |    7.16 (0.00) |   7.30 (0.01) |
        3COL-LADDER1| 36.24 |   42.28 (0.01) |  41.71 (0.03) |
      3COL-N-LADDER1| 19.81 |   22.57 (0.06) |  22.96 (0.00) |
        3COL-RANDOM1| 10.69 |   12.23 (0.01) |  12.87 (0.01) |
          HP-RANDOM1| 13.16 |   14.82 (0.02) |  14.85 (0.01) |
       HAMCYCLE-FREE|  1.18 |    1.71 (0.00) |   1.71 (0.00) |
             DECOMP2| 21.91 |   24.04 (0.02) |  25.75 (0.03) |
        BW-P4-Esra-a| 91.71 |   99.28 (0.03) | 100.19 (0.05) |
        BW-P5-nopush|  6.96 |    7.43 (0.01) |   7.48 (0.00) |
       BW-P5-pushbin|  6.20 |    6.49 (0.00) |   6.57 (0.00) |
     BW-P5-nopushbin|  1.94 |    2.08 (0.00) |   2.13 (0.00) |
              3SAT-1| 32.92 |   38.48 (0.04) |  39.84 (0.00) |
   3SAT-1-CONSTRAINT| 17.46 |   20.63 (0.00) |  21.37 (0.00) |
        HANOI-Towers|  4.73 |    4.94 (0.01) |   5.24 (0.00) |
              RAMSEY|  8.00 |    8.66 (0.00) |   9.01 (0.00) |
             CRISTAL| 11.07 |   13.39 (0.01) |  12.17 (0.02) |
             HANOI-K| 33.41 |   38.73 (0.02) |  38.84 (0.00) |
           21-QUEENS|  9.66 |   10.40 (0.00) |  11.06 (0.01) |
   MSTDir[V=13,A=40]| 25.71 |   21.08 (0.01) |  21.00 (0.01) |
   MSTDir[V=15,A=40]| 25.81 |   21.16 (0.00) |  21.06 (0.01) |
 MSTUndir[V=13,A=40]| 12.86 |   11.47 (0.00) |  11.31 (0.01) |
 MSTUndir[V=15,A=40]|214.87 |  188.61 (0.01) | 185.37 (0.01) |
         TIMETABLING|  9.63 |   10.63 (0.00) |  10.98 (0.01) |
--------------------+-------+ ---------------+---------------+

Gerald

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-24 16:31 ` Daniel Berlin
@ 2002-04-25 15:06   ` Kurt Garloff
  2002-04-25 18:25     ` Daniel Berlin
  0 siblings, 1 reply; 14+ messages in thread
From: Kurt Garloff @ 2002-04-25 15:06 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 4771 bytes --]

Hi Daniel,

On Wed, Apr 24, 2002 at 06:24:05PM -0400, Daniel Berlin wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > I'd like to add a few comments again:
> > * I was expecting to find the AST inliner in gcc-3.1.
> >   I do believe it's more sane starting to inline from the leaves of
> >   a call tree than from the root, so I would have expected that approach
> >   (if tuned well) to win all benchmarks.
> 
> Do you have anything to back up that statement?

If you refer to backing it up by numbers: No. I did not implement this nor
did I try to do a real investigation of what is done in the AST branch.

I do have some reasoning so. Just common sense, nothing more. Well, maybe
the fact that it did not completely fail the last time I applied it to the
inlining heuristics.

> Are there any commercial compilers that do it that way?

I don't know.

Let me just give my reasoning: 
What you should make sure to inline, is the fast functions which are called
the most often, preferably from not so many different places. Agreed?
In a call tree, you will have one root function (main) and normally several
leaf functions.
Normally, you will have a function F somewhere which calls  more than one
function, say N functions and/or calls other functions from within a loop
(with say M iterations). This will be the case for most non-trivial and/or
time-consuming programs.

                                        /-> J
                      F -> (N times) G *--> H
		                        \-> I -> K 

Which is where you win if you started inlining from the leaves:
If you inlined the function itself (because you started from the root)
and ran out of your recursive inlining budget, you have just done one
inline operation, resulting in saving one call. If you started from the
leaves, you would save N and/or M calls. (This happens whether you inline
the functions G itself or the functions that are called on below G, assuming
those calls are unconditional.)

I think this makes some sense, no?

Now, for C++ there's another reason: You often have accessor functions like
inline T  Vector<T>::operator () (const int i) const { return data[i]; }
inline T& Vector<T>::operator () (const int i)       { return data[i]; }

Those are very small functions and they are called very often from loops.
They are called much more often than most other functions. So inlining them
is paramount.

> Intel's, for instance, has only a few differences from what we have.
> This is what they do:
> 
> Start at root.

I would not do that.

> At most 2000 intel intermediate code statements are inlined into the 
> caller.
> 
> 1. Functions with the following substrings in their name are not inlined: 
> (various names signifying aborts, exits, fails, and warns, as well as 
> alloca)

Exception handling in general does not need to be tuned much for performance.

> 2. Focus on callers containing loops, and callees  containing 
> loops.  (This is probably most important).

This makes a lot of sense. But we don't have this information when taking an
inlining decision in gcc available. At least not in a way I could find it ...

> 3. Don't inline functions > a certain number of statements. They 
> default to 230 Intel intermediate code statements.

So the recursive limit is 2000 and the single fn. limit is 230?

> 4. Stop when you detect direct recursion.

Hopefully, you can transform those into iterations and unroll it a few times
...

> 5. All  functions < a certain number of statements are inlined. (This is 
> because it's cheaper to inline than do the arg setup).
> For IA32, the number of statements is 7, for IA64, it's 15.

We should do that as well. Actually, in my v3 patch, I do something similar:
Once we reached the recursive inlining limit (default 600 INSNS aka 60
STMTS, single fn limit is 300), we use some linear fn to put more and more
severe limits. 
But I don't go to 0 but to 13 STMTS (130 INSNS), which I found by experiment
on i386. It was a number leading to the smallest (or close to smallest) code
size. Only much much later, I completely shut down inlining to prevent
infinite recursions ...

Some word about the intel C++ (6.0) compiler:
In most of my tests (numerical C++ code), gcc performs slightly
better. But then, the code has been tuned for gcc for years and I
have some experience which are the best optimization options ...
So probably I do have some improvement possibilities on intel still.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25  9:41 ` Gerald Pfeifer
@ 2002-04-25 15:30   ` Kurt Garloff
  2002-04-26  4:19     ` Gerald Pfeifer
  2002-04-27  9:49   ` Kurt Garloff
  1 sibling, 1 reply; 14+ messages in thread
From: Kurt Garloff @ 2002-04-25 15:30 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc, Andreas Jaeger

[-- Attachment #1: Type: text/plain, Size: 3139 bytes --]

Hi Gerald,

thanks for testing!

On Thu, Apr 25, 2002 at 06:36:10PM +0200, Gerald Pfeifer wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > It would be nice if this patch
> > http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff
> > would be tested by more people and integrated into 3.1.
> 
> This second patch (partially) fixes a very bad regression we've been
> having since GCC 3.0; build time and binary size seem to be fine, though
> we seem to degrade slightly for some of the other benchmarks.
> 
> I'd really like to see what this does to SPEC -- Andreas, could you give
> it a try?

Actually, maybe we should wait a second.

When I developed the patch on 3.0.1, I was not seeing the degradation you
see now for some benchmarks. So I must have done some mistake when porting
it from 3.0.1 to 3.1, or some assumption I made is not true any more.
I've done some more benchmarks with this inline accounting patch and
I also have found a case with performance regression against v3, even
without -finline-functions, so something must have gone wrong I think.

What I try to do in the patch is to improve -O3 (-finline-functions)
performance. The idea is to keep the information whether a function was
declared inline or whether it got inlined by virtue of the automatic
inlining. I may have screwed this up ... and set the flag when it
should not, so we actually treat the inline declared function with the 
lower automatic limit.
(a) The declared inline fn would get the full 300/600 limit whereas the
    automatically inlined ones only half. [The idea is that we still think
    the programmer has a bit of a clue. Maybe the factor 0.5 is a bit too
    strict.]

The patch also does two other things that have proven useful on 2.95.3
(b) better tune -Os (again tested on i386)
(c) give leaf functions a bonus (*3/2) in the RTL inliner
    [is it still used at all?]

Maybe one should separate those issues and verify that do what my intention
was.

I would certainly appreciate if somebody with some more knowledge of the
compiler internals would have a look.
Any takers?

> (This is due to a C module in the C++ application, and your fix seems
> to be critical for many C applications in general.)
> 
>                      build time   binary size
>   2.95.3                   4:01   4430752
>   3.0                     23:54   6295044
>   3.0.3                    3:58   3948444
>   3.1-20020422             4:38   3996096  <-- without patch
>   3.1-20020424+kurt-v3     5:35   4102432
>   3.1.20020425+kurt-finl.  4:32   3912640  <-- this is with the patch

This is with both patches I assume.

I'm astonished we beat plain 0422 at build time and binary size.

But the benchmark results do not really look attractive to me. We only 
win one benchmark against -v3, and even against plain we sometimes lose.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25  0:21 ` Gerald Pfeifer
  2002-04-25  0:48   ` Richard Henderson
@ 2002-04-25 15:51   ` Kurt Garloff
  1 sibling, 0 replies; 14+ messages in thread
From: Kurt Garloff @ 2002-04-25 15:51 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

Hi Gerald,

thanks for your testing!

On Thu, Apr 25, 2002 at 09:18:24AM +0200, Gerald Pfeifer wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > I was browsing the gcc ML archives (I'm not subscribed) and found
> > that the inliner may still not be tuned optimally in gcc-3.1.
> > [...]
> > * I have adapted by inliner patch (v3) to 3.1 (CVS 2002-04-23)
> >   and it still works ...
> Bad news: This patch increases compilation time (for DLV, the package
> I've been using to test performance) quite a bit:
> 
>   2.95.3			 4:01	4430752
>   3.0				23:54	6295044
>   3.0.3				 3:58	3948444
>   3.1-20020422			 4:38	3996096
>   3.1-20020424+kurtpatch	 5:35   4102432
>   3.1-20020422+limit=800	 6:37	4177344

OK, so from compile time and binary size, kurtv3 is somewhere around
finline-limit=700 ...

[...]
> And excellent news: This patch really improves the quality of the
> generated code, and quite significantly so in several cases (much
> more than those 3% you claimed)!

Amazing. I just also compared to your -800 results and we still win
significantly more than half of the benchmarks.

[...]
> This would be very nice to have in GCC 3.1, if it were not for the longer
> compile time.

One could try with -finline-limit-540 or so and see whether we can similar
compilation times as 3.1 and still win benchmarks. If yes, it could be an
option for 3.1 to incorporate the patch and lower the default inline-limit.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25 15:06   ` Kurt Garloff
@ 2002-04-25 18:25     ` Daniel Berlin
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Berlin @ 2002-04-25 18:25 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc

On Fri, 26 Apr 2002, Kurt Garloff wrote:

> Hi Daniel,
> 
> On Wed, Apr 24, 2002 at 06:24:05PM -0400, Daniel Berlin wrote:
> > On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > > I'd like to add a few comments again:
> > > * I was expecting to find the AST inliner in gcc-3.1.
> > >   I do believe it's more sane starting to inline from the leaves of
> > >   a call tree than from the root, so I would have expected that approach
> > >   (if tuned well) to win all benchmarks.
> > 
> > Do you have anything to back up that statement?
> 
> If you refer to backing it up by numbers: No. I did not implement this nor
> did I try to do a real investigation of what is done in the AST branch.
> 
> I do have some reasoning so. Just common sense, nothing more. Well, maybe
> the fact that it did not completely fail the last time I applied it to the
> inlining heuristics.
> 
> > Are there any commercial compilers that do it that way?
> 
> I don't know.
> 
> Let me just give my reasoning: 
> What you should make sure to inline, is the fast functions which are called
> the most often, preferably from not so many different places. Agreed?

Yes. When you have profile info.
When you don't have profile info to determine the most frequently used 
call sites, it's already a guessing game.


> In a call tree, you will have one root function (main) and normally several
> leaf functions.
> Normally, you will have a function F somewhere which calls  more than one
> function, say N functions and/or calls other functions from within a loop
> (with say M iterations). This will be the case for most non-trivial and/or
> time-consuming programs.
> 
>                                         /-> J
>                       F -> (N times) G *--> H
> 		                        \-> I -> K 
> 
> Which is where you win if you started inlining from the leaves:
> If you inlined the function itself (because you started from the root)
> and ran out of your recursive inlining budget, you have just done one
> inline operation, resulting in saving one call.

Buzz.
This assumes you don't inline small functions *anyway* if they are < a 
certain number of statements.
>  If you started from the
> leaves, you would save N and/or M calls. (This happens whether you inline
> the functions G itself or the functions that are called on below G, assuming
> those calls are unconditional.)
> 
> I think this makes some sense, no?
No more sense than anything else.

If the called functions are small enough, they will be inlined regardless.
Otherwise, they aren't better than any *other* functions we choose, unless 
they contain loops/etc.


> 
> Now, for C++ there's another reason: You often have accessor functions like
> inline T  Vector<T>::operator () (const int i) const { return data[i]; }
> inline T& Vector<T>::operator () (const int i)       { return data[i]; }
> 
> Those are very small functions and they are called very often from loops.
> They are called much more often than most other functions. So inlining them
> is paramount.

Yes, then you choose to always inline functions < x statements.
You also choose functions with loops or call functions with loops, over 
any other function.

> 
> > Intel's, for instance, has only a few differences from what we have.
> > This is what they do:
> > 
> > Start at root.
> 
> I would not do that.

Every paper i've read on inlining (there are only a few), and every 
compiler i've seen with good performance, does *not* start at the leaves.

> 
> > At most 2000 intel intermediate code statements are inlined into the 
> > caller.
> > 
> > 1. Functions with the following substrings in their name are not inlined: 
> > (various names signifying aborts, exits, fails, and warns, as well as 
> > alloca)
> 
> Exception handling in general does not need to be tuned much for performance.
> 
> > 2. Focus on callers containing loops, and callees  containing 
> > loops.  (This is probably most important).
> 
> This makes a lot of sense. But we don't have this information when taking an
> inlining decision in gcc available. At least not in a way I could find it ...

Um, I could pretty easily make it available.
At least, the loop part is rather easy to do.
Mainly because we don't care about anything besides "does it have loops" 
and "does it call functions in loop".

This is a rather trivial application of walk_tree_without_duplicates.

> 
> > 3. Don't inline functions > a certain number of statements. They 
> > default to 230 Intel intermediate code statements.
> 
> So the recursive limit is 2000 and the single fn. limit is 230?

If you mean "don't ever inline, into the caller, a callee larger than 230 
statements", then yes, the single fn. limit is 230.

> 
> > 4. Stop when you detect direct recursion.
> 
> Hopefully, you can transform those into iterations and unroll it a few times
> ...

> 
> > 5. All  functions < a certain number of statements are inlined. (This is 
> > because it's cheaper to inline than do the arg setup).
> > For IA32, the number of statements is 7, for IA64, it's 15.
> 
> We should do that as well. Actually, in my v3 patch, I do something similar:
> Once we reached the recursive inlining limit (default 600 INSNS aka 60
> STMTS, single fn limit is 300), we use some linear fn to put more and more
> severe limits. 
> But I don't go to 0 but to 13 STMTS (130 INSNS), which I found by experiment
> on i386. It was a number leading to the smallest (or close to smallest) code
> size. Only much much later, I completely shut down inlining to prevent
> infinite recursions ...
> 
> Some word about the intel C++ (6.0) compiler:
> In most of my tests (numerical C++ code), gcc performs slightly
> better. But then, the code has been tuned for gcc for years and I
> have some experience which are the best optimization options ...
> So probably I do have some improvement possibilities on intel still.
> 

In every piece of numeric C code i've got, Intel's 6.0 blows gcc out of 
the water.
Partially because it's *much* better at vectorizing than 5.0.
Partially because I always at least turn on single-file interprocedural 
optimizations.
Intel does not default to strict aliasing, as well, unless i'm reading the 
manual wrong.

> Regards,
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25 15:30   ` Kurt Garloff
@ 2002-04-26  4:19     ` Gerald Pfeifer
  2002-04-26  8:20       ` Kurt Garloff
  0 siblings, 1 reply; 14+ messages in thread
From: Gerald Pfeifer @ 2002-04-26  4:19 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc, Andreas Jaeger

On Fri, 26 Apr 2002, Kurt Garloff wrote:
> The patch also does two other things that have proven useful on 2.95.3
> (b) better tune -Os (again tested on i386)
> (c) give leaf functions a bonus (*3/2) in the RTL inliner
>     [is it still used at all?]

C++ and C now both use the tree based inliner.

> Maybe one should separate those issues and verify that do what my
> intention was.

Yes, that might make sense.

>>                      build time   binary size
>>   2.95.3                   4:01   4430752
>>   3.0                     23:54   6295044
>>   3.0.3                    3:58   3948444
>>   3.1-20020422             4:38   3996096  <-- without patch
>>   3.1-20020424+kurt-v3     5:35   4102432
>>   3.1.20020425+kurt-finl.  4:32   3912640  <-- this is with the patch
> This is with both patches I assume.

No, the first one is just your v3 patch, and the second one just the
finline patch.

> I'm astonished we beat plain 0422 at build time and binary size.

Me too. ;-)

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-26  4:19     ` Gerald Pfeifer
@ 2002-04-26  8:20       ` Kurt Garloff
  0 siblings, 0 replies; 14+ messages in thread
From: Kurt Garloff @ 2002-04-26  8:20 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc, Andreas Jaeger

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

Hi Gerald,

On Fri, Apr 26, 2002 at 01:10:18PM +0200, Gerald Pfeifer wrote:
> C++ and C now both use the tree based inliner.

Yes, but are there languages that do use the RTL inliner still?

> > Maybe one should separate those issues and verify that do what my
> > intention was.
> 
> Yes, that might make sense.

I'll put three patches up then ...

> >>                      build time   binary size
> >>   2.95.3                   4:01   4430752
> >>   3.0                     23:54   6295044
> >>   3.0.3                    3:58   3948444
> >>   3.1-20020422             4:38   3996096  <-- without patch
> >>   3.1-20020424+kurt-v3     5:35   4102432
> >>   3.1.20020425+kurt-finl.  4:32   3912640  <-- this is with the patch
> > This is with both patches I assume.
> 
> No, the first one is just your v3 patch, and the second one just the
> finline patch.

Oh well, then the better build time and binary size is not so astonishing.
At least if you compile with -O3 (or -O2 -finline-functions).

Because the main thing the patch does is memorize whether a function was
inlined by the keyword or by the -finline-functions option. The latter
only get half the allowable size, so we do less inlining and thus have
shorter compile times and binary sizes.

Actually, if I compare the finline patch results to plain 3.1, the results
are not so bad. I compared to -v3.

For my own benchmarks, I do see best performance with gcc-3.1 with -v3
patch and the options -O3 -fno-inline-functions.
If I use -O3, performance drops by 5--10%. With the finline patch (on top of
the -v3 patch), I do see only a small drop for using -O3. But unfortunately, 
I also see a very small drop with -O3 -fno-inline-functions as compared to
plain -v3, and I don't quite understand why. 
Maybe a function that is implicitly declared inline by being defined in the
class declaration is marked automatically inlined in decl.c, so we limit the
inliner on those as well? I'll try to find out ...

> > I'm astonished we beat plain 0422 at build time and binary size.

I'm not any more.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-25  9:41 ` Gerald Pfeifer
  2002-04-25 15:30   ` Kurt Garloff
@ 2002-04-27  9:49   ` Kurt Garloff
  2002-04-30  7:49     ` Gerald Pfeifer
  1 sibling, 1 reply; 14+ messages in thread
From: Kurt Garloff @ 2002-04-27  9:49 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: gcc, Andreas Jaeger

[-- Attachment #1: Type: text/plain, Size: 4588 bytes --]

Hi Gerald, Andreas,

On Thu, Apr 25, 2002 at 06:36:10PM +0200, Gerald Pfeifer wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > It would be nice if this patch
> > http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff
> > would be tested by more people and integrated into 3.1.
> 
> This second patch (partially) fixes a very bad regression we've been
> having since GCC 3.0; build time and binary size seem to be fine, though
> we seem to degrade slightly for some of the other benchmarks.
> 
> I'd really like to see what this does to SPEC -- Andreas, could you give
> it a try?

I created a new inline accounting patch, which should prevent -O3 
(-finline-functions) from delivering worse performance than -O2 for code
that already has the mostimporatnt functions marked inline.

As it turned out, it is not so good to limit the RTL inlining (integrate.c)
for functions selected by -finline-functions. For the tree-inliner it
is very useful, as the tree inliner does cut off inlining after some
repeated inlining in order to limit compile-time resource requirements.
Maybe some more experiments are needed here.

The patch is at
http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.2.diff
and has been diffed against a 3.1-20020422 with my inline heuristics patch
v3.6 applied.
http://www.garloff.de/kurt/freesoft/gcc/g++310-rec-inline-heuristics-v3.6.diff

Here are my benchmark results.
(Tests performed on 2xpIII-1GHz, Linux-2.4.18, glibc-2.2.5; I left
 max-inline-slope and min-inline-insns alone.)

        max max                libbench_double libbench_cplx_double
 g++    inl+inl        build     run   binary      run   binary
            single    (times u+s in s)
3.1       600 -O2      27.52    16.00   82579     18.97   95909
+3.6      600 -O2      29.02    15.96   82431     18.90+  95780
+3.6+1.2  600 -O2      29.17    16.02   82431     18.87+  95780

3.1      2500 -O2      48.32    15.97   86017     18.96  111912
+3.6     2500+1250-O2  48.12    15.98   86049     19.01  111944
+3.6+1.2 2500+1250-O2  48.50    15.98   86049     18.99  111944    
+3.6     2500+ 300-O2  37.33    15.99   83395     18.88+ 105127
+3.6+1.2 2500+ 300-O2  37.41    15.94   83395     18.88+ 105127

3.1       600 -O3      23.88    16.65-  82667     18.98   94805
+3.6      600 -O3      28.67    16.65-  84809     19.04   96097
+3.6+1.2  600 -O3      30.40    16.62-  99900     19.02  112262

3.1      2500 -O3     136.88    15.78+ 137523     19.08  165986
+3.6     2500+1250-O3 145.06    15.80+ 139550     19.21- 168431   
+3.6+1.2 2500+1250-O3  64.15    15.82+  98138     18.92  128517
+3.6     2500+ 300-O3  38.07    16.64-  85845     19.04  108405
+3.6+1.2 2500+ 300-O3  37.46    16.70-  94113     19.00  117715

This chart does give some unexpected results.

It seems the cplx_double benchmark is almost unaffected by the patch and by
the increased inlining. All time are around 19.0. For -O3 with 2500+1250 and
the v3.6 patch (max-inline-insns + max-inline-insns-single), we are clearly
over the top. The v1.2 patch fixes that. Compile time is reduced to a
reasonable number again and performance is good. Some results are
around 18.9 (v3.6-600-O2, v3.6-2500-300-O2, v3.6+v1.2-2500-300,
v3.6-v1.2-2500-1250-O3).

Looking at the double results, we have three groups: 15.8, 16.0, and 16.6.
The worst results are for a low inline limit (600) with -O3, independent of
patches applied. With v3.6 (with or without 1.2), -O3 and a small single-fn
limit and a large overall one (2500-300), the bad score is received.
The best results are for a lot of inlining (2500 resp. 2500-1250) and -O3.
From those, build times and binary sizes are quite different: With both
patches applied, only half the compile time is needed and a 1.4 times
smaller binary is produced.

The binary sizes are quite surprising. The v1.2 patch does not do anything
for -O2 as expected. For -O3 it does limit the tree inlining. Funny enough,
for small max-inline-insns-single values this leads to _larger_ binaries!
Apparently the smaller chunks get later inlined by the RTL inliner
(integrate) leading to more inlining.
For larger single fn inlining limits, the effects of the v1.2 patch are more
close to what can be expected. 

I'd be curious what other people get.

Regards,
--
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-27  9:49   ` Kurt Garloff
@ 2002-04-30  7:49     ` Gerald Pfeifer
  2002-04-30  7:59       ` Daniel Berlin
  0 siblings, 1 reply; 14+ messages in thread
From: Gerald Pfeifer @ 2002-04-30  7:49 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: gcc, Andreas Jaeger

On Sat, 27 Apr 2002, Kurt Garloff wrote:
> The patch is at
> http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.2.diff
> and has been diffed against a 3.1-20020422 with my inline heuristics patch
> v3.6 applied.
> http://www.garloff.de/kurt/freesoft/gcc/g++310-rec-inline-heuristics-v3.6.diff

I applied both patches and made some tests:

  GCC version	   build time   binary size
  2.95.3		 4:01	4430752
  3.0			23:54	6295044
  3.0.3			 3:58	3948444
  3.1-20020427		 4:35	3992256
  3.1-20020427+2patches	 5:27	4102432

                    | 2.95.3| 3.0.3 |  3.1  |3.1+2pat|
--------------------+-------+-------+-------+--------+
      STRATCOMP1-ALL|  3.57 | 71.43 | 23.07 | 25.01  |
   STRATCOMP-770.2-Q|  0.73 |  0.89 |  0.80 |  0.79  |
               2QBF1| 19.03 | 22.94 | 22.53 | 20.89  |
          PRIMEIMPL2| 10.73 | 12.91 | 11.91 | 10.60  |
            ANCESTOR|  8.90 |  9.64 |  9.19 |  9.45  |
       3COL-SIMPLEX1|  6.30 |  7.15 |  7.06 |  6.85  |
        3COL-LADDER1| 36.20 | 44.40 | 42.80 | 40.08  |
      3COL-N-LADDER1| 19.75 | 21.23 | 23.33 | 20.35  |
        3COL-RANDOM1| 10.68 | 12.70 | 11.92 | 11.20  |
          HP-RANDOM1| 13.12 | 14.55 | 14.58 | 14.42  |
       HAMCYCLE-FREE|  1.18 |  1.70 |  1.75 |  1.63  |
             DECOMP2| 21.95 | 24.20 | 23.58 | 25.03 <-- (Let's ignore this
        BW-P4-Esra-a| 91.71 | 97.63 | 98.74 | 94.81  | one, it often gives
        BW-P5-nopush|  6.97 |  7.41 |  7.35 |  7.12  | "interesting" results)
       BW-P5-pushbin|  6.19 |  6.50 |  6.40 |  6.13  |
     BW-P5-nopushbin|  1.94 |  2.09 |  2.05 |  1.97  |
              3SAT-1| 32.85 | 37.65 | 36.66 | 32.82  |
   3SAT-1-CONSTRAINT| 17.37 | 21.96 | 20.26 | 18.67  |
        HANOI-Towers|  4.75 |  5.03 |  4.99 |  4.95  |
              RAMSEY|  8.15 |  9.04 |  8.66 |  8.46  |
             CRISTAL| 11.07 | 13.17 | 11.87 | 11.76  |
             HANOI-K| 33.32 | 38.40 | 37.74 | 34.77  |
           21-QUEENS|  9.66 | 10.87 | 10.28 |  9.56  |
   MSTDir[V=13,A=40]| 25.62 | 24.44 | 20.71 | 19.91  |
   MSTDir[V=15,A=40]| 25.70 | 24.47 | 20.72 | 19.93  |
 MSTUndir[V=13,A=40]| 12.89 | 12.82 | 11.15 | 10.72  |
 MSTUndir[V=15,A=40]|214.53 |210.60 |182.74 |176.79  |
         TIMETABLING|  9.61 | 10.71 | 10.51 | 10.29  |

That is, code quality improves measurably overall and this would be a
patchset that would be nice to have on mainline, after the first one
RTH approved in principle, and which is just needing minor polishing,
went in.

> I'd be curious what other people get.

Regardless of all these benchmarks, I believe one major problem the
currently inliner has is that it's way too slow and thus forces us to
default to inline limits which are too small to generate really efficient
code for deeply nested C++ structures.

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: inliner in gcc-3.1
  2002-04-30  7:49     ` Gerald Pfeifer
@ 2002-04-30  7:59       ` Daniel Berlin
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Berlin @ 2002-04-30  7:59 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: Kurt Garloff, gcc, Andreas Jaeger

>    MSTDir[V=13,A=40]| 25.62 | 24.44 | 20.71 | 19.91  |
>    MSTDir[V=15,A=40]| 25.70 | 24.47 | 20.72 | 19.93  |
>  MSTUndir[V=13,A=40]| 12.89 | 12.82 | 11.15 | 10.72  |
>  MSTUndir[V=15,A=40]|214.53 |210.60 |182.74 |176.79  |
>          TIMETABLING|  9.61 | 10.71 | 10.51 | 10.29  |
> 
> That is, code quality improves measurably overall and this would be a
> patchset that would be nice to have on mainline, after the first one
> RTH approved in principle, and which is just needing minor polishing,
> went in.
> 
> > I'd be curious what other people get.
> 
> Regardless of all these benchmarks, I believe one major problem the
> currently inliner has is that it's way too slow and thus forces us to
> default to inline limits which are too small to generate really efficient
> code for deeply nested C++ structures.

It's not the inliner, it's the expanders and things that aren't 
particularly quick, like CSE.

But, I actually wanted to see if other compilers had the same issue (if 
you kick up the inline limits, they become very slow), and sure enough, 
they do.

If I kick the inline limits up on Intel's C++ compiler, it starts to take 
5-10 minutes on code it could compile in 10 seconds.

This is with both C, and C++.

It also just makes the binaries bigger, not faster.

This was about a week ago that I did this, it was just a curiosity, so I 
didn't sit there and make tables or anything.

I could, if anyone cares.

 > 
> Gerald
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-04-30 14:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-24  4:37 inliner in gcc-3.1 Kurt Garloff
2002-04-24 16:31 ` Daniel Berlin
2002-04-25 15:06   ` Kurt Garloff
2002-04-25 18:25     ` Daniel Berlin
2002-04-25  0:21 ` Gerald Pfeifer
2002-04-25  0:48   ` Richard Henderson
2002-04-25 15:51   ` Kurt Garloff
2002-04-25  9:41 ` Gerald Pfeifer
2002-04-25 15:30   ` Kurt Garloff
2002-04-26  4:19     ` Gerald Pfeifer
2002-04-26  8:20       ` Kurt Garloff
2002-04-27  9:49   ` Kurt Garloff
2002-04-30  7:49     ` Gerald Pfeifer
2002-04-30  7:59       ` Daniel Berlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).