public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Daniel Berlin <dberlin@dberlin.org>
To: Kurt Garloff <garloff@suse.de>
Cc: gcc@gcc.gnu.org
Subject: Re: inliner in gcc-3.1
Date: Thu, 25 Apr 2002 18:25:00 -0000	[thread overview]
Message-ID: <Pine.LNX.4.44.0204252045250.8013-100000@dberlin.org> (raw)
In-Reply-To: <20020426000424.B14090@gum01m.etpnet.phys.tue.nl>

On Fri, 26 Apr 2002, Kurt Garloff wrote:

> Hi Daniel,
> 
> On Wed, Apr 24, 2002 at 06:24:05PM -0400, Daniel Berlin wrote:
> > On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > > I'd like to add a few comments again:
> > > * I was expecting to find the AST inliner in gcc-3.1.
> > >   I do believe it's more sane starting to inline from the leaves of
> > >   a call tree than from the root, so I would have expected that approach
> > >   (if tuned well) to win all benchmarks.
> > 
> > Do you have anything to back up that statement?
> 
> If you refer to backing it up by numbers: No. I did not implement this nor
> did I try to do a real investigation of what is done in the AST branch.
> 
> I do have some reasoning so. Just common sense, nothing more. Well, maybe
> the fact that it did not completely fail the last time I applied it to the
> inlining heuristics.
> 
> > Are there any commercial compilers that do it that way?
> 
> I don't know.
> 
> Let me just give my reasoning: 
> What you should make sure to inline, is the fast functions which are called
> the most often, preferably from not so many different places. Agreed?

Yes. When you have profile info.
When you don't have profile info to determine the most frequently used 
call sites, it's already a guessing game.


> In a call tree, you will have one root function (main) and normally several
> leaf functions.
> Normally, you will have a function F somewhere which calls  more than one
> function, say N functions and/or calls other functions from within a loop
> (with say M iterations). This will be the case for most non-trivial and/or
> time-consuming programs.
> 
>                                         /-> J
>                       F -> (N times) G *--> H
> 		                        \-> I -> K 
> 
> Which is where you win if you started inlining from the leaves:
> If you inlined the function itself (because you started from the root)
> and ran out of your recursive inlining budget, you have just done one
> inline operation, resulting in saving one call.

Buzz.
This assumes you don't inline small functions *anyway* if they are < a 
certain number of statements.
>  If you started from the
> leaves, you would save N and/or M calls. (This happens whether you inline
> the functions G itself or the functions that are called on below G, assuming
> those calls are unconditional.)
> 
> I think this makes some sense, no?
No more sense than anything else.

If the called functions are small enough, they will be inlined regardless.
Otherwise, they aren't better than any *other* functions we choose, unless 
they contain loops/etc.


> 
> Now, for C++ there's another reason: You often have accessor functions like
> inline T  Vector<T>::operator () (const int i) const { return data[i]; }
> inline T& Vector<T>::operator () (const int i)       { return data[i]; }
> 
> Those are very small functions and they are called very often from loops.
> They are called much more often than most other functions. So inlining them
> is paramount.

Yes, then you choose to always inline functions < x statements.
You also choose functions with loops or call functions with loops, over 
any other function.

> 
> > Intel's, for instance, has only a few differences from what we have.
> > This is what they do:
> > 
> > Start at root.
> 
> I would not do that.

Every paper i've read on inlining (there are only a few), and every 
compiler i've seen with good performance, does *not* start at the leaves.

> 
> > At most 2000 intel intermediate code statements are inlined into the 
> > caller.
> > 
> > 1. Functions with the following substrings in their name are not inlined: 
> > (various names signifying aborts, exits, fails, and warns, as well as 
> > alloca)
> 
> Exception handling in general does not need to be tuned much for performance.
> 
> > 2. Focus on callers containing loops, and callees  containing 
> > loops.  (This is probably most important).
> 
> This makes a lot of sense. But we don't have this information when taking an
> inlining decision in gcc available. At least not in a way I could find it ...

Um, I could pretty easily make it available.
At least, the loop part is rather easy to do.
Mainly because we don't care about anything besides "does it have loops" 
and "does it call functions in loop".

This is a rather trivial application of walk_tree_without_duplicates.

> 
> > 3. Don't inline functions > a certain number of statements. They 
> > default to 230 Intel intermediate code statements.
> 
> So the recursive limit is 2000 and the single fn. limit is 230?

If you mean "don't ever inline, into the caller, a callee larger than 230 
statements", then yes, the single fn. limit is 230.

> 
> > 4. Stop when you detect direct recursion.
> 
> Hopefully, you can transform those into iterations and unroll it a few times
> ...

> 
> > 5. All  functions < a certain number of statements are inlined. (This is 
> > because it's cheaper to inline than do the arg setup).
> > For IA32, the number of statements is 7, for IA64, it's 15.
> 
> We should do that as well. Actually, in my v3 patch, I do something similar:
> Once we reached the recursive inlining limit (default 600 INSNS aka 60
> STMTS, single fn limit is 300), we use some linear fn to put more and more
> severe limits. 
> But I don't go to 0 but to 13 STMTS (130 INSNS), which I found by experiment
> on i386. It was a number leading to the smallest (or close to smallest) code
> size. Only much much later, I completely shut down inlining to prevent
> infinite recursions ...
> 
> Some word about the intel C++ (6.0) compiler:
> In most of my tests (numerical C++ code), gcc performs slightly
> better. But then, the code has been tuned for gcc for years and I
> have some experience which are the best optimization options ...
> So probably I do have some improvement possibilities on intel still.
> 

In every piece of numeric C code i've got, Intel's 6.0 blows gcc out of 
the water.
Partially because it's *much* better at vectorizing than 5.0.
Partially because I always at least turn on single-file interprocedural 
optimizations.
Intel does not default to strict aliasing, as well, unless i'm reading the 
manual wrong.

> Regards,
> 

  reply	other threads:[~2002-04-26  0:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-24  4:37 Kurt Garloff
2002-04-24 16:31 ` Daniel Berlin
2002-04-25 15:06   ` Kurt Garloff
2002-04-25 18:25     ` Daniel Berlin [this message]
2002-04-25  0:21 ` Gerald Pfeifer
2002-04-25  0:48   ` Richard Henderson
2002-04-25 15:51   ` Kurt Garloff
2002-04-25  9:41 ` Gerald Pfeifer
2002-04-25 15:30   ` Kurt Garloff
2002-04-26  4:19     ` Gerald Pfeifer
2002-04-26  8:20       ` Kurt Garloff
2002-04-27  9:49   ` Kurt Garloff
2002-04-30  7:49     ` Gerald Pfeifer
2002-04-30  7:59       ` Daniel Berlin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.44.0204252045250.8013-100000@dberlin.org \
    --to=dberlin@dberlin.org \
    --cc=garloff@suse.de \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).