Hi Daniel,

On Wed, Apr 24, 2002 at 06:24:05PM -0400, Daniel Berlin wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > I'd like to add a few comments again:
> > * I was expecting to find the AST inliner in gcc-3.1.
> >   I do believe it's more sane starting to inline from the leaves of
> >   a call tree than from the root, so I would have expected that approach
> >   (if tuned well) to win all benchmarks.
> 
> Do you have anything to back up that statement?

If you refer to backing it up by numbers: No. I did not implement this nor
did I try to do a real investigation of what is done in the AST branch.

I do have some reasoning so. Just common sense, nothing more. Well, maybe
the fact that it did not completely fail the last time I applied it to the
inlining heuristics.

> Are there any commercial compilers that do it that way?

I don't know.

Let me just give my reasoning: 
What you should make sure to inline, is the fast functions which are called
the most often, preferably from not so many different places. Agreed?
In a call tree, you will have one root function (main) and normally several
leaf functions.
Normally, you will have a function F somewhere which calls  more than one
function, say N functions and/or calls other functions from within a loop
(with say M iterations). This will be the case for most non-trivial and/or
time-consuming programs.

                                        /-> J
                      F -> (N times) G *--> H
		                        \-> I -> K 

Which is where you win if you started inlining from the leaves:
If you inlined the function itself (because you started from the root)
and ran out of your recursive inlining budget, you have just done one
inline operation, resulting in saving one call. If you started from the
leaves, you would save N and/or M calls. (This happens whether you inline
the functions G itself or the functions that are called on below G, assuming
those calls are unconditional.)

I think this makes some sense, no?

Now, for C++ there's another reason: You often have accessor functions like
inline T  Vector<T>::operator () (const int i) const { return data[i]; }
inline T& Vector<T>::operator () (const int i)       { return data[i]; }

Those are very small functions and they are called very often from loops.
They are called much more often than most other functions. So inlining them
is paramount.

> Intel's, for instance, has only a few differences from what we have.
> This is what they do:
> 
> Start at root.

I would not do that.

> At most 2000 intel intermediate code statements are inlined into the 
> caller.
> 
> 1. Functions with the following substrings in their name are not inlined: 
> (various names signifying aborts, exits, fails, and warns, as well as 
> alloca)

Exception handling in general does not need to be tuned much for performance.

> 2. Focus on callers containing loops, and callees  containing 
> loops.  (This is probably most important).

This makes a lot of sense. But we don't have this information when taking an
inlining decision in gcc available. At least not in a way I could find it ...

> 3. Don't inline functions > a certain number of statements. They 
> default to 230 Intel intermediate code statements.

So the recursive limit is 2000 and the single fn. limit is 230?

> 4. Stop when you detect direct recursion.

Hopefully, you can transform those into iterations and unroll it a few times
...

> 5. All  functions < a certain number of statements are inlined. (This is 
> because it's cheaper to inline than do the arg setup).
> For IA32, the number of statements is 7, for IA64, it's 15.

We should do that as well. Actually, in my v3 patch, I do something similar:
Once we reached the recursive inlining limit (default 600 INSNS aka 60
STMTS, single fn limit is 300), we use some linear fn to put more and more
severe limits. 
But I don't go to 0 but to 13 STMTS (130 INSNS), which I found by experiment
on i386. It was a number leading to the smallest (or close to smallest) code
size. Only much much later, I completely shut down inlining to prevent
infinite recursions ...

Some word about the intel C++ (6.0) compiler:
In most of my tests (numerical C++ code), gcc performs slightly
better. But then, the code has been tuned for gcc for years and I
have some experience which are the best optimization options ...
So probably I do have some improvement possibilities on intel still.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security