From mboxrd@z Thu Jan  1 00:00:00 1970
From: Carlo Wood <carlo@alinoe.com>
To: Daniel Berlin <dan@cgsoftware.com>
Cc: gcc@gcc.gnu.org
Subject: Re: Inlining heuristics for C++
Date: Mon, 09 Jul 2001 21:05:00 -0000
Message-id: <20010710060551.C30081@alinoe.com>
References: <87r8vpo8rw.fsf@cgsoftware.com> <20010710045607.B30081@alinoe.com> <87d779o3y5.fsf@cgsoftware.com>
X-SW-Source: 2001-07/msg00697.html

On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote:
> > And suppose that f1() is *only* called from within f2().
> > Then does the size of f1 matter?
> 
> You are missing something important.
> 
> Whatever calls f2 will recursively inline f1 back into f2, for that
> particular inlining.
> 
> We recursively inline.

I understood that.  Thats the whole reason why I say that the size
of f1 doesn't matter: all you lose is one call-overhead, without
growing the size of the code at all.  Fergus also said:

"There is an exception to this.  Functions which are only called once,
and which are defined in this translation unit and not exported, should
almost always be inlined (after which the original copy of the function
is dead code and can be eliminated)"

To which you also disagreed.  There must be a misunderstanding here :),
I think Fergus is perfectly right and actually better worded what I wanted
to say.

> > Despite the fact that we have to compare "one million times a call overhead"
> > with the different object code growths, the sizes of f1, f2 and f3 relative
> > to eachother are totally irrelevant.
> 
> No, they aren't.
> 
> Once again, we recursively inline starting at some base function.
> 
> We will inline f3 and into main.
> f2 won't make the cut off for inlining into the f3 inline that goes
> into main.

That makes, thus, no sense.  Because inlining f2 into the f3 inline gains
*again* one million call-overheads.  It is true that it makes the code
effectively grow 1000 lines, but that is an absolute number and not something
that is relevant to compare relatively (with the size of f3 in this case,
or with the size of main+f3 for that matter).  The only thing that matters
is the actual size of f2 (1000 lines) and the number of times it is called
(1,000,000 times).  The sizes of main and f3 don't come into the picture,
and hence the proposed rule can't be correct.

> However, we'll inline f1 into f2 when we generate the non-inline code
> for f2 (which we do for all non-statics).
> 
> f2 is a big enough function that inlining into f3 is going to increase
> code size and processing time, but not buy you performance.

That depends on how often f3 is called.

> 
> People forget we recursively inline.

Please note that I did not forget that.  There must be some other misunderstanding
by me or you (heh - is *this* the case were one puts 'me' upfront instead of
second? ;).

> So if they were in relative size compared to main, we'd now (with my
> heuristic) inline main->f3->f2->f1 (IE main would call nothing
> anymore).

This decision should be based on the fact that f3 is inside a loop, and not
on the fact that main() has a considerable size, however.

> The problem is then that we also currently inline
> 
> f3->f2->f1
> f2->f1

I understand.  In this case one can easily do the following reasoning:
Knowing that f3 is _called_ (not inlined), it was apparently not important
enough to inline it.  Hence we are not inside a loop, and thus it is not
needed to inline f2 into f3.  Again however, I disagree that the size of
f2 *relative* to f3 matters; the only thing that matters in this case is
the absolute size of f2.  I'd change your heuristic rule to look at the
size of function in absolute terms, not relative compared to the function
that it is called from.

> When generating code for those.
> 
> Think of all the hidden functions.
> 
> So if your constructor calls a function, which calls two functions, we
> end up with
> inlined:
> main->your constructor->the function->two large functions->whatever they
> main->call->till we hit 10000 insns or run out of stuff to inline
> 
> Then we inline, when outputting the constructor
> 
> constructor->the function->two functions->etc
> 
> then 
> the function->two functions->etc
> 
> 
> Consider if more than just the constructor calls the function.
> 
> Everything that calls it, as long as we don't hit 10000 insns for the
> function at the top of the tree, will inline the the function->two functions->etc,
> whether it makes sense or not.
> 
> 
> What my heuristics would do is say
> 
> main->your constructor->... is fine
> your constructor->two large functions->... by itself doesn't make much
> sense.

I agree.  But based on the fact that it is unlikely that a constructor
that calls two large functions is within a loop to begin with.  Fortunately
here you talk about "a large function", instead of "a function large compared
to the constructor size".  That is a difference that matter and that should
be reflected in the patch.

> Whatever calls the constructor outside the compilation unit is
> incurring enough call overhead anyway that inlining these two large
> functions into the constructor just increases code size, not performance. 
> 
> each of the two large functions->... is fine

But... now I see something that I didn't see before :).
Perhaps the call-overhead is much less important then additional
optimizations that can be done AFTER inlining.  By not inlining we
might miss important optimizations.  This wouldn't speed up compile
time, but what about to TRY inlining and then make the decision
based on code size reduction that follows from it?  While a call-
overhead is rather constant (a few assembly instructions), reductions
as a result of actual inlining can be rather HUGE (also mentioned by
Timothy J. Wood (Nice last name) in his example of calling large
functions with constant parameters that cause large code blocks to
drop out).  Ok, that is too much asked... Lets say, we do what your
patch does but 1) look at the absolute size of the to-inline function,
2) inline like we inline now when any parameter passed is a constant?

-- 
Carlo Wood <carlo@alinoe.com>