public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Inlining heuristics for C++
@ 2001-07-10 14:53 mike stump
  2001-07-10 15:04 ` Daniel Berlin
  0 siblings, 1 reply; 28+ messages in thread
From: mike stump @ 2001-07-10 14:53 UTC (permalink / raw)
  To: dan, gcc

> To: gcc@gcc.gnu.org
> From: Daniel Berlin <dan@cgsoftware.com>
> Date: Mon, 09 Jul 2001 21:46:59 -0400

> Right now, they are horrific.

You should have seen it before we limited the growth of the function
being inlined into.  :-(

Anyway, would be nice if someone could take a pass through the
literature and find the good stuff that describes when to do it and
when not to and give us pointers to it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-10 14:53 Inlining heuristics for C++ mike stump
@ 2001-07-10 15:04 ` Daniel Berlin
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-10 15:04 UTC (permalink / raw)
  To: mike stump; +Cc: dan, gcc

mike stump <mrs@windriver.com> writes:

>> To: gcc@gcc.gnu.org
>> From: Daniel Berlin <dan@cgsoftware.com>
>> Date: Mon, 09 Jul 2001 21:46:59 -0400
> 
>> Right now, they are horrific.
> 
> You should have seen it before we limited the growth of the function
> being inlined into.  :-(
> 
> Anyway, would be nice if someone could take a pass through the
> literature and find the good stuff that describes when to do it and
> when not to and give us pointers to it.

I already did this.


Most of the literature is actually pretty concise.

You want to look at "Aggressive Inlining"  (It's a desccription of
what HP's compilers do. I posted the author names in a seperate
message, it's frmo PLDI '97), "Training compilers for better inlining
Decisions" (Dean and Chambers, 1993), and "Fast and Effective
Procedure Inlining" (Waddell, Dybvig, 1997).

As soon as the basic block on trees stuff goes in, i'll implement
profile directed cloning and inlining, and we'll be just like all the
other commercial compilers.


-- 
"My neighbor has a circular driveway...  He can't get out.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-10 14:16     ` Hartmut Schirmer
@ 2001-07-11 14:27       ` Fergus Henderson
  0 siblings, 0 replies; 28+ messages in thread
From: Fergus Henderson @ 2001-07-11 14:27 UTC (permalink / raw)
  To: Hartmut Schirmer; +Cc: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

On 10-Jul-2001, Hartmut Schirmer <hartmut.schirmer@arcormail.de> wrote:
> We also shouldn´t inline functions in branches that aren´t
> predicted to be excecuted, at least take the __buildin_expect
> into account here.

That reminds me of another heuristic: don't inline functions
marked __attribute__((noreturn)).

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-10  2:42   ` Andi Kleen
@ 2001-07-10 14:16     ` Hartmut Schirmer
  2001-07-11 14:27       ` Fergus Henderson
  0 siblings, 1 reply; 28+ messages in thread
From: Hartmut Schirmer @ 2001-07-10 14:16 UTC (permalink / raw)
  To: gcc

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1778 bytes --]

On Tue, 10 Jul 2001, Andi Kleen wrote:
>Fergus Henderson <fjh@cs.mu.oz.au> writes:
>> 
>> There is an exception to this.  Functions which are only called once,
>> and which are defined in this translation unit and not exported, should
>> almost always be inlined (after which the original copy of the function
>> is dead code and can be eliminated), even if they are much larger than
>> the function into which they are being inlined.
>
>This has some bad interactions with gcc's current register allocator though.
>It cannot split register variable livetimes currently. When bigger functions
>are inlined then the variables in the outer function (with their scope 
>extending over the function call) have a much smaller chance to get a 
>register because they're usually already used up by the inlined bigger
>function. With a function call gcc can save/restore them around the 
>function; with inline it tends to use stack variables for the "outside"
>variables which often hurts.
>
>Of course it would be best to fix the register allocator to still do a 
>bettre job. Short term I suspect e.g. on register poor machines
>like the x86 which also has fast CALL it's probably best to avoid bigger 
>inlines at least when there are many variables alive in the caller or 
>the call happens in a loop.

Inlining may increase the required stack space. Without a virtual
growing stack this effect can kill your application.
Better to stop inlining if the amount of local storage required
is increased by some (small) factor.

We also shouldn´t inline functions in branches that aren´t
predicted to be excecuted, at least take the __buildin_expect
into account here.

Is there a chance to get the RTL inliner handling all those
cases the tree inliner didn´t catch ?

Hartmut

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
                     ` (4 preceding siblings ...)
  2001-07-10  0:30   ` Daniel Berlin
@ 2001-07-10  2:47   ` Nathan Sidwell
  5 siblings, 0 replies; 28+ messages in thread
From: Nathan Sidwell @ 2001-07-10  2:47 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Mark Mitchell, gcc

Mark Mitchell wrote:
> 
> --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin
> <dan@cgsoftware.com> wrote:

> I think your ideas are reasonable.  Nathan Sidwell has been thinking
> about these issues, too; you should coordinate with him to try to
> get some decent ideas and some decent measurements.
Yes, and I'm making good progress. I have significantly reduced the
compile time time and memory requirements. One of Geralds test cases now
takes 112MB and 87 seconds instead of 279MB and 1133 seconds (that went
into swap, hence the huge time dilation). The produced executable is also
much smaller. The downside is that the resultant executable does not
always go faster.

Reading some inlining papers suggests that the 'inline until too big'
heuristic is not sensible. A % increase might be better. I'm trying
the following heuristics,
	inline until too big (current one)
	inline all smaller than X
	inline until too expanded
	inline all smaller than X% of target fn

Note that my algorithm does a bottom up inlining, so we don't get problems
with unbounded inlining of small functions.

I don't delay inlining until the end of the compilation unit. Doing that
would not be difficult, and then one could generate the complete call
graph and potentially do a better job (it would not invalidate the work
I'm doing, which is good).

I'd guess I'll have something postable by the end of the week. Dan, if you
can wait 'til then, that'd be great!

nathan
-- 
Dr Nathan Sidwell   ::   http://www.codesourcery.com   ::   CodeSourcery LLC
         'But that's a lie.' - 'Yes it is. What's your point?'
nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
       [not found] ` <20010710122244.A14584@hg.cs.mu.oz.au.suse.lists.egcs>
@ 2001-07-10  2:42   ` Andi Kleen
  2001-07-10 14:16     ` Hartmut Schirmer
  0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2001-07-10  2:42 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: dan, gcc

Fergus Henderson <fjh@cs.mu.oz.au> writes:
> 
> There is an exception to this.  Functions which are only called once,
> and which are defined in this translation unit and not exported, should
> almost always be inlined (after which the original copy of the function
> is dead code and can be eliminated), even if they are much larger than
> the function into which they are being inlined.

This has some bad interactions with gcc's current register allocator though.
It cannot split register variable livetimes currently. When bigger functions
are inlined then the variables in the outer function (with their scope 
extending over the function call) have a much smaller chance to get a 
register because they're usually already used up by the inlined bigger
function. With a function call gcc can save/restore them around the 
function; with inline it tends to use stack variables for the "outside"
variables which often hurts.

Of course it would be best to fix the register allocator to still do a 
bettre job. Short term I suspect e.g. on register poor machines
like the x86 which also has fast CALL it's probably best to avoid bigger 
inlines at least when there are many variables alive in the caller or 
the call happens in a loop.

-Andi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
                     ` (3 preceding siblings ...)
  2001-07-10  0:20   ` Daniel Berlin
@ 2001-07-10  0:30   ` Daniel Berlin
  2001-07-10  2:47   ` Nathan Sidwell
  5 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-10  0:30 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Daniel Berlin, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin
> <dan@cgsoftware.com> wrote:
> 
>> Right now, they are horrific.
> 
> Hey, thanks a lot. :-)  They are, actually, the same ones we had in
> the RTL inliner, just about -- except that we can inline so much more!
> 
> I think your ideas are reasonable.  Nathan Sidwell has been thinking
> about these issues, too; you should coordinate with him to try to
> get some decent ideas and some decent measurements.
> 
> One long-term challenge is that we would like to inline when somehow
> that allows major simplifications.  For example, if there is a giant
> function involving tons of calcuation, but we know that the argument
> is `3', and that means we can fold all the calculations, then we
> should do the inlining, even if the inlined function is nominally
> giant.  I have no idea how to do this, though.  It's probably not worth
> bothering about.

The whole method defaulting to inline thing also explains why v3 was
hit so hard.
It's mostly headers, and thus, all the code is readily available to be
inlined, into any translation unit.

If I -fno-default-inline, i can't get 15k insns to appear on the
example. even at --param max-code-growth=500

 TOTAL                 :   8.10 ....
zsh: exit 1     ./cc1plus --param max-code-growth=500 -O3 -fno-default-inline test3.ii
./cc1plus --param max-code-growth=500 -O3 -fno-default-inline test3.ii
 8.10s user  ...

So definitely, methods were being all marked inline and inlined.

This is fine, as long as we control code growth.
As the timing vs what i posted earlier shows, we don't save more than a second default-inline
on or off with a normal code growth control factor.
(%:/buildspace/egcs/build/gcc)- time  ./cc1plus -O3 -fno-default-inline test3.ii 2>/dev/null
./cc1plus -O3 -fno-default-inline test3.ii 2> /dev/null  8.12s user 0.28s system 100% cpu 8.398 total
(dberlin@debian)(241/vc)(03:28am:07/10/01)-
(%:/buildspace/egcs/build/gcc)- time  ./cc1plus -O3 test3.ii 2>/dev/null
./cc1plus -O3 test3.ii 2> /dev/null  8.12s user 0.27s system 99% cpu 8.471 total
(dberlin@debian)(242/vc)(03:28am:07/10/01)-
(%:/buildspace/egcs/build/gcc)-


It's only what we have now that makes it such a problem.

In fact, i don't have a vote, but if i did, i'd vote for leaving
flag_default_inline as is, and just installing the new heuristic i'll
post a patch for.

We should get better code overall that way, and never worse.
--Dan

> 
> -- 
> Mark Mitchell                mark@codesourcery.com
> CodeSourcery, LLC            http://www.codesourcery.com

-- 
"What's another word for Thesaurus?
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
                     ` (2 preceding siblings ...)
  2001-07-10  0:09   ` Daniel Berlin
@ 2001-07-10  0:20   ` Daniel Berlin
  2001-07-10  0:30   ` Daniel Berlin
  2001-07-10  2:47   ` Nathan Sidwell
  5 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-10  0:20 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Daniel Berlin, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin
> <dan@cgsoftware.com> wrote:
> 
>> Right now, they are horrific.
> 
> Hey, thanks a lot. :-)  They are, actually, the same ones we had in
> the RTL inliner, just about -- except that we can inline so much more!
> 
It's not your fault, it's because someone set flag_default_inline to
1, so all methods are  inline.
and since  DECL_INLINE, < 10k insns and actually possible to inline were the
inliner's criteria,  we ended up with a lot of 10k insn
functions, even when the original function was say, one stmt that was
a call. :)

We effectively did a breadth first search of the call tree for a function, inlining
everything possible until we hit 10k insns for that function.

The only case where we didn't hit 10k insns is if we *couldn't find
that many functions to inline* :).


> I think your ideas are reasonable.  Nathan Sidwell has been thinking
> about these issues, too; you should coordinate with him to try to
> get some decent ideas and some decent measurements.
:)

I really also don't think we will exacerbate the memory problems.  The
memory problems are from every method being inline, and every inline
up to 10k insns being done.
So we generate a *lot* of copies.
And we *were* deferring just about everything anyway, since they were
all marked inline.

So compared to 3.0 we can reduce memory usage just by calming the
inliner.
And still be deferring everything. And inlining where we won't
increase code growth. And reduce compile time by a factor of of 4.
Neat, eh?

Our problem was so big, that we look like geniuses for fixing it.

I think we should "lose" the mail archives, and pretend we spent
months fixing it.
:)

> 
> One long-term challenge is that we would like to inline when somehow
> that allows major simplifications.  For example, if there is a giant
> function involving tons of calcuation, but we know that the argument
> is `3', and that means we can fold all the calculations, then we
> should do the inlining, even if the inlined function is nominally
> giant.  I have no idea how to do this, though.  It's probably not worth
> bothering about.
> 
> -- 
> Mark Mitchell                mark@codesourcery.com
> CodeSourcery, LLC            http://www.codesourcery.com

-- 
"I went to the cinema, and the prices were:  Adults $5.00,
children $2.50.  So I said, "Give me two boys and a girl."
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-10  0:00   ` Fergus Henderson
@ 2001-07-10  0:13     ` Daniel Berlin
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-10  0:13 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: gcc

Fergus Henderson <fjh@cs.mu.oz.au> writes:

> On 09-Jul-2001, Mark Mitchell <mark@codesourcery.com> wrote:
>> 
>> One long-term challenge is that we would like to inline when somehow
>> that allows major simplifications.  For example, if there is a giant
>> function involving tons of calcuation, but we know that the argument
>> is `3', and that means we can fold all the calculations, then we
>> should do the inlining, even if the inlined function is nominally
>> giant.  I have no idea how to do this, though.  It's probably not worth
>> bothering about.
> 
> Another, similar case when you really want to inline are when the body
> of the function contains a call to a virtual method (or other indirect
> function call) and the type of the object and hence its vtable
> (or the value of the function pointer) is known in the caller.
> That's handy because you can then inline the virtual method
> (or indirect function call), which may lead to further opportunities
> for simplification.

Real Devirtualization can be done with region based frameworks, or
interprocedural analysis.
No need for inlining all virtuals.

> 
> For some applications this can really pay off significantly,
> so it is worth doing in the long run, although right now
> it's probably more important on just fixing the most egregariously
> bad heuristics rather than trying 
> 
> We do some of this kind of stuff in the Mercury compiler.  For example,
> we specialize function calls where one of the parameters is a known
> higher-order term (a.k.a. closure; think of it as a known function
> pointer).  

Yup, this is procedure cloning (or so all the papers and books i have
call it, besides closure).  It's part of most aggressive inliner
frameworks, but you really need either to defer everything again.
> We also do "deforestation" optimization, which is
> related to this idea of specializing calls where the caller has
> information about a decision (e.g. a switch) in the callee.
> There's a paper [1] on our web page which describes those and
> some of the other optimizations that we do.
COol.


-- 
"Are there any questions?
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
  2001-07-09 23:37   ` Daniel Jacobowitz
  2001-07-10  0:00   ` Fergus Henderson
@ 2001-07-10  0:09   ` Daniel Berlin
  2001-07-10  0:20   ` Daniel Berlin
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-10  0:09 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Daniel Berlin, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin
> <dan@cgsoftware.com> wrote:
> 
>> Right now, they are horrific.
> 
> Hey, thanks a lot. :-)  They are, actually, the same ones we had in
> the RTL inliner, just about -- except that we can inline so much more!
> 
> I think your ideas are reasonable.  Nathan Sidwell has been thinking
> about these issues, too; you should coordinate with him to try to
> get some decent ideas and some decent measurements.
> 
> One long-term challenge is that we would like to inline when somehow
> that allows major simplifications.  For example, if there is a giant
> function involving tons of calcuation, but we know that the argument
> is `3', and that means we can fold all the calculations, then we
> should do the inlining, even if the inlined function is nominally
> giant.  I have no idea how to do this, though.  It's probably not worth
> bothering about.

This is procedure cloning.
The cheap way is to clone the procedure (with whatever arguments are
constant now omitted from the call), optimize it, see if it
helped relative to the original procedure, if so, change the cloned
name, and the call to match the new cloned name/arguments.

If we find somewhere else compatible with the new arguments, reuse the
cloned procedures.
Picking which procedures to clone is tricky.

See "Aggressive inlining" (Ayers, Gottlieb, Schooler) for more details
(They are HP compiler people, the paper goes into stats about how much
procedure cloning alone vs inlining alone vs both combined helps).
--Dan
> 
> -- 
> Mark Mitchell                mark@codesourcery.com
> CodeSourcery, LLC            http://www.codesourcery.com

-- 
"Is it weird in here, or is it just me?
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
  2001-07-09 23:37   ` Daniel Jacobowitz
@ 2001-07-10  0:00   ` Fergus Henderson
  2001-07-10  0:13     ` Daniel Berlin
  2001-07-10  0:09   ` Daniel Berlin
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Fergus Henderson @ 2001-07-10  0:00 UTC (permalink / raw)
  To: gcc

On 09-Jul-2001, Mark Mitchell <mark@codesourcery.com> wrote:
> 
> One long-term challenge is that we would like to inline when somehow
> that allows major simplifications.  For example, if there is a giant
> function involving tons of calcuation, but we know that the argument
> is `3', and that means we can fold all the calculations, then we
> should do the inlining, even if the inlined function is nominally
> giant.  I have no idea how to do this, though.  It's probably not worth
> bothering about.

Another, similar case when you really want to inline are when the body
of the function contains a call to a virtual method (or other indirect
function call) and the type of the object and hence its vtable
(or the value of the function pointer) is known in the caller.
That's handy because you can then inline the virtual method
(or indirect function call), which may lead to further opportunities
for simplification.

For some applications this can really pay off significantly,
so it is worth doing in the long run, although right now
it's probably more important on just fixing the most egregariously
bad heuristics rather than trying 

We do some of this kind of stuff in the Mercury compiler.  For example,
we specialize function calls where one of the parameters is a known
higher-order term (a.k.a. closure; think of it as a known function
pointer).  We also do "deforestation" optimization, which is
related to this idea of specializing calls where the caller has
information about a decision (e.g. a switch) in the callee.
There's a paper [1] on our web page which describes those and
some of the other optimizations that we do.

[1] "Optimization of Mercury programs", Simon Taylor,
Honours Report, The University of Melbourne, 1998.
< http://www.cs.mu.oz.au/research/mercury/information/papers/stayl_hons.ps.gz >

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 23:26 ` Mark Mitchell
@ 2001-07-09 23:37   ` Daniel Jacobowitz
  2001-07-10  0:00   ` Fergus Henderson
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Daniel Jacobowitz @ 2001-07-09 23:37 UTC (permalink / raw)
  To: gcc

On Mon, Jul 09, 2001 at 11:06:31PM -0700, Mark Mitchell wrote:
> 
> 
> --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin 
> <dan@cgsoftware.com> wrote:
> 
> > Right now, they are horrific.
> 
> Hey, thanks a lot. :-)  They are, actually, the same ones we had in
> the RTL inliner, just about -- except that we can inline so much more!
> 
> I think your ideas are reasonable.  Nathan Sidwell has been thinking
> about these issues, too; you should coordinate with him to try to
> get some decent ideas and some decent measurements.
> 
> One long-term challenge is that we would like to inline when somehow
> that allows major simplifications.  For example, if there is a giant
> function involving tons of calcuation, but we know that the argument
> is `3', and that means we can fold all the calculations, then we
> should do the inlining, even if the inlined function is nominally
> giant.  I have no idea how to do this, though.  It's probably not worth
> bothering about.

To do that, you need basically a backtracking framework - try inlining
compulsively for some amount of time and see if the resulting code size
of the inlined function is substantially smaller than the size of the
inlined function standalone.  Except, of course, you really want likely
execution time rather than code size, but code size is a decent first
approximation :)  Unless of course we enable some additional loop
unrolling.

This is something that would, as Dan said, be nice to have - some day,
when we have the framework to do it.

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 19:22 ` Fergus Henderson
  2001-07-09 20:18   ` Daniel Berlin
@ 2001-07-09 23:30   ` Mark Mitchell
  1 sibling, 0 replies; 28+ messages in thread
From: Mark Mitchell @ 2001-07-09 23:30 UTC (permalink / raw)
  To: Fergus Henderson, Daniel Berlin; +Cc: gcc

> For this to work, you need to do parse and analyze the whole translation
> unit before emitting code, to get information about call counts, and
> to enable elimination of dead functions.

Doing this in the C/C++ front-ends would be approximately trivial, by
the way.  But, it would exacerbate our memory usage problems even more;
until we get them under control, I think we have to stick with just
doing a function at a time.  Some compilers write out the bodies of
function to disk and read them back; that makes it easier to keep the
entire translation unit in memory without having a giant memory
image.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 18:47 Daniel Berlin
                   ` (2 preceding siblings ...)
  2001-07-09 20:05 ` Timothy J. Wood
@ 2001-07-09 23:26 ` Mark Mitchell
  2001-07-09 23:37   ` Daniel Jacobowitz
                     ` (5 more replies)
  3 siblings, 6 replies; 28+ messages in thread
From: Mark Mitchell @ 2001-07-09 23:26 UTC (permalink / raw)
  To: Daniel Berlin, gcc

--On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin 
<dan@cgsoftware.com> wrote:

> Right now, they are horrific.

Hey, thanks a lot. :-)  They are, actually, the same ones we had in
the RTL inliner, just about -- except that we can inline so much more!

I think your ideas are reasonable.  Nathan Sidwell has been thinking
about these issues, too; you should coordinate with him to try to
get some decent ideas and some decent measurements.

One long-term challenge is that we would like to inline when somehow
that allows major simplifications.  For example, if there is a giant
function involving tons of calcuation, but we know that the argument
is `3', and that means we can fold all the calculations, then we
should do the inlining, even if the inlined function is nominally
giant.  I have no idea how to do this, though.  It's probably not worth
bothering about.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 21:49 dewar
@ 2001-07-09 22:17 ` Daniel Berlin
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 22:17 UTC (permalink / raw)
  To: dewar; +Cc: carlo, dan, gcc

dewar@gnat.com writes:

> <<to thank you for writing this (I mean that).  I am reassured too, that
> functions marked as inline still WILL be inlined.  If you add point 2)
>>>
> 
> Personally, I don't feel that such reassurance is necessary. Obviously
> a compiler is not required to inline things that are marked as inlined
> (since basically inlining is semantically neutral). It is perfectly
> reasonable for a compiler to take an inlining request as basically
> a request to inline *if* time performance is improved, I think it is
> always fine for a compiler to ignore an inlining request if inlining
> would be detrimental to both time and space performance.

Hopefully, Diego's tree optimizer work will start to clear the way
towards doing smart inlining (based on static or real profiling info).
And for the loop cases, we can do procedure cloning anyway (clone the
procedure to a new name, with the constant arguments set constant, and
make the loop/anything else with those arguments call the new
procedure).




-- 
"I watched the Indy 500, and I was thinking that if they left
earlier they wouldn't have to go so fast.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
@ 2001-07-09 21:49 dewar
  2001-07-09 22:17 ` Daniel Berlin
  0 siblings, 1 reply; 28+ messages in thread
From: dewar @ 2001-07-09 21:49 UTC (permalink / raw)
  To: carlo, dan; +Cc: gcc

<<to thank you for writing this (I mean that).  I am reassured too, that
functions marked as inline still WILL be inlined.  If you add point 2)
>>

Personally, I don't feel that such reassurance is necessary. Obviously
a compiler is not required to inline things that are marked as inlined
(since basically inlining is semantically neutral). It is perfectly
reasonable for a compiler to take an inlining request as basically
a request to inline *if* time performance is improved, I think it is
always fine for a compiler to ignore an inlining request if inlining
would be detrimental to both time and space performance.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 21:26       ` Daniel Berlin
@ 2001-07-09 21:45         ` Carlo Wood
  0 siblings, 0 replies; 28+ messages in thread
From: Carlo Wood @ 2001-07-09 21:45 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Tue, Jul 10, 2001 at 12:25:58AM -0400, Daniel Berlin wrote:
> > I agree.  But based on the fact that it is unlikely that a constructor
> > that calls two large functions is within a loop to begin with.  Fortunately
> > here you talk about "a large function", instead of "a function large compared
> > to the constructor size".  That is a difference that matter and that should
> > be reflected in the patch.
> I don't understand, please paint me a picture if i don't get it in the
> morning. I'm slow at midnight.
> :)

It's 6:38 am here :p.

Anyway - perhaps I did let myself go too much.  Having an "analysis-IQ" of 180 is great
for seeing errors in other peoples logic and I've successfully pissed of people
numerous of times by constantly (and only) break down what they come up with by
pointing out errors.  However, this is not important, it is just a minor difference
in looking at things :/.  I appologize to everyone on the list.

> > patch does but 1) look at the absolute size of the to-inline function,
> > 2) inline like we inline now when any parameter passed is a
> >    constant?
> 
> Sure, i'm happy to do this, once i understand #1.
> #2 is easy.
> 
> We also inline when it says inline, so that's, once again, always a
> fall back if the heuristic is too stupid.
> 
> Right now we inline *way* too much. I'd rather have people have to add
> inline.
> 
> Theres no way to say "noinline"
> :)

Before going to bed, allow me to summarize what IS important, I think:
I basically agree with everything you said, and I don't have a big problem
with the patch.  It certainly is better than what we have now and I'd like
to thank you for writing this (I mean that).  I am reassured too, that
functions marked as inline still WILL be inlined.  If you add point 2)
above I will be honoured more than satisfied ;).  I'll try to explain
point 1) tomorrow without wanting to make a big point out of it: I have
the feeling that in practise there won't be much difference in the result
anyway - its just that in my mind it's a-logical, and that itched.

Night,

-- 
Carlo Wood <carlo@alinoe.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 21:05     ` Carlo Wood
@ 2001-07-09 21:26       ` Daniel Berlin
  2001-07-09 21:45         ` Carlo Wood
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 21:26 UTC (permalink / raw)
  To: Carlo Wood; +Cc: Daniel Berlin, gcc

Carlo Wood <carlo@alinoe.com> writes:

> On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote:
>> > And suppose that f1() is *only* called from within f2().
>> > Then does the size of f1 matter?
>> 
>> You are missing something important.
>> 
>> Whatever calls f2 will recursively inline f1 back into f2, for that
>> particular inlining.
>> 
>> We recursively inline.
> 
> I understood that.  Thats the whole reason why I say that the size
> of f1 doesn't matter: all you lose is one call-overhead, without
> growing the size of the code at all.  Fergus also said:
> 
> "There is an exception to this.  Functions which are only called once,
> and which are defined in this translation unit and not exported, should
> almost always be inlined (after which the original copy of the function
> is dead code and can be eliminated)"
> 
> To which you also disagreed.  There must be a misunderstanding here :),
> I think Fergus is perfectly right and actually better worded what I wanted
> to say.
> 
>> > Despite the fact that we have to compare "one million times a call overhead"
>> > with the different object code growths, the sizes of f1, f2 and f3 relative
>> > to eachother are totally irrelevant.
>> 
>> No, they aren't.
>> 
>> Once again, we recursively inline starting at some base function.
>> 
>> We will inline f3 and into main.
>> f2 won't make the cut off for inlining into the f3 inline that goes
>> into main.
> 
> That makes, thus, no sense.  Because inlining f2 into the f3 inline gains
> *again* one million call-overheads.

However, since f1 was inlined into f2, the call overhead just became
less important.

>  It is true that it makes the code
> effectively grow 1000 lines, but that is an absolute number and not something
> that is relevant to compare relatively (with the size of f3 in this case,
> or with the size of main+f3 for that matter).

You can't do it perfectly no matter what you try, because we don't
have the architecture.

>  The only thing that matters
> is the actual size of f2 (1000 lines) and the number of times it is called
> (1,000,000 times). 
We don't know how many times it's called at this point. 

> The sizes of main and f3 don't come into the picture,
> and hence the proposed rule can't be correct.
Sure, it's not a *rule*, it's a heuristic.
When we have the architecture to determine the above, i'll change it
to take the call counts, loop depth, etc, into account.

> 
>> However, we'll inline f1 into f2 when we generate the non-inline code
>> for f2 (which we do for all non-statics).
>> 
>> f2 is a big enough function that inlining into f3 is going to increase
>> code size and processing time, but not buy you performance.
> 
> That depends on how often f3 is called.
And how much gets inlined into f2.
> 
>> 
>> People forget we recursively inline.
> 
> Please note that I did not forget that.  There must be some other misunderstanding
> by me or you (heh - is *this* the case were one puts 'me' upfront instead of
> second? ;).
:)

I just don't think the particular case above is as important as you
do.
And I think we get it sooooooo wrong soooooo often right now, that
the current heuristic is worse than my proposed one.

Given that we are spending tons of compilation time, and getting so
little out of it, i think whatever pessimization i perform in the
above particular case, is outweighed by the compile time factor.

These are parameters you can twiddle.
You can say you don't care, and watch your compile time shoot back up
to hell, along with your code size.
But i'm pretty sure it's not going to speed up more than 2% or so.

> 
>> So if they were in relative size compared to main, we'd now (with my
>> heuristic) inline main->f3->f2->f1 (IE main would call nothing
>> anymore).
> 
> This decision should be based on the fact that f3 is inside a loop, and not
> on the fact that main() has a considerable size, however.

We don't have this architecture.

> 
>> The problem is then that we also currently inline
>> 
>> f3->f2->f1
>> f2->f1
> 
> I understand.  In this case one can easily do the following reasoning:
> Knowing that f3 is _called_ (not inlined), it was apparently not important
> enough to inline it.  Hence we are not inside a loop, and thus it is not
> needed to inline f2 into f3.  Again however, I disagree that the size of
> f2 *relative* to f3 matters; the only thing that matters in this case is
> the absolute size of f2.  I'd change your heuristic rule to look at the
> size of function in absolute terms, not relative compared to the function
> that it is called from.
No, because then we'll do f2->f1 because the size doesn't relatively
matter, only the absolute size matters.
In fact, the current heuristic is that only absolute size of the top
level function matters.
We know this doesn't work well in most cases: good at some, and is
bad at a lot more.
I'm trying to having something that is good in most cases: bad at
some, and good at more than we are bad at.

> 
>> When generating code for those.
>> 
>> Think of all the hidden functions.
>> 
>> So if your constructor calls a function, which calls two functions, we
>> end up with
>> inlined:
>> main->your constructor->the function->two large functions->whatever they
>> main->call->till we hit 10000 insns or run out of stuff to inline
>> 
>> Then we inline, when outputting the constructor
>> 
>> constructor->the function->two functions->etc
>> 
>> then 
>> the function->two functions->etc
>> 
>> 
>> Consider if more than just the constructor calls the function.
>> 
>> Everything that calls it, as long as we don't hit 10000 insns for the
>> function at the top of the tree, will inline the the function->two functions->etc,
>> whether it makes sense or not.
>> 
>> 
>> What my heuristics would do is say
>> 
>> main->your constructor->... is fine
>> your constructor->two large functions->... by itself doesn't make much
>> sense.
> 
> I agree.  But based on the fact that it is unlikely that a constructor
> that calls two large functions is within a loop to begin with.  Fortunately
> here you talk about "a large function", instead of "a function large compared
> to the constructor size".  That is a difference that matter and that should
> be reflected in the patch.
I don't understand, please paint me a picture if i don't get it in the
morning. I'm slow at midnight.
:)

>> Whatever calls the constructor outside the compilation unit is
>> incurring enough call overhead anyway that inlining these two large
>> functions into the constructor just increases code size, not performance. 
>> 
>> each of the two large functions->... is fine
> 
> But... now I see something that I didn't see before :).
> Perhaps the call-overhead is much less important then additional
> optimizations that can be done AFTER inlining.
Yes.
>   By not inlining we
> might miss important optimizations.  This wouldn't speed up compile
> time, but what about to TRY inlining and then make the decision
> based on code size reduction that follows from it?

HP does this, and i have a paper on doing this.  You have a budget for
time to spend inlining a given function.  However, we don't have
anywhere near the architecture at the tree level to make use of it.

We can only go with simple heuristics right now, at least until
Diego's basic blocks on trees stuff.

I'm happy to build a great inliner that takes into account all kinds
of stuff, and performs well.

And when we have the architecture, i'll do it.
But i don't have time to build all that architecture all by myself,
when i know others are working on it.

> overhead is rather constant (a few assembly instructions), reductions
> as a result of actual inlining can be rather HUGE (also mentioned by
> Timothy J. Wood (Nice last name) in his example of calling large
> functions with constant parameters that cause large code blocks to
> drop out).

>   Ok, that is too much asked... Lets say, we do what your
> patch does but 1) look at the absolute size of the to-inline function,
> 2) inline like we inline now when any parameter passed is a
>    constant?

Sure, i'm happy to do this, once i understand #1.
#2 is easy.

We also inline when it says inline, so that's, once again, always a
fall back if the heuristic is too stupid.

Right now we inline *way* too much. I'd rather have people have to add
inline.

Theres no way to say "noinline"
:)


> 
> -- 
> Carlo Wood <carlo@alinoe.com>

-- 
"I have the world's largest collection of seashells.  I keep it
on all the beaches of the world...  Perhaps you've seen it.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 20:31   ` Daniel Berlin
@ 2001-07-09 21:05     ` Carlo Wood
  2001-07-09 21:26       ` Daniel Berlin
  0 siblings, 1 reply; 28+ messages in thread
From: Carlo Wood @ 2001-07-09 21:05 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote:
> > And suppose that f1() is *only* called from within f2().
> > Then does the size of f1 matter?
> 
> You are missing something important.
> 
> Whatever calls f2 will recursively inline f1 back into f2, for that
> particular inlining.
> 
> We recursively inline.

I understood that.  Thats the whole reason why I say that the size
of f1 doesn't matter: all you lose is one call-overhead, without
growing the size of the code at all.  Fergus also said:

"There is an exception to this.  Functions which are only called once,
and which are defined in this translation unit and not exported, should
almost always be inlined (after which the original copy of the function
is dead code and can be eliminated)"

To which you also disagreed.  There must be a misunderstanding here :),
I think Fergus is perfectly right and actually better worded what I wanted
to say.

> > Despite the fact that we have to compare "one million times a call overhead"
> > with the different object code growths, the sizes of f1, f2 and f3 relative
> > to eachother are totally irrelevant.
> 
> No, they aren't.
> 
> Once again, we recursively inline starting at some base function.
> 
> We will inline f3 and into main.
> f2 won't make the cut off for inlining into the f3 inline that goes
> into main.

That makes, thus, no sense.  Because inlining f2 into the f3 inline gains
*again* one million call-overheads.  It is true that it makes the code
effectively grow 1000 lines, but that is an absolute number and not something
that is relevant to compare relatively (with the size of f3 in this case,
or with the size of main+f3 for that matter).  The only thing that matters
is the actual size of f2 (1000 lines) and the number of times it is called
(1,000,000 times).  The sizes of main and f3 don't come into the picture,
and hence the proposed rule can't be correct.

> However, we'll inline f1 into f2 when we generate the non-inline code
> for f2 (which we do for all non-statics).
> 
> f2 is a big enough function that inlining into f3 is going to increase
> code size and processing time, but not buy you performance.

That depends on how often f3 is called.

> 
> People forget we recursively inline.

Please note that I did not forget that.  There must be some other misunderstanding
by me or you (heh - is *this* the case were one puts 'me' upfront instead of
second? ;).

> So if they were in relative size compared to main, we'd now (with my
> heuristic) inline main->f3->f2->f1 (IE main would call nothing
> anymore).

This decision should be based on the fact that f3 is inside a loop, and not
on the fact that main() has a considerable size, however.

> The problem is then that we also currently inline
> 
> f3->f2->f1
> f2->f1

I understand.  In this case one can easily do the following reasoning:
Knowing that f3 is _called_ (not inlined), it was apparently not important
enough to inline it.  Hence we are not inside a loop, and thus it is not
needed to inline f2 into f3.  Again however, I disagree that the size of
f2 *relative* to f3 matters; the only thing that matters in this case is
the absolute size of f2.  I'd change your heuristic rule to look at the
size of function in absolute terms, not relative compared to the function
that it is called from.

> When generating code for those.
> 
> Think of all the hidden functions.
> 
> So if your constructor calls a function, which calls two functions, we
> end up with
> inlined:
> main->your constructor->the function->two large functions->whatever they
> main->call->till we hit 10000 insns or run out of stuff to inline
> 
> Then we inline, when outputting the constructor
> 
> constructor->the function->two functions->etc
> 
> then 
> the function->two functions->etc
> 
> 
> Consider if more than just the constructor calls the function.
> 
> Everything that calls it, as long as we don't hit 10000 insns for the
> function at the top of the tree, will inline the the function->two functions->etc,
> whether it makes sense or not.
> 
> 
> What my heuristics would do is say
> 
> main->your constructor->... is fine
> your constructor->two large functions->... by itself doesn't make much
> sense.

I agree.  But based on the fact that it is unlikely that a constructor
that calls two large functions is within a loop to begin with.  Fortunately
here you talk about "a large function", instead of "a function large compared
to the constructor size".  That is a difference that matter and that should
be reflected in the patch.

> Whatever calls the constructor outside the compilation unit is
> incurring enough call overhead anyway that inlining these two large
> functions into the constructor just increases code size, not performance. 
> 
> each of the two large functions->... is fine

But... now I see something that I didn't see before :).
Perhaps the call-overhead is much less important then additional
optimizations that can be done AFTER inlining.  By not inlining we
might miss important optimizations.  This wouldn't speed up compile
time, but what about to TRY inlining and then make the decision
based on code size reduction that follows from it?  While a call-
overhead is rather constant (a few assembly instructions), reductions
as a result of actual inlining can be rather HUGE (also mentioned by
Timothy J. Wood (Nice last name) in his example of calling large
functions with constant parameters that cause large code blocks to
drop out).  Ok, that is too much asked... Lets say, we do what your
patch does but 1) look at the absolute size of the to-inline function,
2) inline like we inline now when any parameter passed is a constant?

-- 
Carlo Wood <carlo@alinoe.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 20:05 ` Timothy J. Wood
@ 2001-07-09 20:34   ` Daniel Berlin
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 20:34 UTC (permalink / raw)
  To: Timothy J. Wood; +Cc: Daniel Berlin, gcc

"Timothy J. Wood" <tjw@omnigroup.com> writes:

> On Monday, July 9, 2001, at 06:46  PM, Daniel Berlin wrote:
> [...]
>> We proceed to recursively inline 128 functions into this simple main
>> function
>> operator call. (Bet you didn't think there were 128 functions you could
>> inline here)
> 
>    IMO, the bug here is that the library you are using declares too
> many inlines.  I (personally) believe that 'inline' should be a
> command, not a request, and that a warning should be issued if
> inlining fails for some reason.  If you have the compiler decide to
> not inline stuff that is marked inline, then someone is going to come
> up with a case where they really need the inline where you didn't let
> them have it.  Then there will ensue a bunch of flags for controlling
> the inline limits and you're back to the programmer deciding when
> things should be inlined, but through a clumsier interface.  Or worse
> yet, they'll start using macros to get around the limits.
> 
> 
> 
> [...]
>> This is effectively expressing the rule: "Small functions should be
>> inlined into larger ones.  Larger functions should not be inlined into
>> small ones".
> 
> 
>    This rule is not true in general.  Just one example off the top of
> my head would be a case where you define a general inline function for
> performing some algorithm where some of the arguments are constants
> and you then define other 'real' functions that effectively pass
> configuration parameters (expecting the optimizer to treat them as
> constants and toss out different dead code in each case).
> 
> 
>    A silly little example might be:
> 
> static inline void generalCase(enum type t, enum operation op, ...)
> {
>      if (t == type1 || t == type3) {
> 	if (op == op1)
> 		block1;
> 	else
> 		block2;
>     } else if (t == type2 && op == op1) {
> 	block3;
>     }
> 	....
>}
> 
> 
> void t1op1(...)
> {
> 	generalCase(type1, op1, ...);
>}
> 
> void t2op2(...)
> {
> 	generalCase(type2, op2, ...);
>}
> 
> 
>    With this approach, you can often avoid needless duplication of
> code in the general case and can 'instantiate' the general case in
> each specific context and should be able to expect the compiler to
> blow away cases that can't be executed based on the inputs.

However, whatever calls t1op1 and t2op2 would recursively inline
generalcase, since generalcase is not as large as it.
It's the t1op1 and t2op2 compilations themselves that shouldn't have
general case inlined. You gain nothing.
If the calls to t1op1 and t2op2 were outside teh translation unit,
you'd be screwed anyway.
So i haven't reduced performance, just code size.

It's the *root* of the current inlining tree that we are comparing
relative to.
If it's t1op1 at the root, we shouldn't inline.
If it's something calling t1op1, we should inline both.

> 
>    Not only can make your code simpler, but you might have a couple
> common case that will perform better by having removed many of the
> branches from the general case.
> 
>    But, it is definitely the case, that the inlined function can be
> bigger than the function it is getting inlined into (which could just
> be a call to the inline!)
> 
> 
> -tim

-- 
"My watch is three hours fast, and I can't fix it.  So I'm going
to move to New York.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 19:56 ` Carlo Wood
@ 2001-07-09 20:31   ` Daniel Berlin
  2001-07-09 21:05     ` Carlo Wood
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 20:31 UTC (permalink / raw)
  To: Carlo Wood; +Cc: Daniel Berlin, gcc

Carlo Wood <carlo@alinoe.com> writes:

> On Mon, Jul 09, 2001 at 09:46:59PM -0400, Daniel Berlin wrote:
>> This is effectively expressing the rule: "Small functions should be
>> inlined into larger ones.  Larger functions should not be inlined into
>> small ones".
> 
> This doesn't seem logical.  I think I can follow the reasoning behing it
> though:
> 
> - The overhead of a function call can be neglected for large functions,
>   therefore it only makes sense to worry about inlining and small functions.
>   As a result, only small functions are normally intented to be inlined
>   by the programmer.
> 
> This works perfectly in C, but for some reason C++ is different: there is
> being done a LOT more inlining.  Especially, many functions are inlined
> that are not even visible to the programmer.
> 
> Therefore I propose to let go the 'human' factor - and look at inlining
> from a mathematical point of view.  Suppose we have:
> 
> int f1(int i)
> {
>   // something
>   return result;
>}
> 
> int f2(int i)
> {
>   // uses f1() once
>   return result;
>}
> 
> And suppose that f1() is *only* called from within f2().
> Then does the size of f1 matter?

You are missing something important.

Whatever calls f2 will recursively inline f1 back into f2, for that
particular inlining.

We recursively inline.

So if you have

int f3(int i)
{
        ....
        f2(a);
}

f3 will inline f2, then expand f1 inline into f2.
The thing we *don't* want is to inline f1 into f2 for f2, if f1 was
much larger than f2.

> 
> So, back to the example:
> 
> int f1(int i)
> {
>   // 100 lines of code
>   return result;
>}
> 
> int f2(int i)
> {
>   // uses f1() once, 1000 lines of code
>   return result;
>}
> 
> int f3(int i)
> {
>   // uses f2 once, 10 lines of code
>   return result;
>}
> 
> int main(void)
> {
>   int s = 0;
>   for (int i = 0; i < 1000000; ++i)
>     s += f3(i); 
>   cout << s << '\n';
>   return 0;
>}
> 
> and,
> 
> by inlining f3 in main() we lose one million times a call overhead;

> by inlining recursively f2, we lose again one million times a call overhead;
> and by inlining recursively f1, we lose again one million times a call overhead.
> 
> Despite the fact that we have to compare "one million times a call overhead"
> with the different object code growths, the sizes of f1, f2 and f3 relative
> to eachother are totally irrelevant.

No, they aren't.

Once again, we recursively inline starting at some base function.

We will inline f3 and into main.
f2 won't make the cut off for inlining into the f3 inline that goes
into main.

However, we'll inline f1 into f2 when we generate the non-inline code
for f2 (which we do for all non-statics).

f2 is a big enough function that inlining into f3 is going to increase
code size and processing time, but not buy you performance.



People forget we recursively inline.

So if they were in relative size compared to main, we'd now (with my
heuristic) inline main->f3->f2->f1 (IE main would call nothing
anymore).

The problem is then that we also currently inline

f3->f2->f1
f2->f1


When generating code for those.

Think of all the hidden functions.

So if your constructor calls a function, which calls two functions, we
end up with
inlined:
main->your constructor->the function->two large functions->whatever they
main->call->till we hit 10000 insns or run out of stuff to inline

Then we inline, when outputting the constructor

constructor->the function->two functions->etc

then 
the function->two functions->etc


Consider if more than just the constructor calls the function.

Everything that calls it, as long as we don't hit 10000 insns for the
function at the top of the tree, will inline the the function->two functions->etc,
whether it makes sense or not.


What my heuristics would do is say

main->your constructor->... is fine
your constructor->two large functions->... by itself doesn't make much
sense. Whatever calls the constructor outside the compilation unit is
incurring enough call overhead anyway that inlining these two large
functions into the constructor just increases code size, not performance. 

each of the two large functions->... is fine



-- 
"I was going 70 miles an hour and got stopped by a cop who said,
"Do you know the speed limit is 55 miles per hour?"  "Yes,
officer, but I wasn't going to be out that long..."
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 20:13 dewar
@ 2001-07-09 20:23 ` Joe Buck
  0 siblings, 0 replies; 28+ messages in thread
From: Joe Buck @ 2001-07-09 20:23 UTC (permalink / raw)
  To: dewar; +Cc: dan, fjh, gcc

> The style in Ada is for people to use pragma Inline to specify what should
> be inlined, and programmers generally specify only very small functions for
> inlining.

Same in C++ (with the addition that functions whose definitions appear in
the class declaration rather than outside it are implicitly inline).  We
might want to address the matter by looking at libstdc++ to see if
excessively large functions are declared inline.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 19:22 ` Fergus Henderson
@ 2001-07-09 20:18   ` Daniel Berlin
  2001-07-09 23:30   ` Mark Mitchell
  1 sibling, 0 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 20:18 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: Daniel Berlin, gcc

Fergus Henderson <fjh@cs.mu.oz.au> writes:

> On 09-Jul-2001, Daniel Berlin <dan@cgsoftware.com> wrote:
>> I've got a very simple heuristic that says "don't inline things <some
>> configurable number currently defaulting to 10> times bigger than we
>> were when we started inlining, into us" I.E. don't inline a 100
>> statement function into a 10 statement one.
>> 
>> This is effectively expressing the rule: "Small functions should be
>> inlined into larger ones.  Larger functions should not be inlined into
>> small ones".
> 
> There is an exception to this.  Functions which are only called once,
> and which are defined in this translation unit and not exported, should
> almost always be inlined (after which the original copy of the function
> is dead code and can be eliminated), even if they are much larger than
> the function into which they are being inlined.

They will be, if some other, larger function, calls that smaller
function.

> 
> For this to work, you need to do parse and analyze the whole translation
> unit before emitting code, to get information about call counts, and
> to enable elimination of dead functions.
Correct.
But this isn't as costly as you think in terms of memory, and trees we
can store to disk if we really had to (precompiled headers, at least
the way redhat did them, does this, in fact)
My memory usage using defer_fn instead of expand_body, in parse.y, and
a few other places, only increased an average of 25%.
It's not the AST that is expensive, and if it was, that's not hard to
fix. 
> 
> (Is compiling whole translation units at a time, rather than compiling
> each function in turn, what you meant by "region-based compilation"?
> 
Almost.
That involves a ton of memory and compile time.
To quote the abstract for tha paper "Region based compilation: An
introduction and motivation"
"
   As the amount of instruction-level parallelism required to fully
   utilize VLIW and superscalar processors increases,
   compilers must perform increasingly more aggressive analysis,
   optimization, parallelization and scheduling on the input
   programs. Traditionally compilers have been built assuming
   functions as the unit of compilation, making compiler
   performance heavily dependent upon the contents of the
   function. The function-based partitioning tends to hide valuable
   optimization opportunities form the compiler making it
   undesirable. Function inlining may be applied to assemble
   inter-procedurally coupled functions into the same compilation unit
   at the cost of very large function bodies. This paper
   introduces a new technique called region-based compilation where
   the compiler is allowed to repartition the program into
   more desirable compilation units. Region-based compilation units
   allow the compiler to control problem size while exposing
   inter-procedural optimization and code motion opportunities.
"
Which is pretty much exactly the problem we've run into.
Except, unlike straight, flat, interprocedural analysis applied to an
entire program, regions give you comparable performance, and with
demand driven region formation, reduced memory usage and reduced
compilation times. And it would be easier to modify gcc to do region
based stuff than interprocedural stuff (because we have a large dependency on single function at a time)

The best part is that papers have been showing solid experimental evidence (Hank, Hwu,
Rau) that regions are often smaller than functions, so we'd probably
actually get *faster* at optimizing.

A good paper on the topic that is pretty easy to read, but has a long
title, is: Region Formation Analysis with Demand-driven Inlining for
Region-based Optimization
http://www.eecis.udel.edu/~way/papers/pact2000.ps.gz


-- 
"I invented the cordless extension cord.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
@ 2001-07-09 20:13 dewar
  2001-07-09 20:23 ` Joe Buck
  0 siblings, 1 reply; 28+ messages in thread
From: dewar @ 2001-07-09 20:13 UTC (permalink / raw)
  To: dan, fjh; +Cc: gcc

Generally there is no point in inlining large functions at all. The 
overhead these days of a call is not that great. 10,000 insns seems
a silly limit, perhaps two orders of magnitude two large.

The style in Ada is for people to use pragma Inline to specify what should
be inlined, and programmers generally specify only very small functions for
inlining.

We consistently find that the use of -O3 is a bad idea and pessimizes most
code, we advise all our users against using any implicit inlining. This is
with GCC 2.8.1, but I would guess that the same advice applies (perhaps even
more so) to GCC 3.x (we have not made -O3 measurements here with Ada).

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 18:47 Daniel Berlin
  2001-07-09 19:22 ` Fergus Henderson
  2001-07-09 19:56 ` Carlo Wood
@ 2001-07-09 20:05 ` Timothy J. Wood
  2001-07-09 20:34   ` Daniel Berlin
  2001-07-09 23:26 ` Mark Mitchell
  3 siblings, 1 reply; 28+ messages in thread
From: Timothy J. Wood @ 2001-07-09 20:05 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Monday, July 9, 2001, at 06:46  PM, Daniel Berlin wrote:
[...]
> We proceed to recursively inline 128 functions into this simple main 
> function
> operator call. (Bet you didn't think there were 128 functions you could
> inline here)

   IMO, the bug here is that the library you are using declares too many 
inlines.  I (personally) believe that 'inline' should be a command, not 
a request, and that a warning should be issued if inlining fails for 
some reason.  If you have the compiler decide to not inline stuff that 
is marked inline, then someone is going to come up with a case where 
they really need the inline where you didn't let them have it.  Then 
there will ensue a bunch of flags for controlling the inline limits and 
you're back to the programmer deciding when things should be inlined, 
but through a clumsier interface.  Or worse yet, they'll start using 
macros to get around the limits.



[...]
> This is effectively expressing the rule: "Small functions should be
> inlined into larger ones.  Larger functions should not be inlined into
> small ones".


   This rule is not true in general.  Just one example off the top of my 
head would be a case where you define a general inline function for 
performing some algorithm where some of the arguments are constants and 
you then define other 'real' functions that effectively pass 
configuration parameters (expecting the optimizer to treat them as 
constants and toss out different dead code in each case).


   A silly little example might be:

static inline void generalCase(enum type t, enum operation op, ...)
{
     if (t == type1 || t == type3) {
	if (op == op1)
		block1;
	else
		block2;
     } else if (t == type2 && op == op1) {
	block3;
     }
	....
}


void t1op1(...)
{
	generalCase(type1, op1, ...);
}

void t2op2(...)
{
	generalCase(type2, op2, ...);
}


   With this approach, you can often avoid needless duplication of code 
in the general case and can 'instantiate' the general case in each 
specific context and should be able to expect the compiler to blow away 
cases that can't be executed based on the inputs.

   Not only can make your code simpler, but you might have a couple 
common case that will perform better by having removed many of the 
branches from the general case.

   But, it is definitely the case, that the inlined function can be 
bigger than the function it is getting inlined into (which could just be 
a call to the inline!)


-tim

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 18:47 Daniel Berlin
  2001-07-09 19:22 ` Fergus Henderson
@ 2001-07-09 19:56 ` Carlo Wood
  2001-07-09 20:31   ` Daniel Berlin
  2001-07-09 20:05 ` Timothy J. Wood
  2001-07-09 23:26 ` Mark Mitchell
  3 siblings, 1 reply; 28+ messages in thread
From: Carlo Wood @ 2001-07-09 19:56 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Mon, Jul 09, 2001 at 09:46:59PM -0400, Daniel Berlin wrote:
> This is effectively expressing the rule: "Small functions should be
> inlined into larger ones.  Larger functions should not be inlined into
> small ones".

This doesn't seem logical.  I think I can follow the reasoning behing it
though:

- The overhead of a function call can be neglected for large functions,
  therefore it only makes sense to worry about inlining and small functions.
  As a result, only small functions are normally intented to be inlined
  by the programmer.

This works perfectly in C, but for some reason C++ is different: there is
being done a LOT more inlining.  Especially, many functions are inlined
that are not even visible to the programmer.

Therefore I propose to let go the 'human' factor - and look at inlining
from a mathematical point of view.  Suppose we have:

int f1(int i)
{
  // something
  return result;
}

int f2(int i)
{
  // uses f1() once
  return result;
}

And suppose that f1() is *only* called from within f2().
Then does the size of f1 matter?

The ONLY trade off that can be made concerning inlining is
that of call-overhead against program size *growth*.

The variables that are involved are:
1) size of the function
2) whether or not the function needs external linkage
   (size factor).
3) for each inlining decision seperately: how often
   a specific source line will be hit (call-overhead
   factor).

Obviously, the latter is unknown at the time the compiler
needs to make the decision whether to inline or not.
But I think we'll all agree that calling a function 10
times isn't worth an inline... Actually, I am pretty sure
that there is only ONE reason why inlining makes sense:
when it is within a loop somehow (a BIG loop).
When a function is called 100,000 times I start to get
interested in inlining ;).

So, back to the example:

int f1(int i)
{
  // 100 lines of code
  return result;
}

int f2(int i)
{
  // uses f1() once, 1000 lines of code
  return result;
}

int f3(int i)
{
  // uses f2 once, 10 lines of code
  return result;
}

int main(void)
{
  int s = 0;
  for (int i = 0; i < 1000000; ++i)
    s += f3(i); 
  cout << s << '\n';
  return 0;
}

and,

by inlining f3 in main() we lose one million times a call overhead;
by inlining recursively f2, we lose again one million times a call overhead;
and by inlining recursively f1, we lose again one million times a call overhead.

Despite the fact that we have to compare "one million times a call overhead"
with the different object code growths, the sizes of f1, f2 and f3 relative
to eachother are totally irrelevant.

> Where what is considered "larger" is fudged by a factor.
> 
> If you have a one line member, that calls a huge function, whatever
> *calls* that member will inline the member, and then whatever that member calls
> will probably beinlined, but the the externally visible member call itself (ie the non-inlined one) will not
> inline the huge function into itself.
> 
> Which seems to make sense to me.

I have Real Life examples of the opposite, where I needed the large function
to be inlined in the smaller one.  The reason being that I had many small
functions with a little difference, and a bulk functionality that had no
difference and for which I didn't want to add duplicated code but also
didn't want to add call over-head.

> However, this doesn't control code growth, it could inline a lot of
> small functions until it hits the limit.
> 
> So the other heuristic i've added controls code growth by saying we don't
> want to increase the number of statements in the function by more than
> a factor of x over what we started with (when we started inlining).
> 
> An expression of the rule: "Don't make the function more than x times bigger than
> it originally was".

I think is only marginal important - perhaps even neglectible.  If, say, there are
5 (small) functions that inline eachother, then that adds a factor of 5 to
whatever we gave as weight to a call-overhead - which is essentially not much
(a loop of 1000000 or 200000, in *practise* being an unknown loop size:
 unknown == 5 * unknown, for me).

PS I am not against your patch, because I think that in reality programmers
   always will mark functions 'inline' or put them in the class declaration
   when it really matters.  Hopefully you *ALWAYS* inline functions that
   are marked 'inline' and/or defined in the class declaration - then it is
   fine with me whatever g++ does to reduce code size and compile time ;).

-- 
Carlo Wood <carlo@alinoe.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Inlining heuristics for C++
  2001-07-09 18:47 Daniel Berlin
@ 2001-07-09 19:22 ` Fergus Henderson
  2001-07-09 20:18   ` Daniel Berlin
  2001-07-09 23:30   ` Mark Mitchell
  2001-07-09 19:56 ` Carlo Wood
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Fergus Henderson @ 2001-07-09 19:22 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On 09-Jul-2001, Daniel Berlin <dan@cgsoftware.com> wrote:
> I've got a very simple heuristic that says "don't inline things <some
> configurable number currently defaulting to 10> times bigger than we
> were when we started inlining, into us" I.E. don't inline a 100
> statement function into a 10 statement one.
> 
> This is effectively expressing the rule: "Small functions should be
> inlined into larger ones.  Larger functions should not be inlined into
> small ones".

There is an exception to this.  Functions which are only called once,
and which are defined in this translation unit and not exported, should
almost always be inlined (after which the original copy of the function
is dead code and can be eliminated), even if they are much larger than
the function into which they are being inlined.

For this to work, you need to do parse and analyze the whole translation
unit before emitting code, to get information about call counts, and
to enable elimination of dead functions.

(Is compiling whole translation units at a time, rather than compiling
each function in turn, what you meant by "region-based compilation"? 
I found that term confusing -- at first I thought you were talking
about region-based memory management, as in the SML kit with Regions,
which is a very different thing.)

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Inlining heuristics for C++
@ 2001-07-09 18:47 Daniel Berlin
  2001-07-09 19:22 ` Fergus Henderson
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Daniel Berlin @ 2001-07-09 18:47 UTC (permalink / raw)
  To: gcc

Right now, they are horrific.

As I recently posted, a simple 5 line C++ program using map and string
in normal ways gets inlined from 300 insns to 15k.

This is because we have a few major deficiencies in our inlining
heuristics.

Mainly, we never consider *not* inlining something unless we can't for some
reason (IE it's a varargs function), or we've hit the limit of a
supposed 10000 insns (which really translated into 15k in reality).

Consider the following:

1 main function.

using namespace std;
int main(void)
{
        map<string, string> simple;
        simple["test"]="test";
}

We'll ignore what happens if we init with arguments that could be
inlined. 

We proceed to recursively inline 128 functions into this simple main function
operator call. (Bet you didn't think there were 128 functions you could
inline here)

Why?
Because we never make *any* choices.
Until main is 15k insns long, we won't stop inlining.

We literally inline *everything* we still have the tree's for.

We'll even inline if say, the destruction function for foo or a string
is 50 times bigger than the function it's being inlined into.

I've got a very simple heuristic that says "don't inline things <some
configurable number currently defaulting to 10> times bigger than we
were when we started inlining, into us" I.E. don't inline a 100
statement function into a 10 statement one.

This is effectively expressing the rule: "Small functions should be
inlined into larger ones.  Larger functions should not be inlined into
small ones".

Where what is considered "larger" is fudged by a factor.

If you have a one line member, that calls a huge function, whatever
*calls* that member will inline the member, and then whatever that member calls
will probably beinlined, but the the externally visible member call itself (ie the non-inlined one) will not
inline the huge function into itself.

Which seems to make sense to me.

However, this doesn't control code growth, it could inline a lot of
small functions until it hits the limit.

So the other heuristic i've added controls code growth by saying we don't
want to increase the number of statements in the function by more than
a factor of x over what we started with (when we started inlining).

An expression of the rule: "Don't make the function more than x times bigger than
it originally was".

By comparison, Right now we only have one rule "Don't make the function larger than
MAX_INLINE_INSNS".

So we end up, even with simple looking code, with functions that are
generally MAX_INLINE_INSNS long.

And take *forever* to compile.

Are these generally acceptable heuristics to add?
They seem pretty common sense to me, maybe i'm missing something
however, I wanted to check before I submitted a patch.
They certainly reduce compile time, and only seem to stop inlining
huge functions into smaller ones. Which are the cases where the
performance probably didn't increase, since we are recursively
inlining (and thus, if there were relatively larger functions calling
the smaller one, both the smaller one, and what it called, would be
inlined back into that function, obviating the real need to inline the
larger into the smaller on it's own). 

--Dan
-- 
"I can levitate birds.  No one cares.
"-Steven Wright

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2001-07-11 14:27 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-10 14:53 Inlining heuristics for C++ mike stump
2001-07-10 15:04 ` Daniel Berlin
     [not found] <87r8vpo8rw.fsf@cgsoftware.com.suse.lists.egcs>
     [not found] ` <20010710122244.A14584@hg.cs.mu.oz.au.suse.lists.egcs>
2001-07-10  2:42   ` Andi Kleen
2001-07-10 14:16     ` Hartmut Schirmer
2001-07-11 14:27       ` Fergus Henderson
  -- strict thread matches above, loose matches on Subject: below --
2001-07-09 21:49 dewar
2001-07-09 22:17 ` Daniel Berlin
2001-07-09 20:13 dewar
2001-07-09 20:23 ` Joe Buck
2001-07-09 18:47 Daniel Berlin
2001-07-09 19:22 ` Fergus Henderson
2001-07-09 20:18   ` Daniel Berlin
2001-07-09 23:30   ` Mark Mitchell
2001-07-09 19:56 ` Carlo Wood
2001-07-09 20:31   ` Daniel Berlin
2001-07-09 21:05     ` Carlo Wood
2001-07-09 21:26       ` Daniel Berlin
2001-07-09 21:45         ` Carlo Wood
2001-07-09 20:05 ` Timothy J. Wood
2001-07-09 20:34   ` Daniel Berlin
2001-07-09 23:26 ` Mark Mitchell
2001-07-09 23:37   ` Daniel Jacobowitz
2001-07-10  0:00   ` Fergus Henderson
2001-07-10  0:13     ` Daniel Berlin
2001-07-10  0:09   ` Daniel Berlin
2001-07-10  0:20   ` Daniel Berlin
2001-07-10  0:30   ` Daniel Berlin
2001-07-10  2:47   ` Nathan Sidwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).