* Re: Inlining heuristics for C++ [not found] ` <20010710122244.A14584@hg.cs.mu.oz.au.suse.lists.egcs> @ 2001-07-10 2:42 ` Andi Kleen 2001-07-10 14:16 ` Hartmut Schirmer 0 siblings, 1 reply; 28+ messages in thread From: Andi Kleen @ 2001-07-10 2:42 UTC (permalink / raw) To: Fergus Henderson; +Cc: dan, gcc Fergus Henderson <fjh@cs.mu.oz.au> writes: > > There is an exception to this. Functions which are only called once, > and which are defined in this translation unit and not exported, should > almost always be inlined (after which the original copy of the function > is dead code and can be eliminated), even if they are much larger than > the function into which they are being inlined. This has some bad interactions with gcc's current register allocator though. It cannot split register variable livetimes currently. When bigger functions are inlined then the variables in the outer function (with their scope extending over the function call) have a much smaller chance to get a register because they're usually already used up by the inlined bigger function. With a function call gcc can save/restore them around the function; with inline it tends to use stack variables for the "outside" variables which often hurts. Of course it would be best to fix the register allocator to still do a bettre job. Short term I suspect e.g. on register poor machines like the x86 which also has fast CALL it's probably best to avoid bigger inlines at least when there are many variables alive in the caller or the call happens in a loop. -Andi ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-10 2:42 ` Inlining heuristics for C++ Andi Kleen @ 2001-07-10 14:16 ` Hartmut Schirmer 2001-07-11 14:27 ` Fergus Henderson 0 siblings, 1 reply; 28+ messages in thread From: Hartmut Schirmer @ 2001-07-10 14:16 UTC (permalink / raw) To: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1778 bytes --] On Tue, 10 Jul 2001, Andi Kleen wrote: >Fergus Henderson <fjh@cs.mu.oz.au> writes: >> >> There is an exception to this. Functions which are only called once, >> and which are defined in this translation unit and not exported, should >> almost always be inlined (after which the original copy of the function >> is dead code and can be eliminated), even if they are much larger than >> the function into which they are being inlined. > >This has some bad interactions with gcc's current register allocator though. >It cannot split register variable livetimes currently. When bigger functions >are inlined then the variables in the outer function (with their scope >extending over the function call) have a much smaller chance to get a >register because they're usually already used up by the inlined bigger >function. With a function call gcc can save/restore them around the >function; with inline it tends to use stack variables for the "outside" >variables which often hurts. > >Of course it would be best to fix the register allocator to still do a >bettre job. Short term I suspect e.g. on register poor machines >like the x86 which also has fast CALL it's probably best to avoid bigger >inlines at least when there are many variables alive in the caller or >the call happens in a loop. Inlining may increase the required stack space. Without a virtual growing stack this effect can kill your application. Better to stop inlining if the amount of local storage required is increased by some (small) factor. We also shouldn´t inline functions in branches that aren´t predicted to be excecuted, at least take the __buildin_expect into account here. Is there a chance to get the RTL inliner handling all those cases the tree inliner didn´t catch ? Hartmut ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-10 14:16 ` Hartmut Schirmer @ 2001-07-11 14:27 ` Fergus Henderson 0 siblings, 0 replies; 28+ messages in thread From: Fergus Henderson @ 2001-07-11 14:27 UTC (permalink / raw) To: Hartmut Schirmer; +Cc: gcc [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 551 bytes --] On 10-Jul-2001, Hartmut Schirmer <hartmut.schirmer@arcormail.de> wrote: > We also shouldn´t inline functions in branches that aren´t > predicted to be excecuted, at least take the __buildin_expect > into account here. That reminds me of another heuristic: don't inline functions marked __attribute__((noreturn)). -- Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ @ 2001-07-10 14:53 mike stump 2001-07-10 15:04 ` Daniel Berlin 0 siblings, 1 reply; 28+ messages in thread From: mike stump @ 2001-07-10 14:53 UTC (permalink / raw) To: dan, gcc > To: gcc@gcc.gnu.org > From: Daniel Berlin <dan@cgsoftware.com> > Date: Mon, 09 Jul 2001 21:46:59 -0400 > Right now, they are horrific. You should have seen it before we limited the growth of the function being inlined into. :-( Anyway, would be nice if someone could take a pass through the literature and find the good stuff that describes when to do it and when not to and give us pointers to it. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-10 14:53 mike stump @ 2001-07-10 15:04 ` Daniel Berlin 0 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-10 15:04 UTC (permalink / raw) To: mike stump; +Cc: dan, gcc mike stump <mrs@windriver.com> writes: >> To: gcc@gcc.gnu.org >> From: Daniel Berlin <dan@cgsoftware.com> >> Date: Mon, 09 Jul 2001 21:46:59 -0400 > >> Right now, they are horrific. > > You should have seen it before we limited the growth of the function > being inlined into. :-( > > Anyway, would be nice if someone could take a pass through the > literature and find the good stuff that describes when to do it and > when not to and give us pointers to it. I already did this. Most of the literature is actually pretty concise. You want to look at "Aggressive Inlining" (It's a desccription of what HP's compilers do. I posted the author names in a seperate message, it's frmo PLDI '97), "Training compilers for better inlining Decisions" (Dean and Chambers, 1993), and "Fast and Effective Procedure Inlining" (Waddell, Dybvig, 1997). As soon as the basic block on trees stuff goes in, i'll implement profile directed cloning and inlining, and we'll be just like all the other commercial compilers. -- "My neighbor has a circular driveway... He can't get out. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++
@ 2001-07-09 21:49 dewar
2001-07-09 22:17 ` Daniel Berlin
0 siblings, 1 reply; 28+ messages in thread
From: dewar @ 2001-07-09 21:49 UTC (permalink / raw)
To: carlo, dan; +Cc: gcc
<<to thank you for writing this (I mean that). I am reassured too, that
functions marked as inline still WILL be inlined. If you add point 2)
>>
Personally, I don't feel that such reassurance is necessary. Obviously
a compiler is not required to inline things that are marked as inlined
(since basically inlining is semantically neutral). It is perfectly
reasonable for a compiler to take an inlining request as basically
a request to inline *if* time performance is improved, I think it is
always fine for a compiler to ignore an inlining request if inlining
would be detrimental to both time and space performance.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 21:49 dewar @ 2001-07-09 22:17 ` Daniel Berlin 0 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 22:17 UTC (permalink / raw) To: dewar; +Cc: carlo, dan, gcc dewar@gnat.com writes: > <<to thank you for writing this (I mean that). I am reassured too, that > functions marked as inline still WILL be inlined. If you add point 2) >>> > > Personally, I don't feel that such reassurance is necessary. Obviously > a compiler is not required to inline things that are marked as inlined > (since basically inlining is semantically neutral). It is perfectly > reasonable for a compiler to take an inlining request as basically > a request to inline *if* time performance is improved, I think it is > always fine for a compiler to ignore an inlining request if inlining > would be detrimental to both time and space performance. Hopefully, Diego's tree optimizer work will start to clear the way towards doing smart inlining (based on static or real profiling info). And for the loop cases, we can do procedure cloning anyway (clone the procedure to a new name, with the constant arguments set constant, and make the loop/anything else with those arguments call the new procedure). -- "I watched the Indy 500, and I was thinking that if they left earlier they wouldn't have to go so fast. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ @ 2001-07-09 20:13 dewar 2001-07-09 20:23 ` Joe Buck 0 siblings, 1 reply; 28+ messages in thread From: dewar @ 2001-07-09 20:13 UTC (permalink / raw) To: dan, fjh; +Cc: gcc Generally there is no point in inlining large functions at all. The overhead these days of a call is not that great. 10,000 insns seems a silly limit, perhaps two orders of magnitude two large. The style in Ada is for people to use pragma Inline to specify what should be inlined, and programmers generally specify only very small functions for inlining. We consistently find that the use of -O3 is a bad idea and pessimizes most code, we advise all our users against using any implicit inlining. This is with GCC 2.8.1, but I would guess that the same advice applies (perhaps even more so) to GCC 3.x (we have not made -O3 measurements here with Ada). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 20:13 dewar @ 2001-07-09 20:23 ` Joe Buck 0 siblings, 0 replies; 28+ messages in thread From: Joe Buck @ 2001-07-09 20:23 UTC (permalink / raw) To: dewar; +Cc: dan, fjh, gcc > The style in Ada is for people to use pragma Inline to specify what should > be inlined, and programmers generally specify only very small functions for > inlining. Same in C++ (with the addition that functions whose definitions appear in the class declaration rather than outside it are implicitly inline). We might want to address the matter by looking at libstdc++ to see if excessively large functions are declared inline. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Inlining heuristics for C++ @ 2001-07-09 18:47 Daniel Berlin 2001-07-09 19:22 ` Fergus Henderson ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 18:47 UTC (permalink / raw) To: gcc Right now, they are horrific. As I recently posted, a simple 5 line C++ program using map and string in normal ways gets inlined from 300 insns to 15k. This is because we have a few major deficiencies in our inlining heuristics. Mainly, we never consider *not* inlining something unless we can't for some reason (IE it's a varargs function), or we've hit the limit of a supposed 10000 insns (which really translated into 15k in reality). Consider the following: 1 main function. using namespace std; int main(void) { map<string, string> simple; simple["test"]="test"; } We'll ignore what happens if we init with arguments that could be inlined. We proceed to recursively inline 128 functions into this simple main function operator call. (Bet you didn't think there were 128 functions you could inline here) Why? Because we never make *any* choices. Until main is 15k insns long, we won't stop inlining. We literally inline *everything* we still have the tree's for. We'll even inline if say, the destruction function for foo or a string is 50 times bigger than the function it's being inlined into. I've got a very simple heuristic that says "don't inline things <some configurable number currently defaulting to 10> times bigger than we were when we started inlining, into us" I.E. don't inline a 100 statement function into a 10 statement one. This is effectively expressing the rule: "Small functions should be inlined into larger ones. Larger functions should not be inlined into small ones". Where what is considered "larger" is fudged by a factor. If you have a one line member, that calls a huge function, whatever *calls* that member will inline the member, and then whatever that member calls will probably beinlined, but the the externally visible member call itself (ie the non-inlined one) will not inline the huge function into itself. Which seems to make sense to me. However, this doesn't control code growth, it could inline a lot of small functions until it hits the limit. So the other heuristic i've added controls code growth by saying we don't want to increase the number of statements in the function by more than a factor of x over what we started with (when we started inlining). An expression of the rule: "Don't make the function more than x times bigger than it originally was". By comparison, Right now we only have one rule "Don't make the function larger than MAX_INLINE_INSNS". So we end up, even with simple looking code, with functions that are generally MAX_INLINE_INSNS long. And take *forever* to compile. Are these generally acceptable heuristics to add? They seem pretty common sense to me, maybe i'm missing something however, I wanted to check before I submitted a patch. They certainly reduce compile time, and only seem to stop inlining huge functions into smaller ones. Which are the cases where the performance probably didn't increase, since we are recursively inlining (and thus, if there were relatively larger functions calling the smaller one, both the smaller one, and what it called, would be inlined back into that function, obviating the real need to inline the larger into the smaller on it's own). --Dan -- "I can levitate birds. No one cares. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 18:47 Daniel Berlin @ 2001-07-09 19:22 ` Fergus Henderson 2001-07-09 20:18 ` Daniel Berlin 2001-07-09 23:30 ` Mark Mitchell 2001-07-09 19:56 ` Carlo Wood ` (2 subsequent siblings) 3 siblings, 2 replies; 28+ messages in thread From: Fergus Henderson @ 2001-07-09 19:22 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On 09-Jul-2001, Daniel Berlin <dan@cgsoftware.com> wrote: > I've got a very simple heuristic that says "don't inline things <some > configurable number currently defaulting to 10> times bigger than we > were when we started inlining, into us" I.E. don't inline a 100 > statement function into a 10 statement one. > > This is effectively expressing the rule: "Small functions should be > inlined into larger ones. Larger functions should not be inlined into > small ones". There is an exception to this. Functions which are only called once, and which are defined in this translation unit and not exported, should almost always be inlined (after which the original copy of the function is dead code and can be eliminated), even if they are much larger than the function into which they are being inlined. For this to work, you need to do parse and analyze the whole translation unit before emitting code, to get information about call counts, and to enable elimination of dead functions. (Is compiling whole translation units at a time, rather than compiling each function in turn, what you meant by "region-based compilation"? I found that term confusing -- at first I thought you were talking about region-based memory management, as in the SML kit with Regions, which is a very different thing.) -- Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 19:22 ` Fergus Henderson @ 2001-07-09 20:18 ` Daniel Berlin 2001-07-09 23:30 ` Mark Mitchell 1 sibling, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 20:18 UTC (permalink / raw) To: Fergus Henderson; +Cc: Daniel Berlin, gcc Fergus Henderson <fjh@cs.mu.oz.au> writes: > On 09-Jul-2001, Daniel Berlin <dan@cgsoftware.com> wrote: >> I've got a very simple heuristic that says "don't inline things <some >> configurable number currently defaulting to 10> times bigger than we >> were when we started inlining, into us" I.E. don't inline a 100 >> statement function into a 10 statement one. >> >> This is effectively expressing the rule: "Small functions should be >> inlined into larger ones. Larger functions should not be inlined into >> small ones". > > There is an exception to this. Functions which are only called once, > and which are defined in this translation unit and not exported, should > almost always be inlined (after which the original copy of the function > is dead code and can be eliminated), even if they are much larger than > the function into which they are being inlined. They will be, if some other, larger function, calls that smaller function. > > For this to work, you need to do parse and analyze the whole translation > unit before emitting code, to get information about call counts, and > to enable elimination of dead functions. Correct. But this isn't as costly as you think in terms of memory, and trees we can store to disk if we really had to (precompiled headers, at least the way redhat did them, does this, in fact) My memory usage using defer_fn instead of expand_body, in parse.y, and a few other places, only increased an average of 25%. It's not the AST that is expensive, and if it was, that's not hard to fix. > > (Is compiling whole translation units at a time, rather than compiling > each function in turn, what you meant by "region-based compilation"? > Almost. That involves a ton of memory and compile time. To quote the abstract for tha paper "Region based compilation: An introduction and motivation" " As the amount of instruction-level parallelism required to fully utilize VLIW and superscalar processors increases, compilers must perform increasingly more aggressive analysis, optimization, parallelization and scheduling on the input programs. Traditionally compilers have been built assuming functions as the unit of compilation, making compiler performance heavily dependent upon the contents of the function. The function-based partitioning tends to hide valuable optimization opportunities form the compiler making it undesirable. Function inlining may be applied to assemble inter-procedurally coupled functions into the same compilation unit at the cost of very large function bodies. This paper introduces a new technique called region-based compilation where the compiler is allowed to repartition the program into more desirable compilation units. Region-based compilation units allow the compiler to control problem size while exposing inter-procedural optimization and code motion opportunities. " Which is pretty much exactly the problem we've run into. Except, unlike straight, flat, interprocedural analysis applied to an entire program, regions give you comparable performance, and with demand driven region formation, reduced memory usage and reduced compilation times. And it would be easier to modify gcc to do region based stuff than interprocedural stuff (because we have a large dependency on single function at a time) The best part is that papers have been showing solid experimental evidence (Hank, Hwu, Rau) that regions are often smaller than functions, so we'd probably actually get *faster* at optimizing. A good paper on the topic that is pretty easy to read, but has a long title, is: Region Formation Analysis with Demand-driven Inlining for Region-based Optimization http://www.eecis.udel.edu/~way/papers/pact2000.ps.gz -- "I invented the cordless extension cord. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 19:22 ` Fergus Henderson 2001-07-09 20:18 ` Daniel Berlin @ 2001-07-09 23:30 ` Mark Mitchell 1 sibling, 0 replies; 28+ messages in thread From: Mark Mitchell @ 2001-07-09 23:30 UTC (permalink / raw) To: Fergus Henderson, Daniel Berlin; +Cc: gcc > For this to work, you need to do parse and analyze the whole translation > unit before emitting code, to get information about call counts, and > to enable elimination of dead functions. Doing this in the C/C++ front-ends would be approximately trivial, by the way. But, it would exacerbate our memory usage problems even more; until we get them under control, I think we have to stick with just doing a function at a time. Some compilers write out the bodies of function to disk and read them back; that makes it easier to keep the entire translation unit in memory without having a giant memory image. -- Mark Mitchell mark@codesourcery.com CodeSourcery, LLC http://www.codesourcery.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 18:47 Daniel Berlin 2001-07-09 19:22 ` Fergus Henderson @ 2001-07-09 19:56 ` Carlo Wood 2001-07-09 20:31 ` Daniel Berlin 2001-07-09 20:05 ` Timothy J. Wood 2001-07-09 23:26 ` Mark Mitchell 3 siblings, 1 reply; 28+ messages in thread From: Carlo Wood @ 2001-07-09 19:56 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Mon, Jul 09, 2001 at 09:46:59PM -0400, Daniel Berlin wrote: > This is effectively expressing the rule: "Small functions should be > inlined into larger ones. Larger functions should not be inlined into > small ones". This doesn't seem logical. I think I can follow the reasoning behing it though: - The overhead of a function call can be neglected for large functions, therefore it only makes sense to worry about inlining and small functions. As a result, only small functions are normally intented to be inlined by the programmer. This works perfectly in C, but for some reason C++ is different: there is being done a LOT more inlining. Especially, many functions are inlined that are not even visible to the programmer. Therefore I propose to let go the 'human' factor - and look at inlining from a mathematical point of view. Suppose we have: int f1(int i) { // something return result; } int f2(int i) { // uses f1() once return result; } And suppose that f1() is *only* called from within f2(). Then does the size of f1 matter? The ONLY trade off that can be made concerning inlining is that of call-overhead against program size *growth*. The variables that are involved are: 1) size of the function 2) whether or not the function needs external linkage (size factor). 3) for each inlining decision seperately: how often a specific source line will be hit (call-overhead factor). Obviously, the latter is unknown at the time the compiler needs to make the decision whether to inline or not. But I think we'll all agree that calling a function 10 times isn't worth an inline... Actually, I am pretty sure that there is only ONE reason why inlining makes sense: when it is within a loop somehow (a BIG loop). When a function is called 100,000 times I start to get interested in inlining ;). So, back to the example: int f1(int i) { // 100 lines of code return result; } int f2(int i) { // uses f1() once, 1000 lines of code return result; } int f3(int i) { // uses f2 once, 10 lines of code return result; } int main(void) { int s = 0; for (int i = 0; i < 1000000; ++i) s += f3(i); cout << s << '\n'; return 0; } and, by inlining f3 in main() we lose one million times a call overhead; by inlining recursively f2, we lose again one million times a call overhead; and by inlining recursively f1, we lose again one million times a call overhead. Despite the fact that we have to compare "one million times a call overhead" with the different object code growths, the sizes of f1, f2 and f3 relative to eachother are totally irrelevant. > Where what is considered "larger" is fudged by a factor. > > If you have a one line member, that calls a huge function, whatever > *calls* that member will inline the member, and then whatever that member calls > will probably beinlined, but the the externally visible member call itself (ie the non-inlined one) will not > inline the huge function into itself. > > Which seems to make sense to me. I have Real Life examples of the opposite, where I needed the large function to be inlined in the smaller one. The reason being that I had many small functions with a little difference, and a bulk functionality that had no difference and for which I didn't want to add duplicated code but also didn't want to add call over-head. > However, this doesn't control code growth, it could inline a lot of > small functions until it hits the limit. > > So the other heuristic i've added controls code growth by saying we don't > want to increase the number of statements in the function by more than > a factor of x over what we started with (when we started inlining). > > An expression of the rule: "Don't make the function more than x times bigger than > it originally was". I think is only marginal important - perhaps even neglectible. If, say, there are 5 (small) functions that inline eachother, then that adds a factor of 5 to whatever we gave as weight to a call-overhead - which is essentially not much (a loop of 1000000 or 200000, in *practise* being an unknown loop size: unknown == 5 * unknown, for me). PS I am not against your patch, because I think that in reality programmers always will mark functions 'inline' or put them in the class declaration when it really matters. Hopefully you *ALWAYS* inline functions that are marked 'inline' and/or defined in the class declaration - then it is fine with me whatever g++ does to reduce code size and compile time ;). -- Carlo Wood <carlo@alinoe.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 19:56 ` Carlo Wood @ 2001-07-09 20:31 ` Daniel Berlin 2001-07-09 21:05 ` Carlo Wood 0 siblings, 1 reply; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 20:31 UTC (permalink / raw) To: Carlo Wood; +Cc: Daniel Berlin, gcc Carlo Wood <carlo@alinoe.com> writes: > On Mon, Jul 09, 2001 at 09:46:59PM -0400, Daniel Berlin wrote: >> This is effectively expressing the rule: "Small functions should be >> inlined into larger ones. Larger functions should not be inlined into >> small ones". > > This doesn't seem logical. I think I can follow the reasoning behing it > though: > > - The overhead of a function call can be neglected for large functions, > therefore it only makes sense to worry about inlining and small functions. > As a result, only small functions are normally intented to be inlined > by the programmer. > > This works perfectly in C, but for some reason C++ is different: there is > being done a LOT more inlining. Especially, many functions are inlined > that are not even visible to the programmer. > > Therefore I propose to let go the 'human' factor - and look at inlining > from a mathematical point of view. Suppose we have: > > int f1(int i) > { > // something > return result; >} > > int f2(int i) > { > // uses f1() once > return result; >} > > And suppose that f1() is *only* called from within f2(). > Then does the size of f1 matter? You are missing something important. Whatever calls f2 will recursively inline f1 back into f2, for that particular inlining. We recursively inline. So if you have int f3(int i) { .... f2(a); } f3 will inline f2, then expand f1 inline into f2. The thing we *don't* want is to inline f1 into f2 for f2, if f1 was much larger than f2. > > So, back to the example: > > int f1(int i) > { > // 100 lines of code > return result; >} > > int f2(int i) > { > // uses f1() once, 1000 lines of code > return result; >} > > int f3(int i) > { > // uses f2 once, 10 lines of code > return result; >} > > int main(void) > { > int s = 0; > for (int i = 0; i < 1000000; ++i) > s += f3(i); > cout << s << '\n'; > return 0; >} > > and, > > by inlining f3 in main() we lose one million times a call overhead; > by inlining recursively f2, we lose again one million times a call overhead; > and by inlining recursively f1, we lose again one million times a call overhead. > > Despite the fact that we have to compare "one million times a call overhead" > with the different object code growths, the sizes of f1, f2 and f3 relative > to eachother are totally irrelevant. No, they aren't. Once again, we recursively inline starting at some base function. We will inline f3 and into main. f2 won't make the cut off for inlining into the f3 inline that goes into main. However, we'll inline f1 into f2 when we generate the non-inline code for f2 (which we do for all non-statics). f2 is a big enough function that inlining into f3 is going to increase code size and processing time, but not buy you performance. People forget we recursively inline. So if they were in relative size compared to main, we'd now (with my heuristic) inline main->f3->f2->f1 (IE main would call nothing anymore). The problem is then that we also currently inline f3->f2->f1 f2->f1 When generating code for those. Think of all the hidden functions. So if your constructor calls a function, which calls two functions, we end up with inlined: main->your constructor->the function->two large functions->whatever they main->call->till we hit 10000 insns or run out of stuff to inline Then we inline, when outputting the constructor constructor->the function->two functions->etc then the function->two functions->etc Consider if more than just the constructor calls the function. Everything that calls it, as long as we don't hit 10000 insns for the function at the top of the tree, will inline the the function->two functions->etc, whether it makes sense or not. What my heuristics would do is say main->your constructor->... is fine your constructor->two large functions->... by itself doesn't make much sense. Whatever calls the constructor outside the compilation unit is incurring enough call overhead anyway that inlining these two large functions into the constructor just increases code size, not performance. each of the two large functions->... is fine -- "I was going 70 miles an hour and got stopped by a cop who said, "Do you know the speed limit is 55 miles per hour?" "Yes, officer, but I wasn't going to be out that long..." "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 20:31 ` Daniel Berlin @ 2001-07-09 21:05 ` Carlo Wood 2001-07-09 21:26 ` Daniel Berlin 0 siblings, 1 reply; 28+ messages in thread From: Carlo Wood @ 2001-07-09 21:05 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote: > > And suppose that f1() is *only* called from within f2(). > > Then does the size of f1 matter? > > You are missing something important. > > Whatever calls f2 will recursively inline f1 back into f2, for that > particular inlining. > > We recursively inline. I understood that. Thats the whole reason why I say that the size of f1 doesn't matter: all you lose is one call-overhead, without growing the size of the code at all. Fergus also said: "There is an exception to this. Functions which are only called once, and which are defined in this translation unit and not exported, should almost always be inlined (after which the original copy of the function is dead code and can be eliminated)" To which you also disagreed. There must be a misunderstanding here :), I think Fergus is perfectly right and actually better worded what I wanted to say. > > Despite the fact that we have to compare "one million times a call overhead" > > with the different object code growths, the sizes of f1, f2 and f3 relative > > to eachother are totally irrelevant. > > No, they aren't. > > Once again, we recursively inline starting at some base function. > > We will inline f3 and into main. > f2 won't make the cut off for inlining into the f3 inline that goes > into main. That makes, thus, no sense. Because inlining f2 into the f3 inline gains *again* one million call-overheads. It is true that it makes the code effectively grow 1000 lines, but that is an absolute number and not something that is relevant to compare relatively (with the size of f3 in this case, or with the size of main+f3 for that matter). The only thing that matters is the actual size of f2 (1000 lines) and the number of times it is called (1,000,000 times). The sizes of main and f3 don't come into the picture, and hence the proposed rule can't be correct. > However, we'll inline f1 into f2 when we generate the non-inline code > for f2 (which we do for all non-statics). > > f2 is a big enough function that inlining into f3 is going to increase > code size and processing time, but not buy you performance. That depends on how often f3 is called. > > People forget we recursively inline. Please note that I did not forget that. There must be some other misunderstanding by me or you (heh - is *this* the case were one puts 'me' upfront instead of second? ;). > So if they were in relative size compared to main, we'd now (with my > heuristic) inline main->f3->f2->f1 (IE main would call nothing > anymore). This decision should be based on the fact that f3 is inside a loop, and not on the fact that main() has a considerable size, however. > The problem is then that we also currently inline > > f3->f2->f1 > f2->f1 I understand. In this case one can easily do the following reasoning: Knowing that f3 is _called_ (not inlined), it was apparently not important enough to inline it. Hence we are not inside a loop, and thus it is not needed to inline f2 into f3. Again however, I disagree that the size of f2 *relative* to f3 matters; the only thing that matters in this case is the absolute size of f2. I'd change your heuristic rule to look at the size of function in absolute terms, not relative compared to the function that it is called from. > When generating code for those. > > Think of all the hidden functions. > > So if your constructor calls a function, which calls two functions, we > end up with > inlined: > main->your constructor->the function->two large functions->whatever they > main->call->till we hit 10000 insns or run out of stuff to inline > > Then we inline, when outputting the constructor > > constructor->the function->two functions->etc > > then > the function->two functions->etc > > > Consider if more than just the constructor calls the function. > > Everything that calls it, as long as we don't hit 10000 insns for the > function at the top of the tree, will inline the the function->two functions->etc, > whether it makes sense or not. > > > What my heuristics would do is say > > main->your constructor->... is fine > your constructor->two large functions->... by itself doesn't make much > sense. I agree. But based on the fact that it is unlikely that a constructor that calls two large functions is within a loop to begin with. Fortunately here you talk about "a large function", instead of "a function large compared to the constructor size". That is a difference that matter and that should be reflected in the patch. > Whatever calls the constructor outside the compilation unit is > incurring enough call overhead anyway that inlining these two large > functions into the constructor just increases code size, not performance. > > each of the two large functions->... is fine But... now I see something that I didn't see before :). Perhaps the call-overhead is much less important then additional optimizations that can be done AFTER inlining. By not inlining we might miss important optimizations. This wouldn't speed up compile time, but what about to TRY inlining and then make the decision based on code size reduction that follows from it? While a call- overhead is rather constant (a few assembly instructions), reductions as a result of actual inlining can be rather HUGE (also mentioned by Timothy J. Wood (Nice last name) in his example of calling large functions with constant parameters that cause large code blocks to drop out). Ok, that is too much asked... Lets say, we do what your patch does but 1) look at the absolute size of the to-inline function, 2) inline like we inline now when any parameter passed is a constant? -- Carlo Wood <carlo@alinoe.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 21:05 ` Carlo Wood @ 2001-07-09 21:26 ` Daniel Berlin 2001-07-09 21:45 ` Carlo Wood 0 siblings, 1 reply; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 21:26 UTC (permalink / raw) To: Carlo Wood; +Cc: Daniel Berlin, gcc Carlo Wood <carlo@alinoe.com> writes: > On Mon, Jul 09, 2001 at 11:31:14PM -0400, Daniel Berlin wrote: >> > And suppose that f1() is *only* called from within f2(). >> > Then does the size of f1 matter? >> >> You are missing something important. >> >> Whatever calls f2 will recursively inline f1 back into f2, for that >> particular inlining. >> >> We recursively inline. > > I understood that. Thats the whole reason why I say that the size > of f1 doesn't matter: all you lose is one call-overhead, without > growing the size of the code at all. Fergus also said: > > "There is an exception to this. Functions which are only called once, > and which are defined in this translation unit and not exported, should > almost always be inlined (after which the original copy of the function > is dead code and can be eliminated)" > > To which you also disagreed. There must be a misunderstanding here :), > I think Fergus is perfectly right and actually better worded what I wanted > to say. > >> > Despite the fact that we have to compare "one million times a call overhead" >> > with the different object code growths, the sizes of f1, f2 and f3 relative >> > to eachother are totally irrelevant. >> >> No, they aren't. >> >> Once again, we recursively inline starting at some base function. >> >> We will inline f3 and into main. >> f2 won't make the cut off for inlining into the f3 inline that goes >> into main. > > That makes, thus, no sense. Because inlining f2 into the f3 inline gains > *again* one million call-overheads. However, since f1 was inlined into f2, the call overhead just became less important. > It is true that it makes the code > effectively grow 1000 lines, but that is an absolute number and not something > that is relevant to compare relatively (with the size of f3 in this case, > or with the size of main+f3 for that matter). You can't do it perfectly no matter what you try, because we don't have the architecture. > The only thing that matters > is the actual size of f2 (1000 lines) and the number of times it is called > (1,000,000 times). We don't know how many times it's called at this point. > The sizes of main and f3 don't come into the picture, > and hence the proposed rule can't be correct. Sure, it's not a *rule*, it's a heuristic. When we have the architecture to determine the above, i'll change it to take the call counts, loop depth, etc, into account. > >> However, we'll inline f1 into f2 when we generate the non-inline code >> for f2 (which we do for all non-statics). >> >> f2 is a big enough function that inlining into f3 is going to increase >> code size and processing time, but not buy you performance. > > That depends on how often f3 is called. And how much gets inlined into f2. > >> >> People forget we recursively inline. > > Please note that I did not forget that. There must be some other misunderstanding > by me or you (heh - is *this* the case were one puts 'me' upfront instead of > second? ;). :) I just don't think the particular case above is as important as you do. And I think we get it sooooooo wrong soooooo often right now, that the current heuristic is worse than my proposed one. Given that we are spending tons of compilation time, and getting so little out of it, i think whatever pessimization i perform in the above particular case, is outweighed by the compile time factor. These are parameters you can twiddle. You can say you don't care, and watch your compile time shoot back up to hell, along with your code size. But i'm pretty sure it's not going to speed up more than 2% or so. > >> So if they were in relative size compared to main, we'd now (with my >> heuristic) inline main->f3->f2->f1 (IE main would call nothing >> anymore). > > This decision should be based on the fact that f3 is inside a loop, and not > on the fact that main() has a considerable size, however. We don't have this architecture. > >> The problem is then that we also currently inline >> >> f3->f2->f1 >> f2->f1 > > I understand. In this case one can easily do the following reasoning: > Knowing that f3 is _called_ (not inlined), it was apparently not important > enough to inline it. Hence we are not inside a loop, and thus it is not > needed to inline f2 into f3. Again however, I disagree that the size of > f2 *relative* to f3 matters; the only thing that matters in this case is > the absolute size of f2. I'd change your heuristic rule to look at the > size of function in absolute terms, not relative compared to the function > that it is called from. No, because then we'll do f2->f1 because the size doesn't relatively matter, only the absolute size matters. In fact, the current heuristic is that only absolute size of the top level function matters. We know this doesn't work well in most cases: good at some, and is bad at a lot more. I'm trying to having something that is good in most cases: bad at some, and good at more than we are bad at. > >> When generating code for those. >> >> Think of all the hidden functions. >> >> So if your constructor calls a function, which calls two functions, we >> end up with >> inlined: >> main->your constructor->the function->two large functions->whatever they >> main->call->till we hit 10000 insns or run out of stuff to inline >> >> Then we inline, when outputting the constructor >> >> constructor->the function->two functions->etc >> >> then >> the function->two functions->etc >> >> >> Consider if more than just the constructor calls the function. >> >> Everything that calls it, as long as we don't hit 10000 insns for the >> function at the top of the tree, will inline the the function->two functions->etc, >> whether it makes sense or not. >> >> >> What my heuristics would do is say >> >> main->your constructor->... is fine >> your constructor->two large functions->... by itself doesn't make much >> sense. > > I agree. But based on the fact that it is unlikely that a constructor > that calls two large functions is within a loop to begin with. Fortunately > here you talk about "a large function", instead of "a function large compared > to the constructor size". That is a difference that matter and that should > be reflected in the patch. I don't understand, please paint me a picture if i don't get it in the morning. I'm slow at midnight. :) >> Whatever calls the constructor outside the compilation unit is >> incurring enough call overhead anyway that inlining these two large >> functions into the constructor just increases code size, not performance. >> >> each of the two large functions->... is fine > > But... now I see something that I didn't see before :). > Perhaps the call-overhead is much less important then additional > optimizations that can be done AFTER inlining. Yes. > By not inlining we > might miss important optimizations. This wouldn't speed up compile > time, but what about to TRY inlining and then make the decision > based on code size reduction that follows from it? HP does this, and i have a paper on doing this. You have a budget for time to spend inlining a given function. However, we don't have anywhere near the architecture at the tree level to make use of it. We can only go with simple heuristics right now, at least until Diego's basic blocks on trees stuff. I'm happy to build a great inliner that takes into account all kinds of stuff, and performs well. And when we have the architecture, i'll do it. But i don't have time to build all that architecture all by myself, when i know others are working on it. > overhead is rather constant (a few assembly instructions), reductions > as a result of actual inlining can be rather HUGE (also mentioned by > Timothy J. Wood (Nice last name) in his example of calling large > functions with constant parameters that cause large code blocks to > drop out). > Ok, that is too much asked... Lets say, we do what your > patch does but 1) look at the absolute size of the to-inline function, > 2) inline like we inline now when any parameter passed is a > constant? Sure, i'm happy to do this, once i understand #1. #2 is easy. We also inline when it says inline, so that's, once again, always a fall back if the heuristic is too stupid. Right now we inline *way* too much. I'd rather have people have to add inline. Theres no way to say "noinline" :) > > -- > Carlo Wood <carlo@alinoe.com> -- "I have the world's largest collection of seashells. I keep it on all the beaches of the world... Perhaps you've seen it. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 21:26 ` Daniel Berlin @ 2001-07-09 21:45 ` Carlo Wood 0 siblings, 0 replies; 28+ messages in thread From: Carlo Wood @ 2001-07-09 21:45 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Tue, Jul 10, 2001 at 12:25:58AM -0400, Daniel Berlin wrote: > > I agree. But based on the fact that it is unlikely that a constructor > > that calls two large functions is within a loop to begin with. Fortunately > > here you talk about "a large function", instead of "a function large compared > > to the constructor size". That is a difference that matter and that should > > be reflected in the patch. > I don't understand, please paint me a picture if i don't get it in the > morning. I'm slow at midnight. > :) It's 6:38 am here :p. Anyway - perhaps I did let myself go too much. Having an "analysis-IQ" of 180 is great for seeing errors in other peoples logic and I've successfully pissed of people numerous of times by constantly (and only) break down what they come up with by pointing out errors. However, this is not important, it is just a minor difference in looking at things :/. I appologize to everyone on the list. > > patch does but 1) look at the absolute size of the to-inline function, > > 2) inline like we inline now when any parameter passed is a > > constant? > > Sure, i'm happy to do this, once i understand #1. > #2 is easy. > > We also inline when it says inline, so that's, once again, always a > fall back if the heuristic is too stupid. > > Right now we inline *way* too much. I'd rather have people have to add > inline. > > Theres no way to say "noinline" > :) Before going to bed, allow me to summarize what IS important, I think: I basically agree with everything you said, and I don't have a big problem with the patch. It certainly is better than what we have now and I'd like to thank you for writing this (I mean that). I am reassured too, that functions marked as inline still WILL be inlined. If you add point 2) above I will be honoured more than satisfied ;). I'll try to explain point 1) tomorrow without wanting to make a big point out of it: I have the feeling that in practise there won't be much difference in the result anyway - its just that in my mind it's a-logical, and that itched. Night, -- Carlo Wood <carlo@alinoe.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 18:47 Daniel Berlin 2001-07-09 19:22 ` Fergus Henderson 2001-07-09 19:56 ` Carlo Wood @ 2001-07-09 20:05 ` Timothy J. Wood 2001-07-09 20:34 ` Daniel Berlin 2001-07-09 23:26 ` Mark Mitchell 3 siblings, 1 reply; 28+ messages in thread From: Timothy J. Wood @ 2001-07-09 20:05 UTC (permalink / raw) To: Daniel Berlin; +Cc: gcc On Monday, July 9, 2001, at 06:46 PM, Daniel Berlin wrote: [...] > We proceed to recursively inline 128 functions into this simple main > function > operator call. (Bet you didn't think there were 128 functions you could > inline here) IMO, the bug here is that the library you are using declares too many inlines. I (personally) believe that 'inline' should be a command, not a request, and that a warning should be issued if inlining fails for some reason. If you have the compiler decide to not inline stuff that is marked inline, then someone is going to come up with a case where they really need the inline where you didn't let them have it. Then there will ensue a bunch of flags for controlling the inline limits and you're back to the programmer deciding when things should be inlined, but through a clumsier interface. Or worse yet, they'll start using macros to get around the limits. [...] > This is effectively expressing the rule: "Small functions should be > inlined into larger ones. Larger functions should not be inlined into > small ones". This rule is not true in general. Just one example off the top of my head would be a case where you define a general inline function for performing some algorithm where some of the arguments are constants and you then define other 'real' functions that effectively pass configuration parameters (expecting the optimizer to treat them as constants and toss out different dead code in each case). A silly little example might be: static inline void generalCase(enum type t, enum operation op, ...) { if (t == type1 || t == type3) { if (op == op1) block1; else block2; } else if (t == type2 && op == op1) { block3; } .... } void t1op1(...) { generalCase(type1, op1, ...); } void t2op2(...) { generalCase(type2, op2, ...); } With this approach, you can often avoid needless duplication of code in the general case and can 'instantiate' the general case in each specific context and should be able to expect the compiler to blow away cases that can't be executed based on the inputs. Not only can make your code simpler, but you might have a couple common case that will perform better by having removed many of the branches from the general case. But, it is definitely the case, that the inlined function can be bigger than the function it is getting inlined into (which could just be a call to the inline!) -tim ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 20:05 ` Timothy J. Wood @ 2001-07-09 20:34 ` Daniel Berlin 0 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-09 20:34 UTC (permalink / raw) To: Timothy J. Wood; +Cc: Daniel Berlin, gcc "Timothy J. Wood" <tjw@omnigroup.com> writes: > On Monday, July 9, 2001, at 06:46 PM, Daniel Berlin wrote: > [...] >> We proceed to recursively inline 128 functions into this simple main >> function >> operator call. (Bet you didn't think there were 128 functions you could >> inline here) > > IMO, the bug here is that the library you are using declares too > many inlines. I (personally) believe that 'inline' should be a > command, not a request, and that a warning should be issued if > inlining fails for some reason. If you have the compiler decide to > not inline stuff that is marked inline, then someone is going to come > up with a case where they really need the inline where you didn't let > them have it. Then there will ensue a bunch of flags for controlling > the inline limits and you're back to the programmer deciding when > things should be inlined, but through a clumsier interface. Or worse > yet, they'll start using macros to get around the limits. > > > > [...] >> This is effectively expressing the rule: "Small functions should be >> inlined into larger ones. Larger functions should not be inlined into >> small ones". > > > This rule is not true in general. Just one example off the top of > my head would be a case where you define a general inline function for > performing some algorithm where some of the arguments are constants > and you then define other 'real' functions that effectively pass > configuration parameters (expecting the optimizer to treat them as > constants and toss out different dead code in each case). > > > A silly little example might be: > > static inline void generalCase(enum type t, enum operation op, ...) > { > if (t == type1 || t == type3) { > if (op == op1) > block1; > else > block2; > } else if (t == type2 && op == op1) { > block3; > } > .... >} > > > void t1op1(...) > { > generalCase(type1, op1, ...); >} > > void t2op2(...) > { > generalCase(type2, op2, ...); >} > > > With this approach, you can often avoid needless duplication of > code in the general case and can 'instantiate' the general case in > each specific context and should be able to expect the compiler to > blow away cases that can't be executed based on the inputs. However, whatever calls t1op1 and t2op2 would recursively inline generalcase, since generalcase is not as large as it. It's the t1op1 and t2op2 compilations themselves that shouldn't have general case inlined. You gain nothing. If the calls to t1op1 and t2op2 were outside teh translation unit, you'd be screwed anyway. So i haven't reduced performance, just code size. It's the *root* of the current inlining tree that we are comparing relative to. If it's t1op1 at the root, we shouldn't inline. If it's something calling t1op1, we should inline both. > > Not only can make your code simpler, but you might have a couple > common case that will perform better by having removed many of the > branches from the general case. > > But, it is definitely the case, that the inlined function can be > bigger than the function it is getting inlined into (which could just > be a call to the inline!) > > > -tim -- "My watch is three hours fast, and I can't fix it. So I'm going to move to New York. "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 18:47 Daniel Berlin ` (2 preceding siblings ...) 2001-07-09 20:05 ` Timothy J. Wood @ 2001-07-09 23:26 ` Mark Mitchell 2001-07-09 23:37 ` Daniel Jacobowitz ` (5 more replies) 3 siblings, 6 replies; 28+ messages in thread From: Mark Mitchell @ 2001-07-09 23:26 UTC (permalink / raw) To: Daniel Berlin, gcc --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin <dan@cgsoftware.com> wrote: > Right now, they are horrific. Hey, thanks a lot. :-) They are, actually, the same ones we had in the RTL inliner, just about -- except that we can inline so much more! I think your ideas are reasonable. Nathan Sidwell has been thinking about these issues, too; you should coordinate with him to try to get some decent ideas and some decent measurements. One long-term challenge is that we would like to inline when somehow that allows major simplifications. For example, if there is a giant function involving tons of calcuation, but we know that the argument is `3', and that means we can fold all the calculations, then we should do the inlining, even if the inlined function is nominally giant. I have no idea how to do this, though. It's probably not worth bothering about. -- Mark Mitchell mark@codesourcery.com CodeSourcery, LLC http://www.codesourcery.com ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell @ 2001-07-09 23:37 ` Daniel Jacobowitz 2001-07-10 0:00 ` Fergus Henderson ` (4 subsequent siblings) 5 siblings, 0 replies; 28+ messages in thread From: Daniel Jacobowitz @ 2001-07-09 23:37 UTC (permalink / raw) To: gcc On Mon, Jul 09, 2001 at 11:06:31PM -0700, Mark Mitchell wrote: > > > --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin > <dan@cgsoftware.com> wrote: > > > Right now, they are horrific. > > Hey, thanks a lot. :-) They are, actually, the same ones we had in > the RTL inliner, just about -- except that we can inline so much more! > > I think your ideas are reasonable. Nathan Sidwell has been thinking > about these issues, too; you should coordinate with him to try to > get some decent ideas and some decent measurements. > > One long-term challenge is that we would like to inline when somehow > that allows major simplifications. For example, if there is a giant > function involving tons of calcuation, but we know that the argument > is `3', and that means we can fold all the calculations, then we > should do the inlining, even if the inlined function is nominally > giant. I have no idea how to do this, though. It's probably not worth > bothering about. To do that, you need basically a backtracking framework - try inlining compulsively for some amount of time and see if the resulting code size of the inlined function is substantially smaller than the size of the inlined function standalone. Except, of course, you really want likely execution time rather than code size, but code size is a decent first approximation :) Unless of course we enable some additional loop unrolling. This is something that would, as Dan said, be nice to have - some day, when we have the framework to do it. -- Daniel Jacobowitz Carnegie Mellon University MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell 2001-07-09 23:37 ` Daniel Jacobowitz @ 2001-07-10 0:00 ` Fergus Henderson 2001-07-10 0:13 ` Daniel Berlin 2001-07-10 0:09 ` Daniel Berlin ` (3 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: Fergus Henderson @ 2001-07-10 0:00 UTC (permalink / raw) To: gcc On 09-Jul-2001, Mark Mitchell <mark@codesourcery.com> wrote: > > One long-term challenge is that we would like to inline when somehow > that allows major simplifications. For example, if there is a giant > function involving tons of calcuation, but we know that the argument > is `3', and that means we can fold all the calculations, then we > should do the inlining, even if the inlined function is nominally > giant. I have no idea how to do this, though. It's probably not worth > bothering about. Another, similar case when you really want to inline are when the body of the function contains a call to a virtual method (or other indirect function call) and the type of the object and hence its vtable (or the value of the function pointer) is known in the caller. That's handy because you can then inline the virtual method (or indirect function call), which may lead to further opportunities for simplification. For some applications this can really pay off significantly, so it is worth doing in the long run, although right now it's probably more important on just fixing the most egregariously bad heuristics rather than trying We do some of this kind of stuff in the Mercury compiler. For example, we specialize function calls where one of the parameters is a known higher-order term (a.k.a. closure; think of it as a known function pointer). We also do "deforestation" optimization, which is related to this idea of specializing calls where the caller has information about a decision (e.g. a switch) in the callee. There's a paper [1] on our web page which describes those and some of the other optimizations that we do. [1] "Optimization of Mercury programs", Simon Taylor, Honours Report, The University of Melbourne, 1998. < http://www.cs.mu.oz.au/research/mercury/information/papers/stayl_hons.ps.gz > -- Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit The University of Melbourne | of excellence is a lethal habit" WWW: < http://www.cs.mu.oz.au/~fjh > | -- the last words of T. S. Garp. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-10 0:00 ` Fergus Henderson @ 2001-07-10 0:13 ` Daniel Berlin 0 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-10 0:13 UTC (permalink / raw) To: Fergus Henderson; +Cc: gcc Fergus Henderson <fjh@cs.mu.oz.au> writes: > On 09-Jul-2001, Mark Mitchell <mark@codesourcery.com> wrote: >> >> One long-term challenge is that we would like to inline when somehow >> that allows major simplifications. For example, if there is a giant >> function involving tons of calcuation, but we know that the argument >> is `3', and that means we can fold all the calculations, then we >> should do the inlining, even if the inlined function is nominally >> giant. I have no idea how to do this, though. It's probably not worth >> bothering about. > > Another, similar case when you really want to inline are when the body > of the function contains a call to a virtual method (or other indirect > function call) and the type of the object and hence its vtable > (or the value of the function pointer) is known in the caller. > That's handy because you can then inline the virtual method > (or indirect function call), which may lead to further opportunities > for simplification. Real Devirtualization can be done with region based frameworks, or interprocedural analysis. No need for inlining all virtuals. > > For some applications this can really pay off significantly, > so it is worth doing in the long run, although right now > it's probably more important on just fixing the most egregariously > bad heuristics rather than trying > > We do some of this kind of stuff in the Mercury compiler. For example, > we specialize function calls where one of the parameters is a known > higher-order term (a.k.a. closure; think of it as a known function > pointer). Yup, this is procedure cloning (or so all the papers and books i have call it, besides closure). It's part of most aggressive inliner frameworks, but you really need either to defer everything again. > We also do "deforestation" optimization, which is > related to this idea of specializing calls where the caller has > information about a decision (e.g. a switch) in the callee. > There's a paper [1] on our web page which describes those and > some of the other optimizations that we do. COol. -- "Are there any questions? "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell 2001-07-09 23:37 ` Daniel Jacobowitz 2001-07-10 0:00 ` Fergus Henderson @ 2001-07-10 0:09 ` Daniel Berlin 2001-07-10 0:20 ` Daniel Berlin ` (2 subsequent siblings) 5 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-10 0:09 UTC (permalink / raw) To: Mark Mitchell; +Cc: Daniel Berlin, gcc Mark Mitchell <mark@codesourcery.com> writes: > --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin > <dan@cgsoftware.com> wrote: > >> Right now, they are horrific. > > Hey, thanks a lot. :-) They are, actually, the same ones we had in > the RTL inliner, just about -- except that we can inline so much more! > > I think your ideas are reasonable. Nathan Sidwell has been thinking > about these issues, too; you should coordinate with him to try to > get some decent ideas and some decent measurements. > > One long-term challenge is that we would like to inline when somehow > that allows major simplifications. For example, if there is a giant > function involving tons of calcuation, but we know that the argument > is `3', and that means we can fold all the calculations, then we > should do the inlining, even if the inlined function is nominally > giant. I have no idea how to do this, though. It's probably not worth > bothering about. This is procedure cloning. The cheap way is to clone the procedure (with whatever arguments are constant now omitted from the call), optimize it, see if it helped relative to the original procedure, if so, change the cloned name, and the call to match the new cloned name/arguments. If we find somewhere else compatible with the new arguments, reuse the cloned procedures. Picking which procedures to clone is tricky. See "Aggressive inlining" (Ayers, Gottlieb, Schooler) for more details (They are HP compiler people, the paper goes into stats about how much procedure cloning alone vs inlining alone vs both combined helps). --Dan > > -- > Mark Mitchell mark@codesourcery.com > CodeSourcery, LLC http://www.codesourcery.com -- "Is it weird in here, or is it just me? "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell ` (2 preceding siblings ...) 2001-07-10 0:09 ` Daniel Berlin @ 2001-07-10 0:20 ` Daniel Berlin 2001-07-10 0:30 ` Daniel Berlin 2001-07-10 2:47 ` Nathan Sidwell 5 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-10 0:20 UTC (permalink / raw) To: Mark Mitchell; +Cc: Daniel Berlin, gcc Mark Mitchell <mark@codesourcery.com> writes: > --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin > <dan@cgsoftware.com> wrote: > >> Right now, they are horrific. > > Hey, thanks a lot. :-) They are, actually, the same ones we had in > the RTL inliner, just about -- except that we can inline so much more! > It's not your fault, it's because someone set flag_default_inline to 1, so all methods are inline. and since DECL_INLINE, < 10k insns and actually possible to inline were the inliner's criteria, we ended up with a lot of 10k insn functions, even when the original function was say, one stmt that was a call. :) We effectively did a breadth first search of the call tree for a function, inlining everything possible until we hit 10k insns for that function. The only case where we didn't hit 10k insns is if we *couldn't find that many functions to inline* :). > I think your ideas are reasonable. Nathan Sidwell has been thinking > about these issues, too; you should coordinate with him to try to > get some decent ideas and some decent measurements. :) I really also don't think we will exacerbate the memory problems. The memory problems are from every method being inline, and every inline up to 10k insns being done. So we generate a *lot* of copies. And we *were* deferring just about everything anyway, since they were all marked inline. So compared to 3.0 we can reduce memory usage just by calming the inliner. And still be deferring everything. And inlining where we won't increase code growth. And reduce compile time by a factor of of 4. Neat, eh? Our problem was so big, that we look like geniuses for fixing it. I think we should "lose" the mail archives, and pretend we spent months fixing it. :) > > One long-term challenge is that we would like to inline when somehow > that allows major simplifications. For example, if there is a giant > function involving tons of calcuation, but we know that the argument > is `3', and that means we can fold all the calculations, then we > should do the inlining, even if the inlined function is nominally > giant. I have no idea how to do this, though. It's probably not worth > bothering about. > > -- > Mark Mitchell mark@codesourcery.com > CodeSourcery, LLC http://www.codesourcery.com -- "I went to the cinema, and the prices were: Adults $5.00, children $2.50. So I said, "Give me two boys and a girl." "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell ` (3 preceding siblings ...) 2001-07-10 0:20 ` Daniel Berlin @ 2001-07-10 0:30 ` Daniel Berlin 2001-07-10 2:47 ` Nathan Sidwell 5 siblings, 0 replies; 28+ messages in thread From: Daniel Berlin @ 2001-07-10 0:30 UTC (permalink / raw) To: Mark Mitchell; +Cc: Daniel Berlin, gcc Mark Mitchell <mark@codesourcery.com> writes: > --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin > <dan@cgsoftware.com> wrote: > >> Right now, they are horrific. > > Hey, thanks a lot. :-) They are, actually, the same ones we had in > the RTL inliner, just about -- except that we can inline so much more! > > I think your ideas are reasonable. Nathan Sidwell has been thinking > about these issues, too; you should coordinate with him to try to > get some decent ideas and some decent measurements. > > One long-term challenge is that we would like to inline when somehow > that allows major simplifications. For example, if there is a giant > function involving tons of calcuation, but we know that the argument > is `3', and that means we can fold all the calculations, then we > should do the inlining, even if the inlined function is nominally > giant. I have no idea how to do this, though. It's probably not worth > bothering about. The whole method defaulting to inline thing also explains why v3 was hit so hard. It's mostly headers, and thus, all the code is readily available to be inlined, into any translation unit. If I -fno-default-inline, i can't get 15k insns to appear on the example. even at --param max-code-growth=500 TOTAL : 8.10 .... zsh: exit 1 ./cc1plus --param max-code-growth=500 -O3 -fno-default-inline test3.ii ./cc1plus --param max-code-growth=500 -O3 -fno-default-inline test3.ii 8.10s user ... So definitely, methods were being all marked inline and inlined. This is fine, as long as we control code growth. As the timing vs what i posted earlier shows, we don't save more than a second default-inline on or off with a normal code growth control factor. (%:/buildspace/egcs/build/gcc)- time ./cc1plus -O3 -fno-default-inline test3.ii 2>/dev/null ./cc1plus -O3 -fno-default-inline test3.ii 2> /dev/null 8.12s user 0.28s system 100% cpu 8.398 total (dberlin@debian)(241/vc)(03:28am:07/10/01)- (%:/buildspace/egcs/build/gcc)- time ./cc1plus -O3 test3.ii 2>/dev/null ./cc1plus -O3 test3.ii 2> /dev/null 8.12s user 0.27s system 99% cpu 8.471 total (dberlin@debian)(242/vc)(03:28am:07/10/01)- (%:/buildspace/egcs/build/gcc)- It's only what we have now that makes it such a problem. In fact, i don't have a vote, but if i did, i'd vote for leaving flag_default_inline as is, and just installing the new heuristic i'll post a patch for. We should get better code overall that way, and never worse. --Dan > > -- > Mark Mitchell mark@codesourcery.com > CodeSourcery, LLC http://www.codesourcery.com -- "What's another word for Thesaurus? "-Steven Wright ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Inlining heuristics for C++ 2001-07-09 23:26 ` Mark Mitchell ` (4 preceding siblings ...) 2001-07-10 0:30 ` Daniel Berlin @ 2001-07-10 2:47 ` Nathan Sidwell 5 siblings, 0 replies; 28+ messages in thread From: Nathan Sidwell @ 2001-07-10 2:47 UTC (permalink / raw) To: Daniel Berlin; +Cc: Mark Mitchell, gcc Mark Mitchell wrote: > > --On Monday, July 09, 2001 09:46:59 PM -0400 Daniel Berlin > <dan@cgsoftware.com> wrote: > I think your ideas are reasonable. Nathan Sidwell has been thinking > about these issues, too; you should coordinate with him to try to > get some decent ideas and some decent measurements. Yes, and I'm making good progress. I have significantly reduced the compile time time and memory requirements. One of Geralds test cases now takes 112MB and 87 seconds instead of 279MB and 1133 seconds (that went into swap, hence the huge time dilation). The produced executable is also much smaller. The downside is that the resultant executable does not always go faster. Reading some inlining papers suggests that the 'inline until too big' heuristic is not sensible. A % increase might be better. I'm trying the following heuristics, inline until too big (current one) inline all smaller than X inline until too expanded inline all smaller than X% of target fn Note that my algorithm does a bottom up inlining, so we don't get problems with unbounded inlining of small functions. I don't delay inlining until the end of the compilation unit. Doing that would not be difficult, and then one could generate the complete call graph and potentially do a better job (it would not invalidate the work I'm doing, which is good). I'd guess I'll have something postable by the end of the week. Dan, if you can wait 'til then, that'd be great! nathan -- Dr Nathan Sidwell :: http://www.codesourcery.com :: CodeSourcery LLC 'But that's a lie.' - 'Yes it is. What's your point?' nathan@codesourcery.com : http://www.cs.bris.ac.uk/~nathan/ : nathan@acm.org ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2001-07-11 14:27 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <87r8vpo8rw.fsf@cgsoftware.com.suse.lists.egcs> [not found] ` <20010710122244.A14584@hg.cs.mu.oz.au.suse.lists.egcs> 2001-07-10 2:42 ` Inlining heuristics for C++ Andi Kleen 2001-07-10 14:16 ` Hartmut Schirmer 2001-07-11 14:27 ` Fergus Henderson 2001-07-10 14:53 mike stump 2001-07-10 15:04 ` Daniel Berlin -- strict thread matches above, loose matches on Subject: below -- 2001-07-09 21:49 dewar 2001-07-09 22:17 ` Daniel Berlin 2001-07-09 20:13 dewar 2001-07-09 20:23 ` Joe Buck 2001-07-09 18:47 Daniel Berlin 2001-07-09 19:22 ` Fergus Henderson 2001-07-09 20:18 ` Daniel Berlin 2001-07-09 23:30 ` Mark Mitchell 2001-07-09 19:56 ` Carlo Wood 2001-07-09 20:31 ` Daniel Berlin 2001-07-09 21:05 ` Carlo Wood 2001-07-09 21:26 ` Daniel Berlin 2001-07-09 21:45 ` Carlo Wood 2001-07-09 20:05 ` Timothy J. Wood 2001-07-09 20:34 ` Daniel Berlin 2001-07-09 23:26 ` Mark Mitchell 2001-07-09 23:37 ` Daniel Jacobowitz 2001-07-10 0:00 ` Fergus Henderson 2001-07-10 0:13 ` Daniel Berlin 2001-07-10 0:09 ` Daniel Berlin 2001-07-10 0:20 ` Daniel Berlin 2001-07-10 0:30 ` Daniel Berlin 2001-07-10 2:47 ` Nathan Sidwell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).