public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: bug introduced by inlining patch
       [not found] <20020904071122.GA15631@nbkurt.etpnet.phys.tue.nl>
@ 2002-09-04 10:18 ` Dale Johannesen
  2002-09-04 11:08   ` Kurt Garloff
  0 siblings, 1 reply; 6+ messages in thread
From: Dale Johannesen @ 2002-09-04 10:18 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Dale Johannesen, gcc-patches, gcc


On Wednesday, September 4, 2002, at 12:11 AM, Kurt Garloff wrote:

> Hi Dale,
>
> On Tue, Sep 03, 2002 at 06:09:02PM -0700, Dale Johannesen wrote:
>> I believe this patch broke some things:
>>
>> 2002-04-27  Kurt Garloff <garloff@suse.de>
> [...]
>>
>> The effect is that the documented flag -finline-limit no longer works
>> to increase the default inlined size limit.  You also need to say
>> --param max-inline-insns-single; I think this is wrong.  Also, the 
>> patch
>> contains no documentation for the new parameters.
>
> What version of gcc are you looking at?
> 3.2 branch? HEAD?

The main trunk at gcc.gnu.org:/cvs/gcc.  If there's another version
somewhere that has these things fixed, it needs to be applied here,
I think.

I haven't tried it, but it looks like this change

>  read_integral_parameter (option_value, arg - 2,
>  			  MAX_INLINE_INSNS);
>
>  set_param_value ("max-inline-insns", val);
>  set_param_value ("max-inline-insns-single", val/2);

alters the meaning of -finline-limit from what it was previously (i.e.
the size limit for an individual function is half what it was).  I think
that needs some discussion, at a minimum.

(If anybody's lost, the patch was discussed in Aug 01 and later in
Apr/May 02; most of the discussion was on gcc, not gcc-patches.
The discussion does mention documentation changes but they don't
seem to have been checked in.  I'm having a little trouble tracking
this; Kurt Garloff isn't in MAINTAINERS, so I wonder if somebody
checked this in for him?)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug introduced by inlining patch
  2002-09-04 10:18 ` bug introduced by inlining patch Dale Johannesen
@ 2002-09-04 11:08   ` Kurt Garloff
  2002-09-04 11:35     ` Daniel Berlin
  2002-09-13  4:47     ` Gerald Pfeifer
  0 siblings, 2 replies; 6+ messages in thread
From: Kurt Garloff @ 2002-09-04 11:08 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: gcc-patches, gcc

[-- Attachment #1: Type: text/plain, Size: 5437 bytes --]

Hi Dale,

On Wed, Sep 04, 2002 at 10:18:05AM -0700, Dale Johannesen wrote:
> On Wednesday, September 4, 2002, at 12:11 AM, Kurt Garloff wrote:
> The main trunk at gcc.gnu.org:/cvs/gcc.  If there's another version
> somewhere that has these things fixed, it needs to be applied here,
> I think.

Well, in my patch collection at
http://www.garloff.de/kurt/freesoft/gcc/
there are patches for documentation included.
But only in the newer patches, and I'm afriad an older one went into 
gcc.

> I haven't tried it, but it looks like this change
> 
> > read_integral_parameter (option_value, arg - 2,
> > 			  MAX_INLINE_INSNS);
> >
> > set_param_value ("max-inline-insns", val);
> > set_param_value ("max-inline-insns-single", val/2);
> 
> alters the meaning of -finline-limit from what it was previously (i.e.
> the size limit for an individual function is half what it was).  I think
> that needs some discussion, at a minimum.

In 2.95, it was defaulting to 10000, which means that no real-world 
function was prevented from being inlined. Which could cause excessive
memory and CPU consumption for compiling fairly easy code in C++.
The inliner in 2.95 did not account for recursive inlining.

3.0 had the tree inliner and was accounting for recursive inlining.
It limited your total inlining budget to 1000 statements. (The parameter
was still at 10000, but one statement was assumed to be aequivalent to 
10 RTL insns.) This resulted in extreme compile resource requirements
even for less pathological C++ code, just because the tree inliner has 
less  restriction than the old RTL one. The performance was still worse 
than 2.95.
In 3.0.1 this was changed to 600 (i.e. 60 statements) after 100 (10st) 
showed desastrous performance. With 600, compile times were OK, but 
runtime performance was poor.
http://gcc.gnu.org/ml/gcc/2001-07/msg01531.html

Only then, the problem was identified 
http://gcc.gnu.org/ml/gcc/2001-08/msg00996.html
to be caused by the wrong functions being inlined, as we allow single
functions to eat up the complete inlining budget, which calls for wrong
inlining decisions, of course. A little patch solved it.
http://gcc.gnu.org/ml/gcc/2001-08/msg01114.html

Since that moment, the notion of a different single function inlining 
limit as compared to the recursive inlining limit exists and I chose
half the complete by default. -finline-limit in 3.0 meant the complete
inlining budget which was identical to the single function inline size
and it could only keep one meaning after the patch was applied obviously.
The limit of 600 in 3.0.1 worked very well when used as complete budget 
with a single fn limit of 300, so I took that decision.

The patch was improved a lot afterwards by applying some smooth function
that restricted inlining as a smooth decreasing function of the already
inlined code, so more parameters were possible. I did lots of bench-
marking, but the improvement on the first patch was not as huge as the
first patch itself.

Anyway, I was expecting a lot of discussion when posting my patches on 
the gcc list, but I got less reactions than expected. Maybe gcc folks
were not so interested in C++ performance or were just happy with the
suggestions.

> (If anybody's lost, the patch was discussed in Aug 01 and later in
> Apr/May 02; most of the discussion was on gcc, not gcc-patches.
> The discussion does mention documentation changes but they don't
> seem to have been checked in.

That's a pity.

> I'm having a little trouble tracking
> this; Kurt Garloff isn't in MAINTAINERS, so I wonder if somebody
> checked this in for him?)

No, I'm not a gcc MAINTAINER. I lack the time to concentrate on gcc work
and I think there are others who are more qualified than me.

I believe the first version of the patch got checked in by Gerald. 
I don't know what other versions went in later and who did the check ins.
Maybe nobody told me, maybe I don't remember.

But, if this discussion is starting up again, I would like to propose
my patch v4 (from the above webpage) for inclusion.
I would appreciate a lot to get (positive and negative) criticism on it. 

If somebody would like to have the patches rediffed against current
CVS, I can provide those.

PS: Dale, if you're seriously starting to look at the inlining code
 and you happen to have a bit of time, please go ahead and improve 
 the heuristics. One of the most obvious optimizations is keeping track
 of the level (and the estimated iteration count) of loops we're called
 from. Take it into account when taking an inlining decision. It's easy
 yet very efficient.
 Next improvement would be to calculate the complete call graph and then
 take decisions on where to cut it based on inlining limits, loop depth
 and function size. There you would probably start with inlining from
 the leaf functions not the trunk. This is slightly more work, obviously.
 I guess the importance of inlinig heuristics has been underestimated
 previously. Maybe due to the fact that people were focused on C which
 does not make such heavy use of inlining as C++ (and other OOP lan-
 guages).

Best regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug introduced by inlining patch
  2002-09-04 11:08   ` Kurt Garloff
@ 2002-09-04 11:35     ` Daniel Berlin
  2002-09-04 11:51       ` Kurt Garloff
  2002-09-13  4:47     ` Gerald Pfeifer
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Berlin @ 2002-09-04 11:35 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Dale Johannesen, gcc-patches, gcc

>
>  Next improvement would be to calculate the complete call graph
>  and then
>  take decisions on where to cut it based on inlining limits, loop depth
>  and function size. There you would probably start with inlining from
>  the leaf functions not the trunk.

Once again, there is never a good reason to inline starting at the 
leaves when you have good information like loop depth, estimated call 
counts, etc.
Haven't we gone over this before?
No good compiler does it (look at Intel's compiler, Pro64, just about 
every compiler producing good code).
This is because if you know the functions with loops and loop depth, 
you want to inline functions *into* them, not take random leaf 
functions and try to figure out where they are called from, and whether 
they are called in loops, etc.
--Dan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug introduced by inlining patch
  2002-09-04 11:35     ` Daniel Berlin
@ 2002-09-04 11:51       ` Kurt Garloff
  2002-09-04 15:56         ` Daniel Berlin
  0 siblings, 1 reply; 6+ messages in thread
From: Kurt Garloff @ 2002-09-04 11:51 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: Dale Johannesen, gcc-patches, gcc

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

Hi Daniel,

On Wed, Sep 04, 2002 at 02:35:44PM -0400, Daniel Berlin wrote:
> > Next improvement would be to calculate the complete call graph
> > and then
> > take decisions on where to cut it based on inlining limits, loop depth
> > and function size. There you would probably start with inlining from
> > the leaf functions not the trunk.
> 
> Once again, there is never a good reason to inline starting at the 
> leaves when you have good information like loop depth, estimated call 
> counts, etc.

Quite strong statement.

> Haven't we gone over this before?

Yes. Well, apparently neither of us succeeded in convincing the other.
Basically your argument is ...

> No good compiler does it (look at Intel's compiler, Pro64, just about 
> every compiler producing good code).

... which I do not consider very convincing, whereas mine is that you'll
have more calls to leaves than somewhere else in the call tree.
(You can construct cases where this is not the case of course, as there
 are conditionals, but in average it will be true.)

> This is because if you know the functions with loops and loop depth, 
> you want to inline functions *into* them, not take random leaf 
> functions and try to figure out where they are called from, and whether 
> they are called in loops, etc.

But if you gather the complete information of your call tree, your 
inlining decisions will be independent from where you start with it,
so this discussion is a noop. Rather an implementation detail.

Basically, you will know for every function in your call tree what
1size is and what the loop depth level is (and ideally have some estimate 
for the iteration counts, in terms of small vs medium vs large).
You'll use some algorithm to take the ideal decisions and it should
not matter at what functions it looks first.

In presence of very incomplete information, I'd still bet that starting
from the leaves is better.

PS: I would love to have enough time to work on this.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug introduced by inlining patch
  2002-09-04 11:51       ` Kurt Garloff
@ 2002-09-04 15:56         ` Daniel Berlin
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Berlin @ 2002-09-04 15:56 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Dale Johannesen, gcc-patches, gcc


On Wednesday, September 4, 2002, at 02:51  PM, Kurt Garloff wrote:

> Hi Daniel,
>
> On Wed, Sep 04, 2002 at 02:35:44PM -0400, Daniel Berlin wrote:
>>> Next improvement would be to calculate the complete call graph
>>> and then
>>> take decisions on where to cut it based on inlining limits, loop 
>>> depth
>>> and function size. There you would probably start with inlining from
>>> the leaf functions not the trunk.
>>
>> Once again, there is never a good reason to inline starting at the
>> leaves when you have good information like loop depth, estimated call
>> counts, etc.
>
> Quite strong statement.
>
>> Haven't we gone over this before?
>
> Yes. Well, apparently neither of us succeeded in convincing the other.
> Basically your argument is ...
>
>> No good compiler does it (look at Intel's compiler, Pro64, just about
>> every compiler producing good code).
>
> ... which I do not consider very convincing, whereas mine is that 
> you'll
> have more calls to leaves than somewhere else in the call tree.
I'm curious as to two things:
1. Why you think this isn't convincing.  Do you honestly think that 
nobody has spent large amounts of time studying this very issue, doing 
benchmarking, etc, before choosing trunk->leaves over leaves->trunk?  
In other words, that everyone got it wrong, and that these compilers 
would do a lot better if they followed a bottom up approach.
2. Why you think on average, most calls are to leaves.  Most functions 
we want to inline tend to call other functions, not nothing, in my 
experience.  Look at profiles of gcc, fer instance.


> (You can construct cases where this is not the case of course, as there
>  are conditionals, but in average it will be true.)
>
You have some statistics to back this up?

>> This is because if you know the functions with loops and loop depth,
>> you want to inline functions *into* them, not take random leaf
>> functions and try to figure out where they are called from, and 
>> whether
>> they are called in loops, etc.
>
> But if you gather the complete information of your call tree, your
> inlining decisions will be independent from where you start with it,
> so this discussion is a noop. Rather an implementation detail.
Yes, i'm quite aware, however, starting at the trunk will let you do it 
in a single walk.
Starting at the leaves wouldn't (unless you've got a callee graph 
computed as well, which would require at least as much work as 
computing the call graph, and isn't as useful, since most things don't 
want to walk bottom-up)
>
> In presence of very incomplete information, I'd still bet that starting
> from the leaves is better.

Very incomplete, sure.
But it's easy to make sure we have enough info.

> PS: I would love to have enough time to work on this.
>
I'm happy to give you code that gives you the call graph construction, 
including information on which functions have loops, if it would speed 
it up.

> Regards,
> -- 
> Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
> GPG key: See mail header, key servers         Linux kernel development
> SuSE Linux AG, Nuernberg, DE                            SCSI, Security
> <mime-attachment>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug introduced by inlining patch
  2002-09-04 11:08   ` Kurt Garloff
  2002-09-04 11:35     ` Daniel Berlin
@ 2002-09-13  4:47     ` Gerald Pfeifer
  1 sibling, 0 replies; 6+ messages in thread
From: Gerald Pfeifer @ 2002-09-13  4:47 UTC (permalink / raw)
  To: Kurt Garloff; +Cc: Dale Johannesen, gcc-patches, gcc

On Wed, 4 Sep 2002, Kurt Garloff wrote:
> Well, in my patch collection at
> http://www.garloff.de/kurt/freesoft/gcc/
> there are patches for documentation included.
> But only in the newer patches, and I'm afriad an older one went into
> gcc.

I did some digging and found that

  2002-04-27  Kurt Garloff <garloff@suse.de>

        * tree-inline.c (inlinable_function_p): Improve heuristics
        by using a smoother function to cut down allowable inlinable size.
        * param.def: Add parameters max-inline-insns-single,
        max-inline-slope, min-inline-insns that determine the exact
        shape of the above function.
        * param.h: Likewise.

was installed 27-Apr-2002 by rth.

> I believe the first version of the patch got checked in by Gerald.

I believe there were further discussion during the month of May, among
rth, you and me (where I did some performance testing) and also Andreas
Jäger (concerning SPEC) but somehow it seems this thread then died down
without any further patch being applied...

> But, if this discussion is starting up again, I would like to propose
> my patch v4 (from the above webpage) for inclusion.
> I would appreciate a lot to get (positive and negative) criticism on it.

...therefore I'd like to suggest that you verify your latest patch
against current mainline and submit it to gcc-patches.

Gerald

> PS:[...] I guess the importance of inlinig heuristics has been underestimated
>  previously. Maybe due to the fact that people were focused on C which
>  does not make such heavy use of inlining as C++ (and other OOP lan-
>  guages).

Fully agreed.
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-09-13 11:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020904071122.GA15631@nbkurt.etpnet.phys.tue.nl>
2002-09-04 10:18 ` bug introduced by inlining patch Dale Johannesen
2002-09-04 11:08   ` Kurt Garloff
2002-09-04 11:35     ` Daniel Berlin
2002-09-04 11:51       ` Kurt Garloff
2002-09-04 15:56         ` Daniel Berlin
2002-09-13  4:47     ` Gerald Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).