Hi Gerald, Andreas,

On Thu, Apr 25, 2002 at 06:36:10PM +0200, Gerald Pfeifer wrote:
> On Wed, 24 Apr 2002, Kurt Garloff wrote:
> > It would be nice if this patch
> > http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff
> > would be tested by more people and integrated into 3.1.
> 
> This second patch (partially) fixes a very bad regression we've been
> having since GCC 3.0; build time and binary size seem to be fine, though
> we seem to degrade slightly for some of the other benchmarks.
> 
> I'd really like to see what this does to SPEC -- Andreas, could you give
> it a try?

I created a new inline accounting patch, which should prevent -O3 
(-finline-functions) from delivering worse performance than -O2 for code
that already has the mostimporatnt functions marked inline.

As it turned out, it is not so good to limit the RTL inlining (integrate.c)
for functions selected by -finline-functions. For the tree-inliner it
is very useful, as the tree inliner does cut off inlining after some
repeated inlining in order to limit compile-time resource requirements.
Maybe some more experiments are needed here.

The patch is at
http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.2.diff
and has been diffed against a 3.1-20020422 with my inline heuristics patch
v3.6 applied.
http://www.garloff.de/kurt/freesoft/gcc/g++310-rec-inline-heuristics-v3.6.diff

Here are my benchmark results.
(Tests performed on 2xpIII-1GHz, Linux-2.4.18, glibc-2.2.5; I left
 max-inline-slope and min-inline-insns alone.)

        max max                libbench_double libbench_cplx_double
 g++    inl+inl        build     run   binary      run   binary
            single    (times u+s in s)
3.1       600 -O2      27.52    16.00   82579     18.97   95909
+3.6      600 -O2      29.02    15.96   82431     18.90+  95780
+3.6+1.2  600 -O2      29.17    16.02   82431     18.87+  95780

3.1      2500 -O2      48.32    15.97   86017     18.96  111912
+3.6     2500+1250-O2  48.12    15.98   86049     19.01  111944
+3.6+1.2 2500+1250-O2  48.50    15.98   86049     18.99  111944    
+3.6     2500+ 300-O2  37.33    15.99   83395     18.88+ 105127
+3.6+1.2 2500+ 300-O2  37.41    15.94   83395     18.88+ 105127

3.1       600 -O3      23.88    16.65-  82667     18.98   94805
+3.6      600 -O3      28.67    16.65-  84809     19.04   96097
+3.6+1.2  600 -O3      30.40    16.62-  99900     19.02  112262

3.1      2500 -O3     136.88    15.78+ 137523     19.08  165986
+3.6     2500+1250-O3 145.06    15.80+ 139550     19.21- 168431   
+3.6+1.2 2500+1250-O3  64.15    15.82+  98138     18.92  128517
+3.6     2500+ 300-O3  38.07    16.64-  85845     19.04  108405
+3.6+1.2 2500+ 300-O3  37.46    16.70-  94113     19.00  117715

This chart does give some unexpected results.

It seems the cplx_double benchmark is almost unaffected by the patch and by
the increased inlining. All time are around 19.0. For -O3 with 2500+1250 and
the v3.6 patch (max-inline-insns + max-inline-insns-single), we are clearly
over the top. The v1.2 patch fixes that. Compile time is reduced to a
reasonable number again and performance is good. Some results are
around 18.9 (v3.6-600-O2, v3.6-2500-300-O2, v3.6+v1.2-2500-300,
v3.6-v1.2-2500-1250-O3).

Looking at the double results, we have three groups: 15.8, 16.0, and 16.6.
The worst results are for a low inline limit (600) with -O3, independent of
patches applied. With v3.6 (with or without 1.2), -O3 and a small single-fn
limit and a large overall one (2500-300), the bad score is received.
The best results are for a lot of inlining (2500 resp. 2500-1250) and -O3.
From those, build times and binary sizes are quite different: With both
patches applied, only half the compile time is needed and a 1.4 times
smaller binary is produced.

The binary sizes are quite surprising. The v1.2 patch does not do anything
for -O2 as expected. For -O3 it does limit the tree inlining. Funny enough,
for small max-inline-insns-single values this leads to _larger_ binaries!
Apparently the smaller chunks get later inlined by the RTL inliner
(integrate) leading to more inlining.
For larger single fn inlining limits, the effects of the v1.2 patch are more
close to what can be expected. 

I'd be curious what other people get.

Regards,
--
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE Linux AG, Nuernberg, DE                            SCSI, Security