Hi Gerald, Andreas, On Thu, Apr 25, 2002 at 06:36:10PM +0200, Gerald Pfeifer wrote: > On Wed, 24 Apr 2002, Kurt Garloff wrote: > > It would be nice if this patch > > http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.diff > > would be tested by more people and integrated into 3.1. > > This second patch (partially) fixes a very bad regression we've been > having since GCC 3.0; build time and binary size seem to be fine, though > we seem to degrade slightly for some of the other benchmarks. > > I'd really like to see what this does to SPEC -- Andreas, could you give > it a try? I created a new inline accounting patch, which should prevent -O3 (-finline-functions) from delivering worse performance than -O2 for code that already has the mostimporatnt functions marked inline. As it turned out, it is not so good to limit the RTL inlining (integrate.c) for functions selected by -finline-functions. For the tree-inliner it is very useful, as the tree inliner does cut off inlining after some repeated inlining in order to limit compile-time resource requirements. Maybe some more experiments are needed here. The patch is at http://www.garloff.de/kurt/freesoft/gcc/gcc310-inline-func-acct-v1.2.diff and has been diffed against a 3.1-20020422 with my inline heuristics patch v3.6 applied. http://www.garloff.de/kurt/freesoft/gcc/g++310-rec-inline-heuristics-v3.6.diff Here are my benchmark results. (Tests performed on 2xpIII-1GHz, Linux-2.4.18, glibc-2.2.5; I left max-inline-slope and min-inline-insns alone.) max max libbench_double libbench_cplx_double g++ inl+inl build run binary run binary single (times u+s in s) 3.1 600 -O2 27.52 16.00 82579 18.97 95909 +3.6 600 -O2 29.02 15.96 82431 18.90+ 95780 +3.6+1.2 600 -O2 29.17 16.02 82431 18.87+ 95780 3.1 2500 -O2 48.32 15.97 86017 18.96 111912 +3.6 2500+1250-O2 48.12 15.98 86049 19.01 111944 +3.6+1.2 2500+1250-O2 48.50 15.98 86049 18.99 111944 +3.6 2500+ 300-O2 37.33 15.99 83395 18.88+ 105127 +3.6+1.2 2500+ 300-O2 37.41 15.94 83395 18.88+ 105127 3.1 600 -O3 23.88 16.65- 82667 18.98 94805 +3.6 600 -O3 28.67 16.65- 84809 19.04 96097 +3.6+1.2 600 -O3 30.40 16.62- 99900 19.02 112262 3.1 2500 -O3 136.88 15.78+ 137523 19.08 165986 +3.6 2500+1250-O3 145.06 15.80+ 139550 19.21- 168431 +3.6+1.2 2500+1250-O3 64.15 15.82+ 98138 18.92 128517 +3.6 2500+ 300-O3 38.07 16.64- 85845 19.04 108405 +3.6+1.2 2500+ 300-O3 37.46 16.70- 94113 19.00 117715 This chart does give some unexpected results. It seems the cplx_double benchmark is almost unaffected by the patch and by the increased inlining. All time are around 19.0. For -O3 with 2500+1250 and the v3.6 patch (max-inline-insns + max-inline-insns-single), we are clearly over the top. The v1.2 patch fixes that. Compile time is reduced to a reasonable number again and performance is good. Some results are around 18.9 (v3.6-600-O2, v3.6-2500-300-O2, v3.6+v1.2-2500-300, v3.6-v1.2-2500-1250-O3). Looking at the double results, we have three groups: 15.8, 16.0, and 16.6. The worst results are for a low inline limit (600) with -O3, independent of patches applied. With v3.6 (with or without 1.2), -O3 and a small single-fn limit and a large overall one (2500-300), the bad score is received. The best results are for a lot of inlining (2500 resp. 2500-1250) and -O3. From those, build times and binary sizes are quite different: With both patches applied, only half the compile time is needed and a 1.4 times smaller binary is produced. The binary sizes are quite surprising. The v1.2 patch does not do anything for -O2 as expected. For -O3 it does limit the tree inlining. Funny enough, for small max-inline-insns-single values this leads to _larger_ binaries! Apparently the smaller chunks get later inlined by the RTL inliner (integrate) leading to more inlining. For larger single fn inlining limits, the effects of the v1.2 patch are more close to what can be expected. I'd be curious what other people get. Regards, -- Kurt Garloff Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE Linux AG, Nuernberg, DE SCSI, Security