From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12048 invoked by alias); 6 Dec 2004 13:18:31 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 12023 invoked by alias); 6 Dec 2004 13:18:25 -0000 Date: Mon, 06 Dec 2004 13:18:00 -0000 Message-ID: <20041206131825.12022.qmail@sourceware.org> From: "rguenth at tat dot physik dot uni-tuebingen dot de" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> References: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression X-Bugzilla-Reason: CC X-SW-Source: 2004-12/txt/msg00837.txt.bz2 List-Id: ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 13:18 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, hubicka at ucw dot cz wrote: > The cfg inliner per se is not too interesting. What matters here is the > code size esitmation and profitability estimation. I am playing with > this now and trying to get profile based inlining working. Yes, I guess the cfg inliner and some early dead code removal passes should improve code size metrics for stuff like template struct Foo { enum { val = X::val }; void foo() { if (val) ... else ... } }; with val being const. > For -n10 and tramp3d.cc I need 2m14s on mainline, 1m31s on the current > tree-profiling. With my new implementation I need 0m27s with profile > feedback and 2m53s without. I wonder what makes the new heuristics work > worse without profiling, but just increasing the inline-unit-growth very > slightly (to 155) I get 0m42s. This might be just little unstability in Note that inline-unit-growth is 50 by default, so 155 is not slightly increased. > the order of inlining decisions affecting this. I would be curious how > those results compare to leafify and whether the 0m27s is not caused by > missoptimization. You can check for misoptimization by looking at the final output. I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum will increase with the number of iterations. With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with leafification turned on, with it turned off, runtime increases to 0m31s with --param inline-unit-growth=175. > Unless I will observe it otherwise (on SPEC with intermodule), I will > apply my current patch and try to improve the profitability analysis > without profiling incrementally. Ideally we ought to build estimated > profile and use it, but that needs some work so for the moment I guess I > will try to experiment with making loop depth available to the cgraph > code. Yes, loops could be "auto-leafified", but it will be difficult to statically check if that is worthwhile. Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704