From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12284 invoked by alias); 6 Dec 2004 15:03:46 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 12238 invoked by alias); 6 Dec 2004 15:03:39 -0000 Date: Mon, 06 Dec 2004 15:03:00 -0000 Message-ID: <20041206150339.12237.qmail@sourceware.org> From: "hubicka at ucw dot cz" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> References: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression X-Bugzilla-Reason: CC X-SW-Source: 2004-12/txt/msg00845.txt.bz2 List-Id: ------- Additional Comments From hubicka at ucw dot cz 2004-12-06 15:03 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression > > ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 14:31 ------- > Subject: Re: [4.0 Regression] Inlining limits > cause 340% performance regression > > On 6 Dec 2004, hubicka at ucw dot cz wrote: > > > > > the order of inlining decisions affecting this. I would be curious how > > > > those results compare to leafify and whether the 0m27s is not caused by > > > > missoptimization. > > > > > > You can check for misoptimization by looking at the final output. > > > I.e. the rh,vx,vy and vz sums should be nearly zero, the T sum > > > will increase with the number of iterations. > > > > > > With mainline, -O2 -fpeel-loops -march=pentium4 -ffast-math > > > -D__NO_MATH_INLINES (we still need explicit -fpeel-loops for > > > unrolling for (i=0;i<3;++i) a[i]=0;), I need 0m17s for -n 10 with > > > leafification turned on, with it turned off, runtime increases > > > to 0m31s with --param inline-unit-growth=175. > > > > I compiled with -O3, would be possible for you to measure how much > > speedup you get on mainline with -O3 and -O3+lefify? That would > > probably allow me relate those numbers somehow. > > 0m23s for -O3+leafify, 1m54s for -O3, 0m35s for -O3 --param > inline-unit-growth=150. Looks like I get 4fold speedup on tree profiling with profiling compared to tree profiling on mainline that is equivalent to speedup you are seeing for leafify patch. That sounds pretty prommising (so the new heuristics can get the leafify idea without the hint from user and hitting the code growth problems). It would be nice to experiment with this a little - in general the heuristics can be viewed as having three players. There are the limits (specified via --param) that it must obey, there is the cost model (estimated growth for inlining into all callees without profiling and the execute_count to estimated growth for inlining to one call with profiling) and the bin packing algorithm optimizing the gains while obeying the limits. With profiling in the cost model is pretty much realistic and it would be nice to figure out how the performance behave when the individual limits are changed and why. If you have some time for experimentation, it would be very usefull. I am trying to do the same with SPEC and GCC but I have dificulty to play with pooma or Gerald's application as I have little understanding what is going there. I will try it myself next but any feedback can be very usefull here. My plan is to try undersand the limits first and then try to get the cost model better without profiling as it is bit too clumpsy to do both at once. Honza > > Richard. > > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704