From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16462 invoked by alias); 7 Dec 2004 14:35:33 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 16311 invoked by alias); 7 Dec 2004 14:35:25 -0000 Date: Tue, 07 Dec 2004 14:35:00 -0000 Message-ID: <20041207143525.16309.qmail@sourceware.org> From: "rguenth at tat dot physik dot uni-tuebingen dot de" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> References: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression X-Bugzilla-Reason: CC X-SW-Source: 2004-12/txt/msg00989.txt.bz2 List-Id: ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 14:35 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 6 Dec 2004, hubicka at ucw dot cz wrote: > Looks like I get 4fold speedup on tree profiling with profiling compared > to tree profiling on mainline that is equivalent to speedup you are > seeing for leafify patch. That sounds pretty prommising (so the new > heuristics can get the leafify idea without the hint from user and > hitting the code growth problems). Yes, it seems so. Really nice improvement. Though profiling is sloooooow. I guess you avoid doing any CFG changing transformation for the profiling stage? I.e. not even inline the simplest functions? That would be the reason the Intel compiler is unusable with profiling for me. -fprofile-generate comes with a 50fold increase in runtime! > It would be nice to experiment with this a little - in general the > heuristics can be viewed as having three players. There are the limits > (specified via --param) that it must obey, there is the cost model > (estimated growth for inlining into all callees without profiling and > the execute_count to estimated growth for inlining to one call with > profiling) and the bin packing algorithm optimizing the gains while > obeying the limits. > > With profiling in the cost model is pretty much realistic and it would > be nice to figure out how the performance behave when the individual > limits are changed and why. If you have some time for experimentation, > it would be very usefull. I am trying to do the same with SPEC and GCC > but I have dificulty to play with pooma or Gerald's application as I > have little understanding what is going there. I will try it myself > next but any feedback can be very usefull here. I can produce some numbers for the tramp testcase. > My plan is to try undersand the limits first and then try to get the > cost model better without profiling as it is bit too clumpsy to do both > at once. Do you have some written overview of the cost model? Richard. -- Richard Guenther WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704