From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26786 invoked by alias); 7 Dec 2004 14:50:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 26707 invoked by alias); 7 Dec 2004 14:49:48 -0000 Date: Tue, 07 Dec 2004 14:50:00 -0000 Message-ID: <20041207144948.26706.qmail@sourceware.org> From: "hubicka at ucw dot cz" To: gcc-bugs@gcc.gnu.org In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> References: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> Reply-To: gcc-bugzilla@gcc.gnu.org Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression X-Bugzilla-Reason: CC X-SW-Source: 2004-12/txt/msg00991.txt.bz2 List-Id: ------- Additional Comments From hubicka at ucw dot cz 2004-12-07 14:49 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression > > ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 14:35 ------- > Subject: Re: [4.0 Regression] Inlining limits > cause 340% performance regression > > On 6 Dec 2004, hubicka at ucw dot cz wrote: > > > Looks like I get 4fold speedup on tree profiling with profiling compared > > to tree profiling on mainline that is equivalent to speedup you are > > seeing for leafify patch. That sounds pretty prommising (so the new > > heuristics can get the leafify idea without the hint from user and > > hitting the code growth problems). > > Yes, it seems so. Really nice improvement. Though profiling is > sloooooow. I guess you avoid doing any CFG changing transformation > for the profiling stage? I.e. not even inline the simplest functions? I can inline but only after actually instrumenting the functios. That should minimize the costs, but I also noticed that tramp3d is surprisingly a lot slower with profiling. > That would be the reason the Intel compiler is unusable with profiling > for me. -fprofile-generate comes with a 50fold increase in runtime! -fprofile-generate is actually package of -fprofile-arcs/-fprofile-values + -fprofile-values-transformations It might be interesting to figure out whether -fprofile-arcs itslef brings similar slowdown. Only reason why this can happen I can think of is the fact that after instrumenting we again inline a lot less or we produce too many redundant counter. Perhaps it would make sense to think about inlining functions reducing code size before instrumenting as we would do that anyway, but it will be tricky to get gcov output and -f* flags independence right then. How our profilng performance is compared to ICC? > > > It would be nice to experiment with this a little - in general the > > heuristics can be viewed as having three players. There are the limits > > (specified via --param) that it must obey, there is the cost model > > (estimated growth for inlining into all callees without profiling and > > the execute_count to estimated growth for inlining to one call with > > profiling) and the bin packing algorithm optimizing the gains while > > obeying the limits. > > > > With profiling in the cost model is pretty much realistic and it would > > be nice to figure out how the performance behave when the individual > > limits are changed and why. If you have some time for experimentation, > > it would be very usefull. I am trying to do the same with SPEC and GCC > > but I have dificulty to play with pooma or Gerald's application as I > > have little understanding what is going there. I will try it myself > > next but any feedback can be very usefull here. > > I can produce some numbers for the tramp testcase. Thanks! Note that with changling the flags you should not need to re-profile now so you can save quite a lot of time. > > > My plan is to try undersand the limits first and then try to get the > > cost model better without profiling as it is bit too clumpsy to do both > > at once. > > Do you have some written overview of the cost model? not really, but it is simple for the moment. To estimate size of function I use simple walk of function body cmputing most nodes as 1, division, call and similar badies as 10, NOP and constants as 0. When profiling the priority of inlining edge is number of executions divided by the estimated growth (size of callee minus 10), when not profiling it is the overall growth after inliing to all callees (ie i count number of callees one can inline into and multiply it by size of callee minus 10). You can see the inlining decisions with -fdump-ipa-inline. Honza > > Richard. > > -- > Richard Guenther > WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ > > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704 > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704