public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at tat dot physik dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression Date: Tue, 07 Dec 2004 15:09:00 -0000 [thread overview] Message-ID: <20041207150930.12188.qmail@sourceware.org> (raw) In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de> ------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 15:09 ------- Subject: Re: [4.0 Regression] Inlining limits cause 340% performance regression On 7 Dec 2004, hubicka at ucw dot cz wrote: > > Yes, it seems so. Really nice improvement. Though profiling is > > sloooooow. I guess you avoid doing any CFG changing transformation > > for the profiling stage? I.e. not even inline the simplest functions? > > I can inline but only after actually instrumenting the functios. That > should minimize the costs, but I also noticed that tramp3d is > surprisingly a lot slower with profiling. > > > That would be the reason the Intel compiler is unusable with profiling > > for me. -fprofile-generate comes with a 50fold increase in runtime! > > -fprofile-generate is actually package of > -fprofile-arcs/-fprofile-values + -fprofile-values-transformations > It might be interesting to figure out whether -fprofile-arcs itslef > brings similar slowdown. Only reason why this can happen I can think of > is the fact that after instrumenting we again inline a lot less or we > produce too many redundant counter. Perhaps it would make sense to > think about inlining functions reducing code size before instrumenting > as we would do that anyway, but it will be tricky to get gcov output and > -f* flags independence right then. Hm. There are a lot of counters - maybe it is possible to merge the counters themselves? The resulting asm of tramp3d-v3 consists of 30% addl/adcl lines for adding the profiling counts - where the total number of lines is just wc -l of a -S -fverbose-asm compilation. That's very much a lot. And additions are in cache unfriedly sequence, too - dunno which optimization pass could improve this though. Consider static inline void foo() {} void bar() { foo(); } which for -O2 -fprofile-generate produces bar: addl $1, .LPBX1 pushl %ebp movl %esp, %ebp adcl $0, .LPBX1+4 addl $1, .LPBX1+16 popl %ebp adcl $0, .LPBX1+20 addl $1, .LPBX1+8 adcl $0, .LPBX1+12 ret that should be bar: addl $1, .LPBX1 pushl %ebp movl %esp, %ebp adcl $0, .LPBX1+4 addl $1, .LPBX1+8 adcl $0, .LPBX1+12 addl $1, .LPBX1+16 adcl $0, .LPBX1+20 ret And of course all the three counters could be merged. But that would need a changed gcov file format somehow representing a callgraph with merged edges. The intel compiler is so much worse here because all the counter adding is done thread-safe in a library (i.e. they have an extra call for every edge and do not do any inlining). > How our profilng performance is compared to ICC? ICC is a lot worse. ICC with -prof_gen causes a 10000 fold slowdown (if the current snapshot of icc doesn't segfault compiling the tramp3d testcase) - ICC is completely unusable for me. So - GCC is great! > > > It would be nice to experiment with this a little - in general the > > > heuristics can be viewed as having three players. There are the limits > > > (specified via --param) that it must obey, there is the cost model > > > (estimated growth for inlining into all callees without profiling and > > > the execute_count to estimated growth for inlining to one call with > > > profiling) and the bin packing algorithm optimizing the gains while > > > obeying the limits. > > > > > > With profiling in the cost model is pretty much realistic and it would > > > be nice to figure out how the performance behave when the individual > > > limits are changed and why. If you have some time for experimentation, > > > it would be very usefull. I am trying to do the same with SPEC and GCC > > > but I have dificulty to play with pooma or Gerald's application as I > > > have little understanding what is going there. I will try it myself > > > next but any feedback can be very usefull here. > > > > I can produce some numbers for the tramp testcase. > Thanks! Note that with changling the flags you should not need to > re-profile now so you can save quite a lot of time. Ah, thats indeed nice. Richard. -- Richard Guenther <richard dot guenther at uni-tuebingen dot de> WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704
next prev parent reply other threads:[~2004-12-07 15:09 UTC|newest] Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top 2004-11-28 18:16 [Bug tree-optimization/18704] New: " rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-28 18:20 ` [Bug tree-optimization/18704] [4.0 Regression] " pinskia at gcc dot gnu dot org 2004-11-28 18:22 ` pinskia at gcc dot gnu dot org 2004-11-29 11:05 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-29 11:36 ` giovannibajo at libero dot it 2004-11-29 12:10 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-11-29 14:07 ` hubicka at ucw dot cz 2004-12-06 5:20 ` pinskia at gcc dot gnu dot org 2004-12-06 9:53 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 12:33 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 12:45 ` hubicka at ucw dot cz 2004-12-06 13:18 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 13:40 ` hubicka at ucw dot cz 2004-12-06 14:31 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-06 15:03 ` hubicka at ucw dot cz 2004-12-07 14:35 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 14:50 ` hubicka at ucw dot cz 2004-12-07 14:52 ` hubicka at ucw dot cz 2004-12-07 15:09 ` rguenth at tat dot physik dot uni-tuebingen dot de [this message] 2004-12-07 15:36 ` rguenth at tat dot physik dot uni-tuebingen dot de 2004-12-07 17:50 ` hubicka at ucw dot cz
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20041207150930.12188.qmail@sourceware.org \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).