[Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression - rguenth at tat dot physik dot uni-tuebingen dot de

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at tat dot physik dot uni-tuebingen dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/18704] [4.0 Regression] Inlining limits cause 340% performance regression
Date: Tue, 07 Dec 2004 15:09:00 -0000	[thread overview]
Message-ID: <20041207150930.12188.qmail@sourceware.org> (raw)
In-Reply-To: <20041128181553.18704.rguenth@tat.physik.uni-tuebingen.de>


------- Additional Comments From rguenth at tat dot physik dot uni-tuebingen dot de  2004-12-07 15:09 -------
Subject: Re:  [4.0 Regression] Inlining limits
 cause 340% performance regression

On 7 Dec 2004, hubicka at ucw dot cz wrote:

> > Yes, it seems so.  Really nice improvement.  Though profiling is
> > sloooooow.  I guess you avoid doing any CFG changing transformation
> > for the profiling stage?  I.e. not even inline the simplest functions?
>
> I can inline but only after actually instrumenting the functios.  That
> should minimize the costs, but I also noticed that tramp3d is
> surprisingly a lot slower with profiling.
>
> > That would be the reason the Intel compiler is unusable with profiling
> > for me.  -fprofile-generate comes with a 50fold increase in runtime!
>
> -fprofile-generate is actually package of
> -fprofile-arcs/-fprofile-values + -fprofile-values-transformations
> It might be interesting to figure out whether -fprofile-arcs itslef
> brings similar slowdown.  Only reason why this can happen I can think of
> is the fact that after instrumenting we again inline a lot less or we
> produce too many redundant counter.  Perhaps it would make sense to
> think about inlining functions reducing code size before instrumenting
> as we would do that anyway, but it will be tricky to get gcov output and
> -f* flags independence right then.

Hm.  There are a lot of counters - maybe it is possible to merge
the counters themselves?  The resulting asm of tramp3d-v3 consists
of 30% addl/adcl lines for adding the profiling counts - where
the total number of lines is just wc -l of a -S -fverbose-asm compilation.
That's very much a lot.  And additions are in cache unfriedly sequence,
too - dunno which optimization pass could improve this though.  Consider

static inline void foo() {}
void bar() { foo(); }

which for -O2 -fprofile-generate produces

bar:
        addl    $1, .LPBX1
        pushl   %ebp
        movl    %esp, %ebp
        adcl    $0, .LPBX1+4
        addl    $1, .LPBX1+16
        popl    %ebp
        adcl    $0, .LPBX1+20
        addl    $1, .LPBX1+8
        adcl    $0, .LPBX1+12
        ret

that should be

bar:
        addl    $1, .LPBX1
        pushl   %ebp
        movl    %esp, %ebp
        adcl    $0, .LPBX1+4
        addl    $1, .LPBX1+8
        adcl    $0, .LPBX1+12
        addl    $1, .LPBX1+16
        adcl    $0, .LPBX1+20
	ret

And of course all the three counters could be merged.  But that
would need a changed gcov file format somehow representing a
callgraph with merged edges.

The intel compiler is so much worse here because all the
counter adding is done thread-safe in a library (i.e. they
have an extra call for every edge and do not do any inlining).

> How our profilng performance is compared to ICC?

ICC is a lot worse.  ICC with -prof_gen causes a 10000 fold slowdown
(if the current snapshot of icc doesn't segfault compiling the tramp3d
testcase) - ICC is completely unusable for me.  So - GCC is great!

> > > It would be nice to experiment with this a little - in general the
> > > heuristics can be viewed as having three players.  There are the limits
> > > (specified via --param) that it must obey, there is the cost model
> > > (estimated growth for inlining into all callees without profiling and
> > > the execute_count to estimated growth for inlining to one call with
> > > profiling) and the bin packing algorithm optimizing the gains while
> > > obeying the limits.
> > >
> > > With profiling in the cost model is pretty much realistic and it would
> > > be nice to figure out how the performance behave when the individual
> > > limits are changed and why.  If you have some time for experimentation,
> > > it would be very usefull.  I am trying to do the same with SPEC and GCC
> > > but I have dificulty to play with pooma or Gerald's application as I
> > > have little understanding what is going there.  I will try it myself
> > > next but any feedback can be very usefull here.
> >
> > I can produce some numbers for the tramp testcase.
> Thanks!  Note that with changling the flags you should not need to
> re-profile now so you can save quite a lot of time.

Ah, thats indeed nice.

Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18704

next prev parent reply	other threads:[~2004-12-07 15:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-28 18:16 [Bug tree-optimization/18704] New: " rguenth at tat dot physik dot uni-tuebingen dot de
2004-11-28 18:20 ` [Bug tree-optimization/18704] [4.0 Regression] " pinskia at gcc dot gnu dot org
2004-11-28 18:22 ` pinskia at gcc dot gnu dot org
2004-11-29 11:05 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-11-29 11:36 ` giovannibajo at libero dot it
2004-11-29 12:10 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-11-29 14:07 ` hubicka at ucw dot cz
2004-12-06  5:20 ` pinskia at gcc dot gnu dot org
2004-12-06  9:53 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-06 12:33 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-06 12:45 ` hubicka at ucw dot cz
2004-12-06 13:18 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-06 13:40 ` hubicka at ucw dot cz
2004-12-06 14:31 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-06 15:03 ` hubicka at ucw dot cz
2004-12-07 14:35 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-07 14:50 ` hubicka at ucw dot cz
2004-12-07 14:52 ` hubicka at ucw dot cz
2004-12-07 15:09 ` rguenth at tat dot physik dot uni-tuebingen dot de [this message]
2004-12-07 15:36 ` rguenth at tat dot physik dot uni-tuebingen dot de
2004-12-07 17:50 ` hubicka at ucw dot cz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041207150930.12188.qmail@sourceware.org \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).