Re: GCC performance regression - up to 20% ?

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Andreas Jaeger <aj@suse.de>
To: Michel LESPINASSE <walken@zoy.org>
Cc: Jan Hubicka <jh@suse.cz>, gcc list <gcc@gcc.gnu.org>
Subject: Re: GCC performance regression - up to 20% ?
Date: Mon, 22 Apr 2002 00:17:00 -0000	[thread overview]
Message-ID: <ho1yd85hts.fsf@gee.suse.de> (raw)
In-Reply-To: <20020422061937.GA27171@zoy.org> (Michel LESPINASSE's message of "Sun, 21 Apr 2002 23:19:37 -0700")

Michel LESPINASSE <walken@zoy.org> writes:

> Hi,
>
> I spent the afternoon looking at this issue and trying different
> flags. I think I got some interesting results, though I still dont get
> the full picture.
>
> The first thing I tweaked was to make the inlining work the way I
> wanted. As I understand it, in gcc 2.95 and 3.0, when compiling with
> usual options (-O3), the behaviour was that functions declared inline
> were always inlined, and functions not declared inline were inlined
> only if gcc thought they were simple enough. While in gcc 3.1
> snapshot, it looks like the inline keyword only makes the function an
> inline candidate. I will argue later on why I think the old behaviour
> (inline being always honoured) is more useful (basically, function
> specialization is harder to do if inline is not honoured). In the mean
> time, I worked around the issue by using an absurdly high value
> -finline-limit=6000 (I tried 2000 first which was not sufficient), and
> then I also added -fno-inline-functions so that I dont get everything
> inlined when I dont ask for it.
>
> With the custom inlining, gcc-3.1 snapshot is about 4% slower than
> gcc-2.95, on my athlon tbird. This is a progress, since without the
> custom inlining, the slowdown was 10%.
>
> Then I tried to figure out where the slowdown is, using gprof. And
> this is where things get really interesting: gprof tells me that the
> code compiled with 3.1 is faster, but 'time' tells me that the user
> time spent executing that code is higher with 3.1 than with 2.95. I'm
> not sure what to make of this, but I think this might give you some
> clues, so I'll describe it in more detail. I'm not sure what the
> overhead is, but it seems to be right in gprof's blind spot.

Using -pg changes the binary and the results might not be the same as
without the flag.  You could try to use the Performance counters of
your Athlon using e.g. oprofile (http://oprofile.sourceforge.net).

> I have to describe my gprof methodology first. 'normal' gprof
> (i.e. compiling every file with -pg) seems to have a high overhead for
> me, plus it conflicts with -fomit-frame-pointer which I usualy use. So
> I tend to use what I'll call 'light' gprof, which is as follows:
> everything is compiled with -g -O3 -fomit-frame-pointer -mcpu=pentiumpro
> except main.c which is compiled with -g -O3 -p and is not cpu intensive.
> 'light' gprof can not help me figure out call graphs, but it should
> normally be good enough to obtain a flat profile.
>
> When using gcc 2.95, 'light' gprof works great. In a flat profile, the
> last number in the 'cumulative seconds' column always matches (within
> 0.1s) the user time as reported by the 'time' command. I think I can
> trust the flat profile information.
>
> When using gcc 3.0 or 3.1 shapshot though, there is a several-seconds
> gap between 'cumulative seconds' and 'user time'. I dont understand
> what happens during this time - could it be that 3.x has a higher
> overhead for function calls, and that this overhead is not accounted
> for in gprof ? I dont understand what happens here, but I get the
> feeling that this might be key to the slowdown I observe.
>
> To summarize the gprof thing, I observe
> 3.1 gprof time < 2.95 gprof time = 2.95 user time < 3.1 user time
>
> If I use 'normal' gprof (every file gets -pg and I remove the
> -fomit-frame-pointer), that inflates the user times a lot, but I still
> get 3.1 gprof time < 2.95 gprof time and 2.95 user time < 3.1 user time.
>
> Also to answer Jan's question, I did try to use -fbranch-probabilities
> and it helped, with this option 3.1 snapshot is about 0.5% slower (in
> user time) than 2.95, instead of 4% slower without. It's still
> frustrating though, because gprof pretends 3.1 is about 10% faster
> than 2.95, so there may be a lot of untapped performance.
>
>
> OK, so this is all I can say for now. I hope someone will know gprof
> internals better than I do and understand why gprof returns times that
> are smaller in 3.1 shapshot vs. 2.95, while time does not agree.
>
> I thought I should add a few comments about the structure of libmpeg2
> code, as I suppose it is somewhat unusual. The time-consuming loop is
> pretty big, and includes several function calls. Each loop execution
> decodes a full mpeg2 macroblocks (=256 pixels), during this decoding
> it calls the IDCT (cosine transform) function up to 6 times, and some
> MC (motion compensation) functions up to 12 times. All these calls are
> indirect, using function pointers. These called functions are all
> kinda short, executing in about 0.4 microseconds per call on average.
> Yes, thats a lot of function calls, and I suspect SPECint does not do
> as many, which could maybe explain why it's not seeing the same
> performance regression that I see ? It would seem consistent with the
> gprof blind spot thing, too.

It would really help a lot if you could try to write some small
program that behaves the same way (performance wise) as this routine
libmpeg2.

>
> Finally, I thought I should include a small explanation about what I
> do with inlines and why I'd like to have a gcc option so that the
> inline keyword always inlines a function, instead of just making that
> function an inline candidate. One example of that would be in the MC
> functions mentionned above. To get acceptable speed, these are written
> using mmx assembly operations. Actually they even have several
> implementations, one in straight c for compatibility, one in straight
> mmx, one using sse integer instructions (as present on PIII and
> athlon), and one using 3dnow instructions. It's easy to select at init
> time which implementation to use, since the MC functions are only
> called thru function pointers. The only difference between the sse
> version and the 3dnow version, is that the sse version uses the pavgb
> instruction, while the 3dnow version uses pavgusb. These instructions
> have different opcodes but the same behaviour, so I wrote this using
> an inline function:

>
> static inline void MC_generic (...., int cpu)
> {
> 	.... do stuff ....
> 	if (cpu == CPU_3DNOW)
> 		pavgusb (....);
> 	else
> 		pavgb ();
> 	.... do more stuff ....
> }
>
> void MC_3dnow (.....)
> {
> 	MC_generic (....., CPU_3DNOW);
> }
>
> void MC_sse (.....)
> {
> 	MC_generic (....., CPU_SSE);
> }
>
> In gcc-2.95 and gcc-3.0, this construct works out nicely since
> MC_generic gets inlined, and then the compiler figures out that the
> cpu test in it is a constant and generates clean code for both the
> 3dnow and the sse function. This kind of specialisation is sometimes
> very convenient, but it requires that the inline keyword does an
> unconditional inlining, not subject to compiler heuristics. I would
> really love to see a gcc option to make inlining unconditional when
> using the inline keyword, and have gcc use its heuristics when there
> is no such keyword.

You might want to use the always_inline function attribute for your
inline functions in 3.1.  It makes the inlining unconditional.


>
> This email is longer than I thought it would be, thanks a lot for
> those who're still reading me :)

;-)

Andreas
-- 
 Andreas Jaeger
  SuSE Labs aj@suse.de
   private aj@arthur.inka.de
    http://www.suse.de/~aj

next prev parent reply	other threads:[~2002-04-22  7:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-20 18:13 Michel LESPINASSE
2002-04-21  3:41 ` Andreas Jaeger
2002-04-21  5:46 ` Jan Hubicka
2002-04-21 23:46   ` Michel LESPINASSE
2002-04-22  0:17     ` Andreas Jaeger [this message]
2002-04-22 17:42       ` Michel LESPINASSE
2002-04-22 18:20         ` Andrew Pinski
2002-04-22 18:30           ` Carlo Wood
2002-04-22 19:25             ` Andrew Pinski
2002-04-24 15:24               ` Allan Sandfeld Jensen
2002-04-22  7:11     ` Carlo Wood
2002-04-22  7:11       ` Falk Hueffner
2002-04-22  7:34       ` law
2002-04-22  8:23       ` Johannes Stezenbach
2002-04-22  1:47 ` Gerald Pfeifer
2002-04-22 14:33 ` GCC performance regression - its memset ! Michel LESPINASSE
2002-04-22 14:58   ` Jason R Thorpe
2002-04-22 15:27     ` Michel LESPINASSE
2002-04-22 16:59     ` Segher Boessenkool
2002-04-22 17:10   ` Richard Henderson
2002-04-22 17:13     ` Michel LESPINASSE
2002-04-22 17:39       ` Richard Henderson
2002-04-22 17:49         ` Michel LESPINASSE
2002-04-23  5:03           ` Falk Hueffner
2002-04-23  6:53             ` Andreas Schwab
2002-04-23  2:39       ` Jan Hubicka
2002-04-23 13:36         ` Michel LESPINASSE
2002-04-24  0:30           ` Jan Hubicka
2002-04-24  0:50             ` Jakub Jelinek
2002-04-24  1:00               ` Jan Hubicka
2002-04-24  3:32           ` Jan Hubicka
     [not found] <20020421005718.GA16378@zoy.org.suse.lists.egcs>
     [not found] ` <20020421113238.GC16602@atrey.karlin.mff.cuni.cz.suse.lists.egcs>
2002-04-21  7:58   ` GCC performance regression - up to 20% ? Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ho1yd85hts.fsf@gee.suse.de \
    --to=aj@suse.de \
    --cc=gcc@gcc.gnu.org \
    --cc=jh@suse.cz \
    --cc=walken@zoy.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).