From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-50387-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 29621 invoked by alias); 21 Apr 2002 11:32:40 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 29606 invoked from network); 21 Apr 2002 11:32:38 -0000
Received: from unknown (HELO atrey.karlin.mff.cuni.cz) (195.113.31.123)
  by sources.redhat.com with SMTP; 21 Apr 2002 11:32:38 -0000
Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018)
	id 62CBE4F969; Sun, 21 Apr 2002 13:32:38 +0200 (CEST)
Date: Sun, 21 Apr 2002 05:46:00 -0000
From: Jan Hubicka <jh@suse.cz>
To: Michel LESPINASSE <walken@zoy.org>
Cc: gcc list <gcc@gcc.gnu.org>
Subject: Re: GCC performance regression - up to 20% ?
Message-ID: <20020421113238.GC16602@atrey.karlin.mff.cuni.cz>
References: <20020421005718.GA16378@zoy.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20020421005718.GA16378@zoy.org>
User-Agent: Mutt/1.3.27i
X-SW-Source: 2002-04/txt/msg01039.txt.bz2

> 
> libmpeg2, on athlon tbird 950, with mmx optimizations:
> gcc-3.0 is about 2% slower than 2.95.4
> gcc-3.1 snapshot is about 10% slower than 2.95.4
> 
> libmpeg2, on athlon tbird 950, using pure C code:
> gcc-3.0 is about 4.5% slower than 2.95.4
> gcc-3.1 snapshot is about 5.5% slower than 2.95.4
> 
> libmpeg2, on celeron 366, with mmx optimizations:
> gcc-3.0 is about 4% slower than 2.95.4
> gcc-3.1 snapshot is about 20.5% slower than 2.95.4 (!!!!)
> 
> These results are all very repeatable. the celeron 366 results are the
> most worrying, as this processor already has borderline performance
> for decoding mpeg2 streams.

Are you able to figure out what exactly makes the code slow? Having self
contained testcase will definitly help a lot.  WHat flags do you use?

I would be quite curious whether using profile feedback helps.
(see documentation of -fprofile-arcs and -fbranch-probabilities)
you can just have some badly predicted branch in the innermost
loop.

Problem of such code usualy is fact that it is tuned to avoid problems on one
particular version of gcc, so even when new version os faster overall, it is
slower in such places.  We've hit similar case with Athlon matrix
multiplication code and such problems are usually easy to fix on gcc side.
> 
> Is there a known performance regression in current GCCs (say, do they
> get lower SPECint scores ?) or is it only with my code ?

No, the SPECint numbers are quite consistenly higher than in any previous
release. See http://www.suse.de/~aj/SPEC
In fact no previous release had such a huge gap in perofrmance.
> 
> Also, is there anything I could do in my code to enhance performance
> with newer gcc versions ? One thing I noticed is that 3.1 snapshot
> produces less inlining than 3.0 or 2.95. This probably accounts for
> some of the slowdown I see when using mmx optimizations, as my mmx
> routines are written using a few routines that I really expect to get
> inlined. Is there any way I can get back control about that, so that
> gcc honours the inline keyword ? I have not managed to do this either.
THere is parameter to increase inline threshold as well as allwaysinline
function attribute. See the documentation.

Honza
> 
> BTW, these two apps I mentionned can be found at
> http://libmpeg2.sourceforge.net/
> http://liba52.sourceforge.net/
> 
> Puzzled,
> 
> -- 
> Michel "Walken" LESPINASSE
> Is this the best that god can do ? Then I'm not impressed.