From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23084 invoked by alias); 21 Apr 2002 00:57:25 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 23063 invoked from network); 21 Apr 2002 00:57:22 -0000 Received: from unknown (HELO Angel.zoy.org) (12.236.86.18) by sources.redhat.com with SMTP; 21 Apr 2002 00:57:22 -0000 Received: by Angel.zoy.org (Postfix, from userid 1000) id 37DF7B824; Sat, 20 Apr 2002 17:57:18 -0700 (PDT) Date: Sat, 20 Apr 2002 18:13:00 -0000 From: Michel LESPINASSE To: gcc list Subject: GCC performance regression - up to 20% ? Message-ID: <20020421005718.GA16378@zoy.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.28i X-SW-Source: 2002-04/txt/msg01014.txt.bz2 Hi, I have downloaded the latest 3.1 snapshot (20020415) and ran some performance tests. So far I've been impressed by the FP performance, but kinda disappointed by the integer performance. The benchmarks I've run are two libraries I maintain, libmpeg2 and liba52. These are used by several open-source dvd players, and are quite CPU intensive (especially libmpeg2). So here are my results, using gcc 2.95 as a reference: First the good news: liba52 (mostly FP intensive workload) on athlon tbird 950, using -mcpu=pentiumpro: gcc-3.0 is between 4.5% and 6.5% faster than 2.95.4 depending on streams gcc-3.1 snapshot is between 8% and 9.5% faster than 2.95.4 from these measurements 3.1 has a very nice performance, very close to intel's icc. Great work ! Also using -march=athlon-tbird and generating sse code, I can get yet a few extra % of performance. Now the bad news: for libmepg2, which is an integer-only workload, I get a 10% to 20% performance regression between 2.95.4 and 3.1... 3.0 was already slower than 2.95.4, but 3.1 seems to be worse for this workload at least. libmpeg2, on athlon tbird 950, with mmx optimizations: gcc-3.0 is about 2% slower than 2.95.4 gcc-3.1 snapshot is about 10% slower than 2.95.4 libmpeg2, on athlon tbird 950, using pure C code: gcc-3.0 is about 4.5% slower than 2.95.4 gcc-3.1 snapshot is about 5.5% slower than 2.95.4 libmpeg2, on celeron 366, with mmx optimizations: gcc-3.0 is about 4% slower than 2.95.4 gcc-3.1 snapshot is about 20.5% slower than 2.95.4 (!!!!) These results are all very repeatable. the celeron 366 results are the most worrying, as this processor already has borderline performance for decoding mpeg2 streams. Is there a known performance regression in current GCCs (say, do they get lower SPECint scores ?) or is it only with my code ? Also, is there anything I could do in my code to enhance performance with newer gcc versions ? One thing I noticed is that 3.1 snapshot produces less inlining than 3.0 or 2.95. This probably accounts for some of the slowdown I see when using mmx optimizations, as my mmx routines are written using a few routines that I really expect to get inlined. Is there any way I can get back control about that, so that gcc honours the inline keyword ? I have not managed to do this either. BTW, these two apps I mentionned can be found at http://libmpeg2.sourceforge.net/ http://liba52.sourceforge.net/ Puzzled, -- Michel "Walken" LESPINASSE Is this the best that god can do ? Then I'm not impressed.