From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28444 invoked by alias); 29 Jul 2011 21:08:11 -0000 Received: (qmail 28435 invoked by uid 22791); 29 Jul 2011 21:08:10 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.67) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 29 Jul 2011 21:07:53 +0000 Received: from hpaq1.eem.corp.google.com (hpaq1.eem.corp.google.com [172.25.149.1]) by smtp-out.google.com with ESMTP id p6TL7qCS014825 for ; Fri, 29 Jul 2011 14:07:52 -0700 Received: from ywe9 (ywe9.prod.google.com [10.192.5.9]) by hpaq1.eem.corp.google.com with ESMTP id p6TL7VBw009578 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Fri, 29 Jul 2011 14:07:51 -0700 Received: by ywe9 with SMTP id 9so73304ywe.20 for ; Fri, 29 Jul 2011 14:07:50 -0700 (PDT) Received: by 10.150.103.1 with SMTP id a1mr67241ybc.244.1311973670586; Fri, 29 Jul 2011 14:07:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.103.1 with SMTP id a1mr67235ybc.244.1311973670444; Fri, 29 Jul 2011 14:07:50 -0700 (PDT) Received: by 10.151.101.7 with HTTP; Fri, 29 Jul 2011 14:07:50 -0700 (PDT) In-Reply-To: <4E330282.5000303@riverbed.com> References: <4E32F44F.7090201@riverbed.com> <4E330282.5000303@riverbed.com> Date: Fri, 29 Jul 2011 21:29:00 -0000 Message-ID: Subject: Re: Performance degradation on g++ 4.6 From: Xinliang David Li To: Oleg Smolsky Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg00511.txt.bz2 On Fri, Jul 29, 2011 at 11:57 AM, Oleg Smolsky wrote: > Hey David, here are a couple of answers and notes: > =A0 =A0- I built the test suite with -O3 and cannot see anything else rel= ated to > inlining that isn't already ON (except for -finline-limit=3Dn which I do = not > how to use) size estimation, inline heuristics are different between two versions, so it won't be surprising they make different decisions. Profiling tools are your best friend here. If you don't have access to any, the least you can do is to build the program with -pg option and use gprof tool to find out differences. David > =A0 =A0http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > =A0 =A0- FTO looks like a very different kettle of fish, I'd prefer to le= ave it > aside to limit the number of data points (at least for the initial > investigation) > =A0 =A0- I've just rerun the suite with -flto and there are no significant > differences in performance > > What else is there? > > Oleg. > > On 2011/7/29 11:07, Xinliang David Li wrote: >> >> My guess is inlining differences. Try more aggressive inline >> parameters to see if helps. Also try FDO to see there is any >> performance difference between two versions. You will probably need to >> do first level triage and file bug reports. >> >> David >> >> >> On Fri, Jul 29, 2011 at 10:56 AM, Oleg Smolsky >> =A0wrote: >>> >>> Hi there, I have compiled and run a set of C++ benchmarks on a CentOS4/= 64 >>> box using the following compilers: >>> =A0 =A0a) g++4.1 that is available for this distro (GCC version 4.1.2 >>> 20071124 >>> (Red Hat 4.1.2-42) >>> =A0 =A0b) g++4.6 that I built (stock version 4.6.1) >>> >>> The machine has two Intel quad core processors in x86_64 mode >>> (/proc/cpuinfo >>> attached) >>> >>> Benchmarks were taken from this page: >>> =A0 =A0http://stlab.adobe.com/performance/ >>> >>> Results: >>> =A0 =A0- some of these tests showed 20..30% performance degradation >>> =A0 =A0 =A0(eg the second section in the simple_types_constant_folding = test: >>> 30s >>> -> =A044s) >>> =A0 =A0- a few were quicker >>> =A0 =A0- full reports are attached >>> >>> I would assume that performance of the generated code is closely >>> monitored >>> by the dev community and obvious blunders should not sneak in... Howeve= r, >>> my >>> findings are reproducible with these synthetic benchmarks as well as >>> production code at work. The latter shows approximately 25% degradation >>> on >>> CPU bound tests. >>> >>> Is there a trick to building the compiler or using a specific >>> -mtune/-march >>> flag for my CPU? I built the compiler with all the default options (it >>> just >>> has a distinct installation path): >>> =A0 =A0../gcc-%{version}/configure --prefix=3D/work/tools/gcc46 >>> --enable-languages=3Dc,c++ --with-system-zlib >>> --with-mpfr=3D/work/tools/mpfr24 >>> --with-gmp=3D/work/tools/gmp --with-mpc=3D/work/tools/mpc >>> >>> LD_LIBRARY_PATH=3D/work/tools/mpfr/lib24:/work/tools/gmp/lib:/work/tool= s/mpc/lib >>> >>> Are there any published benchmarks? I'd appreciate any advice or >>> pointers. >>> >>> Thanks in advance, >>> Oleg. >>> > >