From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2755 invoked by alias); 7 Oct 2009 16:51:30 -0000 Received: (qmail 2745 invoked by uid 22791); 7 Oct 2009 16:51:28 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from nikam-dmz.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Oct 2009 16:51:19 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 3E10015373E; Wed, 7 Oct 2009 18:51:18 +0200 (CEST) Date: Wed, 07 Oct 2009 17:21:00 -0000 From: Jan Hubicka To: Richard Guenther Cc: Vladimir Makarov , Jan Hubicka , Toon Moene , gcc mailing list Subject: Re: LTO: Speedup -- some preliminary SPEC2000 results Message-ID: <20091007165118.GA6516@kam.mff.cuni.cz> References: <4ACBA42B.8040107@moene.org> <20091006214108.GB9046@atrey.karlin.mff.cuni.cz> <20091006220120.GD9046@atrey.karlin.mff.cuni.cz> <4ACCA4F7.6010307@redhat.com> <84fc9c000910070739i64a7f9ddy77d48906c521075e@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <84fc9c000910070739i64a7f9ddy77d48906c521075e@mail.gmail.com> User-Agent: Mutt/1.5.9i Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00171.txt.bz2 Hi, thanks for the report! It is actually more promising than I've expected. A while ago I did similar tests with whole-program and --combine and we didn't get very consistent with performance (I saw also code size reductions). I guess geomaverage will go down for specint after vpr/gcc/perlbmk/gap works since pretty much everything comes from EON's intermodule inlining. I've just comitted the patch to fix ipa-sra problem that will hopefully allow clean SPEC runs. The ipa-sra bug chance calling convention of externally visible functions. It should not affect size too much. > > > > With latest Jan's fixes, The results (for -O3 vs -O3 -flto > > -fwhole-program) are > > > > x86: > >  o Int2000: > >   - LTO crashes the compiler on vortex.  LTO generates > >     wrong code for vpr, gcc, perlbmk, and gap. > >   - Compiler is 1.85 times slower with LTO > >   - Average code size is almost 6% smaller: > > > >        4.615%          44287          46331 164.gzip > >       -3.145%         144101         139569 175.vpr > >        0.261%        1566926        1571009 176.gcc > >      -12.118%          12279          10791 181.mcf > >       11.130%         209956         233324 186.crafty > >      -29.735%         155358         109162 197.parser > >      -23.075%         497347         382585 252.eon > >        8.904%         552163         601327 253.perlbmk > >        1.516%         503006         510630 254.gap > >      -20.891%          47465          37549 256.bzip2 > >       -3.047%         198365         192321 300.twolf > >       Average = -5.96236% > > > >    - Performance is improved almost by 4% > > > >      164.gzip    1668   1629  -2.33813% > >      181.mcf     5011   5020   0.17960% > >      186.crafty  2268   2277   0.39682% > >      197.parser  1928   1925  -0.15560% There is simple opurtunity for improvement at parser for whole program optimization. The hashtable size is held in static variable and it is constant prime (after it gets initialized at startup of benchmark). Being able to constant propagate this would noticeably help here. > >      252.eon     2477   2950  19.0957% > >      256.bzip2   1894   1956   3.2735% > >      300.twolf   2806   3026   7.84034% > >      GeoMean     2416   2509   3.84934% > > > > > > LTO is quite promising.  Actually it is in line or even better with > > improvement got from other compilers (pathscale is the most convenient > > compiler to check lto separately: lto gave there upto 5% improvement > > on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50% > > slower and generated code size upto 30% bigger).  LTO in GCC actually I must say that I expect the geomaverage go down after we fix the broken benchmarks, but I would be happy to be wrong. I wonder how pathscale makes to make code size so much bigger with whole program assumptions. Isn't this comparsion of single file compilation compared to pathscale equivalent of -flto alone? (i.e. not -flto -fwhole-program?). The results also imply that on large units we probably still do quite bad. Doing more clonning and less inlining should help here I would guess. Do you happen to have comparsion of -flto to -flto -fwhole-program? Honza