From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14325 invoked by alias); 7 Oct 2009 14:39:57 -0000 Received: (qmail 14311 invoked by uid 22791); 7 Oct 2009 14:39:55 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail-vw0-f178.google.com (HELO mail-vw0-f178.google.com) (209.85.212.178) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Oct 2009 14:39:51 +0000 Received: by vws8 with SMTP id 8so3595244vws.14 for ; Wed, 07 Oct 2009 07:39:49 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.101.2 with SMTP id a2mr4482599vco.40.1254926389300; Wed, 07 Oct 2009 07:39:49 -0700 (PDT) In-Reply-To: <4ACCA4F7.6010307@redhat.com> References: <4ACBA42B.8040107@moene.org> <20091006214108.GB9046@atrey.karlin.mff.cuni.cz> <20091006220120.GD9046@atrey.karlin.mff.cuni.cz> <4ACCA4F7.6010307@redhat.com> Date: Wed, 07 Oct 2009 14:39:00 -0000 Message-ID: <84fc9c000910070739i64a7f9ddy77d48906c521075e@mail.gmail.com> Subject: Re: LTO: Speedup -- some preliminary SPEC2000 results From: Richard Guenther To: Vladimir Makarov Cc: Jan Hubicka , Toon Moene , gcc mailing list Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00160.txt.bz2 On Wed, Oct 7, 2009 at 4:25 PM, Vladimir Makarov wrot= e: > Jan Hubicka wrote: >> >> So things seems to work now plus minus as expected. =A0I.e. LTO builds >> seems similar to combined builds and whole-programs improves code size >> quite noticeably. >> Runtime results for gzip are pretty much unchanged, but that is >> expected. =A0I am quite curoius about full SPEC run. >> > Before the fix (Jan's two latest patches), the lto results were > disappointed. =A0In brief the results I checked SPEC2000 a week ago on > lto branch LTO on Core I7 =A0(-O3 vs -O3 -flto with optional > -fwhole-program) were > =A0o Usage of LTO made compiler 1.9 time slower (in cpu time) for > =A0 SPECInt2000 and 2.2 time for SPECFP2000 on x86 and x86_64. > =A0o LTO generated 16-17% bigger code for Int2000 and 13-14% bigger for > FP2000. > =A0o There is 0.6% improvement for SPECFP2000 on x86 and 1% for > =A0 SPECInt2000 on x86_64 (only because of 20% improvement on vortex, > =A0 all other tests were actually worse than without LTO). > =A0o No improvement for Int2000 on x86 and FP2000 on x86_64. > =A0o 252.eon and 176.gcc crash compiler when LTO were used. > > With latest Jan's fixes, The results (for -O3 vs -O3 -flto > -fwhole-program) are > > x86: > =A0o Int2000: > =A0 - LTO crashes the compiler on vortex. =A0LTO generates > =A0 =A0 wrong code for vpr, gcc, perlbmk, and gap. > =A0 - Compiler is 1.85 times slower with LTO > =A0 - Average code size is almost 6% smaller: > > =A0 =A0 =A0 =A04.615% =A0 =A0 =A0 =A0 =A044287 =A0 =A0 =A0 =A0 =A046331 1= 64.gzip > =A0 =A0 =A0 -3.145% =A0 =A0 =A0 =A0 144101 =A0 =A0 =A0 =A0 139569 175.vpr > =A0 =A0 =A0 =A00.261% =A0 =A0 =A0 =A01566926 =A0 =A0 =A0 =A01571009 176.g= cc > =A0 =A0 =A0-12.118% =A0 =A0 =A0 =A0 =A012279 =A0 =A0 =A0 =A0 =A010791 181= .mcf > =A0 =A0 =A0 11.130% =A0 =A0 =A0 =A0 209956 =A0 =A0 =A0 =A0 233324 186.cra= fty > =A0 =A0 =A0-29.735% =A0 =A0 =A0 =A0 155358 =A0 =A0 =A0 =A0 109162 197.par= ser > =A0 =A0 =A0-23.075% =A0 =A0 =A0 =A0 497347 =A0 =A0 =A0 =A0 382585 252.eon > =A0 =A0 =A0 =A08.904% =A0 =A0 =A0 =A0 552163 =A0 =A0 =A0 =A0 601327 253.p= erlbmk > =A0 =A0 =A0 =A01.516% =A0 =A0 =A0 =A0 503006 =A0 =A0 =A0 =A0 510630 254.g= ap > =A0 =A0 =A0-20.891% =A0 =A0 =A0 =A0 =A047465 =A0 =A0 =A0 =A0 =A037549 256= .bzip2 > =A0 =A0 =A0 -3.047% =A0 =A0 =A0 =A0 198365 =A0 =A0 =A0 =A0 192321 300.two= lf > =A0 =A0 =A0 Average =3D -5.96236% > > =A0 =A0- Performance is improved almost by 4% > > =A0 =A0 =A0164.gzip =A0 =A01668 =A0 1629 =A0-2.33813% > =A0 =A0 =A0181.mcf =A0 =A0 5011 =A0 5020 =A0 0.17960% > =A0 =A0 =A0186.crafty =A02268 =A0 2277 =A0 0.39682% > =A0 =A0 =A0197.parser =A01928 =A0 1925 =A0-0.15560% > =A0 =A0 =A0252.eon =A0 =A0 2477 =A0 2950 =A019.0957% > =A0 =A0 =A0256.bzip2 =A0 1894 =A0 1956 =A0 3.2735% > =A0 =A0 =A0300.twolf =A0 2806 =A0 3026 =A0 7.84034% > =A0 =A0 =A0GeoMean =A0 =A0 2416 =A0 2509 =A0 3.84934% > > =A0o FP2000 > =A0 - LTO generates wrong code for mgrid, applu, galgel, facerec, > =A0 =A0 fm3d, sxitrack, and apsi. > =A0 - Compiler is 2.1 times slower with LTO > =A0 - Average code size is almost 1.7% smaller: > > =A0 =A0 =A0-8.771% =A0 =A0 =A0 =A0 =A027544 =A0 =A0 =A0 =A0 =A025128 168.= wupwise > =A0 =A0 =A0 2.328% =A0 =A0 =A0 =A0 =A0 9108 =A0 =A0 =A0 =A0 =A0 9320 171.= swim > =A0 =A0 =A0 2.127% =A0 =A0 =A0 =A0 =A018193 =A0 =A0 =A0 =A0 =A018580 172.= mgrid > =A0 =A0 =A0 0.004% =A0 =A0 =A0 =A0 =A076584 =A0 =A0 =A0 =A0 =A076587 173.= applu > =A0 =A0 =A0-5.938% =A0 =A0 =A0 =A0 576270 =A0 =A0 =A0 =A0 542049 177.mesa > =A0 =A0 =A0-2.046% =A0 =A0 =A0 =A0 183667 =A0 =A0 =A0 =A0 179910 178.galg= el > =A0 =A0 -10.635% =A0 =A0 =A0 =A0 =A015881 =A0 =A0 =A0 =A0 =A014192 179.art > =A0 =A0 -16.292% =A0 =A0 =A0 =A0 =A028812 =A0 =A0 =A0 =A0 =A024118 183.eq= uake > =A0 =A0 =A0-3.177% =A0 =A0 =A0 =A0 =A067239 =A0 =A0 =A0 =A0 =A065103 187.= facerec > =A0 =A0 =A010.989% =A0 =A0 =A0 =A0 125273 =A0 =A0 =A0 =A0 139039 188.ammp > =A0 =A0 =A0-0.735% =A0 =A0 =A0 =A0 =A049137 =A0 =A0 =A0 =A0 =A048776 189.= lucas > =A0 =A0 =A0-0.856% =A0 =A0 =A0 =A01144550 =A0 =A0 =A0 =A01134756 191.fma3d > =A0 =A0 =A011.457% =A0 =A0 =A0 =A0 935941 =A0 =A0 =A0 =A01043168 200.sixt= rack > =A0 =A0 =A0Average =3D -1.65735% > > =A0 =A0- Performance is improved almost by 6% > > =A0 =A0 =A0168.wupwise =A0 =A02349 =A0 =A03266 =A039.0379% > =A0 =A0 =A0171.swim =A0 =A0 =A0 3511 =A0 =A03529 =A0 0.51267% > =A0 =A0 =A0177.mesa =A0 =A0 =A0 1970 =A0 =A02008 =A0 1.92893% > =A0 =A0 =A0179.art =A0 =A0 =A0 =A07097 =A0 =A07293 =A0 2.76173% > =A0 =A0 =A0183.equake =A0 =A0 3844 =A0 =A04138 =A0 7.64828% > =A0 =A0 =A0188.ammp =A0 =A0 =A0 2423 =A0 =A02401 =A0-0.90796% > =A0 =A0 =A0189.lucas =A0 =A0 =A02825 =A0 =A02718 =A0-3.78761% > =A0 =A0 =A0GeoMean =A0 =A0 =A0 =A03144 =A0 =A03332 =A0 5.97964% > =A0 x86_64: > =A0o Int2000: > =A0 - LTO crashes the compiler on gcc. =A0LTO generates > =A0 =A0 wrong code for vpr, perlbmk, gap, and vortex > =A0 - Compiler is 1.8 times slower with LTO > =A0 - Average code size is more than 8% smaller: > > =A0 =A0 =A0 =A01.376% =A0 =A0 =A0 =A0 =A049119 =A0 =A0 =A0 =A0 =A049795 1= 64.gzip > =A0 =A0 =A0 -4.348% =A0 =A0 =A0 =A0 158389 =A0 =A0 =A0 =A0 151503 175.vpr > =A0 =A0 =A0-16.964% =A0 =A0 =A0 =A0 =A014949 =A0 =A0 =A0 =A0 =A012413 181= .mcf > =A0 =A0 =A0 12.875% =A0 =A0 =A0 =A0 195234 =A0 =A0 =A0 =A0 220370 186.cra= fty > =A0 =A0 =A0-29.519% =A0 =A0 =A0 =A0 180780 =A0 =A0 =A0 =A0 127416 197.par= ser > =A0 =A0 =A0-22.894% =A0 =A0 =A0 =A0 521614 =A0 =A0 =A0 =A0 402197 252.eon > =A0 =A0 =A0 =A09.507% =A0 =A0 =A0 =A0 645749 =A0 =A0 =A0 =A0 707141 253.p= erlbmk > =A0 =A0 =A0 =A06.550% =A0 =A0 =A0 =A0 585164 =A0 =A0 =A0 =A0 623492 254.g= ap > =A0 =A0 =A0-22.493% =A0 =A0 =A0 =A0 660414 =A0 =A0 =A0 =A0 511866 255.vor= tex > =A0 =A0 =A0-18.343% =A0 =A0 =A0 =A0 =A055825 =A0 =A0 =A0 =A0 =A045585 256= .bzip2 > =A0 =A0 =A0 -5.295% =A0 =A0 =A0 =A0 212727 =A0 =A0 =A0 =A0 201463 300.two= lf > =A0 =A0 =A0Average =3D -8.14068% > > =A0 =A0- Performance is improved by 2.1% > > =A0 =A0 =A0164.gzip =A0 =A0 1804 =A0 =A01773 =A0-1.7184% > =A0 =A0 =A0181.mcf =A0 =A0 =A03480 =A0 =A03460 =A0-0.5747% > =A0 =A0 =A0186.crafty =A0 3397 =A0 =A03406 =A0 0.2649% > =A0 =A0 =A0197.parser =A0 1847 =A0 =A01803 =A0-2.3822% > =A0 =A0 =A0252.eon =A0 =A0 =A04071 =A0 =A04537 =A011.4468% > =A0 =A0 =A0256.bzip2 =A0 =A02197 =A0 =A02249 =A0 2.3668% > =A0 =A0 =A0300.twolf =A0 =A02878 =A0 =A03048 =A0 5.9068% > =A0 =A0 =A0GeoMean =A0 =A0 =A02688 =A0 =A02744 =A0 2.0833% > > =A0o FP2000 > =A0 - LTO crashes the compiler on apsi. =A0LTO generates wrong code for > =A0 =A0 mgrid, applu, galgel, facerec, fm3d, sixtrack. > =A0 - Compiler is 2.1 times slower with LTO > =A0 - Average code size is 2.7% smaller: > > =A0 =A0 =A027.674% =A0 =A0 =A0 =A0 =A033902 =A0 =A0 =A0 =A0 =A043284 168.= wupwise > =A0 =A0 =A0-3.107% =A0 =A0 =A0 =A0 =A015704 =A0 =A0 =A0 =A0 =A015216 171.= swim > =A0 =A0 =A0-0.685% =A0 =A0 =A0 =A0 =A022929 =A0 =A0 =A0 =A0 =A022772 172.= mgrid > =A0 =A0 =A0-1.167% =A0 =A0 =A0 =A0 103280 =A0 =A0 =A0 =A0 102075 173.applu > =A0 =A0 =A0-8.346% =A0 =A0 =A0 =A0 678724 =A0 =A0 =A0 =A0 622079 177.mesa > =A0 =A0 =A0-4.304% =A0 =A0 =A0 =A0 249773 =A0 =A0 =A0 =A0 239024 178.galg= el > =A0 =A0 -25.801% =A0 =A0 =A0 =A0 =A020375 =A0 =A0 =A0 =A0 =A015118 179.art > =A0 =A0 -28.805% =A0 =A0 =A0 =A0 =A037514 =A0 =A0 =A0 =A0 =A026708 183.eq= uake > =A0 =A0 =A0-1.577% =A0 =A0 =A0 =A0 =A076837 =A0 =A0 =A0 =A0 =A075625 187.= facerec > =A0 =A0 =A0 1.570% =A0 =A0 =A0 =A0 168235 =A0 =A0 =A0 =A0 170877 188.ammp > =A0 =A0 =A0-1.168% =A0 =A0 =A0 =A0 =A057271 =A0 =A0 =A0 =A0 =A056602 189.= lucas > =A0 =A0 =A0-0.940% =A0 =A0 =A0 =A01276316 =A0 =A0 =A0 =A01264314 191.fma3d > =A0 =A0 =A010.949% =A0 =A0 =A0 =A01106507 =A0 =A0 =A0 =A01227658 200.sixt= rack > =A0 =A0 Average =3D -2.74672% > > =A0 =A0- Performance is improved almost by 6% > > =A0 =A0 =A0168.wupwise =A0 =A0 2532 =A0 3708 =A046.4455% > =A0 =A0 =A0171.swim =A0 =A0 =A0 =A03740 =A0 3729 =A0-0.2941% > =A0 =A0 =A0177.mesa =A0 =A0 =A0 =A02969 =A0 2946 =A0-0.7746% > =A0 =A0 =A0179.art =A0 =A0 =A0 =A0 7278 =A0 7092 =A0-2.5556% > =A0 =A0 =A0183.equake =A0 =A0 =A03978 =A0 4227 =A0 6.2594% > =A0 =A0 =A0188.ammp =A0 =A0 =A0 =A02490 =A0 2515 =A0 1.0040% > =A0 =A0 =A0189.lucas =A0 =A0 =A0 3886 =A0 3806 =A0-2.0586% > =A0 =A0 =A0GeoMean =A0 =A0 =A0 =A0 3603 =A0 3812 =A0 5.8007% > > LTO is quite promising. =A0Actually it is in line or even better with > improvement got from other compilers (pathscale is the most convenient > compiler to check lto separately: lto gave there upto 5% improvement > on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50% > slower and generated code size upto 30% bigger). =A0LTO in GCC actually > results in significant code reduction which is quite different from > pathscale. =A0That is one of rare cases on my mind when a specific > optimization works actually better in gcc than in other optimizing > compilers. =A0So congratulation to all people who worked on LTO! > > I think the biggest winner of LTO will be big C++ programs (eon shows > that). =A0Additional optimizations (like devirtualization) could improve > that results even more. =A0I think the next big thing would be using > subtarget-specialized functions. Note that there are daily runs for SPEC2000 and SPEC2006 on x86_64 with -flto (and now -fwhopr) beyond gcc.opensuse.org. SPEC2000 all compile and run successfully for me with -flto with the exception of gcc which is non-conforming C code. SPEC2006 is a different story, a bunch of tests do not have enough memory to compile, another bunch miscompare or crash. Note that today we had additional breakage due to IPA-SRA, after that is fixed results should look a lot better. My performance obvservations before Honzas patch are disappointing as well - just some minor speedups / slowdowns. Richard.