From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-157114-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14325 invoked by alias); 7 Oct 2009 14:39:57 -0000
Received: (qmail 14311 invoked by uid 22791); 7 Oct 2009 14:39:55 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 	tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail-vw0-f178.google.com (HELO mail-vw0-f178.google.com) (209.85.212.178)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 07 Oct 2009 14:39:51 +0000
Received: by vws8 with SMTP id 8so3595244vws.14         for <gcc@gcc.gnu.org>; Wed, 07 Oct 2009 07:39:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.220.101.2 with SMTP id a2mr4482599vco.40.1254926389300; Wed,  	07 Oct 2009 07:39:49 -0700 (PDT)
In-Reply-To: <4ACCA4F7.6010307@redhat.com>
References: <4ACBA42B.8040107@moene.org> 	 <20091006214108.GB9046@atrey.karlin.mff.cuni.cz> 	 <20091006220120.GD9046@atrey.karlin.mff.cuni.cz> 	 <4ACCA4F7.6010307@redhat.com>
Date: Wed, 07 Oct 2009 14:39:00 -0000
Message-ID: <84fc9c000910070739i64a7f9ddy77d48906c521075e@mail.gmail.com>
Subject: Re: LTO: Speedup -- some preliminary SPEC2000 results
From: Richard Guenther <richard.guenther@gmail.com>
To: Vladimir Makarov <vmakarov@redhat.com>
Cc: Jan Hubicka <hubicka@ucw.cz>, Toon Moene <toon@moene.org>, gcc mailing list <gcc@gcc.gnu.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg00160.txt.bz2

On Wed, Oct 7, 2009 at 4:25 PM, Vladimir Makarov <vmakarov@redhat.com> wrot=
e:
> Jan Hubicka wrote:
>>
>> So things seems to work now plus minus as expected. =A0I.e. LTO builds
>> seems similar to combined builds and whole-programs improves code size
>> quite noticeably.
>> Runtime results for gzip are pretty much unchanged, but that is
>> expected. =A0I am quite curoius about full SPEC run.
>>
> Before the fix (Jan's two latest patches), the lto results were
> disappointed. =A0In brief the results I checked SPEC2000 a week ago on
> lto branch LTO on Core I7 =A0(-O3 vs -O3 -flto with optional
> -fwhole-program) were
> =A0o Usage of LTO made compiler 1.9 time slower (in cpu time) for
> =A0 SPECInt2000 and 2.2 time for SPECFP2000 on x86 and x86_64.
> =A0o LTO generated 16-17% bigger code for Int2000 and 13-14% bigger for
> FP2000.
> =A0o There is 0.6% improvement for SPECFP2000 on x86 and 1% for
> =A0 SPECInt2000 on x86_64 (only because of 20% improvement on vortex,
> =A0 all other tests were actually worse than without LTO).
> =A0o No improvement for Int2000 on x86 and FP2000 on x86_64.
> =A0o 252.eon and 176.gcc crash compiler when LTO were used.
>
> With latest Jan's fixes, The results (for -O3 vs -O3 -flto
> -fwhole-program) are
>
> x86:
> =A0o Int2000:
> =A0 - LTO crashes the compiler on vortex. =A0LTO generates
> =A0 =A0 wrong code for vpr, gcc, perlbmk, and gap.
> =A0 - Compiler is 1.85 times slower with LTO
> =A0 - Average code size is almost 6% smaller:
>
> =A0 =A0 =A0 =A04.615% =A0 =A0 =A0 =A0 =A044287 =A0 =A0 =A0 =A0 =A046331 1=
64.gzip
> =A0 =A0 =A0 -3.145% =A0 =A0 =A0 =A0 144101 =A0 =A0 =A0 =A0 139569 175.vpr
> =A0 =A0 =A0 =A00.261% =A0 =A0 =A0 =A01566926 =A0 =A0 =A0 =A01571009 176.g=
cc
> =A0 =A0 =A0-12.118% =A0 =A0 =A0 =A0 =A012279 =A0 =A0 =A0 =A0 =A010791 181=
.mcf
> =A0 =A0 =A0 11.130% =A0 =A0 =A0 =A0 209956 =A0 =A0 =A0 =A0 233324 186.cra=
fty
> =A0 =A0 =A0-29.735% =A0 =A0 =A0 =A0 155358 =A0 =A0 =A0 =A0 109162 197.par=
ser
> =A0 =A0 =A0-23.075% =A0 =A0 =A0 =A0 497347 =A0 =A0 =A0 =A0 382585 252.eon
> =A0 =A0 =A0 =A08.904% =A0 =A0 =A0 =A0 552163 =A0 =A0 =A0 =A0 601327 253.p=
erlbmk
> =A0 =A0 =A0 =A01.516% =A0 =A0 =A0 =A0 503006 =A0 =A0 =A0 =A0 510630 254.g=
ap
> =A0 =A0 =A0-20.891% =A0 =A0 =A0 =A0 =A047465 =A0 =A0 =A0 =A0 =A037549 256=
.bzip2
> =A0 =A0 =A0 -3.047% =A0 =A0 =A0 =A0 198365 =A0 =A0 =A0 =A0 192321 300.two=
lf
> =A0 =A0 =A0 Average =3D -5.96236%
>
> =A0 =A0- Performance is improved almost by 4%
>
> =A0 =A0 =A0164.gzip =A0 =A01668 =A0 1629 =A0-2.33813%
> =A0 =A0 =A0181.mcf =A0 =A0 5011 =A0 5020 =A0 0.17960%
> =A0 =A0 =A0186.crafty =A02268 =A0 2277 =A0 0.39682%
> =A0 =A0 =A0197.parser =A01928 =A0 1925 =A0-0.15560%
> =A0 =A0 =A0252.eon =A0 =A0 2477 =A0 2950 =A019.0957%
> =A0 =A0 =A0256.bzip2 =A0 1894 =A0 1956 =A0 3.2735%
> =A0 =A0 =A0300.twolf =A0 2806 =A0 3026 =A0 7.84034%
> =A0 =A0 =A0GeoMean =A0 =A0 2416 =A0 2509 =A0 3.84934%
>
> =A0o FP2000
> =A0 - LTO generates wrong code for mgrid, applu, galgel, facerec,
> =A0 =A0 fm3d, sxitrack, and apsi.
> =A0 - Compiler is 2.1 times slower with LTO
> =A0 - Average code size is almost 1.7% smaller:
>
> =A0 =A0 =A0-8.771% =A0 =A0 =A0 =A0 =A027544 =A0 =A0 =A0 =A0 =A025128 168.=
wupwise
> =A0 =A0 =A0 2.328% =A0 =A0 =A0 =A0 =A0 9108 =A0 =A0 =A0 =A0 =A0 9320 171.=
swim
> =A0 =A0 =A0 2.127% =A0 =A0 =A0 =A0 =A018193 =A0 =A0 =A0 =A0 =A018580 172.=
mgrid
> =A0 =A0 =A0 0.004% =A0 =A0 =A0 =A0 =A076584 =A0 =A0 =A0 =A0 =A076587 173.=
applu
> =A0 =A0 =A0-5.938% =A0 =A0 =A0 =A0 576270 =A0 =A0 =A0 =A0 542049 177.mesa
> =A0 =A0 =A0-2.046% =A0 =A0 =A0 =A0 183667 =A0 =A0 =A0 =A0 179910 178.galg=
el
> =A0 =A0 -10.635% =A0 =A0 =A0 =A0 =A015881 =A0 =A0 =A0 =A0 =A014192 179.art
> =A0 =A0 -16.292% =A0 =A0 =A0 =A0 =A028812 =A0 =A0 =A0 =A0 =A024118 183.eq=
uake
> =A0 =A0 =A0-3.177% =A0 =A0 =A0 =A0 =A067239 =A0 =A0 =A0 =A0 =A065103 187.=
facerec
> =A0 =A0 =A010.989% =A0 =A0 =A0 =A0 125273 =A0 =A0 =A0 =A0 139039 188.ammp
> =A0 =A0 =A0-0.735% =A0 =A0 =A0 =A0 =A049137 =A0 =A0 =A0 =A0 =A048776 189.=
lucas
> =A0 =A0 =A0-0.856% =A0 =A0 =A0 =A01144550 =A0 =A0 =A0 =A01134756 191.fma3d
> =A0 =A0 =A011.457% =A0 =A0 =A0 =A0 935941 =A0 =A0 =A0 =A01043168 200.sixt=
rack
> =A0 =A0 =A0Average =3D -1.65735%
>
> =A0 =A0- Performance is improved almost by 6%
>
> =A0 =A0 =A0168.wupwise =A0 =A02349 =A0 =A03266 =A039.0379%
> =A0 =A0 =A0171.swim =A0 =A0 =A0 3511 =A0 =A03529 =A0 0.51267%
> =A0 =A0 =A0177.mesa =A0 =A0 =A0 1970 =A0 =A02008 =A0 1.92893%
> =A0 =A0 =A0179.art =A0 =A0 =A0 =A07097 =A0 =A07293 =A0 2.76173%
> =A0 =A0 =A0183.equake =A0 =A0 3844 =A0 =A04138 =A0 7.64828%
> =A0 =A0 =A0188.ammp =A0 =A0 =A0 2423 =A0 =A02401 =A0-0.90796%
> =A0 =A0 =A0189.lucas =A0 =A0 =A02825 =A0 =A02718 =A0-3.78761%
> =A0 =A0 =A0GeoMean =A0 =A0 =A0 =A03144 =A0 =A03332 =A0 5.97964%
> =A0 x86_64:
> =A0o Int2000:
> =A0 - LTO crashes the compiler on gcc. =A0LTO generates
> =A0 =A0 wrong code for vpr, perlbmk, gap, and vortex
> =A0 - Compiler is 1.8 times slower with LTO
> =A0 - Average code size is more than 8% smaller:
>
> =A0 =A0 =A0 =A01.376% =A0 =A0 =A0 =A0 =A049119 =A0 =A0 =A0 =A0 =A049795 1=
64.gzip
> =A0 =A0 =A0 -4.348% =A0 =A0 =A0 =A0 158389 =A0 =A0 =A0 =A0 151503 175.vpr
> =A0 =A0 =A0-16.964% =A0 =A0 =A0 =A0 =A014949 =A0 =A0 =A0 =A0 =A012413 181=
.mcf
> =A0 =A0 =A0 12.875% =A0 =A0 =A0 =A0 195234 =A0 =A0 =A0 =A0 220370 186.cra=
fty
> =A0 =A0 =A0-29.519% =A0 =A0 =A0 =A0 180780 =A0 =A0 =A0 =A0 127416 197.par=
ser
> =A0 =A0 =A0-22.894% =A0 =A0 =A0 =A0 521614 =A0 =A0 =A0 =A0 402197 252.eon
> =A0 =A0 =A0 =A09.507% =A0 =A0 =A0 =A0 645749 =A0 =A0 =A0 =A0 707141 253.p=
erlbmk
> =A0 =A0 =A0 =A06.550% =A0 =A0 =A0 =A0 585164 =A0 =A0 =A0 =A0 623492 254.g=
ap
> =A0 =A0 =A0-22.493% =A0 =A0 =A0 =A0 660414 =A0 =A0 =A0 =A0 511866 255.vor=
tex
> =A0 =A0 =A0-18.343% =A0 =A0 =A0 =A0 =A055825 =A0 =A0 =A0 =A0 =A045585 256=
.bzip2
> =A0 =A0 =A0 -5.295% =A0 =A0 =A0 =A0 212727 =A0 =A0 =A0 =A0 201463 300.two=
lf
> =A0 =A0 =A0Average =3D -8.14068%
>
> =A0 =A0- Performance is improved by 2.1%
>
> =A0 =A0 =A0164.gzip =A0 =A0 1804 =A0 =A01773 =A0-1.7184%
> =A0 =A0 =A0181.mcf =A0 =A0 =A03480 =A0 =A03460 =A0-0.5747%
> =A0 =A0 =A0186.crafty =A0 3397 =A0 =A03406 =A0 0.2649%
> =A0 =A0 =A0197.parser =A0 1847 =A0 =A01803 =A0-2.3822%
> =A0 =A0 =A0252.eon =A0 =A0 =A04071 =A0 =A04537 =A011.4468%
> =A0 =A0 =A0256.bzip2 =A0 =A02197 =A0 =A02249 =A0 2.3668%
> =A0 =A0 =A0300.twolf =A0 =A02878 =A0 =A03048 =A0 5.9068%
> =A0 =A0 =A0GeoMean =A0 =A0 =A02688 =A0 =A02744 =A0 2.0833%
>
> =A0o FP2000
> =A0 - LTO crashes the compiler on apsi. =A0LTO generates wrong code for
> =A0 =A0 mgrid, applu, galgel, facerec, fm3d, sixtrack.
> =A0 - Compiler is 2.1 times slower with LTO
> =A0 - Average code size is 2.7% smaller:
>
> =A0 =A0 =A027.674% =A0 =A0 =A0 =A0 =A033902 =A0 =A0 =A0 =A0 =A043284 168.=
wupwise
> =A0 =A0 =A0-3.107% =A0 =A0 =A0 =A0 =A015704 =A0 =A0 =A0 =A0 =A015216 171.=
swim
> =A0 =A0 =A0-0.685% =A0 =A0 =A0 =A0 =A022929 =A0 =A0 =A0 =A0 =A022772 172.=
mgrid
> =A0 =A0 =A0-1.167% =A0 =A0 =A0 =A0 103280 =A0 =A0 =A0 =A0 102075 173.applu
> =A0 =A0 =A0-8.346% =A0 =A0 =A0 =A0 678724 =A0 =A0 =A0 =A0 622079 177.mesa
> =A0 =A0 =A0-4.304% =A0 =A0 =A0 =A0 249773 =A0 =A0 =A0 =A0 239024 178.galg=
el
> =A0 =A0 -25.801% =A0 =A0 =A0 =A0 =A020375 =A0 =A0 =A0 =A0 =A015118 179.art
> =A0 =A0 -28.805% =A0 =A0 =A0 =A0 =A037514 =A0 =A0 =A0 =A0 =A026708 183.eq=
uake
> =A0 =A0 =A0-1.577% =A0 =A0 =A0 =A0 =A076837 =A0 =A0 =A0 =A0 =A075625 187.=
facerec
> =A0 =A0 =A0 1.570% =A0 =A0 =A0 =A0 168235 =A0 =A0 =A0 =A0 170877 188.ammp
> =A0 =A0 =A0-1.168% =A0 =A0 =A0 =A0 =A057271 =A0 =A0 =A0 =A0 =A056602 189.=
lucas
> =A0 =A0 =A0-0.940% =A0 =A0 =A0 =A01276316 =A0 =A0 =A0 =A01264314 191.fma3d
> =A0 =A0 =A010.949% =A0 =A0 =A0 =A01106507 =A0 =A0 =A0 =A01227658 200.sixt=
rack
> =A0 =A0 Average =3D -2.74672%
>
> =A0 =A0- Performance is improved almost by 6%
>
> =A0 =A0 =A0168.wupwise =A0 =A0 2532 =A0 3708 =A046.4455%
> =A0 =A0 =A0171.swim =A0 =A0 =A0 =A03740 =A0 3729 =A0-0.2941%
> =A0 =A0 =A0177.mesa =A0 =A0 =A0 =A02969 =A0 2946 =A0-0.7746%
> =A0 =A0 =A0179.art =A0 =A0 =A0 =A0 7278 =A0 7092 =A0-2.5556%
> =A0 =A0 =A0183.equake =A0 =A0 =A03978 =A0 4227 =A0 6.2594%
> =A0 =A0 =A0188.ammp =A0 =A0 =A0 =A02490 =A0 2515 =A0 1.0040%
> =A0 =A0 =A0189.lucas =A0 =A0 =A0 3886 =A0 3806 =A0-2.0586%
> =A0 =A0 =A0GeoMean =A0 =A0 =A0 =A0 3603 =A0 3812 =A0 5.8007%
>
> LTO is quite promising. =A0Actually it is in line or even better with
> improvement got from other compilers (pathscale is the most convenient
> compiler to check lto separately: lto gave there upto 5% improvement
> on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50%
> slower and generated code size upto 30% bigger). =A0LTO in GCC actually
> results in significant code reduction which is quite different from
> pathscale. =A0That is one of rare cases on my mind when a specific
> optimization works actually better in gcc than in other optimizing
> compilers. =A0So congratulation to all people who worked on LTO!
>
> I think the biggest winner of LTO will be big C++ programs (eon shows
> that). =A0Additional optimizations (like devirtualization) could improve
> that results even more. =A0I think the next big thing would be using
> subtarget-specialized functions.

Note that there are daily runs for SPEC2000 and SPEC2006 on
x86_64 with -flto (and now -fwhopr) beyond gcc.opensuse.org.

SPEC2000 all compile and run successfully for me with -flto
with the exception of gcc which is non-conforming C code.

SPEC2006 is a different story, a bunch of tests do not have
enough memory to compile, another bunch miscompare or
crash.

Note that today we had additional breakage due to IPA-SRA,
after that is fixed results should look a lot better.

My performance obvservations before Honzas patch are
disappointing as well - just some minor speedups / slowdowns.

Richard.