From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30965 invoked by alias); 22 Nov 2005 20:04:49 -0000 Received: (qmail 30956 invoked by uid 22791); 22 Nov 2005 20:04:48 -0000 X-Spam-Check-By: sourceware.org Received: from atrey.karlin.mff.cuni.cz (HELO atrey.karlin.mff.cuni.cz) (195.113.31.123) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Nov 2005 20:04:45 +0000 Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018) id B24C34B41DA; Tue, 22 Nov 2005 21:05:39 +0100 (CET) Date: Tue, 22 Nov 2005 20:04:00 -0000 From: Jan Hubicka To: Benjamin Kosnik Cc: Daniel Berlin , rth@redhat.com, gdr@integrable-solutions.net, gcc@gcc.gnu.org Subject: Some GCC 4.1 benchmarks (Re: Thoughts on LLVM and LTO) Message-ID: <20051122200539.GB4648@atrey.karlin.mff.cuni.cz> References: <200511221817.jAMIH2Co014676@porkchop.devel.redhat.com> <1132685232.3076.132.camel@linux.site> <20051122184957.GA18372@redhat.com> <1132685583.3076.135.camel@linux.site> <20051122130700.090c4564.bkoz@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20051122130700.090c4564.bkoz@redhat.com> User-Agent: Mutt/1.5.9i X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2005-11/txt/msg01086.txt.bz2 > > > Which is why i said "It's fine to say compile time performance of the > > middle end portions ew may replace should be same or better". > > > > And if you were to look right now, it's actually significantly better in > > some cases :( > > Can you prove this assertion? > > Here is some data: > http://people.redhat.com/dnovillo/spec2000.i686/gcc/global-build-secs_elapsed.html > > And some more > http://llvm.cs.uiuc.edu/testresults/X86/2005-11-01.html > > I'm not sure about accuracy, or versions of LLVM used, etc. > > Although promising on some things (as Diego said), LLVM exectue and > compile performance is a mixed bag. > > It would probably be interesting to run SPEC or something else with icc > IPO enabled, LLVM IPO enabled, and whatever gcc IMA support is > available, to do a true comparison of where things stand. More data > would be interesting. I might try to produce bit more useful charts, but I've done some testing of GCC 4.1 on SPEC and some of C++ testcases recently mostly looking for regressions in GCC 4.1 release. I didn't tested LLVM, but did some ICC comparsion and testing both with and without our current IMA so it gives rough idea. I should note that comparison to ICC is not quite fair since it lacks Opteron tunning I tested on, but I would say that we are in same performance camp on SPECint with IMA (IMA contribute 3.3% to the result) despite the fact that GCC IMA and IPA is very primitive. This can be just proof that SPECint is not best testcase for testing future IPA implementations. I also did some C++ results that are a lot more wild. It would be really interesting to see how much benefits one can see on compiling full blown application and how large stuff one can hope to compile with LTO (ie GCC/kernel/mozilla/OOo/... ;). I am not quite sure how much of SPECfp loss can be contributed to IMA, since I would expect it to more come from Fotran tunning. Only regressing C benchmark is ART that ineed needs whole program optimization to allow datastructure layout changes. Obviously we did some notable progress on fortran perofrmance in between 4.0 and 4.1 and none of that is IPA related. I am also adding some scores of C++ testcases - tramp3d that has single file and Gerald's application I didn't actually managed to merge into single file, but I combined the files that appear hot in coverage. Concerning compile time at -O2 hammer branch needs 185s, 4.0 192s, 4.1 205s With IPA and no FDO 4.0 needs 193s when patches by Andrew's faster typemerging patch, 4.1 needs 218s. I didn't recorded ICC compilation times, but it clearly show that we are making compile time problems worse with 4.1 again overall. It also shows that IPA is cheap right, but just because it is so primitive. It is also cheap only as long as you fit in memory (You need over 512MB of memory to build SPEC with IMA on GCC that is far from acceptable) Also note that eon and fortran files are not compiled with IMA in GCC tests. -O2, no IMA on both compilers: GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0 gzip 1162 1181 1199 1151 vpr 859 853 824 854 gcc 1057 1035 1028 963 mcf 540 540 541 543 crafty 2100 2041 2025 2106 parser 776 790 783 778 eon 1793 1874 1952 (failed, substituted as 783 for geomavg) perlbmk 1407 1453 1438 1503 gap 1095 1152 1156 1071 vortex 1689 1663 1666 1618 bzip2 1009 1011 1000 997 twolf 843 858 852 823 geomavg 1114.8 1124.95 1122.76 1102 GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0 wupwise 1218 1079 1304 1278 swim 1038 1065 1070 1064 mgrid 784 728 906 909 applu 772 822 840 884 mesa 1536 1609 1536 1486 galgel 803 830 art 730 739 735 747 equake 1102 1085 1069 1055 facerec 905 914 1393 ammp 967 993 1008 985 lucas 1106 1113 1264 fma3d 976 978 1154 sixtrac 582 591 618 647 apsi 810 922 1004 948 933 971 1016 -O2 -static --combine -fwhole-program -fipa-cp versus ICC -xW -O3 -ipo -vec_report3 profile feedback is used on both compilers. GCC-3.3-hammer GCC 4.0 GCC-4.1 ICC-9.0 gzip 1269 1299 1264 1337 vpr 890 864 885 869 gcc 1112 1095 1175 1023 mcf 539 536 538 546 crafty 2055 2034 2236 2301 parser 960 975 993 851 eon 2081 1928 2192 2150 perlbmk 1621 1574 1697 1652 gap 1117 1181 1223 1224 vortex 1683 2038 2173 2421 bzip2 1058 1022 1085 1087 twolf 842 877 877 849 1183.41 1195.84 1251.55 1232.97 GCC-3.3-hammer GCC 4.0 GCC 4.1 ICC-9.0 wupwise 1305 1401 1678 swim 1065 1293 1360 mgrid 758 884 973 applu 857 918 1060 mesa 1756 1751 1756 1759 galgel 818 848 1790 art 724 734 735 1414 equake 1088 1101 1108 1308 facerec 974 1110 1467 ammp 1008 1034 1063 967 lucas 1111 1104 1261 fma3d 976 1215 1238 sixtrac 643 702 653 apsi 940 988 958 973.82 1049.12 1234.02 Tramp3d, iterations per seccond with and without FDO. GCC 3.3-hammer 0.36 GCC 4.0 0.45 GCC 4.1 0.56 GCC 4.1 flatten 0.62 GCC 4.1 profile 0.07 GCC 4.1 FDO 0.81 GCC 4.1 profile 0.08 4.1 FDO flatten 0.89 ICC 9.0 0.14 DLV, speedup in percents relative to GCC 3.3 hammer-branch GCC 4.0 GCC 4.1 GCC-4.1 profile ICC 9.0 STRATCOMP1-ALL 284 287.1 242.86 18.52 STRATCOMP-770.2-6.25 0 13.33 -10.53 2QBF1 -5.47 -5.87 6.83 -15.23 PRIMEIMPL2 3.09 5.26 12.36 -23.95 3COL-SIMPLEX1 -1.78 -7.78 2.47 9.21 3COL-RANDOM1 -3.88 -0.84 0.21 -20.84 HP-RANDOM1 -26.72 -13.83 -12.45 -9.94 HAMCYCLE-FREE -1.89 -3.7 0 -17.46 DECOMP2 -6.84 -12.2 -12.35 -11.27 BW-P5-nopush -6.29 -4.07 -2.75 -5.98 BW-P5-pushbin -5.28 -1.95 -0.4 -13.75 BW-P5-nopushbin -6.49 -2.7 0 -8.86 HANOI-Towers -6.79 -2.58 0 -21.35 RAMSEY 5.41 -3.7 9.86 -5.65 CRISTAL -17.21 -20.12 -13.53 -8.91 21-QUEENS -1.71 -2.55 4.24 -34.48 MSTDir[V=13] 2.06 0.2 6 -31.72 MSTDir[V=15] 1.84 1.01 6.87 -32.15 MSTUndir[V=13] -4.08 -4.08 2.92 -29.5 TIMETABLING 2.65 0.74 7.97 -31.91 AVG 2.71 2.6 7.74 -16.31