From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-122163-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30965 invoked by alias); 22 Nov 2005 20:04:49 -0000
Received: (qmail 30956 invoked by uid 22791); 22 Nov 2005 20:04:48 -0000
X-Spam-Check-By: sourceware.org
Received: from atrey.karlin.mff.cuni.cz (HELO atrey.karlin.mff.cuni.cz) (195.113.31.123)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Nov 2005 20:04:45 +0000
Received: by atrey.karlin.mff.cuni.cz (Postfix, from userid 4018) 	id B24C34B41DA; Tue, 22 Nov 2005 21:05:39 +0100 (CET)
Date: Tue, 22 Nov 2005 20:04:00 -0000
From: Jan Hubicka <hubicka@ucw.cz>
To: Benjamin Kosnik <bkoz@redhat.com>
Cc: Daniel Berlin <dberlin@dberlin.org>, rth@redhat.com, 	gdr@integrable-solutions.net, gcc@gcc.gnu.org
Subject: Some GCC 4.1 benchmarks (Re: Thoughts on LLVM and LTO)
Message-ID: <20051122200539.GB4648@atrey.karlin.mff.cuni.cz>
References: <200511221817.jAMIH2Co014676@porkchop.devel.redhat.com> <m3psos4k19.fsf@uniton.integrable-solutions.net> <1132685232.3076.132.camel@linux.site> <20051122184957.GA18372@redhat.com> <1132685583.3076.135.camel@linux.site> <20051122130700.090c4564.bkoz@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20051122130700.090c4564.bkoz@redhat.com>
User-Agent: Mutt/1.5.9i
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2005-11/txt/msg01086.txt.bz2

> 
> > Which is why i said "It's fine to say compile time performance of the
> > middle end portions ew may replace should be same or better".
> > 
> > And if you were to look right now, it's actually significantly better in
> > some cases :(
> 
> Can you prove this assertion?
> 
> Here is some data:
> http://people.redhat.com/dnovillo/spec2000.i686/gcc/global-build-secs_elapsed.html
> 
> And some more
> http://llvm.cs.uiuc.edu/testresults/X86/2005-11-01.html
> 
> I'm  not sure about accuracy, or versions of LLVM used, etc.
> 
> Although promising on some things (as Diego said), LLVM exectue and
> compile performance is a mixed bag.
> 
> It would probably be interesting to run SPEC or something else with icc
> IPO enabled, LLVM IPO enabled, and whatever gcc IMA support is
> available, to do a true comparison of where things stand. More data
> would be interesting.

I might try to produce bit more useful charts, but I've done some
testing of GCC 4.1 on SPEC and some of C++ testcases recently mostly
looking for regressions in GCC 4.1 release.  I didn't tested LLVM, but
did some ICC comparsion and testing both with and without our current
IMA so it gives rough idea.

I should note that comparison to ICC is not quite fair since it lacks
Opteron tunning I tested on, but I would say that we are in same
performance camp on SPECint with IMA (IMA contribute 3.3% to the result)
despite the fact that GCC IMA and IPA is very primitive.  This can be
just proof that SPECint is not best testcase for testing future IPA
implementations.  I also did some C++ results that are a lot more wild.
It would be really interesting to see how much benefits one can see on
compiling full blown application and how large stuff one can hope to
compile with LTO (ie GCC/kernel/mozilla/OOo/... ;).

I am not quite sure how much of SPECfp loss can be contributed to IMA,
since I would expect it to more come from Fotran tunning.  Only
regressing C benchmark is ART that ineed needs whole program
optimization to allow datastructure layout changes.  Obviously we did
some notable progress on fortran perofrmance in between 4.0 and 4.1 and
none of that is IPA related.

I am also adding some scores of C++ testcases - tramp3d that has single
file and Gerald's application I didn't actually managed to merge into
single file, but I combined the files that appear hot in coverage.

Concerning compile time at -O2 hammer branch needs 185s, 4.0 192s, 4.1
205s With IPA and no FDO 4.0 needs 193s when patches by Andrew's faster
typemerging patch, 4.1 needs 218s.  I didn't recorded ICC compilation
times, but it clearly show that we are making compile time problems
worse with 4.1 again overall.  It also shows that IPA is cheap right,
but just because it is so primitive.  It is also cheap only as long as
you fit in memory (You need over 512MB of memory to build SPEC with IMA
on GCC that is far from acceptable)

Also note that eon and fortran files are not compiled with IMA in GCC
tests.

-O2, no IMA on both compilers:
	GCC-3.3-hammer	GCC 4.0	GCC 4.1	ICC-9.0
gzip	1162		1181	1199	1151
vpr    	859		853	824	854
gcc    	1057		1035	1028	963
mcf    	540		540	541	543
crafty 	2100		2041	2025	2106
parser 	776		790	783	778
eon    	1793		1874	1952	(failed, substituted as 783 for geomavg)
perlbmk	1407		1453	1438	1503
gap    	1095		1152	1156	1071
vortex 	1689		1663	1666	1618
bzip2  	1009		1011	1000	997
twolf  	843		858	852	823
geomavg	1114.8		1124.95	1122.76	1102

	GCC-3.3-hammer	GCC 4.0	GCC 4.1	ICC-9.0
wupwise	1218		1079	1304	1278
swim	1038		1065	1070	1064
mgrid	784		728	906	909
applu	772		822	840	884
mesa	1536		1609	1536	1486
galgel	    		803	830	
art	730		739	735	747
equake	1102		1085	1069	1055
facerec	    		905	914	1393
ammp	967		993	1008	985
lucas	    		1106	1113	1264
fma3d	    		976	978	1154
sixtrac	582		591	618	647
apsi	810		922	1004	948
			933	971	1016

-O2 -static --combine -fwhole-program  -fipa-cp
versus ICC -xW -O3 -ipo -vec_report3
profile feedback is used on both compilers.
	GCC-3.3-hammer	GCC 4.0	GCC-4.1	ICC-9.0
gzip	1269		1299	1264	1337
vpr    	890		864	885	869
gcc    	1112		1095	1175	1023
mcf    	539		536	538	546
crafty 	2055		2034	2236	2301
parser 	960		975	993	851
eon    	2081		1928	2192	2150
perlbmk	1621		1574	1697	1652
gap    	1117		1181	1223	1224
vortex 	1683		2038	2173	2421
bzip2  	1058		1022	1085	1087
twolf  	842		877	877	849
	1183.41		1195.84	1251.55	1232.97


	GCC-3.3-hammer	GCC 4.0	GCC 4.1	ICC-9.0
wupwise			1305	1401	1678
swim			1065	1293	1360
mgrid			758	884	973
applu			857	918	1060
mesa	1756		1751	1756	1759
galgel			818	848	1790
art	724		734	735	1414
equake	1088		1101	1108	1308
facerec			974	1110	1467
ammp	1008		1034	1063	967
lucas			1111	1104	1261
fma3d			976	1215	1238
sixtrac			643	702	653
apsi			940	988	958
			973.82	1049.12	1234.02


Tramp3d, iterations per seccond with and without FDO.
GCC 3.3-hammer	0.36
GCC 4.0		0.45
GCC 4.1		0.56
GCC 4.1 flatten	0.62
GCC 4.1 profile	0.07
GCC 4.1 FDO    	0.81
GCC 4.1 profile	0.08
4.1 FDO flatten	0.89
ICC 9.0		0.14


DLV, speedup in percents relative to GCC 3.3 hammer-branch
		GCC 4.0	GCC 4.1	GCC-4.1 profile	ICC 9.0
STRATCOMP1-ALL	284	287.1	242.86		18.52
STRATCOMP-770.2-6.25	0	13.33		-10.53
2QBF1		-5.47	-5.87	6.83		-15.23
PRIMEIMPL2	3.09	5.26	12.36		-23.95
3COL-SIMPLEX1	-1.78	-7.78	2.47		9.21
3COL-RANDOM1	-3.88	-0.84	0.21		-20.84
HP-RANDOM1	-26.72	-13.83	-12.45		-9.94
HAMCYCLE-FREE	-1.89	-3.7	0		-17.46
DECOMP2		-6.84	-12.2	-12.35		-11.27
BW-P5-nopush	-6.29	-4.07	-2.75		-5.98
BW-P5-pushbin	-5.28	-1.95	-0.4		-13.75
BW-P5-nopushbin	-6.49	-2.7	0		-8.86
HANOI-Towers	-6.79	-2.58	0		-21.35
RAMSEY		5.41	-3.7	9.86		-5.65
CRISTAL		-17.21	-20.12	-13.53		-8.91
21-QUEENS	-1.71	-2.55	4.24		-34.48
MSTDir[V=13]	2.06	0.2	6		-31.72
MSTDir[V=15]	1.84	1.01	6.87		-32.15
MSTUndir[V=13]	-4.08	-4.08	2.92		-29.5
TIMETABLING	2.65	0.74	7.97		-31.91
AVG		2.71	2.6	7.74		-16.31