From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12448 invoked by alias); 22 Apr 2003 15:08:55 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 12441 invoked from network); 22 Apr 2003 15:08:55 -0000 Received: from unknown (HELO ns2.tudelft.nl) (130.161.180.65) by sources.redhat.com with SMTP; 22 Apr 2003 15:08:55 -0000 Received: from CONVERSION-DAEMON.mailhost1.tudelft.nl by mailhost1.tudelft.nl (PMDF V6.1-1 #40924) id <0HDR00G013ERXB@mailhost1.tudelft.nl> for gcc@gcc.gnu.org; Tue, 22 Apr 2003 17:08:54 +0200 (MEST) Received: from lr0nt3.lr.tudelft.nl (lr0nt3.lr.tudelft.nl [130.161.166.23]) by mailhost1.tudelft.nl (PMDF V6.1-1 #40924) with ESMTP id <0HDR008I13ERFL@mailhost1.tudelft.nl>; Tue, 22 Apr 2003 17:08:51 +0200 (MEST) Received: by lr0nt3.lr.tudelft.nl with Internet Mail Service (5.5.2656.59) id <23SPVBGR>; Tue, 22 Apr 2003 17:08:25 +0200 Content-return: allowed Date: Tue, 22 Apr 2003 15:38:00 -0000 From: "S. Bosscher" Subject: RE: benchmarking (or almabench) To: 'Jeremy Sanders ' , "'gcc@gcc.gnu.org '" Message-id: <4195D82C2DB1D211B9910008C7C9B06F01F37327@lr0nt3.lr.tudelft.nl> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 X-SW-Source: 2003-04/txt/msg01061.txt.bz2 -march=pentium4 is known to pessimise code compared to -march=i686 for some benchmarks, see PR 8474. Maybe you're seeing the same problem? Greetz Steven -----Original Message----- From: Jeremy Sanders To: gcc@gcc.gnu.org Sent: 22-4-03 16:43 Subject: benchmarking (or almabench) I've been looking at compiling the almabench benchmark again with gcc. See: http://gcc.gnu.org/ml/gcc/2003-01/msg00037.html With a pentium4 processor I'm getting drastically different times for the running the code output from icc and gcc. icc produces code which is up to 2.7 times faster than gcc code for this program. (with gcc mainline) /data/jss/gcc-3.3/bin/g++ -o almabench.o -O2 -mfpmath=sse -msse -msse2 -march=pentium4 -finline-limit=10000 -c almabench.cpp /data/jss/gcc-3.3/bin/g++ -o almabench -O2 -mfpmath=sse -msse -msse2 -march=pentium4 -finline-limit=10000 almabench.o xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 31.121u 0.060s 0:33.31 93.6% 0+0k 0+0io 212pf+0w xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 31.148u 0.052s 0:33.61 92.7% 0+0k 0+0io 212pf+0w (I've also tried without sse and march, and there's little difference. I've also tried fprofile-arcs, which doesn't do anything. inline-limit has no real effect). With icc 7.1. xpc5:/<3>almabench-1.0.1/cpp> make icc -o almabench.o -O2 -c almabench.cpp icc -o almabench -O2 almabench.o xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 16.494u 0.013s 0:17.71 93.1% 0+0k 0+0io 116pf+0w xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 16.445u 0.029s 0:17.53 93.8% 0+0k 0+0io 116pf+0w That's 88% faster than gcc. Enabling P4 optimisation (okay gcc can't do vectorization): xpc5:/<3>almabench-1.0.1/cpp> make icc -o almabench.o -O2 -tpp7 -xW -march=pentium4 -c almabench.cpp almabench.cpp(219) : (col. 5) remark: LOOP WAS VECTORIZED. almabench.cpp(230) : (col. 5) remark: LOOP WAS VECTORIZED. icc -o almabench -O2 -tpp7 -xW -march=pentium4 almabench.o xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 11.318u 0.005s 0:12.09 93.5% 0+0k 0+0io 116pf+0w xpc5:/<3>almabench-1.0.1/cpp> time ./almabench 11.277u 0.007s 0:12.08 93.2% 0+0k 0+0io 116pf+0w That's 2.75 times faster than gcc's code. Obviously this benchmark is synthetic, but it suggests gcc isn't optimising something in this code very well. We've also seen similar effects with other floating-point intensive code. Any suggestions? I can supply assembler output for both if anyone would like a look! Jeremy -- Jeremy Sanders http://www-xray.ast.cam.ac.uk/~jss/ X-Ray Group, Institute of Astronomy, University of Cambridge, UK. Public Key Server PGP Key ID: E1AAE053