From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 747 invoked by alias); 1 Jun 2006 08:43:41 -0000 Received: (qmail 721 invoked by uid 48); 1 Jun 2006 08:43:34 -0000 Date: Thu, 01 Jun 2006 08:43:00 -0000 Message-ID: <20060601084334.720.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3 In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "uros at kss-loka dot si" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2006-06/txt/msg00027.txt.bz2 List-Id: ------- Comment #9 from uros at kss-loka dot si 2006-06-01 08:43 ------- The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit): vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping : 9 cpu MHz : 3191.917 cache size : 512 KB shows even more interesting results: gcc version 3.4.6 vs. gcc version 4.2.0 20060601 (experimental) -fomit-frame-pointer -O -msse2 -mfpmath=sse GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.162 2664.87 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.164 2633.13 and -fomit-frame-pointer -O -mfpmath=387 GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.160 2697.37 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.164 2633.15 There is a small performance drop on gcc-4.x, but nothing critical. I can confirm, that code indeed runs >50% slower on 64bit athlon. Perhaps the problem is in the order of instructions (Software Optimization Guide for AMD Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how things should be, and gcc-4.2 code looks similar to the example, how things should _NOT_ be. BTW: Did you try to run the benchmark on AMD target with -march=k8? The effects of this flag are devastating on Pentium4 CPU: -O -msse2 -mfpmath=sse -march=k8 ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.836 516.79 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.287 1504.66 -- uros at kss-loka dot si changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2006-06-01 08:43:34 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827