From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-187742-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 747 invoked by alias); 1 Jun 2006 08:43:41 -0000
Received: (qmail 721 invoked by uid 48); 1 Jun 2006 08:43:34 -0000
Date: Thu, 01 Jun 2006 08:43:00 -0000
Message-ID: <20060601084334.720.qmail@sourceware.org>
X-Bugzilla-Reason: CC
References: <bug-27827-12761@http.gcc.gnu.org/bugzilla/>
Subject: [Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
In-Reply-To: <bug-27827-12761@http.gcc.gnu.org/bugzilla/>
Reply-To: gcc-bugzilla@gcc.gnu.org
To: gcc-bugs@gcc.gnu.org
From: "uros at kss-loka dot si" <gcc-bugzilla@gcc.gnu.org>
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2006-06/txt/msg00027.txt.bz2
List-Id: <gcc-bugs.sourceware.org>


------- Comment #9 from uros at kss-loka dot si  2006-06-01 08:43 -------
The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit):

vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 3.20GHz
stepping        : 9
cpu MHz         : 3191.917
cache size      : 512 KB

shows even more interesting results:

gcc version 3.4.6
vs.
gcc version 4.2.0 20060601 (experimental)

-fomit-frame-pointer -O -msse2 -mfpmath=sse

GCC 3.x     performance:
./xmm_gcc
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.162     2664.87

GCC 4.x     performance:
./xmm_gc4
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.164     2633.13

and

-fomit-frame-pointer -O -mfpmath=387

GCC 3.x     performance:
./xmm_gcc
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.160     2697.37

GCC 4.x     performance:
./xmm_gc4
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.164     2633.15

There is a small performance drop on gcc-4.x, but nothing critical.

I can confirm, that code indeed runs >50% slower on 64bit athlon. Perhaps the
problem is in the order of instructions (Software Optimization Guide for AMD
Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how
things should be, and gcc-4.2 code looks similar to the example, how things
should _NOT_ be.

BTW: Did you try to run the benchmark on AMD target with -march=k8? The effects
of this flag are devastating on Pentium4 CPU:

-O -msse2 -mfpmath=sse -march=k8

./xmm_gcc
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.836      516.79

GCC 4.x     performance:
./xmm_gc4
ALGORITHM     NB   REPS        TIME      MFLOPS
=========  =====  =====  ==========  ==========

atlasmm       60   1000       0.287     1504.66


-- 

uros at kss-loka dot si changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2006-06-01 08:43:34
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827