[Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "whaley at cs dot utsa dot edu" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/27827] gcc 4 produces worse x87 code on all platforms than gcc 3
Date: Thu, 01 Jun 2006 18:43:00 -0000	[thread overview]
Message-ID: <20060601184346.28923.qmail@sourceware.org> (raw)
In-Reply-To: <bug-27827-12761@http.gcc.gnu.org/bugzilla/>

------- Comment #12 from whaley at cs dot utsa dot edu  2006-06-01 18:43 -------
Subject: Re:  gcc 4 produces worse x87 code on all platforms than gcc 3

Uros,

>gcc version 3.4.6
>vs.
>gcc version 4.2.0 20060601 (experimental)
>
>-fomit-frame-pointer -O -msse2 -mfpmath=sse
>There is a small performance drop on gcc-4.x, but nothing critical.
>I can confirm, that code indeed runs >50% slower on 64bit athlon. Perhaps the
>problem is in the order of instructions (Software Optimization Guide for AMD
>Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how
>things should be, and gcc-4.2 code looks similar to the example, how things
>should _NOT_ be.

Thanks for looking into this!  However, I am indeed aware that by using SSE2
you
can get the double precision results fairly close to the x87 on most platforms.
In fact, you can get gcc 4.1-sse within a few % of gcc 3-x87 on the Athlon 64
as well, by changing the kernel you feed gcc, and giving it these flags:
   -march=athlon64 -O2 -mfpmath=sse -msse -msse2 -m64 \ 
   -ftree-vectorize -fargument-noalias-global
(this doesn't make it vectorize, but I throw the flag for future hope :)

Now, sometimes you want to use the x87 unit because of its superior precision,
but the real problem with the approach of "ignore the x87 performance and
just use SSE" comes in single precision.  The performance of the best
kernel found by ATLAS in single precision using gcc4.1-sse is roughly half
of that of using the x87 unit on an Athlon-64, and 80% on a P4e (one reason
they are closer on the P4e is that the P4e's x87 peak is 1/2 that of the
Athlon [AMD machines can do 2 flops/cycle using the x87, whereas intel machines
can do only 1]), so there's not as large a gap between excellent and
non-so-excellent kernels).  My guess (and it's only a guess) for the reason
scalar double-precision sse can compete and single cannot comes down to the
cost of doing scalar load and stores.  In double, you can use movlpd instead of
movsd for a low-overhead vector load, but in single you must use movss, and
since movss is much more expensive than fld, scalar SSE always blows in
comparison to x87 . . .

So, that's why my error report concentrated on "x87 performance".  I submitted
in double precision because I had a preexisting Makefile/source demonstrating
the performance problem from a prior bug report (bugzilla 4991).  I think
we should not blow off the x87 performance even if SSE *was* competitive,
because there are times when the x87 is better.  However, in single precision,
scalar SSE is not competitive, at least on the platforms I have tried.  If you
guys are planning on deprecating the x87 unit when SSE is competitive on modern
machines, I can certainly rework the tarfile so I can send you single precision
benchmark, so you can see the sse/x87 performance gap yourself.  Let me know
if you want this, as I'll need to do a bit of extra work.

Thanks,
Clint

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

next prev parent reply	other threads:[~2006-06-01 18:43 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-31  0:33 [Bug rtl-optimization/27827] New: " hiclint at gmail dot com
2006-05-31  0:35 ` [Bug rtl-optimization/27827] " pinskia at gcc dot gnu dot org
2006-05-31  0:36 ` hiclint at gmail dot com
2006-05-31  0:42 ` [Bug target/27827] " pinskia at gcc dot gnu dot org
2006-05-31  0:50 ` hiclint at gmail dot com
2006-05-31  0:55 ` pinskia at gcc dot gnu dot org
2006-05-31  1:09 ` whaley at cs dot utsa dot edu
2006-05-31 10:57 ` uros at kss-loka dot si
2006-05-31 14:13 ` whaley at cs dot utsa dot edu
2006-06-01  8:43 ` uros at kss-loka dot si
2006-06-01 16:03 ` whaley at cs dot utsa dot edu
2006-06-01 16:26 ` whaley at cs dot utsa dot edu
2006-06-01 18:43 ` whaley at cs dot utsa dot edu [this message]
2006-06-07 22:39 ` whaley at cs dot utsa dot edu
2006-06-14  3:04 ` whaley at cs dot utsa dot edu
2006-06-24 18:11 ` whaley at cs dot utsa dot edu
2006-06-24 19:13 ` rguenth at gcc dot gnu dot org
2006-06-25 13:35 ` whaley at cs dot utsa dot edu
2006-06-25 23:05 ` rguenth at gcc dot gnu dot org
2006-06-26  1:12 ` whaley at cs dot utsa dot edu
2006-06-26  7:53 ` uros at kss-loka dot si
2006-06-26 16:02 ` whaley at cs dot utsa dot edu
2006-06-27  6:05 ` uros at kss-loka dot si
2006-06-27 14:37 ` whaley at cs dot utsa dot edu
2006-06-27 17:47 ` whaley at cs dot utsa dot edu
2006-06-28 17:37 ` [Bug target/27827] [4.0/4.1/4.2 Regression] " steven at gcc dot gnu dot org
2006-06-28 20:18 ` whaley at cs dot utsa dot edu
2006-06-29  4:18 ` hjl at lucon dot org
2006-06-29  6:43 ` whaley at cs dot utsa dot edu
2006-07-04 13:15 ` whaley at cs dot utsa dot edu
2006-07-05 17:55 ` mmitchel at gcc dot gnu dot org
2006-08-04  7:46 ` bonzini at gnu dot org
2006-08-04 16:24 ` whaley at cs dot utsa dot edu
2006-08-05  7:21 ` bonzini at gnu dot org
2006-08-05 14:24 ` whaley at cs dot utsa dot edu
2006-08-05 17:16 ` bonzini at gnu dot org
2006-08-05 18:26 ` whaley at cs dot utsa dot edu
2006-08-06 15:03 ` [Bug target/27827] [4.0/4.1 " whaley at cs dot utsa dot edu
2006-08-07  6:19 ` bonzini at gnu dot org
2006-08-07 15:32 ` whaley at cs dot utsa dot edu
2006-08-07 16:47 ` whaley at cs dot utsa dot edu
2006-08-07 16:58 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-07 17:19 ` whaley at cs dot utsa dot edu
2006-08-07 18:19 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-07 20:35 ` dorit at il dot ibm dot com
2006-08-07 21:57 ` whaley at cs dot utsa dot edu
2006-08-08  2:59 ` whaley at cs dot utsa dot edu
2006-08-08  6:15 ` hubicka at gcc dot gnu dot org
2006-08-08  6:28   ` Jan Hubicka
2006-08-08  6:29 ` hubicka at ucw dot cz
2006-08-08  7:05 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-08 16:44 ` whaley at cs dot utsa dot edu
2006-08-08 18:36 ` whaley at cs dot utsa dot edu
2006-08-09  4:34 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-09 14:33 ` whaley at cs dot utsa dot edu
2006-08-09 15:52 ` whaley at cs dot utsa dot edu
2006-08-09 16:08 ` whaley at cs dot utsa dot edu
2006-08-09 19:10   ` Dorit Nuzman
2006-08-09 19:10 ` dorit at il dot ibm dot com
2006-08-09 21:33 ` whaley at cs dot utsa dot edu
2006-08-09 21:46   ` Andrew Pinski
2006-08-09 21:46 ` pinskia at physics dot uc dot edu
2006-08-09 23:02 ` whaley at cs dot utsa dot edu
2006-08-10  6:52 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-10 14:08 ` whaley at cs dot utsa dot edu
2006-08-10 14:29 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-10 15:16 ` whaley at cs dot utsa dot edu
2006-08-10 15:22 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-11  9:19 ` uros at kss-loka dot si
2006-08-11 13:26 ` bonzini at gcc dot gnu dot org
2006-08-11 14:10 ` [Bug target/27827] [4.0 " bonzini at gnu dot org
2006-08-11 15:22 ` whaley at cs dot utsa dot edu
2006-08-23 10:36 ` oliver dot jennrich at googlemail dot com
2006-10-07 10:06 ` steven at gcc dot gnu dot org
2007-02-13  2:59 ` pinskia at gcc dot gnu dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060601184346.28923.qmail@sourceware.org \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).