public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "whaley at cs dot utsa dot edu" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
Date: Sun, 06 Aug 2006 15:03:00 -0000	[thread overview]
Message-ID: <20060806150322.30274.qmail@sourceware.org> (raw)
In-Reply-To: <bug-27827-12761@http.gcc.gnu.org/bugzilla/>



------- Comment #36 from whaley at cs dot utsa dot edu  2006-08-06 15:03 -------
Paola,

Thanks for working on this.  We are making progres, but I have some mixed
results.  I timed the assemblies you provided directly.  I added a target
"asgexe" that builds the same benchmark, assuming assembly source instead of C
to make this more reproducable.  I ran on the Athlon-64X2, where your new
assembly ran *faster* than gcc 3 for double precision.  However, you still lost
for single precision.  I believe the reason is that you still have more
fmuls/fmull (fmul from memory) than does gcc 3:

>animal>fgrep -i fmuls smm_4.s | wc
>    240     480    4051
>animal>fgrep -i fmuls smm_asg.s | wc
>     60     120    1020
>animal>fgrep -i fmuls smm_3.s  | wc
>      0       0       0
>animal>fgrep -i fmull dmm_4.s | wc
>    100     200    1739
>animal>fgrep -i fmull dmm_asg.s | wc
>     20      40     360
>animal>fgrep -i fmuls dmm_3.s | wc
>      0       0       0


I haven't really scoped out the dmm diff, but in single prec anyway, these
dreaded fmuls are in the inner loop, and this is probably why you are still
losing.  I'm guessing your peephole is missing some cases, and for some reason
is missing more under single.  Any ideas?

As for your assembly actually beating gcc 3 for double, my guess is that it is
some other optimization that gcc 4 has, and you will beat by even more once the
final fmull are removed . . .

On the P4e, your double precision code is faster than stock gcc 4, but still
slower than gcc3.  again, I suspect the remaining fmull.  Then comes the thing
I cannot explain at all.  Your single precision results are horrible.  gcc 3
gets 1991MFLOPS, gcc 4 gets 1664, and the assembly you sent gets 34!  No chance
the mixed fld/fmuls is causing stack overflow, I guess?  I think this might
account for such a catastrophic drop  . . .  That's about the only WAG I've got
for this behavior.

Anyway, I think the first order of business may be to get your peephole to
grabbing all the cases, and see if that makes you win everywhere on Athlon, and
if it makes single precision P4e better, and we can go from there . . .

If you do that, attach the assemblies  again, and I'll redo timings.  Also, if
you could attach (not put in comment) the patch, it'd be nice to get the
compiler, so I could test x86-64 code on Athlon, etc.

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827


  parent reply	other threads:[~2006-08-06 15:03 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-31  0:33 [Bug rtl-optimization/27827] New: " hiclint at gmail dot com
2006-05-31  0:35 ` [Bug rtl-optimization/27827] " pinskia at gcc dot gnu dot org
2006-05-31  0:36 ` hiclint at gmail dot com
2006-05-31  0:42 ` [Bug target/27827] " pinskia at gcc dot gnu dot org
2006-05-31  0:50 ` hiclint at gmail dot com
2006-05-31  0:55 ` pinskia at gcc dot gnu dot org
2006-05-31  1:09 ` whaley at cs dot utsa dot edu
2006-05-31 10:57 ` uros at kss-loka dot si
2006-05-31 14:13 ` whaley at cs dot utsa dot edu
2006-06-01  8:43 ` uros at kss-loka dot si
2006-06-01 16:03 ` whaley at cs dot utsa dot edu
2006-06-01 16:26 ` whaley at cs dot utsa dot edu
2006-06-01 18:43 ` whaley at cs dot utsa dot edu
2006-06-07 22:39 ` whaley at cs dot utsa dot edu
2006-06-14  3:04 ` whaley at cs dot utsa dot edu
2006-06-24 18:11 ` whaley at cs dot utsa dot edu
2006-06-24 19:13 ` rguenth at gcc dot gnu dot org
2006-06-25 13:35 ` whaley at cs dot utsa dot edu
2006-06-25 23:05 ` rguenth at gcc dot gnu dot org
2006-06-26  1:12 ` whaley at cs dot utsa dot edu
2006-06-26  7:53 ` uros at kss-loka dot si
2006-06-26 16:02 ` whaley at cs dot utsa dot edu
2006-06-27  6:05 ` uros at kss-loka dot si
2006-06-27 14:37 ` whaley at cs dot utsa dot edu
2006-06-27 17:47 ` whaley at cs dot utsa dot edu
2006-06-28 17:37 ` [Bug target/27827] [4.0/4.1/4.2 Regression] " steven at gcc dot gnu dot org
2006-06-28 20:18 ` whaley at cs dot utsa dot edu
2006-06-29  4:18 ` hjl at lucon dot org
2006-06-29  6:43 ` whaley at cs dot utsa dot edu
2006-07-04 13:15 ` whaley at cs dot utsa dot edu
2006-07-05 17:55 ` mmitchel at gcc dot gnu dot org
2006-08-04  7:46 ` bonzini at gnu dot org
2006-08-04 16:24 ` whaley at cs dot utsa dot edu
2006-08-05  7:21 ` bonzini at gnu dot org
2006-08-05 14:24 ` whaley at cs dot utsa dot edu
2006-08-05 17:16 ` bonzini at gnu dot org
2006-08-05 18:26 ` whaley at cs dot utsa dot edu
2006-08-06 15:03 ` whaley at cs dot utsa dot edu [this message]
2006-08-07  6:19 ` [Bug target/27827] [4.0/4.1 " bonzini at gnu dot org
2006-08-07 15:32 ` whaley at cs dot utsa dot edu
2006-08-07 16:47 ` whaley at cs dot utsa dot edu
2006-08-07 16:58 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-07 17:19 ` whaley at cs dot utsa dot edu
2006-08-07 18:19 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-07 20:35 ` dorit at il dot ibm dot com
2006-08-07 21:57 ` whaley at cs dot utsa dot edu
2006-08-08  2:59 ` whaley at cs dot utsa dot edu
2006-08-08  6:15 ` hubicka at gcc dot gnu dot org
2006-08-08  6:28   ` Jan Hubicka
2006-08-08  6:29 ` hubicka at ucw dot cz
2006-08-08  7:05 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-08 16:44 ` whaley at cs dot utsa dot edu
2006-08-08 18:36 ` whaley at cs dot utsa dot edu
2006-08-09  4:34 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-09 14:33 ` whaley at cs dot utsa dot edu
2006-08-09 15:52 ` whaley at cs dot utsa dot edu
2006-08-09 16:08 ` whaley at cs dot utsa dot edu
2006-08-09 19:10   ` Dorit Nuzman
2006-08-09 19:10 ` dorit at il dot ibm dot com
2006-08-09 21:33 ` whaley at cs dot utsa dot edu
2006-08-09 21:46   ` Andrew Pinski
2006-08-09 21:46 ` pinskia at physics dot uc dot edu
2006-08-09 23:02 ` whaley at cs dot utsa dot edu
2006-08-10  6:52 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-10 14:08 ` whaley at cs dot utsa dot edu
2006-08-10 14:29 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-10 15:16 ` whaley at cs dot utsa dot edu
2006-08-10 15:22 ` paolo dot bonzini at lu dot unisi dot ch
2006-08-11  9:19 ` uros at kss-loka dot si
2006-08-11 13:26 ` bonzini at gcc dot gnu dot org
2006-08-11 14:10 ` [Bug target/27827] [4.0 " bonzini at gnu dot org
2006-08-11 15:22 ` whaley at cs dot utsa dot edu
2006-08-23 10:36 ` oliver dot jennrich at googlemail dot com
2006-10-07 10:06 ` steven at gcc dot gnu dot org
2007-02-13  2:59 ` pinskia at gcc dot gnu dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060806150322.30274.qmail@sourceware.org \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).