[Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6
       [not found] <bug-29874-4@http.gcc.gnu.org/bugzilla/>
@ 2011-03-05 18:03 ` rguenth at gcc dot gnu.org
  2011-03-07 23:14 ` stevenj at alum dot mit.edu
  2011-03-08 10:03 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-05 18:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-05 18:02:49 UTC ---
There is no testcase attached and, well, lot of time has passed.  Let's assume
this is fixed.  If not please open a new bugreport with a proper testcase.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6
       [not found] <bug-29874-4@http.gcc.gnu.org/bugzilla/>
  2011-03-05 18:03 ` [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6 rguenth at gcc dot gnu.org
@ 2011-03-07 23:14 ` stevenj at alum dot mit.edu
  2011-03-08 10:03 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: stevenj at alum dot mit.edu @ 2011-03-07 23:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874

--- Comment #2 from stevenj at alum dot mit.edu 2011-03-07 23:13:41 UTC ---
Created attachment 23579
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23579
benchmark extracted from FFTW3 - size 64 FFT with SSE2

I extracted a little benchmark of a size-64 FFT using double-precision SSE2
from FFTW3; this is a hard-coded (program-generated) routine specifically for
transforms of that size, and is usually a good test of the optimizer.

I played around with the compiler flags a bit, but it seems that just "-O3" is
about as good as anything.  i.e. gcc -O3 n1fv_64.c -o n1fv_64

I then ran a few timing tests on my Debian/x86_64 box (2.83GHz Intel Xeon
E5440), with the command:
      (for n in `seq 1 40`; do time ./n1fv_64; done) 2>&1 |grep user |sort
to time it a bunch of times, keeping only the fastest result to try and remove
random variations.  The results seemed pretty repeatable.

Results:
   gcc 3.4.6:    0m0.208s
   gcc 4.1.3:   0m0.216s
   gcc 4.3.2:   0m0.232s

So, there does seem to be a definite slight slowdown.  I haven't tried gcc 4.4
or 4.5, since they are not installed on this box, but seems worthwhile for
someone to try them.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6
       [not found] <bug-29874-4@http.gcc.gnu.org/bugzilla/>
  2011-03-05 18:03 ` [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6 rguenth at gcc dot gnu.org
  2011-03-07 23:14 ` stevenj at alum dot mit.edu
@ 2011-03-08 10:03 ` rguenth at gcc dot gnu.org
  2 siblings, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-03-08 10:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874

--- Comment #3 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-08 10:03:22 UTC ---
I raised the number of FFTs to 10000000 and get

       -O2   -O3   -O3 -ffast-math   -O3 -ffast-math -funroll-loops
3.3-H  7.32  7.47  7.48              7.39
4.1    7.21  7.22  7.18              7.21
4.3    7.21  7.20  7.20              7.34
4.5    7.27  7.27  7.21              7.34
4.6    7.09  7.06  7.01              7.16

I don't have a 64bit 3.4 compiler handy, but 3.3-H is the hammer branch so
should be close to 3.4.

Thus I can't reproduce the slowdown (but I don't have a real 3.4) and 4.6
looks promising here.  The generated code looks quite good, though we still
have some stack spills left (not sure if due to required temporaries).

ICC 12.0 does not manage to come close to the above performance, the
best I found was -fast -xHOST which makes the benchmark take 7.30.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-08 10:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-29874-4@http.gcc.gnu.org/bugzilla/>
2011-03-05 18:03 ` [Bug rtl-optimization/29874] gcc-4.1.1 generates consistently worse performming SSE code than gcc-3.4.6 rguenth at gcc dot gnu.org
2011-03-07 23:14 ` stevenj at alum dot mit.edu
2011-03-08 10:03 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).