public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/34682]  New: 70% slowdown with SSE enabled
@ 2008-01-05 23:06 rootkit85 at yahoo dot it
  2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:06 UTC (permalink / raw)
  To: gcc-bugs

I have a piece of code that runs 70% slower with SSE enabled than with plain
387 on a Dual CPU Xeon system.
I'm not an optimization fanatic, but since -mfpmath=sse is enabled by default
on amd64 this could cause huge performance losses while making amd64 binaries
on this CPU

The runlog is:

[aguy@enc1 ~]$ uname -a
FreeBSD enc1 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 11:05:30 UTC 2007  
  root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP  
[aguy@enc1 ~]$ gcc42 -v
Using built-in specs.
Target: i386-portbld-freebsd6.2
Configured with: ./..//gcc-4.2-20071024/configure --disable-nls
--with-system-zlib --with-libiconv-prefix=/usr/local --with-gmp=/usr/local
--program-suffix=42 --libdir=/usr/local/lib/gcc-4.2.3
--with-gxx-include-dir=/usr/local/lib/gcc-4.2.3/include/c++/ --disable-rpath
--prefix=/usr/local --mandir=/usr/local/man --infodir=/usr/local/info/gcc42
i386-portbld-freebsd6.2
Thread model: posix
gcc version 4.2.3 20071024 (prerelease)
[aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-387
[aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-sse -mfpmath=sse
[aguy@enc1 ~]$ ssucks-387 ; ssucks-sse

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0147    953.0052
     2     -1.4166e-13      0.0061   1149.6845

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0146    960.7945
     2     -1.4166e-13      0.0281    249.3171
[aguy@enc1 ~]$

1149.6845 vs 249.3171: a ~78% slowdown by just enabling sse

I have source, assembled files and runlog online here:
http://teknoraver.campuslife.it/software/gcc-sse/

Cheers,
Matteo Croce


-- 
           Summary: 70% slowdown with SSE enabled
           Product: gcc
           Version: 4.2.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: rootkit85 at yahoo dot it


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
@ 2008-01-05 23:08 ` rootkit85 at yahoo dot it
  2008-01-05 23:11 ` rootkit85 at yahoo dot it
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:08 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rootkit85 at yahoo dot it  2008-01-05 21:31 -------
Created an attachment (id=14882)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14882&action=view)
the source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
  2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
@ 2008-01-05 23:11 ` rootkit85 at yahoo dot it
  2008-01-05 23:14 ` rootkit85 at yahoo dot it
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rootkit85 at yahoo dot it  2008-01-05 21:31 -------
Created an attachment (id=14883)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14883&action=view)
the source compiled with -mfpmath=387


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug c/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
  2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
  2008-01-05 23:11 ` rootkit85 at yahoo dot it
@ 2008-01-05 23:14 ` rootkit85 at yahoo dot it
  2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:14 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rootkit85 at yahoo dot it  2008-01-05 21:32 -------
Created an attachment (id=14884)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14884&action=view)
the source compiled with -mfpmath=sse


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (2 preceding siblings ...)
  2008-01-05 23:14 ` rootkit85 at yahoo dot it
@ 2008-01-06 12:53 ` rguenth at gcc dot gnu dot org
  2008-01-07 13:12 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-06 12:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2008-01-06 12:18 -------
Please narrow down the particular loop in your testcase that gets slower.  It
looks like the testsuite measures several things.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (3 preceding siblings ...)
  2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
@ 2008-01-07 13:12 ` ubizjak at gmail dot com
  2008-01-07 14:34 ` ubizjak at gmail dot com
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 13:12 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2008-01-07 12:19 -------
Confirmed by following testcase:

--cut here--
#include <stdio.h>

void __attribute__((noinline))
dtime (void) 
{
  __asm__ __volatile__ ("" : : : "memory");
}

double sa, sb, sc, sd;
double one, two, four, five;
double piref, piprg, pierr;

int
main (int argc, char *argv[])
{
  double s, u, v, w, x;

  long i, m;

  piref = 3.14159265358979324;
  one = 1.0;
  two = 2.0;
  four = 4.0;
  five = 5.0;

  m = 512000000;

  dtime();

  s = -five;
  sa = -one;

  dtime();

  for (i = 1; i <= m; i++)
    {
      s = -s;
      sa = sa + s;
    }

  dtime();

  sc = (double) m;

  u = sa;
  v = 0.0;
  w = 0.0;
  x = 0.0;

  dtime();

  for (i = 1; i <= m; i++)
    {
      s = -s;
      sa = sa + s;
      u = u + two;
      x = x + (s - u);
      v = v - s * u;
      w = w + s / u;
    }

  dtime();

  m = (long) (sa * x / sc);
  sa = four * w / five;
  sb = sa + five / v;
  sc = 31.25;
  piprg = sb - sc / (v * v * v);
  pierr = piprg - piref;

  printf ("%13.4le\n", pierr);
  return 0;
}
--cut here--

.L5:
        xorb    $-128, -17(%ebp)        #, s
        addl    $1, %eax        #, i.65
        addsd   %xmm4, %xmm1    # two.16, u
        cmpl    $512000001, %eax        #, i.65
        movsd   -24(%ebp), %xmm0        # s, tmp90
        addsd   -24(%ebp), %xmm2        # s, sa_lsm.48
        mulsd   %xmm1, %xmm0    # u, tmp90
        subsd   %xmm0, %xmm3    # tmp90, v
        movsd   -24(%ebp), %xmm0        # s, tmp91
        divsd   %xmm1, %xmm0    # u, tmp91
        addsd   -16(%ebp), %xmm0        # w, tmp91
        movsd   %xmm0, -16(%ebp)        # tmp91, w
        jne     .L5     #,


It is somehow possible to tolerate that "s" and "w" are not pushed into
registers due to non-existent live range splitting (PR 23322), the main problem
here is that the sign of "s"is changed in the memory by using (unaligned) xorb
insn. The same situation is in the first (shorter) loop:

.L4:
        xorb    $-128, -17(%ebp)        #, s
        addl    $1, %eax        #, i
        cmpl    $512000001, %eax        #, i
        addsd   -24(%ebp), %xmm0        # s, sa_lsm.97
        jne     .L4     #,


The performance regression is caused by partial memory stall [1].

[1] Agner Fog: How to optimize for the Pentium family of microprocessors,
section 14.7


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-01-07 12:19:54
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (4 preceding siblings ...)
  2008-01-07 13:12 ` ubizjak at gmail dot com
@ 2008-01-07 14:34 ` ubizjak at gmail dot com
  2008-01-07 14:58 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 14:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from ubizjak at gmail dot com  2008-01-07 14:02 -------
Patch in testing.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
                   |dot org                     |
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2008-01-07 12:19:54         |2008-01-07 14:02:46
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (5 preceding siblings ...)
  2008-01-07 14:34 ` ubizjak at gmail dot com
@ 2008-01-07 14:58 ` ubizjak at gmail dot com
  2008-01-07 20:00 ` rootkit85 at yahoo dot it
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 14:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from ubizjak at gmail dot com  2008-01-07 14:09 -------
Patched gcc:

387:


   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1     -8.1208e-11      0.0128   1094.6170
     2     -1.5485e-13      0.0061   1145.7086

SSE:

   FLOPS C Program (Double Precision), V2.0 18 Dec 1992

   Module     Error        RunTime      MFLOPS
                            (usec)
     1      4.0146e-13      0.0114   1227.3206
     2     -1.4166e-13      0.0050   1399.9125

   [ 2     -1.4166e-13      0.0269    260.2975 ]

So, 5.36x faster.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (7 preceding siblings ...)
  2008-01-07 20:00 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:00 ` rootkit85 at yahoo dot it
  2008-01-07 20:04 ` rootkit85 at yahoo dot it
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from rootkit85 at yahoo dot it  2008-01-07 19:47 -------
Created an attachment (id=14895)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14895&action=view)
minimal testcase


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (6 preceding siblings ...)
  2008-01-07 14:58 ` ubizjak at gmail dot com
@ 2008-01-07 20:00 ` rootkit85 at yahoo dot it
  2008-01-07 20:00 ` rootkit85 at yahoo dot it
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from rootkit85 at yahoo dot it  2008-01-07 19:47 -------
Created an attachment (id=14896)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14896&action=view)
minimal testcase, compiled with -mfpmath=387


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (8 preceding siblings ...)
  2008-01-07 20:00 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:04 ` rootkit85 at yahoo dot it
  2008-01-07 20:05 ` rootkit85 at yahoo dot it
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rootkit85 at yahoo dot it  2008-01-07 19:47 -------
Created an attachment (id=14897)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14897&action=view)
minimal testcase, compiled with -mfpmath=sse


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (9 preceding siblings ...)
  2008-01-07 20:04 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:05 ` rootkit85 at yahoo dot it
  2008-01-07 20:48 ` uros at gcc dot gnu dot org
  2008-01-07 21:03 ` ubizjak at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from rootkit85 at yahoo dot it  2008-01-07 19:49 -------
very very minimal testcase added


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (10 preceding siblings ...)
  2008-01-07 20:05 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:48 ` uros at gcc dot gnu dot org
  2008-01-07 21:03 ` ubizjak at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: uros at gcc dot gnu dot org @ 2008-01-07 20:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from uros at gcc dot gnu dot org  2008-01-07 20:07 -------
Subject: Bug 34682

Author: uros
Date: Mon Jan  7 20:06:34 2008
New Revision: 131381

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131381
Log:
        PR target/34682
        * config/i386/i386.md (neg<mode>2): Rename from negsf2, negdf2 and
        negxf2.  Macroize expander using X87MODEF mode iterator.  Change
        predicates of op0 and op1 to register_operand.
        (abs<mode>2): Rename from abssf2, absdf2 and negxf2.  Macroize expander
        using X87MODEF mode iterator.  Change predicates of op0 and op1 to
        register_operand.
        ("*absneg<mode>2_mixed", "*absneg<mode>2_sse"): Rename from
        corresponding patterns and macroize using MODEF macro.  Change
        predicates of op0 and op1 to register_operand and remove
        "m" constraint. Disparage "r" alternative with "!".
        ("*absneg<mode>2_i387"): Rename from corresponding patterns and
        macroize using X87MODEF macro.  Change predicates of op0 and op1
        to register_operand and remove "m" constraint.  Disparage "r"
        alternative with "!".
        (absneg splitter with memory operands): Remove.
        ("*neg<mode>2_1", "*abs<mode>2_1"): Rename from corresponding
        patterns and macroize using X87MODEF mode iterator.
        * config/i386/sse.md (negv4sf2, absv4sf2, neg2vdf2, absv2df2):
        Change predicate of op1 to register_operand.
        * config/i386/i386.c (ix86_expand_fp_absneg_operator): Remove support
        for memory operands.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/sse.md


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/34682] 70% slowdown with SSE enabled
  2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
                   ` (11 preceding siblings ...)
  2008-01-07 20:48 ` uros at gcc dot gnu dot org
@ 2008-01-07 21:03 ` ubizjak at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 21:03 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from ubizjak at gmail dot com  2008-01-07 20:10 -------
Fixed in SVN.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|http://teknoraver.campuslife|http://gcc.gnu.org/ml/gcc-
                   |.it/software/gcc-sse/       |patches/2008-
                   |                            |01/msg00254.html
             Status|ASSIGNED                    |RESOLVED
           Keywords|                            |ssemmx
         Resolution|                            |FIXED
   Target Milestone|---                         |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-01-07 20:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
2008-01-05 23:11 ` rootkit85 at yahoo dot it
2008-01-05 23:14 ` rootkit85 at yahoo dot it
2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
2008-01-07 13:12 ` ubizjak at gmail dot com
2008-01-07 14:34 ` ubizjak at gmail dot com
2008-01-07 14:58 ` ubizjak at gmail dot com
2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:04 ` rootkit85 at yahoo dot it
2008-01-07 20:05 ` rootkit85 at yahoo dot it
2008-01-07 20:48 ` uros at gcc dot gnu dot org
2008-01-07 21:03 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).