public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/34682] New: 70% slowdown with SSE enabled
@ 2008-01-05 23:06 rootkit85 at yahoo dot it
2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:06 UTC (permalink / raw)
To: gcc-bugs
I have a piece of code that runs 70% slower with SSE enabled than with plain
387 on a Dual CPU Xeon system.
I'm not an optimization fanatic, but since -mfpmath=sse is enabled by default
on amd64 this could cause huge performance losses while making amd64 binaries
on this CPU
The runlog is:
[aguy@enc1 ~]$ uname -a
FreeBSD enc1 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 11:05:30 UTC 2007
root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP
[aguy@enc1 ~]$ gcc42 -v
Using built-in specs.
Target: i386-portbld-freebsd6.2
Configured with: ./..//gcc-4.2-20071024/configure --disable-nls
--with-system-zlib --with-libiconv-prefix=/usr/local --with-gmp=/usr/local
--program-suffix=42 --libdir=/usr/local/lib/gcc-4.2.3
--with-gxx-include-dir=/usr/local/lib/gcc-4.2.3/include/c++/ --disable-rpath
--prefix=/usr/local --mandir=/usr/local/man --infodir=/usr/local/info/gcc42
i386-portbld-freebsd6.2
Thread model: posix
gcc version 4.2.3 20071024 (prerelease)
[aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-387
[aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-sse -mfpmath=sse
[aguy@enc1 ~]$ ssucks-387 ; ssucks-sse
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0147 953.0052
2 -1.4166e-13 0.0061 1149.6845
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0146 960.7945
2 -1.4166e-13 0.0281 249.3171
[aguy@enc1 ~]$
1149.6845 vs 249.3171: a ~78% slowdown by just enabling sse
I have source, assembled files and runlog online here:
http://teknoraver.campuslife.it/software/gcc-sse/
Cheers,
Matteo Croce
--
Summary: 70% slowdown with SSE enabled
Product: gcc
Version: 4.2.3
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rootkit85 at yahoo dot it
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
@ 2008-01-05 23:08 ` rootkit85 at yahoo dot it
2008-01-05 23:11 ` rootkit85 at yahoo dot it
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rootkit85 at yahoo dot it 2008-01-05 21:31 -------
Created an attachment (id=14882)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14882&action=view)
the source
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
@ 2008-01-05 23:11 ` rootkit85 at yahoo dot it
2008-01-05 23:14 ` rootkit85 at yahoo dot it
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rootkit85 at yahoo dot it 2008-01-05 21:31 -------
Created an attachment (id=14883)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14883&action=view)
the source compiled with -mfpmath=387
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
2008-01-05 23:11 ` rootkit85 at yahoo dot it
@ 2008-01-05 23:14 ` rootkit85 at yahoo dot it
2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-05 23:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rootkit85 at yahoo dot it 2008-01-05 21:32 -------
Created an attachment (id=14884)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14884&action=view)
the source compiled with -mfpmath=sse
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (2 preceding siblings ...)
2008-01-05 23:14 ` rootkit85 at yahoo dot it
@ 2008-01-06 12:53 ` rguenth at gcc dot gnu dot org
2008-01-07 13:12 ` ubizjak at gmail dot com
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-01-06 12:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2008-01-06 12:18 -------
Please narrow down the particular loop in your testcase that gets slower. It
looks like the testsuite measures several things.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (3 preceding siblings ...)
2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
@ 2008-01-07 13:12 ` ubizjak at gmail dot com
2008-01-07 14:34 ` ubizjak at gmail dot com
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 13:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from ubizjak at gmail dot com 2008-01-07 12:19 -------
Confirmed by following testcase:
--cut here--
#include <stdio.h>
void __attribute__((noinline))
dtime (void)
{
__asm__ __volatile__ ("" : : : "memory");
}
double sa, sb, sc, sd;
double one, two, four, five;
double piref, piprg, pierr;
int
main (int argc, char *argv[])
{
double s, u, v, w, x;
long i, m;
piref = 3.14159265358979324;
one = 1.0;
two = 2.0;
four = 4.0;
five = 5.0;
m = 512000000;
dtime();
s = -five;
sa = -one;
dtime();
for (i = 1; i <= m; i++)
{
s = -s;
sa = sa + s;
}
dtime();
sc = (double) m;
u = sa;
v = 0.0;
w = 0.0;
x = 0.0;
dtime();
for (i = 1; i <= m; i++)
{
s = -s;
sa = sa + s;
u = u + two;
x = x + (s - u);
v = v - s * u;
w = w + s / u;
}
dtime();
m = (long) (sa * x / sc);
sa = four * w / five;
sb = sa + five / v;
sc = 31.25;
piprg = sb - sc / (v * v * v);
pierr = piprg - piref;
printf ("%13.4le\n", pierr);
return 0;
}
--cut here--
.L5:
xorb $-128, -17(%ebp) #, s
addl $1, %eax #, i.65
addsd %xmm4, %xmm1 # two.16, u
cmpl $512000001, %eax #, i.65
movsd -24(%ebp), %xmm0 # s, tmp90
addsd -24(%ebp), %xmm2 # s, sa_lsm.48
mulsd %xmm1, %xmm0 # u, tmp90
subsd %xmm0, %xmm3 # tmp90, v
movsd -24(%ebp), %xmm0 # s, tmp91
divsd %xmm1, %xmm0 # u, tmp91
addsd -16(%ebp), %xmm0 # w, tmp91
movsd %xmm0, -16(%ebp) # tmp91, w
jne .L5 #,
It is somehow possible to tolerate that "s" and "w" are not pushed into
registers due to non-existent live range splitting (PR 23322), the main problem
here is that the sign of "s"is changed in the memory by using (unaligned) xorb
insn. The same situation is in the first (shorter) loop:
.L4:
xorb $-128, -17(%ebp) #, s
addl $1, %eax #, i
cmpl $512000001, %eax #, i
addsd -24(%ebp), %xmm0 # s, sa_lsm.97
jne .L4 #,
The performance regression is caused by partial memory stall [1].
[1] Agner Fog: How to optimize for the Pentium family of microprocessors,
section 14.7
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2008-01-07 12:19:54
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (4 preceding siblings ...)
2008-01-07 13:12 ` ubizjak at gmail dot com
@ 2008-01-07 14:34 ` ubizjak at gmail dot com
2008-01-07 14:58 ` ubizjak at gmail dot com
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 14:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from ubizjak at gmail dot com 2008-01-07 14:02 -------
Patch in testing.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |ubizjak at gmail dot com
|dot org |
Status|NEW |ASSIGNED
Last reconfirmed|2008-01-07 12:19:54 |2008-01-07 14:02:46
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (5 preceding siblings ...)
2008-01-07 14:34 ` ubizjak at gmail dot com
@ 2008-01-07 14:58 ` ubizjak at gmail dot com
2008-01-07 20:00 ` rootkit85 at yahoo dot it
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 14:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ubizjak at gmail dot com 2008-01-07 14:09 -------
Patched gcc:
387:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -8.1208e-11 0.0128 1094.6170
2 -1.5485e-13 0.0061 1145.7086
SSE:
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0114 1227.3206
2 -1.4166e-13 0.0050 1399.9125
[ 2 -1.4166e-13 0.0269 260.2975 ]
So, 5.36x faster.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (7 preceding siblings ...)
2008-01-07 20:00 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:04 ` rootkit85 at yahoo dot it
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from rootkit85 at yahoo dot it 2008-01-07 19:47 -------
Created an attachment (id=14895)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14895&action=view)
minimal testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (6 preceding siblings ...)
2008-01-07 14:58 ` ubizjak at gmail dot com
@ 2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:00 ` rootkit85 at yahoo dot it
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rootkit85 at yahoo dot it 2008-01-07 19:47 -------
Created an attachment (id=14896)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14896&action=view)
minimal testcase, compiled with -mfpmath=387
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (8 preceding siblings ...)
2008-01-07 20:00 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:04 ` rootkit85 at yahoo dot it
2008-01-07 20:05 ` rootkit85 at yahoo dot it
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from rootkit85 at yahoo dot it 2008-01-07 19:47 -------
Created an attachment (id=14897)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14897&action=view)
minimal testcase, compiled with -mfpmath=sse
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (9 preceding siblings ...)
2008-01-07 20:04 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:05 ` rootkit85 at yahoo dot it
2008-01-07 20:48 ` uros at gcc dot gnu dot org
2008-01-07 21:03 ` ubizjak at gmail dot com
12 siblings, 0 replies; 14+ messages in thread
From: rootkit85 at yahoo dot it @ 2008-01-07 20:05 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from rootkit85 at yahoo dot it 2008-01-07 19:49 -------
very very minimal testcase added
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (10 preceding siblings ...)
2008-01-07 20:05 ` rootkit85 at yahoo dot it
@ 2008-01-07 20:48 ` uros at gcc dot gnu dot org
2008-01-07 21:03 ` ubizjak at gmail dot com
12 siblings, 0 replies; 14+ messages in thread
From: uros at gcc dot gnu dot org @ 2008-01-07 20:48 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from uros at gcc dot gnu dot org 2008-01-07 20:07 -------
Subject: Bug 34682
Author: uros
Date: Mon Jan 7 20:06:34 2008
New Revision: 131381
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131381
Log:
PR target/34682
* config/i386/i386.md (neg<mode>2): Rename from negsf2, negdf2 and
negxf2. Macroize expander using X87MODEF mode iterator. Change
predicates of op0 and op1 to register_operand.
(abs<mode>2): Rename from abssf2, absdf2 and negxf2. Macroize expander
using X87MODEF mode iterator. Change predicates of op0 and op1 to
register_operand.
("*absneg<mode>2_mixed", "*absneg<mode>2_sse"): Rename from
corresponding patterns and macroize using MODEF macro. Change
predicates of op0 and op1 to register_operand and remove
"m" constraint. Disparage "r" alternative with "!".
("*absneg<mode>2_i387"): Rename from corresponding patterns and
macroize using X87MODEF macro. Change predicates of op0 and op1
to register_operand and remove "m" constraint. Disparage "r"
alternative with "!".
(absneg splitter with memory operands): Remove.
("*neg<mode>2_1", "*abs<mode>2_1"): Rename from corresponding
patterns and macroize using X87MODEF mode iterator.
* config/i386/sse.md (negv4sf2, absv4sf2, neg2vdf2, absv2df2):
Change predicate of op1 to register_operand.
* config/i386/i386.c (ix86_expand_fp_absneg_operator): Remove support
for memory operands.
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/sse.md
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
` (11 preceding siblings ...)
2008-01-07 20:48 ` uros at gcc dot gnu dot org
@ 2008-01-07 21:03 ` ubizjak at gmail dot com
12 siblings, 0 replies; 14+ messages in thread
From: ubizjak at gmail dot com @ 2008-01-07 21:03 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from ubizjak at gmail dot com 2008-01-07 20:10 -------
Fixed in SVN.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
URL|http://teknoraver.campuslife|http://gcc.gnu.org/ml/gcc-
|.it/software/gcc-sse/ |patches/2008-
| |01/msg00254.html
Status|ASSIGNED |RESOLVED
Keywords| |ssemmx
Resolution| |FIXED
Target Milestone|--- |4.3.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-01-07 20:10 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it
2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it
2008-01-05 23:11 ` rootkit85 at yahoo dot it
2008-01-05 23:14 ` rootkit85 at yahoo dot it
2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org
2008-01-07 13:12 ` ubizjak at gmail dot com
2008-01-07 14:34 ` ubizjak at gmail dot com
2008-01-07 14:58 ` ubizjak at gmail dot com
2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:00 ` rootkit85 at yahoo dot it
2008-01-07 20:04 ` rootkit85 at yahoo dot it
2008-01-07 20:05 ` rootkit85 at yahoo dot it
2008-01-07 20:48 ` uros at gcc dot gnu dot org
2008-01-07 21:03 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).