public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/34682] New: 70% slowdown with SSE enabled @ 2008-01-05 23:06 rootkit85 at yahoo dot it 2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it ` (12 more replies) 0 siblings, 13 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-05 23:06 UTC (permalink / raw) To: gcc-bugs I have a piece of code that runs 70% slower with SSE enabled than with plain 387 on a Dual CPU Xeon system. I'm not an optimization fanatic, but since -mfpmath=sse is enabled by default on amd64 this could cause huge performance losses while making amd64 binaries on this CPU The runlog is: [aguy@enc1 ~]$ uname -a FreeBSD enc1 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan 12 11:05:30 UTC 2007 root@dessler.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP [aguy@enc1 ~]$ gcc42 -v Using built-in specs. Target: i386-portbld-freebsd6.2 Configured with: ./..//gcc-4.2-20071024/configure --disable-nls --with-system-zlib --with-libiconv-prefix=/usr/local --with-gmp=/usr/local --program-suffix=42 --libdir=/usr/local/lib/gcc-4.2.3 --with-gxx-include-dir=/usr/local/lib/gcc-4.2.3/include/c++/ --disable-rpath --prefix=/usr/local --mandir=/usr/local/man --infodir=/usr/local/info/gcc42 i386-portbld-freebsd6.2 Thread model: posix gcc version 4.2.3 20071024 (prerelease) [aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-387 [aguy@enc1 ~]$ gcc42 ssucks.c -O2 -march=prescott -o ssucks-sse -mfpmath=sse [aguy@enc1 ~]$ ssucks-387 ; ssucks-sse FLOPS C Program (Double Precision), V2.0 18 Dec 1992 Module Error RunTime MFLOPS (usec) 1 4.0146e-13 0.0147 953.0052 2 -1.4166e-13 0.0061 1149.6845 FLOPS C Program (Double Precision), V2.0 18 Dec 1992 Module Error RunTime MFLOPS (usec) 1 4.0146e-13 0.0146 960.7945 2 -1.4166e-13 0.0281 249.3171 [aguy@enc1 ~]$ 1149.6845 vs 249.3171: a ~78% slowdown by just enabling sse I have source, assembled files and runlog online here: http://teknoraver.campuslife.it/software/gcc-sse/ Cheers, Matteo Croce -- Summary: 70% slowdown with SSE enabled Product: gcc Version: 4.2.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rootkit85 at yahoo dot it http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it @ 2008-01-05 23:08 ` rootkit85 at yahoo dot it 2008-01-05 23:11 ` rootkit85 at yahoo dot it ` (11 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-05 23:08 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from rootkit85 at yahoo dot it 2008-01-05 21:31 ------- Created an attachment (id=14882) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14882&action=view) the source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it 2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it @ 2008-01-05 23:11 ` rootkit85 at yahoo dot it 2008-01-05 23:14 ` rootkit85 at yahoo dot it ` (10 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-05 23:11 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from rootkit85 at yahoo dot it 2008-01-05 21:31 ------- Created an attachment (id=14883) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14883&action=view) the source compiled with -mfpmath=387 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug c/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it 2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it 2008-01-05 23:11 ` rootkit85 at yahoo dot it @ 2008-01-05 23:14 ` rootkit85 at yahoo dot it 2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org ` (9 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-05 23:14 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from rootkit85 at yahoo dot it 2008-01-05 21:32 ------- Created an attachment (id=14884) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14884&action=view) the source compiled with -mfpmath=sse -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (2 preceding siblings ...) 2008-01-05 23:14 ` rootkit85 at yahoo dot it @ 2008-01-06 12:53 ` rguenth at gcc dot gnu dot org 2008-01-07 13:12 ` ubizjak at gmail dot com ` (8 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rguenth at gcc dot gnu dot org @ 2008-01-06 12:53 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from rguenth at gcc dot gnu dot org 2008-01-06 12:18 ------- Please narrow down the particular loop in your testcase that gets slower. It looks like the testsuite measures several things. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (3 preceding siblings ...) 2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org @ 2008-01-07 13:12 ` ubizjak at gmail dot com 2008-01-07 14:34 ` ubizjak at gmail dot com ` (7 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: ubizjak at gmail dot com @ 2008-01-07 13:12 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from ubizjak at gmail dot com 2008-01-07 12:19 ------- Confirmed by following testcase: --cut here-- #include <stdio.h> void __attribute__((noinline)) dtime (void) { __asm__ __volatile__ ("" : : : "memory"); } double sa, sb, sc, sd; double one, two, four, five; double piref, piprg, pierr; int main (int argc, char *argv[]) { double s, u, v, w, x; long i, m; piref = 3.14159265358979324; one = 1.0; two = 2.0; four = 4.0; five = 5.0; m = 512000000; dtime(); s = -five; sa = -one; dtime(); for (i = 1; i <= m; i++) { s = -s; sa = sa + s; } dtime(); sc = (double) m; u = sa; v = 0.0; w = 0.0; x = 0.0; dtime(); for (i = 1; i <= m; i++) { s = -s; sa = sa + s; u = u + two; x = x + (s - u); v = v - s * u; w = w + s / u; } dtime(); m = (long) (sa * x / sc); sa = four * w / five; sb = sa + five / v; sc = 31.25; piprg = sb - sc / (v * v * v); pierr = piprg - piref; printf ("%13.4le\n", pierr); return 0; } --cut here-- .L5: xorb $-128, -17(%ebp) #, s addl $1, %eax #, i.65 addsd %xmm4, %xmm1 # two.16, u cmpl $512000001, %eax #, i.65 movsd -24(%ebp), %xmm0 # s, tmp90 addsd -24(%ebp), %xmm2 # s, sa_lsm.48 mulsd %xmm1, %xmm0 # u, tmp90 subsd %xmm0, %xmm3 # tmp90, v movsd -24(%ebp), %xmm0 # s, tmp91 divsd %xmm1, %xmm0 # u, tmp91 addsd -16(%ebp), %xmm0 # w, tmp91 movsd %xmm0, -16(%ebp) # tmp91, w jne .L5 #, It is somehow possible to tolerate that "s" and "w" are not pushed into registers due to non-existent live range splitting (PR 23322), the main problem here is that the sign of "s"is changed in the memory by using (unaligned) xorb insn. The same situation is in the first (shorter) loop: .L4: xorb $-128, -17(%ebp) #, s addl $1, %eax #, i cmpl $512000001, %eax #, i addsd -24(%ebp), %xmm0 # s, sa_lsm.97 jne .L4 #, The performance regression is caused by partial memory stall [1]. [1] Agner Fog: How to optimize for the Pentium family of microprocessors, section 14.7 -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2008-01-07 12:19:54 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (4 preceding siblings ...) 2008-01-07 13:12 ` ubizjak at gmail dot com @ 2008-01-07 14:34 ` ubizjak at gmail dot com 2008-01-07 14:58 ` ubizjak at gmail dot com ` (6 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: ubizjak at gmail dot com @ 2008-01-07 14:34 UTC (permalink / raw) To: gcc-bugs ------- Comment #6 from ubizjak at gmail dot com 2008-01-07 14:02 ------- Patch in testing. -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|unassigned at gcc dot gnu |ubizjak at gmail dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2008-01-07 12:19:54 |2008-01-07 14:02:46 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (5 preceding siblings ...) 2008-01-07 14:34 ` ubizjak at gmail dot com @ 2008-01-07 14:58 ` ubizjak at gmail dot com 2008-01-07 20:00 ` rootkit85 at yahoo dot it ` (5 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: ubizjak at gmail dot com @ 2008-01-07 14:58 UTC (permalink / raw) To: gcc-bugs ------- Comment #7 from ubizjak at gmail dot com 2008-01-07 14:09 ------- Patched gcc: 387: FLOPS C Program (Double Precision), V2.0 18 Dec 1992 Module Error RunTime MFLOPS (usec) 1 -8.1208e-11 0.0128 1094.6170 2 -1.5485e-13 0.0061 1145.7086 SSE: FLOPS C Program (Double Precision), V2.0 18 Dec 1992 Module Error RunTime MFLOPS (usec) 1 4.0146e-13 0.0114 1227.3206 2 -1.4166e-13 0.0050 1399.9125 [ 2 -1.4166e-13 0.0269 260.2975 ] So, 5.36x faster. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (6 preceding siblings ...) 2008-01-07 14:58 ` ubizjak at gmail dot com @ 2008-01-07 20:00 ` rootkit85 at yahoo dot it 2008-01-07 20:00 ` rootkit85 at yahoo dot it ` (4 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw) To: gcc-bugs ------- Comment #8 from rootkit85 at yahoo dot it 2008-01-07 19:47 ------- Created an attachment (id=14895) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14895&action=view) minimal testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (7 preceding siblings ...) 2008-01-07 20:00 ` rootkit85 at yahoo dot it @ 2008-01-07 20:00 ` rootkit85 at yahoo dot it 2008-01-07 20:04 ` rootkit85 at yahoo dot it ` (3 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-07 20:00 UTC (permalink / raw) To: gcc-bugs ------- Comment #9 from rootkit85 at yahoo dot it 2008-01-07 19:47 ------- Created an attachment (id=14896) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14896&action=view) minimal testcase, compiled with -mfpmath=387 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (8 preceding siblings ...) 2008-01-07 20:00 ` rootkit85 at yahoo dot it @ 2008-01-07 20:04 ` rootkit85 at yahoo dot it 2008-01-07 20:05 ` rootkit85 at yahoo dot it ` (2 subsequent siblings) 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-07 20:04 UTC (permalink / raw) To: gcc-bugs ------- Comment #10 from rootkit85 at yahoo dot it 2008-01-07 19:47 ------- Created an attachment (id=14897) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14897&action=view) minimal testcase, compiled with -mfpmath=sse -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (9 preceding siblings ...) 2008-01-07 20:04 ` rootkit85 at yahoo dot it @ 2008-01-07 20:05 ` rootkit85 at yahoo dot it 2008-01-07 20:48 ` uros at gcc dot gnu dot org 2008-01-07 21:03 ` ubizjak at gmail dot com 12 siblings, 0 replies; 14+ messages in thread From: rootkit85 at yahoo dot it @ 2008-01-07 20:05 UTC (permalink / raw) To: gcc-bugs ------- Comment #11 from rootkit85 at yahoo dot it 2008-01-07 19:49 ------- very very minimal testcase added -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (10 preceding siblings ...) 2008-01-07 20:05 ` rootkit85 at yahoo dot it @ 2008-01-07 20:48 ` uros at gcc dot gnu dot org 2008-01-07 21:03 ` ubizjak at gmail dot com 12 siblings, 0 replies; 14+ messages in thread From: uros at gcc dot gnu dot org @ 2008-01-07 20:48 UTC (permalink / raw) To: gcc-bugs ------- Comment #12 from uros at gcc dot gnu dot org 2008-01-07 20:07 ------- Subject: Bug 34682 Author: uros Date: Mon Jan 7 20:06:34 2008 New Revision: 131381 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=131381 Log: PR target/34682 * config/i386/i386.md (neg<mode>2): Rename from negsf2, negdf2 and negxf2. Macroize expander using X87MODEF mode iterator. Change predicates of op0 and op1 to register_operand. (abs<mode>2): Rename from abssf2, absdf2 and negxf2. Macroize expander using X87MODEF mode iterator. Change predicates of op0 and op1 to register_operand. ("*absneg<mode>2_mixed", "*absneg<mode>2_sse"): Rename from corresponding patterns and macroize using MODEF macro. Change predicates of op0 and op1 to register_operand and remove "m" constraint. Disparage "r" alternative with "!". ("*absneg<mode>2_i387"): Rename from corresponding patterns and macroize using X87MODEF macro. Change predicates of op0 and op1 to register_operand and remove "m" constraint. Disparage "r" alternative with "!". (absneg splitter with memory operands): Remove. ("*neg<mode>2_1", "*abs<mode>2_1"): Rename from corresponding patterns and macroize using X87MODEF mode iterator. * config/i386/sse.md (negv4sf2, absv4sf2, neg2vdf2, absv2df2): Change predicate of op1 to register_operand. * config/i386/i386.c (ix86_expand_fp_absneg_operator): Remove support for memory operands. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md trunk/gcc/config/i386/sse.md -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug target/34682] 70% slowdown with SSE enabled 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it ` (11 preceding siblings ...) 2008-01-07 20:48 ` uros at gcc dot gnu dot org @ 2008-01-07 21:03 ` ubizjak at gmail dot com 12 siblings, 0 replies; 14+ messages in thread From: ubizjak at gmail dot com @ 2008-01-07 21:03 UTC (permalink / raw) To: gcc-bugs ------- Comment #13 from ubizjak at gmail dot com 2008-01-07 20:10 ------- Fixed in SVN. -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- URL|http://teknoraver.campuslife|http://gcc.gnu.org/ml/gcc- |.it/software/gcc-sse/ |patches/2008- | |01/msg00254.html Status|ASSIGNED |RESOLVED Keywords| |ssemmx Resolution| |FIXED Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34682 ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-01-07 20:10 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-05 23:06 [Bug c/34682] New: 70% slowdown with SSE enabled rootkit85 at yahoo dot it 2008-01-05 23:08 ` [Bug c/34682] " rootkit85 at yahoo dot it 2008-01-05 23:11 ` rootkit85 at yahoo dot it 2008-01-05 23:14 ` rootkit85 at yahoo dot it 2008-01-06 12:53 ` [Bug target/34682] " rguenth at gcc dot gnu dot org 2008-01-07 13:12 ` ubizjak at gmail dot com 2008-01-07 14:34 ` ubizjak at gmail dot com 2008-01-07 14:58 ` ubizjak at gmail dot com 2008-01-07 20:00 ` rootkit85 at yahoo dot it 2008-01-07 20:00 ` rootkit85 at yahoo dot it 2008-01-07 20:04 ` rootkit85 at yahoo dot it 2008-01-07 20:05 ` rootkit85 at yahoo dot it 2008-01-07 20:48 ` uros at gcc dot gnu dot org 2008-01-07 21:03 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).