public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "tim at klingt dot org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/38824] New: [4.4 regression] performance regression of sse code from 4.2/4.3 Date: Tue, 13 Jan 2009 11:25:00 -0000 [thread overview] Message-ID: <bug-38824-12873@http.gcc.gnu.org/bugzilla/> (raw) the following code shows a performance regression from gcc-4.2 to gcc-4.3 and 4.4 (20090111) on an intel core2 using the x86_64 architecture: void bench_1(float * out, float * in, float f, unsigned int n) { n /= 4; __m128 scalar = _mm_set_ps1(f); do { __m128 arg = _mm_load_ps(in); __m128 result = _mm_add_ps(arg, scalar); _mm_store_ps(out, result); in += 4; out += 4; } while (--n); } results, running the function 100000000 times, measured with performance counters (requires a patched kernel), compiled with -O3 -mfpmath=sse -msse gcc-4.2: 1946256122 cycles, 8394301290 instructions, 5005 branch misses gcc-4.3: 2191990305 cycles, 7658465214 instructions, 3442 branch misses gcc-4.4: 2532778908 cycles, 7462359830 instructions, 8593402 branch misses although the instruction count decreases, the cycles spent in the function increases. also gcc-4.4 shows a huge number of branch misses. the generated code is gcc-4.2: .globl _Z7bench_1PfS_fj .type _Z7bench_1PfS_fj, @function _Z7bench_1PfS_fj: .LFB2695: movaps %xmm0, %xmm2 shrl $2, %edx shufps $0, %xmm2, %xmm2 movaps %xmm2, %xmm1 .p2align 4,,7 .L15: movaps (%rsi), %xmm0 addq $16, %rsi addps %xmm1, %xmm0 movaps %xmm0, (%rdi) addq $16, %rdi subl $1, %edx jne .L15 rep ; ret .LFE2695: .size _Z7bench_1PfS_fj, .-_Z7bench_1PfS_fj .align 2 .p2align 4,,15 gcc-4.3 .globl _Z7bench_1PfS_fj .type _Z7bench_1PfS_fj, @function _Z7bench_1PfS_fj: .LFB2563: movaps %xmm0, %xmm2 shrl $2, %edx subl $1, %edx xorl %eax, %eax shufps $0, %xmm2, %xmm2 mov %edx, %edx addq $1, %rdx salq $4, %rdx movaps %xmm2, %xmm1 .p2align 4,,10 .p2align 3 .L17: movaps (%rsi,%rax), %xmm0 addps %xmm1, %xmm0 movaps %xmm0, (%rdi,%rax) addq $16, %rax cmpq %rdx, %rax jne .L17 rep ret .LFE2563: .size _Z7bench_1PfS_fj, .-_Z7bench_1PfS_fj .p2align 4,,15 gcc-4.4 .globl _Z7bench_1PfS_fj .type _Z7bench_1PfS_fj, @function _Z7bench_1PfS_fj: .LFB2489: .cfi_startproc .cfi_personality 0x3,__gxx_personality_v0 shrl $2, %edx shufps $0, %xmm0, %xmm0 subl $1, %edx xorl %eax, %eax addq $1, %rdx salq $4, %rdx .p2align 4,,10 .p2align 3 .L17: movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm1 movaps %xmm1, (%rdi,%rax) addq $16, %rax cmpq %rdx, %rax jne .L17 rep ret .cfi_endproc .LFE2489: .size _Z7bench_1PfS_fj, .-_Z7bench_1PfS_fj .p2align 4,,15 -- Summary: [4.4 regression] performance regression of sse code from 4.2/4.3 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tim at klingt dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38824
next reply other threads:[~2009-01-13 11:25 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2009-01-13 11:25 tim at klingt dot org [this message] 2009-01-13 15:07 ` [Bug target/38824] " rguenth at gcc dot gnu dot org 2009-01-13 16:22 ` tim at klingt dot org 2009-01-14 20:20 ` hubicka at gcc dot gnu dot org 2009-01-14 20:26 ` [Bug target/38824] [4.4 Regression] " rguenth at gcc dot gnu dot org 2009-01-14 20:32 ` [Bug target/38824] [4.4 regression] " hubicka at gcc dot gnu dot org 2009-01-15 0:31 ` hubicka at gcc dot gnu dot org 2009-01-15 1:26 ` hjl dot tools at gmail dot com 2009-01-15 1:49 ` hubicka at ucw dot cz 2009-01-23 16:19 ` [Bug target/38824] [4.4 Regression] " rguenth at gcc dot gnu dot org 2009-01-24 5:12 ` xuepeng dot guo at intel dot com 2009-01-24 9:56 ` tim at klingt dot org 2009-01-24 13:14 ` tim at klingt dot org 2009-01-25 17:56 ` rguenth at gcc dot gnu dot org 2009-02-06 9:16 ` bonzini at gnu dot org 2009-02-06 22:35 ` dwarak dot rajagopal at amd dot com 2009-02-07 16:18 ` rob1weld at aol dot com 2009-02-08 12:36 ` hubicka at gcc dot gnu dot org 2009-02-08 12:40 ` hubicka at gcc dot gnu dot org 2009-02-09 9:16 ` xuepeng dot guo at intel dot com 2009-02-09 13:36 ` bonzini at gnu dot org 2009-02-09 13:38 ` bonzini at gnu dot org 2009-02-10 16:29 ` dwarak dot rajagopal at amd dot com 2009-02-10 16:39 ` bonzini at gnu dot org 2009-02-11 7:37 ` xuepeng dot guo at intel dot com 2009-02-11 8:01 ` bonzini at gnu dot org 2009-02-11 8:14 ` ubizjak at gmail dot com 2009-02-11 8:58 ` bonzini at gnu dot org 2009-02-12 15:45 ` hjl at gcc dot gnu dot org 2009-02-16 9:15 ` bonzini at gnu dot org 2009-03-12 16:01 ` hjl dot tools at gmail dot com 2009-03-12 16:08 ` hjl at gcc dot gnu dot org 2009-03-12 20:22 ` hjl dot tools at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-38824-12873@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).