public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "tim at klingt dot org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug inline-asm/38671] New: [4.4 Regression] speed regression with sse intrinsics Date: Tue, 30 Dec 2008 12:58:00 -0000 [thread overview] Message-ID: <bug-38671-12873@http.gcc.gnu.org/bugzilla/> (raw) i experience some speed regressions with gcc-4.4, with sse intrinsics on a core2 (x86_64). the code is: namespace detail { /** compute x1 * (1 + x2 * amount) */ __m128 inline amp_mod4_loop(__m128 x1, __m128 x2, __m128 amount, __m128 one) { return _mm_mul_ps(x1, _mm_add_ps(one, _mm_mul_ps(x2, amount))); } } /* namespace detail */ template <> inline void amp_mod4(float * out, const float * in1, const float * in2, const float amount, unsigned int n) { n = n >> 2; const __m128 one = detail::gen_one(); const __m128 amnt = _mm_set_ps1(amount); do { const __m128 x1 = _mm_load_ps(in1); in1 += 4; const __m128 x2 = _mm_load_ps(in2); in2 += 4; const __m128 result = detail::amp_mod4_loop(x1, x2, amnt, one); _mm_store_ps(out, result); out += 4; } while (--n); } the results for different compilers (using hardware performance counters) are: gcc-4.4: cycles: 1416276094 branch misses: 425897 gcc-4.4 -march=core2: cycles: 1520034636 branch misses: 3263912 gcc-4.3: cycles: 1548838336 branch misses: 5990424 gcc-4.3 -march=core2: cycles: 1386605444 branch misses: 5609 gcc-4.2: cycles: 1321697674 branch misses: 3682 it seems that gcc-4.3 with -march core2 and gcc-4.2 generate code, which is more friendly to the branch predictor. tuning for core2 on gcc-4.4 actually seems to generate worse code. the best code (gcc-4.2) is: 0000000000400de0 <bench_1_simd(unsigned int)>: 400de0: 66 0f ef c0 pxor %xmm0,%xmm0 400de4: c1 ef 02 shr $0x2,%edi 400de7: 0f 28 15 32 0f 00 00 movaps 0xf32(%rip),%xmm2 # 401d20 <_IO_stdin_used+0xb0> 400dee: 31 c0 xor %eax,%eax 400df0: 66 0f 76 c0 pcmpeqd %xmm0,%xmm0 400df4: 66 0f 72 d0 19 psrld $0x19,%xmm0 400df9: 66 0f 72 f0 17 pslld $0x17,%xmm0 400dfe: 0f 28 c8 movaps %xmm0,%xmm1 400e01: 0f 28 80 e0 26 60 00 movaps 0x6026e0(%rax),%xmm0 400e08: 0f 59 c2 mulps %xmm2,%xmm0 400e0b: 0f 58 c1 addps %xmm1,%xmm0 400e0e: 0f 59 80 e0 25 60 00 mulps 0x6025e0(%rax),%xmm0 400e15: 0f 29 80 e0 24 60 00 movaps %xmm0,0x6024e0(%rax) 400e1c: 48 83 c0 10 add $0x10,%rax 400e20: 83 ef 01 sub $0x1,%edi 400e23: 75 dc jne 400e01 <bench_1_simd(unsigned int)+0x21> 400e25: f3 c3 repz retq 400e27: 90 nop 400e28: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) the worst code (gcc-4.4, -march=core2) is 15% slower: 0000000000400e70 <bench_1_simd(unsigned int)>: 400e70: 66 0f ef d2 pxor %xmm2,%xmm2 400e74: 89 fa mov %edi,%edx 400e76: 66 0f 76 d2 pcmpeqd %xmm2,%xmm2 400e7a: c1 ea 02 shr $0x2,%edx 400e7d: 66 0f 72 d2 19 psrld $0x19,%xmm2 400e82: ff ca dec %edx 400e84: 66 0f 72 f2 17 pslld $0x17,%xmm2 400e89: 48 ff c2 inc %rdx 400e8c: 0f 28 0d 7d 17 00 00 movaps 0x177d(%rip),%xmm1 # 402610 <_IO_stdin_used+0xb0> 400e93: 48 c1 e2 04 shl $0x4,%rdx 400e97: 31 c0 xor %eax,%eax 400e99: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 400ea0: 0f 28 c1 movaps %xmm1,%xmm0 400ea3: 0f 59 80 e0 36 60 00 mulps 0x6036e0(%rax),%xmm0 400eaa: 0f 58 c2 addps %xmm2,%xmm0 400ead: 0f 59 80 e0 35 60 00 mulps 0x6035e0(%rax),%xmm0 400eb4: 0f 29 80 e0 34 60 00 movaps %xmm0,0x6034e0(%rax) 400ebb: 48 83 c0 10 add $0x10,%rax 400ebf: 48 39 d0 cmp %rdx,%rax 400ec2: 75 dc jne 400ea0 <bench_1_simd(unsigned int)+0x30> 400ec4: f3 c3 repz retq 400ec6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 400ecd: 00 00 00 -- Summary: [4.4 Regression] speed regression with sse intrinsics Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tim at klingt dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38671
next reply other threads:[~2008-12-30 12:58 UTC|newest] Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top 2008-12-30 12:58 tim at klingt dot org [this message] 2008-12-30 12:59 ` [Bug inline-asm/38671] " tim at klingt dot org 2008-12-30 16:23 ` [Bug target/38671] " pinskia at gcc dot gnu dot org 2008-12-31 7:50 ` pinskia at gcc dot gnu dot org 2008-12-31 7:57 ` pinskia at gcc dot gnu dot org 2008-12-31 8:11 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops pinskia at gcc dot gnu dot org 2008-12-31 8:14 ` [Bug middle-end/38671] [4.4 Regression] extra code for setting up loops (IV-opts and 32bits vs 64bits) pinskia at gcc dot gnu dot org 2008-12-31 9:21 ` tim at klingt dot org 2009-01-05 11:28 ` rguenth at gcc dot gnu dot org 2009-04-21 16:00 ` [Bug middle-end/38671] [4.4/4.5 " jakub at gcc dot gnu dot org 2009-07-22 10:35 ` jakub at gcc dot gnu dot org 2009-10-15 12:54 ` jakub at gcc dot gnu dot org 2010-01-21 13:16 ` jakub at gcc dot gnu dot org 2010-03-01 23:32 ` [Bug middle-end/38671] [4.3/4.4/4.5 " pinskia at gcc dot gnu dot org 2010-03-01 23:35 ` [Bug middle-end/38671] [4.3/4.4/4.5 Regression] selecting one IV instead of three pinskia at gcc dot gnu dot org 2010-04-30 9:25 ` [Bug middle-end/38671] [4.3/4.4/4.5/4.6 " jakub at gcc dot gnu dot org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-38671-12873@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).