public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "jtaylor.debian at googlemail dot com" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/64731] New: poor code when using vector_size((32)) for sse2 Date: Thu, 22 Jan 2015 15:15:00 -0000 [thread overview] Message-ID: <bug-64731-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64731 Bug ID: 64731 Summary: poor code when using vector_size((32)) for sse2 Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jtaylor.debian at googlemail dot com It would be nice if for some simple cases too large vector_size for the selected instruction set would still produce efficient code. E.g. using vector_size of 32 for SSE2 code results in essentially once unrolled vector_size 16 code and it still simply uses AVX if it one compiles with the appropriate option. But with current gcc 5.0 with this code: typedef double double4 __attribute__((vector_size(32))); void fun(double * a, double * b) { for (int i = 0; i < 1024; i+=4) { *(double4*)&a[i] += *(double4*)&b[i]; } } with AVX this turns into the expected code, but with only SSE2 enabled one gets this: gcc -O3 test2.c -c -std=c99 0000000000000000 <fun>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 e0 and $0xffffffffffffffe0,%rsp 9: 31 c0 xor %eax,%eax b: 41 ff 72 f8 pushq -0x8(%r10) f: 55 push %rbp 10: 48 89 e5 mov %rsp,%rbp 13: 41 52 push %r10 15: 48 83 ec 10 sub $0x10,%rsp 19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 20: 48 8b 14 07 mov (%rdi,%rax,1),%rdx 24: 48 89 55 90 mov %rdx,-0x70(%rbp) 28: 48 8b 54 07 08 mov 0x8(%rdi,%rax,1),%rdx 2d: 48 89 55 98 mov %rdx,-0x68(%rbp) 31: 48 8b 54 07 10 mov 0x10(%rdi,%rax,1),%rdx 36: 48 89 55 a0 mov %rdx,-0x60(%rbp) 3a: 48 8b 54 07 18 mov 0x18(%rdi,%rax,1),%rdx 3f: 48 89 55 a8 mov %rdx,-0x58(%rbp) 43: 48 8b 14 06 mov (%rsi,%rax,1),%rdx 47: 48 89 55 b0 mov %rdx,-0x50(%rbp) 4b: 48 8b 54 06 08 mov 0x8(%rsi,%rax,1),%rdx 50: 48 89 55 b8 mov %rdx,-0x48(%rbp) 54: 48 8b 54 06 10 mov 0x10(%rsi,%rax,1),%rdx 59: 66 0f 28 45 b0 movapd -0x50(%rbp),%xmm0 5e: 66 0f 58 45 90 addpd -0x70(%rbp),%xmm0 63: 48 89 55 c0 mov %rdx,-0x40(%rbp) 67: 48 8b 54 06 18 mov 0x18(%rsi,%rax,1),%rdx 6c: 48 89 55 c8 mov %rdx,-0x38(%rbp) 70: 0f 29 85 70 ff ff ff movaps %xmm0,-0x90(%rbp) 77: 66 48 0f 7e c2 movq %xmm0,%rdx 7c: 66 0f 28 45 c0 movapd -0x40(%rbp),%xmm0 81: 48 89 14 07 mov %rdx,(%rdi,%rax,1) 85: 48 8b 95 78 ff ff ff mov -0x88(%rbp),%rdx 8c: 66 0f 58 45 a0 addpd -0x60(%rbp),%xmm0 91: 0f 29 45 80 movaps %xmm0,-0x80(%rbp) 95: 48 89 54 07 08 mov %rdx,0x8(%rdi,%rax,1) 9a: 48 8b 55 80 mov -0x80(%rbp),%rdx 9e: 48 89 54 07 10 mov %rdx,0x10(%rdi,%rax,1) a3: 48 8b 55 88 mov -0x78(%rbp),%rdx a7: 48 89 54 07 18 mov %rdx,0x18(%rdi,%rax,1) ac: 48 83 c0 20 add $0x20,%rax b0: 48 3d 00 20 00 00 cmp $0x2000,%rax b6: 0f 85 64 ff ff ff jne 20 <fun+0x20> bc: 48 83 c4 10 add $0x10,%rsp c0: 41 5a pop %r10 c2: 5d pop %rbp c3: 49 8d 62 f8 lea -0x8(%r10),%rsp c7: c3 retq c8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) cf: 00 while I would have hoped for something along the lines of this: 10: 66 0f 28 44 c6 10 movapd 0x10(%rsi,%rax,8),%xmm0 16: 66 0f 28 0c c6 movapd (%rsi,%rax,8),%xmm1 1b: 66 0f 58 0c c7 addpd (%rdi,%rax,8),%xmm1 20: 66 0f 58 44 c7 10 addpd 0x10(%rdi,%rax,8),%xmm0 26: 66 0f 29 44 c7 10 movapd %xmm0,0x10(%rdi,%rax,8) 2c: 66 0f 29 0c c7 movapd %xmm1,(%rdi,%rax,8) 31: 48 83 c0 04 add $0x4,%rax 35: 3d 00 04 00 00 cmp $0x400,%eax 3a: 7c d4 jl 10 <fun+0x10>
next reply other threads:[~2015-01-22 15:15 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-01-22 15:15 jtaylor.debian at googlemail dot com [this message] 2015-01-22 16:17 ` [Bug tree-optimization/64731] vector lowering should split loads and stores rguenth at gcc dot gnu.org 2015-01-22 16:23 ` jakub at gcc dot gnu.org 2015-01-22 16:29 ` rguenther at suse dot de 2015-01-22 18:21 ` glisse at gcc dot gnu.org 2023-05-12 4:38 ` pinskia at gcc dot gnu.org 2023-05-12 6:42 ` rguenth at gcc dot gnu.org 2023-05-12 13:04 ` cvs-commit at gcc dot gnu.org 2023-05-12 13:05 ` rguenth at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-64731-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).