public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "marcin.krotkiewski at gmail dot com" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/60086] New: suboptimal asm generated for a loop (store/load false aliasing) Date: Wed, 05 Feb 2014 22:41:00 -0000 [thread overview] Message-ID: <bug-60086-4@http.gcc.gnu.org/bugzilla/> (raw) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086 Bug ID: 60086 Summary: suboptimal asm generated for a loop (store/load false aliasing) Product: gcc Version: 4.7.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: marcin.krotkiewski at gmail dot com Created attachment 32060 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32060&action=edit source code that compiles Hello, I am seeing suboptimal performance of the following loop compiled with gcc 4.7.3 (but also 4.4.7, Ubuntu, full test code attached): for(i=0; i<NSIZE; i++){ a[i] += b[i]; c[i] += d[i]; } Arrays are dynamically allocated and aligned to page boundary, declared with __restrict__ and __attribute__((aligned(32))). I am running on Intel i7-2620M (Sandy Bridge). The problem is IMHO related to '4k aliasing'. It happens for the most common case of a/b/c/d starting at page boundary (e.g., natural result of malloc). To demonstrate, here is the assembly generated with 'gcc -mtune=native -mavx -O3': .L8: vmovapd (%rdx,%rdi), %ymm0 #1 load b addq $1, %r8 #2 vaddpd (%rcx,%rdi), %ymm0, %ymm0 #3 load a and add vmovapd %ymm0, (%rdx,%rdi) #4 store a vmovapd (%rax,%rdi), %ymm0 #5 load d vaddpd (%rsi,%rdi), %ymm0, %ymm0 #6 load c and add vmovapd %ymm0, (%rax,%rdi) #7 store c addq $32, %rdi #8 cmpq %r8, %r12 #9 ja .L8 #10 The 4k aliasing problem is caused by lines 4 and 5 (writing result to array a and reading data from either c or d). From my tests this seems to be the default behavior for both AVX and SSE2 instruction sets, and for both vectorized and non-vectorized cases. It is easy to fix the problem by placing the two writes together, at the end of the iteration, e.g.: .L8: vmovapd (%rdx,%rdi), %ymm1 #1 addq $1, %r8 #2 vaddpd (%rcx,%rdi), %ymm1, %ymm1 #3 vmovapd (%rax,%rdi), %ymm0 #4 vaddpd (%rsi,%rdi), %ymm0, %ymm0 #5 vmovapd %ymm1, (%rdx,%rdi) #6 vmovapd %ymm0, (%rax,%rdi) #7 addq $32, %rdi #8 cmpq %r8, %r12 #9 ja .L8 #10 In this case the writes happen after all the loads. The above code is (almost) what ICC generates for this case. For problem sizes small enough to fit in L1 the speedup is roughly 50%.
next reply other threads:[~2014-02-05 22:41 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-02-05 22:41 marcin.krotkiewski at gmail dot com [this message] 2014-02-06 8:28 ` [Bug rtl-optimization/60086] " jakub at gcc dot gnu.org 2014-02-06 9:34 ` marcin.krotkiewski at gmail dot com 2014-02-06 10:10 ` mpolacek at gcc dot gnu.org 2014-02-06 10:22 ` rguenth at gcc dot gnu.org 2014-02-07 8:52 ` abel at gcc dot gnu.org 2014-02-07 8:53 ` abel at gcc dot gnu.org 2014-02-07 14:33 ` amonakov at gcc dot gnu.org 2014-02-07 16:43 ` marcin.krotkiewski at gmail dot com 2014-02-07 17:21 ` amonakov at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-60086-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).