[Bug rtl-optimization/60086] New: suboptimal asm generated for a loop (store/load false aliasing)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "marcin.krotkiewski at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/60086] New: suboptimal asm generated for a loop (store/load false aliasing)
Date: Wed, 05 Feb 2014 22:41:00 -0000	[thread overview]
Message-ID: <bug-60086-4@http.gcc.gnu.org/bugzilla/> (raw)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60086

            Bug ID: 60086
           Summary: suboptimal asm generated for a loop (store/load false
                    aliasing)
           Product: gcc
           Version: 4.7.3
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: marcin.krotkiewski at gmail dot com

Created attachment 32060
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32060&action=edit
source code that compiles

Hello,

I am seeing suboptimal performance of the following loop compiled with
gcc 4.7.3 (but also 4.4.7, Ubuntu, full test code attached):

    for(i=0; i<NSIZE; i++){
      a[i] += b[i];
      c[i] += d[i];
    }

Arrays are dynamically allocated and aligned to page boundary, declared
with __restrict__ and __attribute__((aligned(32))). I am running on
Intel i7-2620M (Sandy Bridge).

The problem is IMHO related to '4k aliasing'. It happens for the most
common case of a/b/c/d starting at page boundary (e.g., natural result
of malloc). To demonstrate, here is the assembly generated with 'gcc
-mtune=native -mavx -O3':

.L8:
        vmovapd (%rdx,%rdi), %ymm0        #1 load b
        addq    $1, %r8                #2
        vaddpd  (%rcx,%rdi), %ymm0, %ymm0    #3 load a and add
        vmovapd %ymm0, (%rdx,%rdi)        #4 store a
        vmovapd (%rax,%rdi), %ymm0        #5 load d
        vaddpd  (%rsi,%rdi), %ymm0, %ymm0    #6 load c and add
        vmovapd %ymm0, (%rax,%rdi)        #7 store c
        addq    $32, %rdi            #8
        cmpq    %r8, %r12            #9
        ja      .L8                #10

The 4k aliasing problem is caused by lines 4 and 5 (writing result to
array a and reading data from either c or d). From my tests this seems
to be the default behavior for both AVX and SSE2 instruction sets, and
for both vectorized and non-vectorized cases.

It is easy to fix the problem by placing the two writes together, at the
end of the iteration, e.g.:

.L8:
        vmovapd (%rdx,%rdi), %ymm1        #1
        addq    $1, %r8                #2
        vaddpd  (%rcx,%rdi), %ymm1, %ymm1    #3
        vmovapd (%rax,%rdi), %ymm0        #4
        vaddpd  (%rsi,%rdi), %ymm0, %ymm0    #5
        vmovapd %ymm1, (%rdx,%rdi)        #6
        vmovapd %ymm0, (%rax,%rdi)        #7
        addq    $32, %rdi            #8
        cmpq    %r8, %r12            #9
        ja      .L8                #10

In this case the writes happen after all the loads. The above code is
(almost) what ICC generates for this case. For problem sizes small
enough to fit in L1 the speedup is roughly 50%.

next             reply	other threads:[~2014-02-05 22:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-05 22:41 marcin.krotkiewski at gmail dot com [this message]
2014-02-06  8:28 ` [Bug rtl-optimization/60086] " jakub at gcc dot gnu.org
2014-02-06  9:34 ` marcin.krotkiewski at gmail dot com
2014-02-06 10:10 ` mpolacek at gcc dot gnu.org
2014-02-06 10:22 ` rguenth at gcc dot gnu.org
2014-02-07  8:52 ` abel at gcc dot gnu.org
2014-02-07  8:53 ` abel at gcc dot gnu.org
2014-02-07 14:33 ` amonakov at gcc dot gnu.org
2014-02-07 16:43 ` marcin.krotkiewski at gmail dot com
2014-02-07 17:21 ` amonakov at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-60086-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).