From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-376532-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 31619 invoked by alias); 10 Dec 2011 00:59:40 -0000
Received: (qmail 31611 invoked by uid 22791); 10 Dec 2011 00:59:39 -0000
X-SWARE-Spam-Status: No, hits=-2.8 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,TW_DQ,TW_PX,TW_VD
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 10 Dec 2011 00:59:26 +0000
From: "drepper.fsp at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/51492] New: vectorizer generates unnecessary code
Date: Sat, 10 Dec 2011 01:38:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: drepper.fsp at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-51492-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2011-12/txt/msg01098.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492

             Bug #: 51492
           Summary: vectorizer generates unnecessary code
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: drepper.fsp@gmail.com
             Build: x86_64-linux


Compile this code with 4.6.2 on a x86-64 machine with -O3:

#define SIZE 65536
#define WSIZE 64
unsigned short head[SIZE] __attribute__((aligned(64)));

void
f(void)
{
  for (unsigned n = 0; n < SIZE; ++n) {
    unsigned short m = head[n];
    head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0);
  }
}

The result I see is this:

0000000000000000 <f>:
   0:    66 0f ef d2              pxor   %xmm2,%xmm2
   4:    b8 00 00 00 00           mov    $0x0,%eax
            5: R_X86_64_32    head
   9:    66 0f 6f 25 00 00 00     movdqa 0x0(%rip),%xmm4        # 11 <f+0x11>
  10:    00 
            d: R_X86_64_PC32    .LC0-0x4
  11:    66 0f 6f 1d 00 00 00     movdqa 0x0(%rip),%xmm3        # 19 <f+0x19>
  18:    00 
            15: R_X86_64_PC32    .LC1-0x4
  19:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
  20:    66 0f 6f 00              movdqa (%rax),%xmm0
  24:    66 0f 6f c8              movdqa %xmm0,%xmm1
  28:    66 0f d9 c4              psubusw %xmm4,%xmm0
  2c:    66 0f 75 c2              pcmpeqw %xmm2,%xmm0
  30:    66 0f fd cb              paddw  %xmm3,%xmm1
  34:    66 0f df c1              pandn  %xmm1,%xmm0
  38:    66 0f 7f 00              movdqa %xmm0,(%rax)
  3c:    48 83 c0 10              add    $0x10,%rax
  40:    48 3d 00 00 00 00        cmp    $0x0,%rax
            42: R_X86_64_32S    head+0x20000
  46:    75 d8                    jne    20 <f+0x20>
  48:    f3 c3                    repz retq 


There is a lot of unnecessary code.  The psubusw instruction alone is
sufficient.  The purpose of this instruction is to implement saturated
subtraction.  Why does gcc create all this extra code?  The code should just be

   movdqa (%rax), %xmm0
   psubusw %xmm1, %xmm0
   movdqa %mm0, (%rax)

where %xmm1 has WSIZE in the 16-bit values.