From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31619 invoked by alias); 10 Dec 2011 00:59:40 -0000 Received: (qmail 31611 invoked by uid 22791); 10 Dec 2011 00:59:39 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_DQ,TW_PX,TW_VD X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 10 Dec 2011 00:59:26 +0000 From: "drepper.fsp at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/51492] New: vectorizer generates unnecessary code Date: Sat, 10 Dec 2011 01:38:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: drepper.fsp at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-12/txt/msg01098.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492 Bug #: 51492 Summary: vectorizer generates unnecessary code Classification: Unclassified Product: gcc Version: 4.6.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: drepper.fsp@gmail.com Build: x86_64-linux Compile this code with 4.6.2 on a x86-64 machine with -O3: #define SIZE 65536 #define WSIZE 64 unsigned short head[SIZE] __attribute__((aligned(64))); void f(void) { for (unsigned n = 0; n < SIZE; ++n) { unsigned short m = head[n]; head[n] = (unsigned short)(m >= WSIZE ? m-WSIZE : 0); } } The result I see is this: 0000000000000000 : 0: 66 0f ef d2 pxor %xmm2,%xmm2 4: b8 00 00 00 00 mov $0x0,%eax 5: R_X86_64_32 head 9: 66 0f 6f 25 00 00 00 movdqa 0x0(%rip),%xmm4 # 11 10: 00 d: R_X86_64_PC32 .LC0-0x4 11: 66 0f 6f 1d 00 00 00 movdqa 0x0(%rip),%xmm3 # 19 18: 00 15: R_X86_64_PC32 .LC1-0x4 19: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 20: 66 0f 6f 00 movdqa (%rax),%xmm0 24: 66 0f 6f c8 movdqa %xmm0,%xmm1 28: 66 0f d9 c4 psubusw %xmm4,%xmm0 2c: 66 0f 75 c2 pcmpeqw %xmm2,%xmm0 30: 66 0f fd cb paddw %xmm3,%xmm1 34: 66 0f df c1 pandn %xmm1,%xmm0 38: 66 0f 7f 00 movdqa %xmm0,(%rax) 3c: 48 83 c0 10 add $0x10,%rax 40: 48 3d 00 00 00 00 cmp $0x0,%rax 42: R_X86_64_32S head+0x20000 46: 75 d8 jne 20 48: f3 c3 repz retq There is a lot of unnecessary code. The psubusw instruction alone is sufficient. The purpose of this instruction is to implement saturated subtraction. Why does gcc create all this extra code? The code should just be movdqa (%rax), %xmm0 psubusw %xmm1, %xmm0 movdqa %mm0, (%rax) where %xmm1 has WSIZE in the 16-bit values.