From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11114 invoked by alias); 12 Jun 2012 12:45:14 -0000 Received: (qmail 11102 invoked by uid 22791); 12 Jun 2012 12:45:13 -0000 X-SWARE-Spam-Status: No, hits=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,LOTS_OF_MONEY,TW_DQ,TW_KL,TW_LQ,TW_QD,TW_VD X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 12 Jun 2012 12:45:01 +0000 From: "andrii.riabushenko at barclays dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/53645] New: Missed optimization for division of vector types Date: Tue, 12 Jun 2012 12:45:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: andrii.riabushenko at barclays dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-06/txt/msg00684.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53645 Bug #: 53645 Summary: Missed optimization for division of vector types Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned@gcc.gnu.org ReportedBy: andrii.riabushenko@barclays.com for the following code v4si ttt(v4si x) { return x / (v4si) {3,3,3,3}; } GCC generates the following assembler ttt: movdqa (%rcx), %xmm0 movl $1431655766, %ecx movd %xmm0, %r8d pextrd $1, %xmm0, %r10d pextrd $2, %xmm0, %r11d movl %r8d, %eax sarl $31, %r8d imull %ecx movl %r10d, %eax sarl $31, %r10d movl %edx, %r9d imull %ecx movl %r11d, %eax subl %r8d, %r9d sarl $31, %r11d movl %edx, %r8d imull %ecx subl %r10d, %r8d movl %edx, %r10d subl %r11d, %r10d pextrd $3, %xmm0, %r11d movl %r11d, %eax imull %ecx sarl $31, %r11d movd %r10d, %xmm1 movd %r9d, %xmm0 pinsrd $0x1, %r8d, %xmm0 subl %r11d, %edx pinsrd $0x1, %edx, %xmm1 punpcklqdq %xmm1, %xmm0 ret Thus gcc DOES optimize the division to be done through High Multiplication, but it is applied to each value separately instead of vectorized ones. Assember should look like movdqa .LC190(%rip), %xmm0 pmulld (%rcx), %xmm0 pslld $31, %xmm0 ret