From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4661 invoked by alias); 9 Apr 2012 11:48:34 -0000 Received: (qmail 4650 invoked by uid 22791); 9 Apr 2012 11:48:33 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,KHOP_THREADED,TW_DD,TW_DQ,TW_PX,TW_SD,TW_VD,TW_VP,TW_ZJ X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 09 Apr 2012 11:48:21 +0000 From: "ubizjak at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/52908] xop-mul-1:f9 miscompiled on bulldozer (-mxop) Date: Mon, 09 Apr 2012 11:48:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ubizjak at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: ubizjak at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg00542.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908 --- Comment #2 from Uros Bizjak 2012-04-09 11:48:05 UTC --- Created attachment 27117 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27117 Proposed patch There are indeed two problems with XOP patterns: a) duplication of *sse4_1_mulv2siv2di3 pattern b) wrong order of operands in all (!!!) XOP patterns. XOP patterns consider element 0 as MSB. Attached patch solves this by simply removing fake xop_mulv2div2di3_{low,high} patterns, expanding to (fixed) xop_pmacsdq{h,l} patterns directly. There is simply no need to use vpmacsdql instead of vpmuldq. For consistency, the patch expands to xop_pmacsdql pattern, but gcc figures out that addition of 0 is unneeded and substitutes MAC insn with plain MUL. Attached patch does not even try to fix other intrinsics. Someone familiar with AMD documentation should review all these, since the documentation (43479.pdf) is somehow inconsistent (i.e. the figure that explains VPMADCSSWD is inconsistent with the description). Since I don't have XOP processor, I can only eyeball the asm, in this case: vpxor %xmm3, %xmm3, %xmm3 xorl %eax, %eax .L3: vpshufd $216, c2(%rax), %xmm1 vpshufd $216, c3(%rax), %xmm0 vpmuldq %xmm0, %xmm1, %xmm2 vpmacsdqh %xmm3, %xmm0, %xmm1, %xmm0 vmovdqa %xmm2, e1(%rax,%rax) vmovdqa %xmm0, e1+16(%rax,%rax) addq $16, %rax cmpq $2048, %rax jne .L3 Please also note hoisting of constant load.