From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 400 invoked by alias); 8 Apr 2012 22:48:29 -0000 Received: (qmail 391 invoked by uid 22791); 8 Apr 2012 22:48:26 -0000 X-SWARE-Spam-Status: No, hits=-3.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_DD,TW_DQ,TW_MX,TW_PX,TW_SD,TW_VD,TW_VP X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 08 Apr 2012 22:48:11 +0000 From: "matz at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/52908] New: xop-mul-1:f9 miscompiled on bulldozer (-mxop) Date: Sun, 08 Apr 2012 22:48:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: matz at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-04/txt/msg00526.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908 Bug #: 52908 Summary: xop-mul-1:f9 miscompiled on bulldozer (-mxop) Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: matz@gcc.gnu.org FAIL: gcc.target/i386/xop-mul-1.c execution test This is because f9 is miscompiled with -O3 -mxop (f13 is as well, but this report is only about f9). f9 looks like so: __attribute__((noinline, noclone)) void f9 (void) { int i; for (i = 0; i < 512; ++i) e1[i] = (long long) c2[i] * (long long) c3[i]; } This is compiled into this asm: f9: movl $0, %eax .L29: vpshufd $216, c2(%rax), %xmm1 vpshufd $216, c3(%rax), %xmm0 vpxor %xmm2, %xmm2, %xmm2 vpmacsdql %xmm2, %xmm0, %xmm1, %xmm2 vmovdqa %xmm2, e1(%rax,%rax) vpxor %xmm0, %xmm0, %xmm0 <<< overwrite xmm0 vpmacsdqh %xmm0, %xmm0, %xmm1, %xmm0 vmovdqa %xmm0, e1+16(%rax,%rax) addq $16, %rax cmpq $2048, %rax jne .L29 rep ret The marked instruction zeroing xmm0 before the second vpmacsdqh overwrites xmm0 although it's still used as input. That insn is generated by post-reload splitting and is caused by a conflict between two pattern in sse.md: (define_insn "*sse4_1_mulv2siv2di3" [(set (match_operand:V2DI 0 "register_operand" "=x,x") (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 "nonimmediate_operand" "%0,x") (parallel [(const_int 0) (const_int 2)]))) (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm") (parallel [(const_int 0) (const_int 2)])))))] "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, V4SImode, operands)" ... (note how operand 0 is not marked early-clobber) and the next xop pattern: ;; We don't have a straight 32-bit parallel multiply and extend on XOP, so ;; fake it with a multiply/add. In general, we expect the define_split to ;; occur before register allocation, so we have to handle the corner case where ;; the target is the same as either operands[1] or operands[2] (define_insn_and_split "xop_mulv2div2di3_high" [(set (match_operand:V2DI 0 "register_operand" "=&x") (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 1 "register_operand" "%x") (parallel [(const_int 0) (const_int 2)]))) (sign_extend:V2DI (vec_select:V2SI (match_operand:V4SI 2 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 2)])))))] "TARGET_XOP" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 3)) ... Note in particular how the comment about being careful about dest == op0/op1 is taken care of by the early-clobber of op0. Now the problem is, the first (sse 4.1) pattern matches the insn in question: (insn 31 27 32 3 (set (reg:V2DI 84) (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 81) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) ]))) (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 82) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) ]))))) /matz/trunk/gcc/gcc/testsuite/gcc.target/i386/sse 2-mul-1.c:99 1484 {*sse4_1_mulv2siv2di3} (expr_list:REG_DEAD (reg:V4SI 82) (expr_list:REG_DEAD (reg:V4SI 81) (nil)))) (sse4_1_mulv2siv2di3 matched). The reg allocator hence doesn't see any early-clobber, allocates pseudo 84 and 81 to xmm0, and when the pattern then is split (using the xop_mulv2div2di3_high splitter, used because the sse4_1_mulv2siv2di3 pattern is no splitter) op1 is overwritten. I've no idea how best to solve this, either deactivating the sse4.1 pattern when TARGET_XOP, or rewriting the latter to not early-clobber result, or something else entirely. I've tested that the first way fixes this problem.