From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-388244-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 400 invoked by alias); 8 Apr 2012 22:48:29 -0000
Received: (qmail 391 invoked by uid 22791); 8 Apr 2012 22:48:26 -0000
X-SWARE-Spam-Status: No, hits=-3.2 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00,TW_DD,TW_DQ,TW_MX,TW_PX,TW_SD,TW_VD,TW_VP
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 08 Apr 2012 22:48:11 +0000
From: "matz at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/52908] New: xop-mul-1:f9 miscompiled on bulldozer (-mxop)
Date: Sun, 08 Apr 2012 22:48:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: matz at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-52908-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2012-04/txt/msg00526.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908

             Bug #: 52908
           Summary: xop-mul-1:f9 miscompiled on bulldozer (-mxop)
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: matz@gcc.gnu.org


FAIL: gcc.target/i386/xop-mul-1.c execution test

This is because f9 is miscompiled with -O3 -mxop (f13 is as well,
but this report is only about f9).  f9 looks like so:

__attribute__((noinline, noclone)) void
f9 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
    e1[i] = (long long) c2[i] * (long long) c3[i];
}

This is compiled into this asm:

f9:
        movl    $0, %eax
.L29:
        vpshufd $216, c2(%rax), %xmm1
        vpshufd $216, c3(%rax), %xmm0
        vpxor   %xmm2, %xmm2, %xmm2
        vpmacsdql       %xmm2, %xmm0, %xmm1, %xmm2
        vmovdqa %xmm2, e1(%rax,%rax)
        vpxor   %xmm0, %xmm0, %xmm0         <<< overwrite xmm0
        vpmacsdqh       %xmm0, %xmm0, %xmm1, %xmm0
        vmovdqa %xmm0, e1+16(%rax,%rax)
        addq    $16, %rax
        cmpq    $2048, %rax
        jne     .L29
        rep
        ret

The marked instruction zeroing xmm0 before the second vpmacsdqh overwrites
xmm0 although it's still used as input.  That insn is generated by post-reload
splitting and is caused by a conflict between two pattern in sse.md:

(define_insn "*sse4_1_mulv2siv2di3"
  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
        (mult:V2DI
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 1 "nonimmediate_operand" "%0,x")
              (parallel [(const_int 0) (const_int 2)])))
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm")
              (parallel [(const_int 0) (const_int 2)])))))]
  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, V4SImode, operands)"
...

(note how operand 0 is not marked early-clobber) and the next xop pattern:

;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
;; fake it with a multiply/add.  In general, we expect the define_split to
;; occur before register allocation, so we have to handle the corner case where
;; the target is the same as either operands[1] or operands[2]
(define_insn_and_split "xop_mulv2div2di3_high"
  [(set (match_operand:V2DI 0 "register_operand" "=&x")
        (mult:V2DI
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 1 "register_operand" "%x")
              (parallel [(const_int 0)
                         (const_int 2)])))
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 2 "nonimmediate_operand" "xm")
              (parallel [(const_int 0)
                         (const_int 2)])))))]
  "TARGET_XOP"
  "#"
  "&& reload_completed"
  [(set (match_dup 0)
        (match_dup 3))
  ...

Note in particular how the comment about being careful about dest == op0/op1
is taken care of by the early-clobber of op0.

Now the problem is, the first (sse 4.1) pattern matches the insn in question:

(insn 31 27 32 3 (set (reg:V2DI 84)
        (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 81)
                    (parallel [
                            (const_int 0 [0])
                            (const_int 2 [0x2])
                        ])))
            (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 82)
                    (parallel [
                            (const_int 0 [0])
                            (const_int 2 [0x2])
                        ])))))
/matz/trunk/gcc/gcc/testsuite/gcc.target/i386/sse
2-mul-1.c:99 1484 {*sse4_1_mulv2siv2di3}
     (expr_list:REG_DEAD (reg:V4SI 82)
        (expr_list:REG_DEAD (reg:V4SI 81)
            (nil))))

(sse4_1_mulv2siv2di3 matched).  The reg allocator hence doesn't see any
early-clobber, allocates pseudo 84 and 81 to xmm0, and when the pattern
then is split (using the xop_mulv2div2di3_high splitter, used because
the sse4_1_mulv2siv2di3 pattern is no splitter) op1 is overwritten.

I've no idea how best to solve this, either deactivating the sse4.1 pattern
when TARGET_XOP, or rewriting the latter to not early-clobber result, or
something else entirely.  I've tested that the first way fixes this problem.