public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/52908] New: xop-mul-1:f9 miscompiled on bulldozer (-mxop)
@ 2012-04-08 22:48 matz at gcc dot gnu.org
  2012-04-09  8:17 ` [Bug target/52908] " ubizjak at gmail dot com
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: matz at gcc dot gnu.org @ 2012-04-08 22:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908

             Bug #: 52908
           Summary: xop-mul-1:f9 miscompiled on bulldozer (-mxop)
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: matz@gcc.gnu.org


FAIL: gcc.target/i386/xop-mul-1.c execution test

This is because f9 is miscompiled with -O3 -mxop (f13 is as well,
but this report is only about f9).  f9 looks like so:

__attribute__((noinline, noclone)) void
f9 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
    e1[i] = (long long) c2[i] * (long long) c3[i];
}

This is compiled into this asm:

f9:
        movl    $0, %eax
.L29:
        vpshufd $216, c2(%rax), %xmm1
        vpshufd $216, c3(%rax), %xmm0
        vpxor   %xmm2, %xmm2, %xmm2
        vpmacsdql       %xmm2, %xmm0, %xmm1, %xmm2
        vmovdqa %xmm2, e1(%rax,%rax)
        vpxor   %xmm0, %xmm0, %xmm0         <<< overwrite xmm0
        vpmacsdqh       %xmm0, %xmm0, %xmm1, %xmm0
        vmovdqa %xmm0, e1+16(%rax,%rax)
        addq    $16, %rax
        cmpq    $2048, %rax
        jne     .L29
        rep
        ret

The marked instruction zeroing xmm0 before the second vpmacsdqh overwrites
xmm0 although it's still used as input.  That insn is generated by post-reload
splitting and is caused by a conflict between two pattern in sse.md:

(define_insn "*sse4_1_mulv2siv2di3"
  [(set (match_operand:V2DI 0 "register_operand" "=x,x")
        (mult:V2DI
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 1 "nonimmediate_operand" "%0,x")
              (parallel [(const_int 0) (const_int 2)])))
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 2 "nonimmediate_operand" "xm,xm")
              (parallel [(const_int 0) (const_int 2)])))))]
  "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, V4SImode, operands)"
...

(note how operand 0 is not marked early-clobber) and the next xop pattern:

;; We don't have a straight 32-bit parallel multiply and extend on XOP, so
;; fake it with a multiply/add.  In general, we expect the define_split to
;; occur before register allocation, so we have to handle the corner case where
;; the target is the same as either operands[1] or operands[2]
(define_insn_and_split "xop_mulv2div2di3_high"
  [(set (match_operand:V2DI 0 "register_operand" "=&x")
        (mult:V2DI
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 1 "register_operand" "%x")
              (parallel [(const_int 0)
                         (const_int 2)])))
          (sign_extend:V2DI
            (vec_select:V2SI
              (match_operand:V4SI 2 "nonimmediate_operand" "xm")
              (parallel [(const_int 0)
                         (const_int 2)])))))]
  "TARGET_XOP"
  "#"
  "&& reload_completed"
  [(set (match_dup 0)
        (match_dup 3))
  ...

Note in particular how the comment about being careful about dest == op0/op1
is taken care of by the early-clobber of op0.

Now the problem is, the first (sse 4.1) pattern matches the insn in question:

(insn 31 27 32 3 (set (reg:V2DI 84)
        (mult:V2DI (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 81)
                    (parallel [
                            (const_int 0 [0])
                            (const_int 2 [0x2])
                        ])))
            (sign_extend:V2DI (vec_select:V2SI (reg:V4SI 82)
                    (parallel [
                            (const_int 0 [0])
                            (const_int 2 [0x2])
                        ])))))
/matz/trunk/gcc/gcc/testsuite/gcc.target/i386/sse
2-mul-1.c:99 1484 {*sse4_1_mulv2siv2di3}
     (expr_list:REG_DEAD (reg:V4SI 82)
        (expr_list:REG_DEAD (reg:V4SI 81)
            (nil))))

(sse4_1_mulv2siv2di3 matched).  The reg allocator hence doesn't see any
early-clobber, allocates pseudo 84 and 81 to xmm0, and when the pattern
then is split (using the xop_mulv2div2di3_high splitter, used because
the sse4_1_mulv2siv2di3 pattern is no splitter) op1 is overwritten.

I've no idea how best to solve this, either deactivating the sse4.1 pattern
when TARGET_XOP, or rewriting the latter to not early-clobber result, or
something else entirely.  I've tested that the first way fixes this problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-03-17  2:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-08 22:48 [Bug target/52908] New: xop-mul-1:f9 miscompiled on bulldozer (-mxop) matz at gcc dot gnu.org
2012-04-09  8:17 ` [Bug target/52908] " ubizjak at gmail dot com
2012-04-09 11:48 ` ubizjak at gmail dot com
2012-04-10 13:49 ` ubizjak at gmail dot com
2012-05-04 10:43 ` venkataramanan.kumar at amd dot com
2012-05-04 12:52 ` ubizjak at gmail dot com
2012-05-09  3:44 ` venkataramanan.kumar at amd dot com
2012-05-09 20:46 ` uros at gcc dot gnu.org
2012-05-09 20:57 ` ubizjak at gmail dot com
2012-05-10  6:22 ` ubizjak at gmail dot com
2012-06-18 15:11 ` vekumar at gcc dot gnu.org
2012-07-02 12:42 ` rguenth at gcc dot gnu.org
2024-03-17  2:33 ` sjames at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).