public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/37101]  New: [4.2/4.3 Regression] wrong code: tree vectorizer omits bogus movq/movlps construct
@ 2008-08-12 23:15 christophe at saout dot de
  2008-08-12 23:19 ` [Bug tree-optimization/37101] " christophe at saout dot de
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: christophe at saout dot de @ 2008-08-12 23:15 UTC (permalink / raw)
  To: gcc-bugs

(see http://bugs.gentoo.org/show_bug.cgi?id=227311 )

Seeing this on my (amd64) gcc 4.3.1 in xorg-server code dix/resource.c with
"-O2 -march=nocona -ftree-vectorize".

( gcc -O2 -march=nocona -ftree-vectorize -o resource.o -c resource.i )

Unfortunately I was not able to reproduce this using a simpler testcase,
because gcc would omit a "punpcklqdq" instead of the offending "movq/movlps",
so I attached the whole thing with a description of the problem and an analysis
of the wrong assembler code emitted.

When going through the attached file, look for the "RebuildTable" function,
which is inlined into "AddResource".

It contains the following construct:

    for (rptr = resources, tptr = tails; --j >= 0; rptr++, tptr++)
    {
        *rptr = NullResource;
        *tptr = rptr;
    }

which initialises two arrays of pointers, one filled with null pointers and the
other with a pointer to the just filled NULL in the other array.

The symptom (leeding to a segfault just after) shows up that the second "tails"
array is incorrectly initialized. Every second entry does not contain the
correct pointer (i.e. filled with the correct value of rptr), but contains a
"ladder" starting at 0 (0x8, 018, 0x28, ...), i.e. does not add the value of
"resources" as start offset.  (the whole array goes like 0xc000, 0x0008,
0xc010, 0x0018, 0xc020, 0x0028, ... instead of 0xc000, 0xc008, 0xc010, 0xc018,
0xc020, 0xc028, ...)

Digging through the code it seems that the tree vectorizer is emitting an
optimized implementation of the loop for large amount of entries (j starts at a
value of 128 when the failure was observed) after the body of the function.

It fills a bunch of xmm registers with values, so that both arrays can be
filled 16 bytes at a time (instead of 8 bytes at a time).

Now, due to the bug, the register that pair-wise fills the "tails" array should
be doing sth like this (pseudo-code)

xmm1 = ((resources + 0x00000008) << 64) + resources
xmm2 = (0x00000010 << 64) + 0x00000010

and then fill in xmm1 and add xmm2 to it after each iteration, or something
like that.

What I now seem to see be seeing, is that in my terminology here "xmm1" is not
filled correctly, the upper 64 bit seem to be zero, which would explain how the
array is incorrectly filled.

The assembler code of this part looks like:

(the start of the for loop in the default code path)
     f0d:       44 89 ee                mov    %r13d,%esi
     f10:       83 ee 01                sub    $0x1,%esi
     f13:       78 32                   js     f47 <AddResource+0x1a2>
     f15:       41 83 fd 09             cmp    $0x9,%r13d
     f19:       0f 87 22 01 00 00       ja     1041 <AddResource+0x29c>

So here seems to be some logic for the first --j and as to whether the
optimized out-of-line code should be jumped (and a non-sse implementation
of the loop body below, which looks ok).

Starting at 1041 there is some more funny logic which I don't fully get, but
some instructions below is where I think the cause of the problem is:

Here, the code that initializes xmm1 (also the xmm1 in my pseudo-code above):

    107e:       48 8b 44 24 38          mov    0x38(%rsp),%rax
    1083:       48 83 c0 08             add    $0x8,%rax
    1087:       66 48 0f 6e c8          movq   %rax,%xmm1
    108c:       0f 12 4c 24 38          movlps 0x38(%rsp),%xmm1

I'm no sse2 expert, but it seems as if movlps was supposed to move the lower
part of xmm1 (which contains "resources" + 0x8) to the upper part before
overwriting the lower part with "resources"? Or something, like that, which it
doesn't (using google and http://www.ews.uiuc.edu/~cjiang/reference/vc191.htm
as a reference... :-))

Anyway, here is the rest of the method:

    1091:       66 0f 6f c1             movdqa %xmm1,%xmm0
    1095:       31 d2                   xor    %edx,%edx
    1097:       31 c0                   xor    %eax,%eax
    1099:       66 0f 6f 15 00 00 00    movdqa 0x0(%rip),%xmm2        # 10a1
<AddResource+0x2fc>
    10a0:       00 
                        109d: R_X86_64_PC32     .LC3+0xfffffffffffffffc

(presumably loads 0x0000001000000010 into xmm2)

    10a1:       66 0f ef c9             pxor   %xmm1,%xmm1

(xmm1 now contains zeroes to fill the "resources" array)

    10a5:       48 8b 5c 24 38          mov    0x38(%rsp),%rbx
    10aa:       66 0f 7f 0c 03          movdqa %xmm1,(%rbx,%rax,1) <- fill 2
resources entries
    10af:       66 41 0f 7f 04 04       movdqa %xmm0,(%r12,%rax,1) <- fill 2
tails entries
    10b5:       83 c2 01                add    $0x1,%edx
    10b8:       48 83 c0 10             add    $0x10,%rax
    10bc:       66 0f d4 c2             paddq  %xmm2,%xmm0
    10c0:       39 ca                   cmp    %ecx,%edx
    10c2:       72 e1                   jb     10a5 <AddResource+0x300>

Trying to reproduce the test case, gcc omitted a "punpcklqdq" to initialized
the base instead, which works, so this must me some different code triggered by
some other register constraints or something (?).

As from the report, I guess this bug was already present in 4.2, haven't tested
any other versions myself.


-- 
           Summary: [4.2/4.3 Regression] wrong code: tree vectorizer omits
                    bogus movq/movlps construct
           Product: gcc
           Version: 4.3.1
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: christophe at saout dot de
 GCC build triplet: x86_64-pc-linux-gnu
  GCC host triplet: x86_64-pc-linux-gnu
GCC target triplet: x86_64-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37101


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-04-29 15:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-12 23:15 [Bug tree-optimization/37101] New: [4.2/4.3 Regression] wrong code: tree vectorizer omits bogus movq/movlps construct christophe at saout dot de
2008-08-12 23:19 ` [Bug tree-optimization/37101] " christophe at saout dot de
2008-08-13  8:54 ` rguenth at gcc dot gnu dot org
2008-08-13 13:48 ` ubizjak at gmail dot com
2008-08-13 16:15 ` christophe at saout dot de
2008-08-13 22:17 ` christophe at saout dot de
2008-08-14  0:10 ` christophe at saout dot de
2008-08-14 11:35 ` ubizjak at gmail dot com
2008-08-14 11:59 ` uros at gcc dot gnu dot org
2008-08-14 18:26 ` hjl at gcc dot gnu dot org
2008-08-17  6:24 ` cnstar9988 at gmail dot com
2008-08-17 14:59 ` uros at gcc dot gnu dot org
2008-08-17 17:26 ` [Bug target/37101] [4.2 " ubizjak at gmail dot com
2008-08-27 22:13 ` jsm28 at gcc dot gnu dot org
2008-08-28 14:22 ` rguenth at gcc dot gnu dot org
2008-08-29  9:44 ` uros at gcc dot gnu dot org
2008-08-29  9:45 ` ubizjak at gmail dot com
2008-09-02 20:54 ` Emmanuel dot Thome at inria dot fr
2009-04-29 15:16 ` pinskia at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).