From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25447 invoked by alias); 12 Aug 2008 23:15:41 -0000 Received: (qmail 24656 invoked by uid 48); 12 Aug 2008 23:14:23 -0000 Date: Tue, 12 Aug 2008 23:15:00 -0000 Subject: [Bug tree-optimization/37101] New: [4.2/4.3 Regression] wrong code: tree vectorizer omits bogus movq/movlps construct X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "christophe at saout dot de" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2008-08/txt/msg00958.txt.bz2 (see http://bugs.gentoo.org/show_bug.cgi?id=227311 ) Seeing this on my (amd64) gcc 4.3.1 in xorg-server code dix/resource.c with "-O2 -march=nocona -ftree-vectorize". ( gcc -O2 -march=nocona -ftree-vectorize -o resource.o -c resource.i ) Unfortunately I was not able to reproduce this using a simpler testcase, because gcc would omit a "punpcklqdq" instead of the offending "movq/movlps", so I attached the whole thing with a description of the problem and an analysis of the wrong assembler code emitted. When going through the attached file, look for the "RebuildTable" function, which is inlined into "AddResource". It contains the following construct: for (rptr = resources, tptr = tails; --j >= 0; rptr++, tptr++) { *rptr = NullResource; *tptr = rptr; } which initialises two arrays of pointers, one filled with null pointers and the other with a pointer to the just filled NULL in the other array. The symptom (leeding to a segfault just after) shows up that the second "tails" array is incorrectly initialized. Every second entry does not contain the correct pointer (i.e. filled with the correct value of rptr), but contains a "ladder" starting at 0 (0x8, 018, 0x28, ...), i.e. does not add the value of "resources" as start offset. (the whole array goes like 0xc000, 0x0008, 0xc010, 0x0018, 0xc020, 0x0028, ... instead of 0xc000, 0xc008, 0xc010, 0xc018, 0xc020, 0xc028, ...) Digging through the code it seems that the tree vectorizer is emitting an optimized implementation of the loop for large amount of entries (j starts at a value of 128 when the failure was observed) after the body of the function. It fills a bunch of xmm registers with values, so that both arrays can be filled 16 bytes at a time (instead of 8 bytes at a time). Now, due to the bug, the register that pair-wise fills the "tails" array should be doing sth like this (pseudo-code) xmm1 = ((resources + 0x00000008) << 64) + resources xmm2 = (0x00000010 << 64) + 0x00000010 and then fill in xmm1 and add xmm2 to it after each iteration, or something like that. What I now seem to see be seeing, is that in my terminology here "xmm1" is not filled correctly, the upper 64 bit seem to be zero, which would explain how the array is incorrectly filled. The assembler code of this part looks like: (the start of the for loop in the default code path) f0d: 44 89 ee mov %r13d,%esi f10: 83 ee 01 sub $0x1,%esi f13: 78 32 js f47 f15: 41 83 fd 09 cmp $0x9,%r13d f19: 0f 87 22 01 00 00 ja 1041 So here seems to be some logic for the first --j and as to whether the optimized out-of-line code should be jumped (and a non-sse implementation of the loop body below, which looks ok). Starting at 1041 there is some more funny logic which I don't fully get, but some instructions below is where I think the cause of the problem is: Here, the code that initializes xmm1 (also the xmm1 in my pseudo-code above): 107e: 48 8b 44 24 38 mov 0x38(%rsp),%rax 1083: 48 83 c0 08 add $0x8,%rax 1087: 66 48 0f 6e c8 movq %rax,%xmm1 108c: 0f 12 4c 24 38 movlps 0x38(%rsp),%xmm1 I'm no sse2 expert, but it seems as if movlps was supposed to move the lower part of xmm1 (which contains "resources" + 0x8) to the upper part before overwriting the lower part with "resources"? Or something, like that, which it doesn't (using google and http://www.ews.uiuc.edu/~cjiang/reference/vc191.htm as a reference... :-)) Anyway, here is the rest of the method: 1091: 66 0f 6f c1 movdqa %xmm1,%xmm0 1095: 31 d2 xor %edx,%edx 1097: 31 c0 xor %eax,%eax 1099: 66 0f 6f 15 00 00 00 movdqa 0x0(%rip),%xmm2 # 10a1 10a0: 00 109d: R_X86_64_PC32 .LC3+0xfffffffffffffffc (presumably loads 0x0000001000000010 into xmm2) 10a1: 66 0f ef c9 pxor %xmm1,%xmm1 (xmm1 now contains zeroes to fill the "resources" array) 10a5: 48 8b 5c 24 38 mov 0x38(%rsp),%rbx 10aa: 66 0f 7f 0c 03 movdqa %xmm1,(%rbx,%rax,1) <- fill 2 resources entries 10af: 66 41 0f 7f 04 04 movdqa %xmm0,(%r12,%rax,1) <- fill 2 tails entries 10b5: 83 c2 01 add $0x1,%edx 10b8: 48 83 c0 10 add $0x10,%rax 10bc: 66 0f d4 c2 paddq %xmm2,%xmm0 10c0: 39 ca cmp %ecx,%edx 10c2: 72 e1 jb 10a5 Trying to reproduce the test case, gcc omitted a "punpcklqdq" to initialized the base instead, which works, so this must me some different code triggered by some other register constraints or something (?). As from the report, I guess this bug was already present in 4.2, haven't tested any other versions myself. -- Summary: [4.2/4.3 Regression] wrong code: tree vectorizer omits bogus movq/movlps construct Product: gcc Version: 4.3.1 Status: UNCONFIRMED Severity: critical Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: christophe at saout dot de GCC build triplet: x86_64-pc-linux-gnu GCC host triplet: x86_64-pc-linux-gnu GCC target triplet: x86_64-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37101