public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98119] New: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=512 -mtune=thunderx
@ 2020-12-03 12:20 acoplan at gcc dot gnu.org
  2020-12-03 13:06 ` [Bug target/98119] " rsandifo at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: acoplan at gcc dot gnu.org @ 2020-12-03 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98119

            Bug ID: 98119
           Summary: SVE: Wrong code with -O1 -ftree-vectorize
                    -msve-vector-bits=512 -mtune=thunderx
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

AArch64 GCC miscompiles the following testcase:

_Bool a[34];
int main() {
  for (long b = 0; b < 2; ++b)
    for (long c = 0; c < 17; ++c)
      a[b * 2 + c] = 1;
  for (long c = 0; c < 7; ++c)
    if (!a[2 + c])
      __builtin_abort();
}

at -O1 -ftree-vectorize -march=armv8.2-a+sve -msve-vector-bits=512
-mtune=thunderx.

Removing any one of these flags, the issue goes away. Obviously, this is not a
sensible choice of -mtune given that we're asking for SVE, but it seems that
the scheduling should not result in a miscompile.

Looking at a snippet of the broken code:

main:
.LFB0:
        .cfi_startproc
        adrp    x2, .LANCHOR0
        add     x2, x2, :lo12:.LANCHOR0
        and     w3, w2, 63
        and     x0, x2, -64    // align x2 down
        add     w1, w3, 17
        whilelo p0.d, wzr, w1
        whilelo p1.d, wzr, w3
        not     p0.b, p0/z, p1.b
        mov     z0.b, #1
        st1b    z0.d, p0, [x0] // no-op (p0 all 0s)
        mov     w3, 8
        whilelo p0.d, w3, w1
        b.none  .L2
        add     x4, x0, 8
        st1b    z0.d, p0, [x4] // stores out-of-bounds
        add     x0, x0, 16
        mov     w3, 16
        whilelo p0.d, w3, w1
        b.none  .L2
        st1b    z0.d, p0, [x0]

We initially compute the address of our array (a) in x2, and then align this
down to the nearest 64-byte-aligned address, storing the result in x0. We then
add 8 to this, and store a vector to this address. But this address can be
out-of-bounds (suppose a is only 16-byte aligned). So things have already
started to go downhill by this point.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-04-23 16:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 12:20 [Bug target/98119] New: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=512 -mtune=thunderx acoplan at gcc dot gnu.org
2020-12-03 13:06 ` [Bug target/98119] " rsandifo at gcc dot gnu.org
2020-12-07  9:32 ` [Bug target/98119] [10/11 Regression] " acoplan at gcc dot gnu.org
2020-12-07 12:45 ` rguenth at gcc dot gnu.org
2021-01-14 11:00 ` rguenth at gcc dot gnu.org
2021-03-30 15:06 ` rsandifo at gcc dot gnu.org
2021-03-31 10:26 ` cvs-commit at gcc dot gnu.org
2021-03-31 20:39 ` [Bug target/98119] [10 " rsandifo at gcc dot gnu.org
2021-04-08 12:02 ` rguenth at gcc dot gnu.org
2021-04-23 16:17 ` cvs-commit at gcc dot gnu.org
2021-04-23 16:19 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).