From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9062F3858024; Thu, 3 Dec 2020 12:20:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9062F3858024 From: "acoplan at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/98119] New: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=512 -mtune=thunderx Date: Thu, 03 Dec 2020 12:20:21 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: acoplan at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2020 12:20:21 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98119 Bug ID: 98119 Summary: SVE: Wrong code with -O1 -ftree-vectorize -msve-vector-bits=3D512 -mtune=3Dthunderx Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- AArch64 GCC miscompiles the following testcase: _Bool a[34]; int main() { for (long b =3D 0; b < 2; ++b) for (long c =3D 0; c < 17; ++c) a[b * 2 + c] =3D 1; for (long c =3D 0; c < 7; ++c) if (!a[2 + c]) __builtin_abort(); } at -O1 -ftree-vectorize -march=3Darmv8.2-a+sve -msve-vector-bits=3D512 -mtune=3Dthunderx. Removing any one of these flags, the issue goes away. Obviously, this is no= t a sensible choice of -mtune given that we're asking for SVE, but it seems that the scheduling should not result in a miscompile. Looking at a snippet of the broken code: main: .LFB0: .cfi_startproc adrp x2, .LANCHOR0 add x2, x2, :lo12:.LANCHOR0 and w3, w2, 63 and x0, x2, -64 // align x2 down add w1, w3, 17 whilelo p0.d, wzr, w1 whilelo p1.d, wzr, w3 not p0.b, p0/z, p1.b mov z0.b, #1 st1b z0.d, p0, [x0] // no-op (p0 all 0s) mov w3, 8 whilelo p0.d, w3, w1 b.none .L2 add x4, x0, 8 st1b z0.d, p0, [x4] // stores out-of-bounds add x0, x0, 16 mov w3, 16 whilelo p0.d, w3, w1 b.none .L2 st1b z0.d, p0, [x0] We initially compute the address of our array (a) in x2, and then align this down to the nearest 64-byte-aligned address, storing the result in x0. We t= hen add 8 to this, and store a vector to this address. But this address can be out-of-bounds (suppose a is only 16-byte aligned). So things have already started to go downhill by this point.=