From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id DE7113858D33; Mon, 26 Jun 2023 11:14:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE7113858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687778068; bh=hVNpiH4yh62wDunXLivzKz7JOsp4Tg1EjEUftSlSOkY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=OInk2dmfhTUI2yOm4z2yGLbQOKswP0sU8PGOgNZL/ytbbrnxOJPoCP98dM1sffkqJ N+n1ml6RiIgw1UWc0O8uHGVj8QCWGN7bdTz2KicvMMOlskGDtxNIMi1Wca4gI4QhBk 2R5ZmGo+2dvUIgBBiowVWTOvz8Fjsc1W5ZIEkJco= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload Date: Mon, 26 Jun 2023 11:14:27 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237 --- Comment #17 from rguenther at suse dot de --- On Mon, 26 Jun 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237 >=20 > --- Comment #16 from Alexander Monakov --- > (In reply to rguenther@suse.de from comment #14) > > vectors of T and scalar T interoperate TBAA wise. What we disambiguate= is > >=20 > > int a[2]; > >=20 > > int foo(int *p) > > { > > a[0] =3D 1; > > *(v4si *)p =3D {0,0,0,0}; > > return a[0]; > > } > >=20 > > because the V4SI vector store is too large for the a[] object. That > > doesn't even use TBAA (it works with -fno-strict-aliasing just fine). >=20 > Thank you for the example. If we do the same for vector loads, that's a f= ootgun > for users who use vector loads to access small objects: >=20 > // alignment to 16 is ensured externally > extern int a[2]; >=20 > int foo() > { > a[0] =3D 1; >=20 > __v4si v =3D (__attribute__((may_alias)) __v4si *) &a; > // mask out extra elements in v and continue > ... > } >=20 > This is a benign data race on data that follows 'a' in the address space,= but > otherwise should be a valid and useful technique. Yes, we do the same to loads. I hope that's not a common technique though but I have to admit the vectorizer itself assesses whether it's safe to access "gaps" by looking at alignment so its code generation is prone to this same "mistake". Now, is "alignment to 16 is ensured externally" good enough here? If we consider static int a[2]; and code doing if (is_aligned (a)) { __v4si v =3D (__attribute__((may_alias)) __v4si *) &a; } then we cannot even use a DECL_ALIGN that's insufficient for decls that bind locally. Note we have similar arguments with aggregate type sizes (and TBAA) where when we infer a dynamic type from one access we check if the other access would fit. Wouldn't the above then extend to that as well given we could also do aggregate copies of "padding" and ignore the bits if we'd have ensured the larger access wouldn't trap? So supporting the above might be a bit of a stretch (though I think we have to fix the vectorizer here). > > If the v4si store is masked we cannot do this anymore, but the IL > > we seed the alias oracle with doesn't know the store is partial. > > The only way to "fix" it is to take away all of the information from it. >=20 > But that won't fix the trapping issue? I think we need a distinct RTX for > memory accesses where hardware does fault suppression for masked-out elem= ents. Yes, it doesn't fix that part. The idea of using BLKmode instead of a vector mode for the MEMs would, I guess, together with specifying MEM_SIZE as not known.=