From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0820F3858D33; Tue, 27 Jun 2023 06:53:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0820F3858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687848827; bh=JxbgRG/aYtQqKKNk5+vA5RWTYra11MIufiLrGXmELZQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=QkULb08JgWSkX2SrM85uyodilFdXT35W+CMJto148jqPEBQ3+i5SXIwq7QpQdvOpA D5Y0941VWNbKVLX+MU9D/USzJbddLBdNtjETVEgyac34L/glIShnRMtKsQ/NkQgw3Q h6PLilhGrXSMD50E8IyiZmecpiPxDMh7MzJQC5d8= From: "rguenther at suse dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload Date: Tue, 27 Jun 2023 06:53:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenther at suse dot de X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237 --- Comment #19 from rguenther at suse dot de --- On Mon, 26 Jun 2023, amonakov at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237 >=20 > --- Comment #18 from Alexander Monakov --- > (In reply to rguenther@suse.de from comment #17) > > Yes, we do the same to loads. I hope that's not a common technique > > though but I have to admit the vectorizer itself assesses whether it's > > safe to access "gaps" by looking at alignment so its code generation > > is prone to this same "mistake". > >=20 > > Now, is "alignment to 16 is ensured externally" good enough here? > > If we consider > >=20 > > static int a[2]; > >=20 > > and code doing > >=20 > > if (is_aligned (a)) > > { > > __v4si v =3D (__attribute__((may_alias)) __v4si *) &a; > > } > >=20 > > then we cannot even use a DECL_ALIGN that's insufficient for decls > > that bind locally. >=20 > I agree. I went with the 'extern' example because there it should be more > obvious the construction ought to work. >=20 >=20 > > Note we have similar arguments with aggregate type sizes (and TBAA) > > where when we infer a dynamic type from one access we check if > > the other access would fit. Wouldn't the above then extend to that > > as well given we could also do aggregate copies of "padding" and > > ignore the bits if we'd have ensured the larger access wouldn't trap? >=20 > I think a read via a may_alias type just tells you that N bytes are acces= sible > for reading, not necessarily for writing. So I don't see a problem, but m= aybe I > didn't quite catch what you are saying. I wasn't sure how to phrase, what I was saying is we have this "the access is too large for the object in consideration, so it cannot alias it" in places where we just work with types within the TBAA framework. So I wondered if one can construct a similar case to support that we should not do this. (tree-ssa-alias.cc: aliasing_component_refs_p) >=20 > > So supporting the above might be a bit of a stretch (though I think > > we have to fix the vectorizer here). >=20 > What would the solution be? Using a may_alias type for such accesses? But the size argument doesn't have anything to do with TBAA (and may_alias is about TBAA). I don't think we have any way to circumvent C object access rules. That is, for example, with -fno-strict-aliasing the following isn't going to work. int a; int b; int main() { a =3D 1; b =3D 2; if (&a + 1 =3D=3D &b) // equality compare of unrelated pointers OK { long x =3D *(long *)&a; // access outside of 'a' not OK if (x !=3D 0x0000000100000002) abort (); } } there's no command-line flag or attribute to form a pointer to an object composing 'a' and 'b' besides changing how the storage is declared. I don't think we should make an exception for "padding" after an object and I don't see any sensible way how to constrain the size of the supported "padding" either? Pad to the largest possible alignment of the object? That would be MAX_OFILE_ALIGNMENT ... >=20 > > > > If the v4si store is masked we cannot do this anymore, but the IL > > > > we seed the alias oracle with doesn't know the store is partial. > > > > The only way to "fix" it is to take away all of the information fro= m it. > > >=20 > > > But that won't fix the trapping issue? I think we need a distinct RTX= for > > > memory accesses where hardware does fault suppression for masked-out = elements. > >=20 > > Yes, it doesn't fix that part. The idea of using BLKmode instead of > > a vector mode for the MEMs would, I guess, together with specifying > > MEM_SIZE as not known. >=20 > Unfortunate if that works for the trapping side, but not for the=20 > aliasing side. It should work for both I think, but MEM_EXPR would need changing as well - we do have a perfectly working representation there, it would just be the first CALL_EXPR in such context ...=