From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id DE7113858D33; Mon, 26 Jun 2023 11:14:28 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE7113858D33
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1687778068;
	bh=hVNpiH4yh62wDunXLivzKz7JOsp4Tg1EjEUftSlSOkY=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=OInk2dmfhTUI2yOm4z2yGLbQOKswP0sU8PGOgNZL/ytbbrnxOJPoCP98dM1sffkqJ
	 N+n1ml6RiIgw1UWc0O8uHGVj8QCWGN7bdTz2KicvMMOlskGDtxNIMi1Wca4gI4QhBk
	 2R5ZmGo+2dvUIgBBiowVWTOvz8Fjsc1W5ZIEkJco=
From: "rguenther at suse dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is
 miscompiled by RTL scheduling after reload
Date: Mon, 26 Jun 2023 11:14:27 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenther at suse dot de
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-110237-4-KJ5c6h4DbE@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-110237-4@http.gcc.gnu.org/bugzilla/>
References: <bug-110237-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237
--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 26 Jun 2023, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110237
>=20
> --- Comment #16 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> (In reply to rguenther@suse.de from comment #14)
> > vectors of T and scalar T interoperate TBAA wise.  What we disambiguate=
 is
> >=20
> > int a[2];
> >=20
> > int foo(int *p)
> > {
> >   a[0] =3D 1;
> >   *(v4si *)p =3D {0,0,0,0};
> >   return a[0];
> > }
> >=20
> > because the V4SI vector store is too large for the a[] object.  That
> > doesn't even use TBAA (it works with -fno-strict-aliasing just fine).
>=20
> Thank you for the example. If we do the same for vector loads, that's a f=
ootgun
> for users who use vector loads to access small objects:
>=20
> // alignment to 16 is ensured externally
> extern int a[2];
>=20
> int foo()
> {
>   a[0] =3D 1;
>=20
>   __v4si v =3D (__attribute__((may_alias)) __v4si *) &a;
>   // mask out extra elements in v and continue
>  ...
> }
>=20
> This is a benign data race on data that follows 'a' in the address space,=
 but
> otherwise should be a valid and useful technique.

Yes, we do the same to loads.  I hope that's not a common technique
though but I have to admit the vectorizer itself assesses whether it's
safe to access "gaps" by looking at alignment so its code generation
is prone to this same "mistake".

Now, is "alignment to 16 is ensured externally" good enough here?
If we consider

static int a[2];

and code doing

 if (is_aligned (a))
   {
     __v4si v =3D (__attribute__((may_alias)) __v4si *) &a;
   }

then we cannot even use a DECL_ALIGN that's insufficient for decls
that bind locally.

Note we have similar arguments with aggregate type sizes (and TBAA)
where when we infer a dynamic type from one access we check if
the other access would fit.  Wouldn't the above then extend to that
as well given we could also do aggregate copies of "padding" and
ignore the bits if we'd have ensured the larger access wouldn't trap?

So supporting the above might be a bit of a stretch (though I think
we have to fix the vectorizer here).

> > If the v4si store is masked we cannot do this anymore, but the IL
> > we seed the alias oracle with doesn't know the store is partial.
> > The only way to "fix" it is to take away all of the information from it.
>=20
> But that won't fix the trapping issue? I think we need a distinct RTX for
> memory accesses where hardware does fault suppression for masked-out elem=
ents.

Yes, it doesn't fix that part.  The idea of using BLKmode instead of
a vector mode for the MEMs would, I guess, together with specifying
MEM_SIZE as not known.=