From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=jURu=JH=tugraz.at=uecker@sourceware.org>
Received: from mailrelay.tugraz.at (mailrelay.tugraz.at [129.27.2.202])
	by sourceware.org (Postfix) with ESMTPS id 0F69F3858D20
	for <gcc-patches@gcc.gnu.org>; Mon, 29 Jan 2024 15:50:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F69F3858D20
Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tugraz.at
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tugraz.at
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0F69F3858D20
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=129.27.2.202
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706543445; cv=none;
	b=LFjdohGuhGn7Bv4cSqiq6tUyAXfBSy0QSM4g5wpZxfbx329CA9aKiFigJ1Ksg1IRitgcmIuTNKhQAnJWALpjFM5ut5VSrTjNubGKOY80IjtNjT8r6jgcIDcZqXASs0686eGNW5XJfyur597h46KUMNbpfeQox51b+j0T38tnXnA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1706543445; c=relaxed/simple;
	bh=98GuyXv7sJmloGsDF7So70lHUicgbgqTUcvEOECbUOg=;
	h=DKIM-Signature:Message-ID:Subject:From:To:Date:MIME-Version; b=Ei4MECg+bNQUOvHGvsRRKb+RrhRymndZW/0G2TABTxBxbXp6HQKcsortYl0p4+exFtY74Gfc9Mj3yiXnfxWzWEX+/zhl7IfSjlG7TEJuuca9+KMwhk4qT7MidhOKzYgMlwBH1wuWN+bi31vXOK//eHEy7XzwEYiDyXbvOa0jnkc=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from fbmtpc21.tugraz.at (fbmtpc21.tugraz.at [129.27.144.40])
	by mailrelay.tugraz.at (Postfix) with ESMTPSA id 4TNt701n7Kz1LM0J;
	Mon, 29 Jan 2024 16:50:36 +0100 (CET)
DKIM-Filter: OpenDKIM Filter v2.11.0 mailrelay.tugraz.at 4TNt701n7Kz1LM0J
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tugraz.at;
	s=mailrelay; t=1706543436;
	bh=KNWZrahTDzV8mRBZiyD1Ce07FRpx5StNklbvsZA51gc=;
	h=Subject:From:To:Cc:Date:In-Reply-To:References:From;
	b=kMqxTsy3gHRRmcxPsfKKe8AGHG2vcK36Xh4X3Dxf69hEVHdv5fH0pgvhuPM8aPBLR
	 I5/xvo4/qGHfAFonfMRrhDsgShTJHuq8sebdBwe4oG19el+iNUzPJ71PcclQDUYUcV
	 NuKCh/wkhkujKPAwz8SgvO9HayF6KcXWCdGRUW+4=
Message-ID: <30a9f93477b9f9745f4b12c5c25fa1e5aa9a8daa.camel@tugraz.at>
Subject: Re: [PATCH v4 0/4]New attribute "counted_by" to annotate bounds for
 C99 FAM(PR108896)
From: Martin Uecker <uecker@tugraz.at>
To: Qing Zhao <qing.zhao@oracle.com>, "josmyers@redhat.com"
	 <josmyers@redhat.com>, Richard Biener <richard.guenther@gmail.com>
Cc: Kees Cook <keescook@chromium.org>, "jakub@redhat.com"
 <jakub@redhat.com>,  "siddhesh@gotplt.org" <siddhesh@gotplt.org>,
 "isanbard@gmail.com" <isanbard@gmail.com>,  "gcc-patches@gcc.gnu.org"
 <gcc-patches@gcc.gnu.org>
Date: Mon, 29 Jan 2024 16:50:36 +0100
In-Reply-To: <E7395E71-3C6D-4118-8B1D-1A423A19CAC5@oracle.com>
References: <20240124002955.3387096-1-qing.zhao@oracle.com>
	 <202401241624.4DC3D829@keescook>
	 <DD3D126A-D621-4BF6-A20B-31B3234F9C82@oracle.com>
	 <08c3913ee0e75beb1fdfbce5487418fbd50de9e3.camel@tugraz.at>
	 <848FCA7D-1CBC-435E-8B4A-8136B13CF318@oracle.com>
	 <98f8a862e65dfc1ddd8227266ed41724f9422adb.camel@tugraz.at>
	 <E7395E71-3C6D-4118-8B1D-1A423A19CAC5@oracle.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.46.4-2 
MIME-Version: 1.0
X-TUG-Backscatter-control: G/VXY7/6zeyuAY/PU2/0qw
X-Spam-Scanner: SpamAssassin 3.003001 
X-Spam-Score-relay: -0.4
X-Scanned-By: MIMEDefang 2.74 on 129.27.10.116
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Am Montag, dem 29.01.2024 um 15:09 +0000 schrieb Qing Zhao:
> Thank you!
>=20
> Joseph and Richard,  could you also comment on this?
>=20
> > On Jan 28, 2024, at 5:09 AM, Martin Uecker <uecker@tugraz.at> wrote:
> >=20
> > Am Freitag, dem 26.01.2024 um 14:33 +0000 schrieb Qing Zhao:
> > >=20
> > > > On Jan 26, 2024, at 3:04 AM, Martin Uecker <uecker@tugraz.at> wrote=
:
> > > >=20
> > > >=20
> > > > I haven't looked at the patch, but it sounds you give the result
> > > > the wrong type. Then patching up all use cases instead of the
> > > > type seems wrong.
> > >=20
> > > Yes, this is for resolving a very early gimplification issue as I rep=
orted last Nov:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638793.html
> > >=20
> > > Since no-one responded at that time, I fixed the issue by replacing t=
he ARRAY_REF
> > > With a pointer indirection:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639605.html
> > >=20
> > > The reason for such change is:  return a flexible array member TYPE i=
s not allowed
> > > by C language (our gimplification follows this rule), so, we have to =
return a pointer TYPE instead.=20
> > >=20
> > > ******The new internal function
> > >=20
> > > .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, SIZE_OF_SI=
ZE, ACCESS_MODE, INDEX)
> > >=20
> > > INTERNAL_FN (ACCESS_WITH_SIZE, ECF_LEAF | ECF_NOTHROW, NULL)
> > >=20
> > > which returns the "REF_TO_OBJ" same as the 1st argument;
> > >=20
> > > Both the return type and the type of the first argument of this funct=
ion have been converted from=20
> > > the incomplete array type to the corresponding pointer type.
> > >=20
> > > As a result, the original ARRAY_REF was converted to an INDIRECT_REF,=
 the original INDEX of the ARRAY_REF was lost
> > > when converting from ARRAY_REF to INDIRECT_REF, in order to keep the =
INDEX for bound sanitizer instrumentation, I added
> > > The 6th argument =E2=80=9CINDEX=E2=80=9D.
> > >=20
> > > What=E2=80=99s your comment and suggestion on this solution?
> >=20
> > I am not entirely sure but changing types in the FE seems
> > problematic because this breaks language semantics. And
> > then adding special code everywhere to treat it specially
> > in the FE does not seem a good way forward.
> >=20
> > If I understand correctly, returning an incomplete array=20
> > type is not allowed and then fails during gimplification.
>=20
> Yes, this is the problem in gimplification.=20
>=20
> > So I would suggest to make it return a pointer to the=20
> > incomplete array (and not the element type)
>=20
>=20
> for the following:
>=20
> struct annotated {
>   unsigned int size;
>   int array[] __attribute__((counted_by (size)));
> };
>=20
>   struct annotated * p =3D =E2=80=A6.
>   p->array[9] =3D 0;
>=20
> The IL for the above array reference p->array[9] is:
>=20
> 1. If the return type is the original incomplete array type,=20
>=20
> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] =3D 0;
>=20
> (this triggered the gimplification failure since the return type cannot b=
e a complete type).
>=20
> 2. When the return type is changed to a pointer to the element type of th=
e incomplete array, (the current patch)
> Then the original array reference naturally becomes an indirect reference=
 through the pointer
>=20
> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1, 9) + 36) =3D=
 0;
>=20
> Since the original array reference becomes an indirect reference through =
the pointer to the element array, the INDEX info=20
> is mixed into the OFFSET of the indirect reference and lost, so, I added =
the 6th argument to the routine .ACCESS_WITH_SIZE
> to record the INDEX.=20
>=20
> 3. With your suggestion, the return type is changed to a pointer to the i=
ncomplete array,=20
> I just tried this to change the result type :
>=20
>=20
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -2619,7 +2619,7 @@ build_access_with_size_for_counted_by (location_t l=
oc, tree ref,
>                                        tree counted_by_type)
>  {
>    gcc_assert (c_flexible_array_member_type_p (TREE_TYPE (ref)));
> -  tree result_type =3D build_pointer_type (TREE_TYPE (TREE_TYPE (ref)));
> +  tree result_type =3D build_pointer_type (TREE_TYPE (ref));
>=20
> Then, I got the following FE errors:
>=20
> test.c:10:11: error: invalid use of flexible array member
>    10 |   p->array[9] =3D 0;
>=20
> The reason for the error is: when the original array_ref becomes an indir=
ect_ref through the pointer to the incomplete array,
> During the computation of the OFFSET to the pointer, the TYPE_SIZE_UNIT (=
type) is invalid since the type is an incomplete array.=20
> As a result, the OFFSET cannot computed for the indirect_ref.
>=20
> Looks like even more issues with this approach.

Yes, but only because the following is missing:

>=20
>=20
> > but then wrap
> > it with an indirection when inserting this code in the FE
> > so that the full replacement has the correct type again
> > (of the incomplete array).
>=20
> I don=E2=80=99t quite understand the above, could you please explain this=
 in more details? (If possible, could you please use the above small exampl=
e?)
> thanks.

You would need to add an indirection:

(*(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)))[9] =3D 0;

if .ACCESS_WITH_SIZE has type    T (*)[], i.e. pointer to incomplete
array of type T, then (*(.ACCESS_WITH_SIZE (...))) has type T[], i.e.
incomplete array of type.  =20

And you shouldn't even consider array derefencing part at all,
but replace the p->array when the component ref is constructed.=20


Martin


>=20
> >=20
> >=20
> > Alternatively, one could allow this during gimplification
> > or add some conversion.
>=20
> Allowing this in gimplification might trigger some other issues.  I guess=
 that adding conversion=20
> in the end of the FE or in the beginning of gimplification might be bette=
r.
>=20
> i.e,  in FE, still keep the original incomplete array type as the return =
type for the routine .ACCESS_WITH_SIZE, i.e
>=20
> .ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1)[9] =3D 0;
>=20
> But add a conversion from the above array_ref to an indirect_ref in the e=
nd of FE or in the beginning of gimplification:
>=20
> *(.ACCESS_WITH_SIZE ((int *) &p->array, &p->size, 1, 32, -1) + 36) =3D 0;
>=20
> With this approach,  during FE, the original ARRAY TYPE and the INDEX can=
 be kept, the array bound sanitizer instrumentation
> Will be much easier than my current approach.=20
>=20
> Is this approach reasonable?
>=20
> If so, where is better to add this conversion, in the end of FE or in the=
 beginning of gimplification?
>=20
> Thanks.
>=20
> Qing
>=20
>=20
> >=20
> > Martin
> >=20
> >=20
> > >=20
> > > Thanks.
> > >=20
> > > Qing
> > >=20
> > >=20
> > > >=20
> > > > Martin
> > > >=20
> > > >=20
> > > > Am Donnerstag, dem 25.01.2024 um 20:11 +0000 schrieb Qing Zhao:
> > > > > Thanks a lot for the testing.
> > > > >=20
> > > > > Yes, I can repeat the issue with the following small example:
> > > > >=20
> > > > > #include <stdlib.h>
> > > > > #include <stddef.h>
> > > > > #include <stdio.h>
> > > > >=20
> > > > > #define MAX(a, b)  ((a) > (b) ? (a) :  (b))
> > > > >=20
> > > > > struct untracked {
> > > > >      int size;
> > > > >      int array[] __attribute__((counted_by (size)));
> > > > > } *a;
> > > > > struct untracked * alloc_buf (int index)
> > > > > {
> > > > > struct untracked *p;
> > > > > p =3D (struct untracked *) malloc (MAX (sizeof (struct untracked)=
,
> > > > >                                       (offsetof (struct untracked=
, array[0])
> > > > >                                        + (index) * sizeof (int)))=
);
> > > > > p->size =3D index;
> > > > > return p;
> > > > > }
> > > > >=20
> > > > > int main()
> > > > > {
> > > > > a =3D alloc_buf(10);
> > > > >    printf ("same_type is %d\n",
> > > > > (__builtin_types_compatible_p(typeof (a->array), typeof (&(a->arr=
ay)[0]))));
> > > > > return 0;
> > > > > }
> > > > >=20
> > > > >=20
> > > > > /home/opc/Install/latest-d/bin/gcc -O2 btcp.c
> > > > > same_type is 1
> > > > >=20
> > > > > Looks like that the =E2=80=9Ctypeof=E2=80=9D operator need to be =
handled specially in C FE
> > > > > for the new internal function .ACCESS_WITH_SIZE.=20
> > > > >=20
> > > > > (I have specially handle the operator =E2=80=9Coffsetof=E2=80=9D =
in C FE already).
> > > > >=20
> > > > > Will fix this issue.
> > > > >=20
> > > > > Thanks.
> > > > >=20
> > > > > Qing
> > > > >=20
> > > > > > On Jan 24, 2024, at 7:51 PM, Kees Cook <keescook@chromium.org> =
wrote:
> > > > > >=20
> > > > > > On Wed, Jan 24, 2024 at 12:29:51AM +0000, Qing Zhao wrote:
> > > > > > > This is the 4th version of the patch.
> > > > > >=20
> > > > > > Thanks very much for this!
> > > > > >=20
> > > > > > I tripped over an unexpected behavioral change that the Linux k=
ernel
> > > > > > depends on:
> > > > > >=20
> > > > > > __builtin_types_compatible_p() no longer treats an array marked=
 with
> > > > > > counted_by as different from that array's decayed pointer. Spec=
ifically,
> > > > > > the kernel uses these macros:
> > > > > >=20
> > > > > >=20
> > > > > > /*
> > > > > > * Force a compilation error if condition is true, but also prod=
uce a
> > > > > > * result (of value 0 and type int), so the expression can be us=
ed
> > > > > > * e.g. in a structure initializer (or where-ever else comma exp=
ressions
> > > > > > * aren't permitted).
> > > > > > */
> > > > > > #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)=
); })))
> > > > > >=20
> > > > > > #define __same_type(a, b) __builtin_types_compatible_p(typeof(a=
), typeof(b))
> > > > > >=20
> > > > > > /* &a[0] degrades to a pointer: a different type from an array =
*/
> > > > > > #define __must_be_array(a)   BUILD_BUG_ON_ZERO(__same_type((a),=
 &(a)[0]))
> > > > > >=20
> > > > > >=20
> > > > > > This gets used in various places to make sure we're dealing wit=
h an
> > > > > > array for a macro:
> > > > > >=20
> > > > > > #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __mus=
t_be_array(arr))
> > > > > >=20
> > > > > >=20
> > > > > > So this builds:
> > > > > >=20
> > > > > > struct untracked {
> > > > > >      int size;
> > > > > >      int array[];
> > > > > > } *a;
> > > > > >=20
> > > > > > __must_be_array(a->array)
> > > > > > =3D> 0 (as expected)
> > > > > > __builtin_types_compatible_p(typeof(a->array), typeof(&(a->arra=
y)[0]))
> > > > > > =3D> 0 (as expected, array vs decayed array pointer)
> > > > > >=20
> > > > > >=20
> > > > > > But if counted_by is added, we get a build failure:
> > > > > >=20
> > > > > > struct tracked {
> > > > > >      int size;
> > > > > >      int array[] __counted_by(size);
> > > > > > } *b;
> > > > > >=20
> > > > > > __must_be_array(b->array)
> > > > > > =3D> build failure (not expected)
> > > > > > __builtin_types_compatible_p(typeof(b->array), typeof(&(b->arra=
y)[0]))
> > > > > > =3D> 1 (not expected, both pointers?)
> > > > > >=20
> > > > > >=20
> > > > > >=20
> > > > > >=20
> > > > > > --=20
> > > > > > Kees Cook
>=20
>=20

--=20
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging