From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E699A385840F; Sat, 4 Mar 2023 07:52:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E699A385840F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1677916360; bh=VU7ka9pIX+F/z+J0L1MN406qKvXzLIx3q/pvM/Bx6fk=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ptB4BGLzYbcnNpETfTOkxPZLVGdYxfqn+H47aVTvRJXChwWwJaP2pJrjvgm9JjEXX Fs9gQYz5JWT/zgvEbfsR6z19yMgDlunSL7L9AkYf6S6YEH2XO1PxhA/i2IDpYkp/gg TnyLIpx/oSuy7aWv+25sM0chp+Qahrvwo65rLgIM= From: "muecker at gwdg dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/108896] provide "element_count" attribute to give more context to __builtin_dynamic_object_size() and -fsanitize=bounds Date: Sat, 04 Mar 2023 07:52:39 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: muecker at gwdg dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: qinzhao at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 --- Comment #17 from Martin Uecker --- Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot com: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 >=20 > --- Comment #16 from Bill Wendling --- > (In reply to Martin Uecker from comment #15) > > Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot= com: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 > > >=20 > > > --- Comment #14 from Bill Wendling --- > > > (In reply to Martin Uecker from comment #9) > > > > > > Considering that the GNU extensions is rarely used, one could c= onsider > > > > > > redefining the meaning of > > > > > >=20 > > > > > > int n =3D 1; > > > > > > struct { > > > > > > =C2=A0=C2=A0int n; > > > > > > =C2=A0=C2=A0char buf[n]; > > > > > > }; > > > > > >=20 > > > > > > so that the 'n' refers to the member. Or we add a new syntax si= milar to > > > > > > designators (which intuitively makes sense to me). > > > > > designator might be better IMO. > > > > >=20 > > > > > a question here is: > > > > >=20 > > > > > for the following nested structure:=20 > > > > >=20 > > > > > struct object { > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0char items; > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[]; > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}; > > > > > } *ptr; > > > > >=20 > > > > > what kind of syntax is good to represent the upper bound of "flex= " in the inner > > > > > struct with "items" in the outer structure? any suggestion? > > > >=20 > > > > I would disallow it. At least at first. It also raises some > > > > questions: For example, one could form a pointer to the inner > > > > struct, and then it is not clear how 'items' could be accessed > > > > anymore. > > > >=20 > > >=20 > > > That would be limiting its use in the Linux kernel. It seems that the= re are > > > ways to refer to struct members already using something like "offseto= f": > > >=20 > > > struct object { > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0char items; > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[] __attribut= e__((__element_count__(offsetof(struct object, > > > items)))); > > > =C2=A0=C2=A0=C2=A0=C2=A0}; > > > } *ptr; > >=20 > > This seems to be something completely different. offsetof > > computes the offset from the type given in its argument. > > But it would not access the value of the member of the > > enclosing struct. But it would not work in your example, > > because the struct object is incomplete at this point. > >=20 > > So no, you can not use offsetof to reference a member > > of an enclosing struct. > >=20 > "offsetof(struct foo, count)" is a fancy wrapper for "((struct foo > *)0)->count", which is resolved during sema, where it does have the full > structure definition. For instance, this compiles in C++: >=20 > struct a { > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int count; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int y =3D ((struct a *)0)= ->count; > } x; >=20 > void foo(struct a *); This is not the case in C: https://godbolt.org/z/P7M6cdnoa But even we make it resolve the name, which we have to do for all proposals, it is something different. If offsetof it would resolve the count of a different struct of the same *type* (here a non-existent one at=C2=A0 address zero). Here we need a self reference to the same *object*. ... > > But I am not saying we shouldn't have the attribute first. > >=20 > I personally prefer using an attribute than the suggestion to use some ot= her > syntax, like: >=20 > struct foo { > =C2=A0=C2=A0=C2=A0=C2=A0int fam[.count]; > }; >=20 > It becomes less intuitive what's going on here. And might conflict with V= LA's > in structures. The syntax with the dot would make it not conflict. But I need this for this use case struct foo { int count; int (*buf)[.count]; }; so that ARRAY_SIZE(*foo->buf) works correctly and also accesses to foo->buf are bounds checkked. So it would make sense to=C2=A0 solve to treat flexible array members in the same way. But I agree that we should simply add the attribute now also because it makes it possible to use it for existing code bases. > > > It also has the benefit of > > > allowing one to reference a variable not in the structure: > > >=20 > > > const int items; > > > struct object { > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0char items; > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[] __attribut= e__((__element_count__(items))); /* References > > > global "items" */ > > > =C2=A0=C2=A0=C2=A0=C2=A0}; > > > } *ptr; > >=20 > > Whether you allow this or not has nothing to do with the syntax. > >=20 > > The question is what semantics you attach to this and this is > > also a question in your example.=20 > >=20 > > If you define > >=20 > > struct inner* a =3D ... > >=20 > > What does it say for a->flex ? > >=20 > I need to point out that I used "offsetof" only as an example. It's possi= ble to > create something more robust which will carry along type information, etc= ., > which the current offsetof macro throws away. I should have made that cle= ar. >=20 > The sanitizers that see "a->flex" will try to find the correct variable. = If > they can't, then they won't generate a check. In the case of it referenci= ng a > non-field member, it'll use that if it's within scope. If it refers to a = field > member of a parent container that's not within scope, it'll also not gene= rate a > check. It's unfortunate that these checks are done as a "best effort," bu= t it > could lead to software changes to improve security checks (like passing a > parent structure into a function rather than an inner structure. Yes. We could also have an optional warning which warns about accessing 'flex' in a context where 'items' is not accessible. My point is that this feature of potentially referring to stuff which may not be accessible in all cases makes implementation more complicated. Martin >=