From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2D5C33858C66; Mon, 6 Mar 2023 19:15:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2D5C33858C66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678130102; bh=qmKp32gNB2/Uc/4iZiPXX9WZja8ImPaXy8gq1LG1Dco=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MrVSH3+ll78Z2HwJF3tucVsznfKiDRnlBybE9sAKn4BSEw2EjzUWX+n6KVMhrhlJ1 6/SM2tYDtDvCXxvK6OqVHNc/AQM2KXrikOHEcK6+ZRrqC2j0ayArorE57CMc8ZeA6z uWcMO+TyPgSONtpTCq64ddfh9K0pLEf2NfEaJ8lo= From: "isanbard at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/108896] provide "element_count" attribute to give more context to __builtin_dynamic_object_size() and -fsanitize=bounds Date: Mon, 06 Mar 2023 19:15:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: isanbard at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: qinzhao at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 --- Comment #18 from Bill Wendling --- (In reply to Martin Uecker from comment #17) > Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot c= om: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 > >=20 > > --- Comment #16 from Bill Wendling --- > > (In reply to Martin Uecker from comment #15) > > > Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail d= ot com: > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108896 > > > >=20 > > > > --- Comment #14 from Bill Wendling --- > > > > (In reply to Martin Uecker from comment #9) > > > > > > > Considering that the GNU extensions is rarely used, one could= consider > > > > > > > redefining the meaning of > > > > > > >=20 > > > > > > > int n =3D 1; > > > > > > > struct { > > > > > > > =C2=A0=C2=A0int n; > > > > > > > =C2=A0=C2=A0char buf[n]; > > > > > > > }; > > > > > > >=20 > > > > > > > so that the 'n' refers to the member. Or we add a new syntax = similar to > > > > > > > designators (which intuitively makes sense to me). > > > > > > designator might be better IMO. > > > > > >=20 > > > > > > a question here is: > > > > > >=20 > > > > > > for the following nested structure:=20 > > > > > >=20 > > > > > > struct object { > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0char items; > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[]; > > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}; > > > > > > } *ptr; > > > > > >=20 > > > > > > what kind of syntax is good to represent the upper bound of "fl= ex" in the inner > > > > > > struct with "items" in the outer structure? any suggestion? > > > > >=20 > > > > > I would disallow it. At least at first. It also raises some > > > > > questions: For example, one could form a pointer to the inner > > > > > struct, and then it is not clear how 'items' could be accessed > > > > > anymore. > > > > >=20 > > > >=20 > > > > That would be limiting its use in the Linux kernel. It seems that t= here are > > > > ways to refer to struct members already using something like "offse= tof": > > > >=20 > > > > struct object { > > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0char items; > > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[] __attrib= ute__((__element_count__(offsetof(struct object, > > > > items)))); > > > > =C2=A0=C2=A0=C2=A0=C2=A0}; > > > > } *ptr; > > >=20 > > > This seems to be something completely different. offsetof > > > computes the offset from the type given in its argument. > > > But it would not access the value of the member of the > > > enclosing struct. But it would not work in your example, > > > because the struct object is incomplete at this point. > > >=20 > > > So no, you can not use offsetof to reference a member > > > of an enclosing struct. > > >=20 > > "offsetof(struct foo, count)" is a fancy wrapper for "((struct foo > > *)0)->count", which is resolved during sema, where it does have the full > > structure definition. For instance, this compiles in C++: > >=20 > > struct a { > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int count; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int y =3D ((struct a *)= 0)->count; > > } x; > >=20 > > void foo(struct a *); >=20 > This is not the case in C: https://godbolt.org/z/P7M6cdnoa >=20 Right, it's not legal C syntax. I wanted to point out that it's not an impossible syntax. The lexer and parser have no idea of, nor should they ca= re about, legal types, complete struct definitions, etc. All it sees is a memb= er access. It's semantic analysis that actually determines whether or not it's correct, and if so what it means. > But even we make it resolve the name, which we > have to do for all proposals, it is something different. >=20 > If offsetof it would resolve the count of a different > struct of the same *type* (here a non-existent one at=C2=A0 > address zero). Here we need a self reference to the > same *object*. >=20 If we make it an attribute, we have more control over what the arguments me= an, what they refer to, etc. For the record, I'm not opposed to the idea of using the designated initial= izer syntax. I just think it's good to explore other ideas before settling on extending C (which is a bit heavy-handed). > > > But I am not saying we shouldn't have the attribute first. > > >=20 > > I personally prefer using an attribute than the suggestion to use some = other > > syntax, like: > >=20 > > struct foo { > > =C2=A0=C2=A0=C2=A0=C2=A0int fam[.count]; > > }; > >=20 > > It becomes less intuitive what's going on here. And might conflict with= VLA's > > in structures. >=20 > The syntax with the dot would make it not conflict. But I need > this for this use case >=20 > struct foo { > int count; > int (*buf)[.count]; > }; >=20 > so that ARRAY_SIZE(*foo->buf) works correctly and also accesses > to foo->buf are bounds checkked. So it would make sense to=C2=A0 > solve to treat flexible array members in the same way. >=20 > But I agree that we should simply add the attribute now also > because it makes it possible to use it for existing code bases. >=20 Agreed. I believe we're on the same page (despite what I may write here :-). > > > > It also has the benefit of > > > > allowing one to reference a variable not in the structure: > > > >=20 > > > > const int items; > > > > struct object { > > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0char items; > > > > =C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0struct inner { > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0... > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int flex[] __attrib= ute__((__element_count__(items))); /* References > > > > global "items" */ > > > > =C2=A0=C2=A0=C2=A0=C2=A0}; > > > > } *ptr; > > >=20 > > > Whether you allow this or not has nothing to do with the syntax. > > >=20 > > > The question is what semantics you attach to this and this is > > > also a question in your example.=20 > > >=20 > > > If you define > > >=20 > > > struct inner* a =3D ... > > >=20 > > > What does it say for a->flex ? > > >=20 > > I need to point out that I used "offsetof" only as an example. It's pos= sible to > > create something more robust which will carry along type information, e= tc., > > which the current offsetof macro throws away. I should have made that c= lear. > >=20 > > The sanitizers that see "a->flex" will try to find the correct variable= . If > > they can't, then they won't generate a check. In the case of it referen= cing a > > non-field member, it'll use that if it's within scope. If it refers to = a field > > member of a parent container that's not within scope, it'll also not ge= nerate a > > check. It's unfortunate that these checks are done as a "best effort," = but it > > could lead to software changes to improve security checks (like passing= a > > parent structure into a function rather than an inner structure. >=20 > Yes. We could also have an optional warning which warns about accessing > 'flex' in a context where 'items' is not accessible. My point is that > this feature of potentially referring to stuff which may not be accessible > in all cases makes implementation more complicated. >=20 It does make it more complicated, that's true. I haven't seen comments on Kees's first example, where "malloc" returns an "__alloc_size" hint that's lost when "p" is returned from the function (at least in Clang). If that information is kept around, it would expand the nu= mber of bounds checks we could perform. (Kudos if this is currently the case in GCC.) -bw=