public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "isanbard at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/108896] provide "element_count" attribute to give more context to __builtin_dynamic_object_size() and -fsanitize=bounds
Date: Mon, 06 Mar 2023 19:15:00 +0000	[thread overview]
Message-ID: <bug-108896-4-wH59mj1AWR@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-108896-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896

--- Comment #18 from Bill Wendling <isanbard at gmail dot com> ---
(In reply to Martin Uecker from comment #17)
> Am Freitag, dem 03.03.2023 um 23:18 +0000 schrieb isanbard at gmail dot com:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > 
> > --- Comment #16 from Bill Wendling <isanbard at gmail dot com> ---
> > (In reply to Martin Uecker from comment #15)
> > > Am Freitag, dem 03.03.2023 um 20:27 +0000 schrieb isanbard at gmail dot com:
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
> > > > 
> > > > --- Comment #14 from Bill Wendling <isanbard at gmail dot com> ---
> > > > (In reply to Martin Uecker from comment #9)
> > > > > > > Considering that the GNU extensions is rarely used, one could consider
> > > > > > > redefining the meaning of
> > > > > > > 
> > > > > > > int n = 1;
> > > > > > > struct {
> > > > > > >   int n;
> > > > > > >   char buf[n];
> > > > > > > };
> > > > > > > 
> > > > > > > so that the 'n' refers to the member. Or we add a new syntax similar to
> > > > > > > designators (which intuitively makes sense to me).
> > > > > > designator might be better IMO.
> > > > > > 
> > > > > > a question here is:
> > > > > > 
> > > > > > for the following nested structure: 
> > > > > > 
> > > > > > struct object {
> > > > > >         ...
> > > > > >         char items;
> > > > > >         ...
> > > > > >         struct inner {
> > > > > >                 ...
> > > > > >                 int flex[];
> > > > > >         };
> > > > > > } *ptr;
> > > > > > 
> > > > > > what kind of syntax is good to represent the upper bound of "flex" in the inner
> > > > > > struct with "items" in the outer structure? any suggestion?
> > > > > 
> > > > > I would disallow it. At least at first. It also raises some
> > > > > questions: For example, one could form a pointer to the inner
> > > > > struct, and then it is not clear how 'items' could be accessed
> > > > > anymore.
> > > > > 
> > > > 
> > > > That would be limiting its use in the Linux kernel. It seems that there are
> > > > ways to refer to struct members already using something like "offsetof":
> > > > 
> > > > struct object {
> > > >     ...
> > > >     char items;
> > > >     ...
> > > >     struct inner {
> > > >         ...
> > > >         int flex[] __attribute__((__element_count__(offsetof(struct object,
> > > > items))));
> > > >     };
> > > > } *ptr;
> > > 
> > > This seems to be something completely different. offsetof
> > > computes the offset from the type given in its argument.
> > > But it would not access the value of the member of the
> > > enclosing struct. But it would not work in your example,
> > > because the struct object is incomplete at this point.
> > > 
> > > So no, you can not use offsetof to reference a member
> > > of an enclosing struct.
> > > 
> > "offsetof(struct foo, count)" is a fancy wrapper for "((struct foo
> > *)0)->count", which is resolved during sema, where it does have the full
> > structure definition. For instance, this compiles in C++:
> > 
> > struct a {
> >         int count;
> >         int y = ((struct a *)0)->count;
> > } x;
> > 
> > void foo(struct a *);
> 
> This is not the case in C: https://godbolt.org/z/P7M6cdnoa
> 
Right, it's not legal C syntax. I wanted to point out that it's not an
impossible syntax. The lexer and parser have no idea of, nor should they care
about, legal types, complete struct definitions, etc. All it sees is a member
access. It's semantic analysis that actually determines whether or not it's
correct, and if so what it means.

> But even we make it resolve the name, which we
> have to do for all proposals, it is something  different.
> 
> If offsetof it would resolve the count of a different
> struct of the same *type* (here a non-existent one at 
> address zero). Here we need a self reference to the
> same *object*.
> 
If we make it an attribute, we have more control over what the arguments mean,
what they refer to, etc.

For the record, I'm not opposed to the idea of using the designated initializer
syntax. I just think it's good to explore other ideas before settling on
extending C (which is a bit heavy-handed).

> > > But I am not saying we shouldn't have the attribute first.
> > > 
> > I personally prefer using an attribute than the suggestion to use some other
> > syntax, like:
> > 
> > struct foo {
> >     int fam[.count];
> > };
> > 
> > It becomes less intuitive what's going on here. And might conflict with VLA's
> > in structures.
> 
> The syntax with the dot would make it not conflict.  But I need
> this for this use case
> 
> struct foo {
>   int count;
>   int (*buf)[.count];
> };
> 
> so that ARRAY_SIZE(*foo->buf) works correctly and also accesses
> to foo->buf are bounds checkked.  So it would make sense to 
> solve to treat flexible array members in the same way.
> 
> But I agree that we should simply add the attribute now also
> because it makes it possible to use it for existing code bases.
> 
Agreed. I believe we're on the same page (despite what I may write here :-).

> > > > It also has the benefit of
> > > > allowing one to reference a variable not in the structure:
> > > > 
> > > > const int items;
> > > > struct object {
> > > >     ...
> > > >     char items;
> > > >     ...
> > > >     struct inner {
> > > >         ...
> > > >         int flex[] __attribute__((__element_count__(items))); /* References
> > > > global "items" */
> > > >     };
> > > > } *ptr;
> > > 
> > > Whether you allow this or not has nothing to do with the syntax.
> > > 
> > > The question is what semantics you attach to this and this is
> > > also a question in your example. 
> > > 
> > > If you define
> > > 
> > > struct inner* a = ...
> > > 
> > > What does it say for  a->flex  ?
> > > 
> > I need to point out that I used "offsetof" only as an example. It's possible to
> > create something more robust which will carry along type information, etc.,
> > which the current offsetof macro throws away. I should have made that clear.
> > 
> > The sanitizers that see "a->flex" will try to find the correct variable. If
> > they can't, then they won't generate a check. In the case of it referencing a
> > non-field member, it'll use that if it's within scope. If it refers to a field
> > member of a parent container that's not within scope, it'll also not generate a
> > check. It's unfortunate that these checks are done as a "best effort," but it
> > could lead to software changes to improve security checks (like passing a
> > parent structure into a function rather than an inner structure.
> 
> Yes. We could also have an optional warning which warns about accessing
> 'flex' in a context where 'items' is not accessible.  My point is that
> this feature of potentially referring to stuff which may not be accessible
> in all cases makes implementation more complicated.
> 
It does make it more complicated, that's true.

I haven't seen comments on Kees's first example, where "malloc" returns an
"__alloc_size" hint that's lost when "p" is returned from the function (at
least in Clang). If that information is kept around, it would expand the number
of bounds checks we could perform. (Kudos if this is currently the case in
GCC.)

-bw

  parent reply	other threads:[~2023-03-06 19:15 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-22 21:26 [Bug c/108896] New: " kees at outflux dot net
2023-02-22 21:31 ` [Bug c/108896] " kees at outflux dot net
2023-02-22 21:32 ` pinskia at gcc dot gnu.org
2023-02-23  8:44 ` rguenth at gcc dot gnu.org
2023-02-23  9:10 ` jakub at gcc dot gnu.org
2023-02-24 15:44 ` muecker at gwdg dot de
2023-03-01 22:54 ` qinzhao at gcc dot gnu.org
2023-03-01 23:27 ` kees at outflux dot net
2023-03-02 15:50 ` muecker at gwdg dot de
2023-03-02 17:34 ` qinzhao at gcc dot gnu.org
2023-03-02 18:17 ` muecker at gwdg dot de
2023-03-02 18:34 ` muecker at gwdg dot de
2023-03-02 19:47 ` qinzhao at gcc dot gnu.org
2023-03-02 19:56 ` qinzhao at gcc dot gnu.org
2023-03-02 20:07 ` muecker at gwdg dot de
2023-03-03 20:27 ` isanbard at gmail dot com
2023-03-03 21:32 ` muecker at gwdg dot de
2023-03-03 23:18 ` isanbard at gmail dot com
2023-03-04  7:52 ` muecker at gwdg dot de
2023-03-06 19:15 ` isanbard at gmail dot com [this message]
2023-03-06 19:18 ` jakub at gcc dot gnu.org
2023-03-06 19:38 ` muecker at gwdg dot de
2023-03-06 19:57 ` muecker at gwdg dot de
2023-03-06 20:05 ` siddhesh at gcc dot gnu.org
2023-03-08 16:56 ` qinzhao at gcc dot gnu.org
2023-03-08 17:13 ` qinzhao at gcc dot gnu.org
2023-03-08 17:36 ` qinzhao at gcc dot gnu.org
2023-03-08 17:38 ` qinzhao at gcc dot gnu.org
2023-03-08 17:43 ` qinzhao at gcc dot gnu.org
2023-03-08 17:48 ` muecker at gwdg dot de
2023-03-08 18:37 ` muecker at gwdg dot de
2023-03-08 19:20 ` qinzhao at gcc dot gnu.org
2023-03-08 19:47 ` qinzhao at gcc dot gnu.org
2023-03-08 20:20 ` muecker at gwdg dot de
2023-03-08 20:47 ` qinzhao at gcc dot gnu.org
2023-03-29 16:12 ` muecker at gwdg dot de
2023-04-03 20:29 ` qinzhao at gcc dot gnu.org
2023-04-03 21:53 ` muecker at gwdg dot de
2023-04-04 15:07 ` qinzhao at gcc dot gnu.org
2023-04-04 16:33 ` muecker at gwdg dot de
2023-04-04 20:08 ` qinzhao at gcc dot gnu.org
2023-04-19 16:32 ` qinzhao at gcc dot gnu.org
2023-05-03 13:57 ` qinzhao at gcc dot gnu.org
2023-05-03 15:32 ` kees at outflux dot net
2023-05-04 15:16 ` muecker at gwdg dot de
2023-05-04 15:30 ` qinzhao at gcc dot gnu.org
2023-05-25 18:14 ` qinzhao at gcc dot gnu.org
2023-05-25 18:47 ` ndesaulniers at google dot com
2023-10-05 19:54 ` tg at mirbsd dot org
2023-10-05 20:21 ` muecker at gwdg dot de
2023-12-27  6:31 ` sean@rogue-research.com
2024-03-06 14:40 ` qinzhao at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108896-4-wH59mj1AWR@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).