public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: [RFC] analyzer: allocation size warning
Date: Tue, 21 Jun 2022 19:16:53 -0400	[thread overview]
Message-ID: <5e65c86d518c6e26a970b5fb23c2ed507abd2085.camel@redhat.com> (raw)
In-Reply-To: <CKW2VDHMKLS4.2TVKXUQ3IT54U@fedora>

On Tue, 2022-06-21 at 22:00 +0200, Tim Lange wrote:
> On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> > On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
> > 
> > [...snip...]
> > 
> > > > 
> > > 
> > > I have resent the patch using git send-email as a reply to my
> > > original
> > > message. 
> > > The new message looks properly formatted in the archive:
> > >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
> > 
> > Thanks; that's *much* more readable.
> > 
> > 
> > [...snip...]
> > 
> > > > 
> > > > 
> > > > 
> > > > On symbolic buffer sizes:
> > > > warning: Allocated buffer size is not a multiple of the
> > > > pointee's
> > > > size 
> > > > [CWE-131] [-Wanalyzer-allocation-size]
> > > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line
> > > > malloc3 }
> > > > */
> > > >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > >   ‘test_3’: event 1
> > > >     |
> > > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line
> > > > malloc3
> > > > } 
> > > > */
> > > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > >     | | |
> > > >     | | (1) Allocation is incompatible with ‘int *’; either the
> > > > allocated size is bogus or the type on the left-hand side is
> > > > wrong
> > > >     |
> > > > 
> > > > 
> > > > Is there location information for both the malloc and for the
> > > > assignment, here?
> > > 
> > > I'm not sure whether I understand your question but the warning
> > > is 
> > > emitted at the gcall* with a ssa var lhs and the call_fndecl on
> > > the
> > > rhs.
> > > I think that is enough to split that up into "(1) n + sizeof(int)
> > > allocated here" and "(2) Allocation at (1) is incompatible
> > > with..."? 
> > 
> > Probably, yes.
> > 
> > FWIW I wrote some more notes about the events in my reply to to
> > your
> > reply to Prathamesh, here:
> >   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
> > 
> > [...snip...]
> > 
> > > > 
> > > > There are some things to discuss from my side:
> > > > * The tests with the "toy re-implementation of CPython's object
> > > > model"[2] fail due to a extra warning emitted. Because the
> > > > analyzer
> > > > can't know the calculation actually results in a correct buffer
> > > > size 
> > > > when viewed as a string_obj later on, it emits a warning, e.g.
> > > > at
> > > > line 
> > > > 61 in data-model-5.c. The only mitigation would be to disable
> > > > the 
> > > > warning for structs entirely. Now, the question is to rather
> > > > have
> > > > noise
> > > > on these cases or disable the warning for structs entirely?
> > > > 
> > > > Can you post the full warning please?
> > > 
> > > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is
> > > not a
> > > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-
> > > size]
> > >    61 |   base_obj *obj = (base_obj *)malloc (sz);
> > >       |                               ^~~~~~~~~~~
> > >   ‘new_string_obj’: events 1-2
> > >     |
> > >     |   69 | base_obj *new_string_obj (const char *str)
> > >     |      |           ^~~~~~~~~~~~~~
> > >     |      |           |
> > >     |      |           (1) entry to ‘new_string_obj’
> > >     |......
> > >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> > > (string_obj) + len + 1);
> > >     |      |                    
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |                     |
> > >     |      |                     (2) calling ‘alloc_obj’ from
> > > ‘new_string_obj’
> > >     |
> > >     +--> ‘alloc_obj’: events 3-4
> > >            |
> > >            |   59 | base_obj *alloc_obj (type_obj *ob_type,
> > > size_t sz)
> > >            |      |           ^~~~~~~~~
> > >            |      |           |
> > >            |      |           (3) entry to ‘alloc_obj’
> > >            |   60 | {
> > >            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
> > >            |      |                               ~~~~~~~~~~~
> > >            |      |                               |
> > >            |      |                               (4) Allocation
> > > is
> > > incompatible with ‘base_obj *’; either the allocated size is
> > > bogus or
> > > the type on the left-hand side is wrong
> > >            |
> > > 
> > > > 
> > > > These testcases exhibit a common way of faking inheritance in
> > > > C, and
> > > > I
> > > > think it ought to be possible to support this in the warning.
> > > > 
> > > > I thing what's happening is we have
> > > > 
> > > > struct base
> > > > { 
> > > >   /* fields */
> > > > };
> > > > 
> > > > struct sub
> > > > {
> > > >   struct base m_base;
> > > >   /* extra fields.  */
> > > > };
> > > > 
> > > > struct base *construct_base (size_t sz)
> > > > {
> > > >   struct base *p = (struct base *) malloc (sz);
> > > > 
> > > >   /* set up fields of base in p  */
> > > > 
> > > >   return p;
> > > > }
> > > > 
> > > > Or is this on the interprocedural path as called with a
> > > > specific
> > > > sizeof
> > > > for struct sub?
> > > 
> > > At (4), it does not know that base_obj is later used as a "base
> > > struct". 
> > > As it is called with sizeof(struct sub), my checker thinks the
> > > buffer
> > > is
> > > too large for one but too small for another base_obj.
> > > 
> > > > 
> > > > Maybe we can special-case these by detecting where struct sub's
> > > > first
> > > > field is struct base, and hence where we expect this pattern? 
> > > > (and
> > > > use
> > > > this to suppress the warning for such cases?)
> > > 
> > > I already excluded all structs with structs inside with 
> > > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does
> > > not
> > > help 
> > > in the case size for struct sub is allocated but casted as base.
> > > Maybe,
> > > we
> > > should do a special case for structs where we only warn when the
> > > sizeof
> > > is
> > > too small to hold the base struct together with supressing
> > > warnings
> > > when
> > > the first field is a struct? 
> > 
> > That sounds like it could work.
> > 
> > There are several things going on in the above example:
> > - fake inheritance
> > - the "trailing array idiom": struct string_obj's final field is:
> >      char str_buf[];
> >   meaning that the string_obj will have the char buffer trailing
> > off
> > the end, and the allocation is expected to support this.
> > 
> > This is not uncommon in C; it occurs in CPython, see e.g.:
> > https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
> > 
> > where CPython's PyBytesObject has the bytes in an ob_sval field
> > trailing off the end:
> > 
> > typedef struct {
> >     PyObject_VAR_HEAD
> >     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
> >     char ob_sval[1];
> > 
> >     /* Invariants:
> >      *     ob_sval contains space for 'ob_size+1' elements.
> >      *     ob_sval[ob_size] == 0.
> >      *     ob_shash is the hash of the byte string or -1 if not
> > computed yet.
> >      */
> > } PyBytesObject;
> > 
> > so it would be good for the warning to handle it gracefully, which
> > I
> > think your proposal above would.
> > 
> > I try to have plenty of idiomatic C code in the analyzer test suite
> > to
> > try to catch this kind of thing (as well as more "unit test" kinds
> > of
> > test coverage); we want the warnings to have a good signal:noise
> > ratio.
> > 
> > 
> > 
> > > 
> > > 
> > > * I'm unable to emit a warning whenever the cast happens at an 
> > > assignment with a call as the rhs, e.g. test_1 in allocation-
> > > size-4.c. 
> > > This is because I'm unable to access a region_svalue for the
> > > returned
> > > value. Even in the new_program_state, the svalue of the lhs is
> > > still a 
> > > conjured_svalue. Maybe David can lead me to a place where I can
> > > access 
> > > the return value's region_svalue or do I have to adapt the
> > > engine?
> > > 
> > > Please can you try reposting the patch?  I tried to read it, but
> > > am
> > > having trouble with the mangled indentation.
> > 
> > See my inline answer above. Both, the test case and from where I
> > want
> > to access the region_svalue are commented with // FIXME.
> > 
> > What does the dump of the state look like? e.g. via calling 
> > 
> >   (gdb) call m_region_model->debug()
> > 
> > from within gdb
> > 
> > A conjured_svalue represents the result of a call to an external
> > function (or a side-effect written out to a *out-style param of
> > such a
> > function), but we have the body of create_buffer, so the call to
> > create_buffer should be analyzed interprocedurally, and we should
> > have
> > a region_svalue pointing at a heap_allocated_region.
> > 
> > You might want to simplify things to just the functions of
> > interest,
> > and then have a look at the output of -fdump-analyzer-exploded-
> > graph in
> > your favorite .dot viewer (I like xdot; it's in python-xdot in
> > Fedora).
> > 
> > I wonder if my idea from the other email of moving the test from
> > sm-
> > malloc.cc to region-model.cc might affect this; the state machines
> > run
> > at a slightly different time to the region model updates.
> 
> I've now moved the checker inside the region_model. While I got
> everything working again and fixed the bogus struct error, 

...thanks!

> I'm stuck at
> the warning if the cast is on a return.
> 
> Lets take the example:
>   void *create_buffer(int n)
>   {
>     return malloc(n);
>   }
> 
>   int main (int argc, char **argv)
>   {
>     int *buf = create_buffer(42);
>     free (buf);
>     return 0;
>   }
> 
> After moving it to the region_model, it reaches
>   ctxt->warn (new dubious_allocation_size(...))
> but does not emit a warning because inside
> impl_region_model_context::warn
> m_stmt and m_stmt_finder are both NULL.
> m_stmt is null, because the exploded node at which set_value is
> called
> on, is the after node of create_buffer (the last bb of the function),
> which doesn't have any statements.

What optimization level are you trying this at?

What's the output of -fdump-ipa-analyzer=stderr

> 
> I·tried·to·add·the·call·site·to·the·context·inside
> program_state::on_edge
> to·get·the·warning.·While·the·warning·appears,
> the·notes·in·the·diagnostic
> are·not·correct·anymore,·i.e.·the·call·site
> has·the·same·indentation·as
> the·callee·and·the·return·note·is missing.
> On the other hand, I can not split the return value set_value call
> out
> of pop_frame, because then I'm unable to retrieve the <return_value>
> as
> it is already gone.
> If I simply delay the pop_frame call until the 'before' supernode, I
> get
> the warning I wanted but break around 600 test, which can't be right
> either - at least not for a simple fix.
> 
> Because I'm stuck at this for some hours, I'd like to ask you what do
> you think is the best way to get a warning at the call site?

Can you post your patch please so that I can see what's going on more
clearly.

Thanks
Dave



  reply	other threads:[~2022-06-21 23:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17 15:54 Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23   ` Tim Lange
2022-06-17 21:39     ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23   ` Tim Lange
2022-06-17 22:13     ` David Malcolm
2022-06-21 20:00       ` Tim Lange
2022-06-21 23:16         ` David Malcolm [this message]
2022-06-22 14:57           ` Tim Lange
2022-06-22 18:23             ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39   ` David Malcolm
2022-06-30 20:40     ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e65c86d518c6e26a970b5fb23c2ed507abd2085.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).