public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Tim Lange" <mail@tim-lange.me>
To: "David Malcolm" <dmalcolm@redhat.com>
Cc: "GCC Mailing List" <gcc@gcc.gnu.org>
Subject: Re: [RFC] analyzer: allocation size warning
Date: Tue, 21 Jun 2022 22:00:34 +0200	[thread overview]
Message-ID: <CKW2VDHMKLS4.2TVKXUQ3IT54U@fedora> (raw)
In-Reply-To: <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com>

On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
>
> [...snip...]
>
> > > 
> > 
> > I have resent the patch using git send-email as a reply to my original
> > message. 
> > The new message looks properly formatted in the archive:
> >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
>
> Thanks; that's *much* more readable.
>
>
> [...snip...]
>
> > > 
> > > 
> > > 
> > > On symbolic buffer sizes:
> > > warning: Allocated buffer size is not a multiple of the pointee's
> > > size 
> > > [CWE-131] [-Wanalyzer-allocation-size]
> > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > > */
> > >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> > >   ‘test_3’: event 1
> > >     |
> > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > > } 
> > > */
> > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > >     | | |
> > >     | | (1) Allocation is incompatible with ‘int *’; either the 
> > > allocated size is bogus or the type on the left-hand side is wrong
> > >     |
> > > 
> > > 
> > > Is there location information for both the malloc and for the
> > > assignment, here?
> > 
> > I'm not sure whether I understand your question but the warning is 
> > emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> > rhs.
> > I think that is enough to split that up into "(1) n + sizeof(int) 
> > allocated here" and "(2) Allocation at (1) is incompatible with..."? 
>
> Probably, yes.
>
> FWIW I wrote some more notes about the events in my reply to to your
> reply to Prathamesh, here:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
>
> [...snip...]
>
> > > 
> > > There are some things to discuss from my side:
> > > * The tests with the "toy re-implementation of CPython's object 
> > > model"[2] fail due to a extra warning emitted. Because the analyzer
> > > can't know the calculation actually results in a correct buffer size 
> > > when viewed as a string_obj later on, it emits a warning, e.g. at
> > > line 
> > > 61 in data-model-5.c. The only mitigation would be to disable the 
> > > warning for structs entirely. Now, the question is to rather have
> > > noise
> > > on these cases or disable the warning for structs entirely?
> > > 
> > > Can you post the full warning please?
> > 
> > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >    61 |   base_obj *obj = (base_obj *)malloc (sz);
> >       |                               ^~~~~~~~~~~
> >   ‘new_string_obj’: events 1-2
> >     |
> >     |   69 | base_obj *new_string_obj (const char *str)
> >     |      |           ^~~~~~~~~~~~~~
> >     |      |           |
> >     |      |           (1) entry to ‘new_string_obj’
> >     |......
> >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> > (string_obj) + len + 1);
> >     |      |                    
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |                     |
> >     |      |                     (2) calling ‘alloc_obj’ from
> > ‘new_string_obj’
> >     |
> >     +--> ‘alloc_obj’: events 3-4
> >            |
> >            |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
> >            |      |           ^~~~~~~~~
> >            |      |           |
> >            |      |           (3) entry to ‘alloc_obj’
> >            |   60 | {
> >            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
> >            |      |                               ~~~~~~~~~~~
> >            |      |                               |
> >            |      |                               (4) Allocation is
> > incompatible with ‘base_obj *’; either the allocated size is bogus or
> > the type on the left-hand side is wrong
> >            |
> > 
> > > 
> > > These testcases exhibit a common way of faking inheritance in C, and
> > > I
> > > think it ought to be possible to support this in the warning.
> > > 
> > > I thing what's happening is we have
> > > 
> > > struct base
> > > { 
> > >   /* fields */
> > > };
> > > 
> > > struct sub
> > > {
> > >   struct base m_base;
> > >   /* extra fields.  */
> > > };
> > > 
> > > struct base *construct_base (size_t sz)
> > > {
> > >   struct base *p = (struct base *) malloc (sz);
> > > 
> > >   /* set up fields of base in p  */
> > > 
> > >   return p;
> > > }
> > > 
> > > Or is this on the interprocedural path as called with a specific
> > > sizeof
> > > for struct sub?
> > 
> > At (4), it does not know that base_obj is later used as a "base
> > struct". 
> > As it is called with sizeof(struct sub), my checker thinks the buffer
> > is
> > too large for one but too small for another base_obj.
> > 
> > > 
> > > Maybe we can special-case these by detecting where struct sub's first
> > > field is struct base, and hence where we expect this pattern?  (and
> > > use
> > > this to suppress the warning for such cases?)
> > 
> > I already excluded all structs with structs inside with 
> > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> > help 
> > in the case size for struct sub is allocated but casted as base. Maybe,
> > we
> > should do a special case for structs where we only warn when the sizeof
> > is
> > too small to hold the base struct together with supressing warnings
> > when
> > the first field is a struct? 
>
> That sounds like it could work.
>
> There are several things going on in the above example:
> - fake inheritance
> - the "trailing array idiom": struct string_obj's final field is:
>      char str_buf[];
>   meaning that the string_obj will have the char buffer trailing off
> the end, and the allocation is expected to support this.
>
> This is not uncommon in C; it occurs in CPython, see e.g.:
> https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
>
> where CPython's PyBytesObject has the bytes in an ob_sval field
> trailing off the end:
>
> typedef struct {
>     PyObject_VAR_HEAD
>     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
>     char ob_sval[1];
>
>     /* Invariants:
>      *     ob_sval contains space for 'ob_size+1' elements.
>      *     ob_sval[ob_size] == 0.
>      *     ob_shash is the hash of the byte string or -1 if not computed yet.
>      */
> } PyBytesObject;
>
> so it would be good for the warning to handle it gracefully, which I
> think your proposal above would.
>
> I try to have plenty of idiomatic C code in the analyzer test suite to
> try to catch this kind of thing (as well as more "unit test" kinds of
> test coverage); we want the warnings to have a good signal:noise ratio.
>
>
>
> > 
> > 
> > * I'm unable to emit a warning whenever the cast happens at an 
> > assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
> > This is because I'm unable to access a region_svalue for the returned
> > value. Even in the new_program_state, the svalue of the lhs is still a 
> > conjured_svalue. Maybe David can lead me to a place where I can access 
> > the return value's region_svalue or do I have to adapt the engine?
> > 
> > Please can you try reposting the patch?  I tried to read it, but am
> > having trouble with the mangled indentation.
>
> See my inline answer above. Both, the test case and from where I want
> to access the region_svalue are commented with // FIXME.
>
> What does the dump of the state look like? e.g. via calling 
>
>   (gdb) call m_region_model->debug()
>
> from within gdb
>
> A conjured_svalue represents the result of a call to an external
> function (or a side-effect written out to a *out-style param of such a
> function), but we have the body of create_buffer, so the call to
> create_buffer should be analyzed interprocedurally, and we should have
> a region_svalue pointing at a heap_allocated_region.
>
> You might want to simplify things to just the functions of interest,
> and then have a look at the output of -fdump-analyzer-exploded-graph in
> your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).
>
> I wonder if my idea from the other email of moving the test from sm-
> malloc.cc to region-model.cc might affect this; the state machines run
> at a slightly different time to the region model updates.

I've now moved the checker inside the region_model. While I got
everything working again and fixed the bogus struct error, I'm stuck at
the warning if the cast is on a return.

Lets take the example:
  void *create_buffer(int n)
  {
    return malloc(n);
  }

  int main (int argc, char **argv)
  {
    int *buf = create_buffer(42);
    free (buf);
    return 0;
  }

After moving it to the region_model, it reaches
  ctxt->warn (new dubious_allocation_size(...))
but does not emit a warning because inside impl_region_model_context::warn
m_stmt and m_stmt_finder are both NULL.
m_stmt is null, because the exploded node at which set_value is called
on, is the after node of create_buffer (the last bb of the function),
which doesn't have any statements.

I·tried·to·add·the·call·site·to·the·context·inside program_state::on_edge
to·get·the·warning.·While·the·warning·appears, the·notes·in·the·diagnostic
are·not·correct·anymore,·i.e.·the·call·site has·the·same·indentation·as
the·callee·and·the·return·note·is missing.
On the other hand, I can not split the return value set_value call out
of pop_frame, because then I'm unable to retrieve the <return_value> as
it is already gone.
If I simply delay the pop_frame call until the 'before' supernode, I get
the warning I wanted but break around 600 test, which can't be right
either - at least not for a simple fix.

Because I'm stuck at this for some hours, I'd like to ask you what do
you think is the best way to get a warning at the call site?

- Tim
>
>
>
> > 
> > 
> >e* attr-malloc-6.c and pr96639.c did both contain stn;cts withn;t an
> > implementation. Something in the analyzer must have triggered another
> > warning about the usage of those without them having an implementation.
> > I changed those structs to have an empty implementation, such that the 
> > additional warning are gone. I think this shouldn't change the test
> > case, so is this change okay?
> > 
> > What were the new warnings?
>
> /path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
> ‘struct FILE’
>   175 |     FILE *p = malloc (100);   // { dg-message "allocated here"
> }
>       |               ^~~~~~~~~~~~

For anyone in the future with the same problem. Make sure to check that
TYPE_SIZE_UNIT != NULL_TREE before calling size_in_bytes.

>
> All were like the one above. error: invalid use of undefined type
> 'struct XXX'
>
> That error looks bogus; I'm guessing that something the new diagnostics
> is calling is generating it.  You can probably track it down by using
>
>   (gdb) break-on-diagnostic
>
> in the debugger, and then seeing what the backtrace shows when the
> breakpoint fires.
>
> See:
>   https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
>
> break-on-diagnostic is one of the things in the support scripts
> mentioned on that page.
>
> Hope this is helpful
>
> (BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)
>
> Dave


  reply	other threads:[~2022-06-21 23:05 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17 15:54 Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23   ` Tim Lange
2022-06-17 21:39     ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23   ` Tim Lange
2022-06-17 22:13     ` David Malcolm
2022-06-21 20:00       ` Tim Lange [this message]
2022-06-21 23:16         ` David Malcolm
2022-06-22 14:57           ` Tim Lange
2022-06-22 18:23             ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39   ` David Malcolm
2022-06-30 20:40     ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CKW2VDHMKLS4.2TVKXUQ3IT54U@fedora \
    --to=mail@tim-lange.me \
    --cc=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).