From: "Tim Lange" <mail@tim-lange.me>
To: "David Malcolm" <dmalcolm@redhat.com>
Cc: "GCC Mailing List" <gcc@gcc.gnu.org>
Subject: Re: [RFC] analyzer: allocation size warning
Date: Tue, 21 Jun 2022 22:00:34 +0200 [thread overview]
Message-ID: <CKW2VDHMKLS4.2TVKXUQ3IT54U@fedora> (raw)
In-Reply-To: <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com>
On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
>
> [...snip...]
>
> > >
> >
> > I have resent the patch using git send-email as a reply to my original
> > message.
> > The new message looks properly formatted in the archive:
> > https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
>
> Thanks; that's *much* more readable.
>
>
> [...snip...]
>
> > >
> > >
> > >
> > > On symbolic buffer sizes:
> > > warning: Allocated buffer size is not a multiple of the pointee's
> > > size
> > > [CWE-131] [-Wanalyzer-allocation-size]
> > > 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > > */
> > > | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > ‘test_3’: event 1
> > > |
> > > | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > > }
> > > */
> > > | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > | | |
> > > | | (1) Allocation is incompatible with ‘int *’; either the
> > > allocated size is bogus or the type on the left-hand side is wrong
> > > |
> > >
> > >
> > > Is there location information for both the malloc and for the
> > > assignment, here?
> >
> > I'm not sure whether I understand your question but the warning is
> > emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> > rhs.
> > I think that is enough to split that up into "(1) n + sizeof(int)
> > allocated here" and "(2) Allocation at (1) is incompatible with..."?
>
> Probably, yes.
>
> FWIW I wrote some more notes about the events in my reply to to your
> reply to Prathamesh, here:
> https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
>
> [...snip...]
>
> > >
> > > There are some things to discuss from my side:
> > > * The tests with the "toy re-implementation of CPython's object
> > > model"[2] fail due to a extra warning emitted. Because the analyzer
> > > can't know the calculation actually results in a correct buffer size
> > > when viewed as a string_obj later on, it emits a warning, e.g. at
> > > line
> > > 61 in data-model-5.c. The only mitigation would be to disable the
> > > warning for structs entirely. Now, the question is to rather have
> > > noise
> > > on these cases or disable the warning for structs entirely?
> > >
> > > Can you post the full warning please?
> >
> > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > 61 | base_obj *obj = (base_obj *)malloc (sz);
> > | ^~~~~~~~~~~
> > ‘new_string_obj’: events 1-2
> > |
> > | 69 | base_obj *new_string_obj (const char *str)
> > | | ^~~~~~~~~~~~~~
> > | | |
> > | | (1) entry to ‘new_string_obj’
> > |......
> > | 75 | = (string_obj *)alloc_obj (&str_type, sizeof
> > (string_obj) + len + 1);
> > | |
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > | | |
> > | | (2) calling ‘alloc_obj’ from
> > ‘new_string_obj’
> > |
> > +--> ‘alloc_obj’: events 3-4
> > |
> > | 59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
> > | | ^~~~~~~~~
> > | | |
> > | | (3) entry to ‘alloc_obj’
> > | 60 | {
> > | 61 | base_obj *obj = (base_obj *)malloc (sz);
> > | | ~~~~~~~~~~~
> > | | |
> > | | (4) Allocation is
> > incompatible with ‘base_obj *’; either the allocated size is bogus or
> > the type on the left-hand side is wrong
> > |
> >
> > >
> > > These testcases exhibit a common way of faking inheritance in C, and
> > > I
> > > think it ought to be possible to support this in the warning.
> > >
> > > I thing what's happening is we have
> > >
> > > struct base
> > > {
> > > /* fields */
> > > };
> > >
> > > struct sub
> > > {
> > > struct base m_base;
> > > /* extra fields. */
> > > };
> > >
> > > struct base *construct_base (size_t sz)
> > > {
> > > struct base *p = (struct base *) malloc (sz);
> > >
> > > /* set up fields of base in p */
> > >
> > > return p;
> > > }
> > >
> > > Or is this on the interprocedural path as called with a specific
> > > sizeof
> > > for struct sub?
> >
> > At (4), it does not know that base_obj is later used as a "base
> > struct".
> > As it is called with sizeof(struct sub), my checker thinks the buffer
> > is
> > too large for one but too small for another base_obj.
> >
> > >
> > > Maybe we can special-case these by detecting where struct sub's first
> > > field is struct base, and hence where we expect this pattern? (and
> > > use
> > > this to suppress the warning for such cases?)
> >
> > I already excluded all structs with structs inside with
> > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> > help
> > in the case size for struct sub is allocated but casted as base. Maybe,
> > we
> > should do a special case for structs where we only warn when the sizeof
> > is
> > too small to hold the base struct together with supressing warnings
> > when
> > the first field is a struct?
>
> That sounds like it could work.
>
> There are several things going on in the above example:
> - fake inheritance
> - the "trailing array idiom": struct string_obj's final field is:
> char str_buf[];
> meaning that the string_obj will have the char buffer trailing off
> the end, and the allocation is expected to support this.
>
> This is not uncommon in C; it occurs in CPython, see e.g.:
> https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
>
> where CPython's PyBytesObject has the bytes in an ob_sval field
> trailing off the end:
>
> typedef struct {
> PyObject_VAR_HEAD
> Py_DEPRECATED(3.11) Py_hash_t ob_shash;
> char ob_sval[1];
>
> /* Invariants:
> * ob_sval contains space for 'ob_size+1' elements.
> * ob_sval[ob_size] == 0.
> * ob_shash is the hash of the byte string or -1 if not computed yet.
> */
> } PyBytesObject;
>
> so it would be good for the warning to handle it gracefully, which I
> think your proposal above would.
>
> I try to have plenty of idiomatic C code in the analyzer test suite to
> try to catch this kind of thing (as well as more "unit test" kinds of
> test coverage); we want the warnings to have a good signal:noise ratio.
>
>
>
> >
> >
> > * I'm unable to emit a warning whenever the cast happens at an
> > assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c.
> > This is because I'm unable to access a region_svalue for the returned
> > value. Even in the new_program_state, the svalue of the lhs is still a
> > conjured_svalue. Maybe David can lead me to a place where I can access
> > the return value's region_svalue or do I have to adapt the engine?
> >
> > Please can you try reposting the patch? I tried to read it, but am
> > having trouble with the mangled indentation.
>
> See my inline answer above. Both, the test case and from where I want
> to access the region_svalue are commented with // FIXME.
>
> What does the dump of the state look like? e.g. via calling
>
> (gdb) call m_region_model->debug()
>
> from within gdb
>
> A conjured_svalue represents the result of a call to an external
> function (or a side-effect written out to a *out-style param of such a
> function), but we have the body of create_buffer, so the call to
> create_buffer should be analyzed interprocedurally, and we should have
> a region_svalue pointing at a heap_allocated_region.
>
> You might want to simplify things to just the functions of interest,
> and then have a look at the output of -fdump-analyzer-exploded-graph in
> your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).
>
> I wonder if my idea from the other email of moving the test from sm-
> malloc.cc to region-model.cc might affect this; the state machines run
> at a slightly different time to the region model updates.
I've now moved the checker inside the region_model. While I got
everything working again and fixed the bogus struct error, I'm stuck at
the warning if the cast is on a return.
Lets take the example:
void *create_buffer(int n)
{
return malloc(n);
}
int main (int argc, char **argv)
{
int *buf = create_buffer(42);
free (buf);
return 0;
}
After moving it to the region_model, it reaches
ctxt->warn (new dubious_allocation_size(...))
but does not emit a warning because inside impl_region_model_context::warn
m_stmt and m_stmt_finder are both NULL.
m_stmt is null, because the exploded node at which set_value is called
on, is the after node of create_buffer (the last bb of the function),
which doesn't have any statements.
I·tried·to·add·the·call·site·to·the·context·inside program_state::on_edge
to·get·the·warning.·While·the·warning·appears, the·notes·in·the·diagnostic
are·not·correct·anymore,·i.e.·the·call·site has·the·same·indentation·as
the·callee·and·the·return·note·is missing.
On the other hand, I can not split the return value set_value call out
of pop_frame, because then I'm unable to retrieve the <return_value> as
it is already gone.
If I simply delay the pop_frame call until the 'before' supernode, I get
the warning I wanted but break around 600 test, which can't be right
either - at least not for a simple fix.
Because I'm stuck at this for some hours, I'd like to ask you what do
you think is the best way to get a warning at the call site?
- Tim
>
>
>
> >
> >
> >e* attr-malloc-6.c and pr96639.c did both contain stn;cts withn;t an
> > implementation. Something in the analyzer must have triggered another
> > warning about the usage of those without them having an implementation.
> > I changed those structs to have an empty implementation, such that the
> > additional warning are gone. I think this shouldn't change the test
> > case, so is this change okay?
> >
> > What were the new warnings?
>
> /path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
> ‘struct FILE’
> 175 | FILE *p = malloc (100); // { dg-message "allocated here"
> }
> | ^~~~~~~~~~~~
For anyone in the future with the same problem. Make sure to check that
TYPE_SIZE_UNIT != NULL_TREE before calling size_in_bytes.
>
> All were like the one above. error: invalid use of undefined type
> 'struct XXX'
>
> That error looks bogus; I'm guessing that something the new diagnostics
> is calling is generating it. You can probably track it down by using
>
> (gdb) break-on-diagnostic
>
> in the debugger, and then seeing what the backtrace shows when the
> breakpoint fires.
>
> See:
> https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
>
> break-on-diagnostic is one of the things in the support scripts
> mentioned on that page.
>
> Hope this is helpful
>
> (BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)
>
> Dave
next prev parent reply other threads:[~2022-06-21 23:05 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-17 15:54 Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23 ` Tim Lange
2022-06-17 21:39 ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23 ` Tim Lange
2022-06-17 22:13 ` David Malcolm
2022-06-21 20:00 ` Tim Lange [this message]
2022-06-21 23:16 ` David Malcolm
2022-06-22 14:57 ` Tim Lange
2022-06-22 18:23 ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39 ` David Malcolm
2022-06-30 20:40 ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CKW2VDHMKLS4.2TVKXUQ3IT54U@fedora \
--to=mail@tim-lange.me \
--cc=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).