From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: [RFC] analyzer: allocation size warning
Date: Fri, 17 Jun 2022 18:13:17 -0400 [thread overview]
Message-ID: <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com> (raw)
In-Reply-To: <20220617202305.kg2otzye6k57fgyo@fedora>
On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
[...snip...]
> >
>
> I have resent the patch using git send-email as a reply to my original
> message.
> The new message looks properly formatted in the archive:
> https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
Thanks; that's *much* more readable.
[...snip...]
> >
> >
> >
> > On symbolic buffer sizes:
> > warning: Allocated buffer size is not a multiple of the pointee's
> > size
> > [CWE-131] [-Wanalyzer-allocation-size]
> > 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > */
> > | ^~~~~~~~~~~~~~~~~~~~~~~~
> > ‘test_3’: event 1
> > |
> > | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > }
> > */
> > | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > | | |
> > | | (1) Allocation is incompatible with ‘int *’; either the
> > allocated size is bogus or the type on the left-hand side is wrong
> > |
> >
> >
> > Is there location information for both the malloc and for the
> > assignment, here?
>
> I'm not sure whether I understand your question but the warning is
> emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> rhs.
> I think that is enough to split that up into "(1) n + sizeof(int)
> allocated here" and "(2) Allocation at (1) is incompatible with..."?
Probably, yes.
FWIW I wrote some more notes about the events in my reply to to your
reply to Prathamesh, here:
https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
[...snip...]
> >
> > There are some things to discuss from my side:
> > * The tests with the "toy re-implementation of CPython's object
> > model"[2] fail due to a extra warning emitted. Because the analyzer
> > can't know the calculation actually results in a correct buffer size
> > when viewed as a string_obj later on, it emits a warning, e.g. at
> > line
> > 61 in data-model-5.c. The only mitigation would be to disable the
> > warning for structs entirely. Now, the question is to rather have
> > noise
> > on these cases or disable the warning for structs entirely?
> >
> > Can you post the full warning please?
>
> /path/to/data-model-5.c: In function ‘alloc_obj’:
> /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> 61 | base_obj *obj = (base_obj *)malloc (sz);
> | ^~~~~~~~~~~
> ‘new_string_obj’: events 1-2
> |
> | 69 | base_obj *new_string_obj (const char *str)
> | | ^~~~~~~~~~~~~~
> | | |
> | | (1) entry to ‘new_string_obj’
> |......
> | 75 | = (string_obj *)alloc_obj (&str_type, sizeof
> (string_obj) + len + 1);
> | |
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> | | |
> | | (2) calling ‘alloc_obj’ from
> ‘new_string_obj’
> |
> +--> ‘alloc_obj’: events 3-4
> |
> | 59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
> | | ^~~~~~~~~
> | | |
> | | (3) entry to ‘alloc_obj’
> | 60 | {
> | 61 | base_obj *obj = (base_obj *)malloc (sz);
> | | ~~~~~~~~~~~
> | | |
> | | (4) Allocation is
> incompatible with ‘base_obj *’; either the allocated size is bogus or
> the type on the left-hand side is wrong
> |
>
> >
> > These testcases exhibit a common way of faking inheritance in C, and
> > I
> > think it ought to be possible to support this in the warning.
> >
> > I thing what's happening is we have
> >
> > struct base
> > {
> > /* fields */
> > };
> >
> > struct sub
> > {
> > struct base m_base;
> > /* extra fields. */
> > };
> >
> > struct base *construct_base (size_t sz)
> > {
> > struct base *p = (struct base *) malloc (sz);
> >
> > /* set up fields of base in p */
> >
> > return p;
> > }
> >
> > Or is this on the interprocedural path as called with a specific
> > sizeof
> > for struct sub?
>
> At (4), it does not know that base_obj is later used as a "base
> struct".
> As it is called with sizeof(struct sub), my checker thinks the buffer
> is
> too large for one but too small for another base_obj.
>
> >
> > Maybe we can special-case these by detecting where struct sub's first
> > field is struct base, and hence where we expect this pattern? (and
> > use
> > this to suppress the warning for such cases?)
>
> I already excluded all structs with structs inside with
> struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> help
> in the case size for struct sub is allocated but casted as base. Maybe,
> we
> should do a special case for structs where we only warn when the sizeof
> is
> too small to hold the base struct together with supressing warnings
> when
> the first field is a struct?
That sounds like it could work.
There are several things going on in the above example:
- fake inheritance
- the "trailing array idiom": struct string_obj's final field is:
char str_buf[];
meaning that the string_obj will have the char buffer trailing off
the end, and the allocation is expected to support this.
This is not uncommon in C; it occurs in CPython, see e.g.:
https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
where CPython's PyBytesObject has the bytes in an ob_sval field
trailing off the end:
typedef struct {
PyObject_VAR_HEAD
Py_DEPRECATED(3.11) Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the byte string or -1 if not computed yet.
*/
} PyBytesObject;
so it would be good for the warning to handle it gracefully, which I
think your proposal above would.
I try to have plenty of idiomatic C code in the analyzer test suite to
try to catch this kind of thing (as well as more "unit test" kinds of
test coverage); we want the warnings to have a good signal:noise ratio.
>
>
> * I'm unable to emit a warning whenever the cast happens at an
> assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c.
> This is because I'm unable to access a region_svalue for the returned
> value. Even in the new_program_state, the svalue of the lhs is still a
> conjured_svalue. Maybe David can lead me to a place where I can access
> the return value's region_svalue or do I have to adapt the engine?
>
> Please can you try reposting the patch? I tried to read it, but am
> having trouble with the mangled indentation.
See my inline answer above. Both, the test case and from where I want
to access the region_svalue are commented with // FIXME.
What does the dump of the state look like? e.g. via calling
(gdb) call m_region_model->debug()
from within gdb
A conjured_svalue represents the result of a call to an external
function (or a side-effect written out to a *out-style param of such a
function), but we have the body of create_buffer, so the call to
create_buffer should be analyzed interprocedurally, and we should have
a region_svalue pointing at a heap_allocated_region.
You might want to simplify things to just the functions of interest,
and then have a look at the output of -fdump-analyzer-exploded-graph in
your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).
I wonder if my idea from the other email of moving the test from sm-
malloc.cc to region-model.cc might affect this; the state machines run
at a slightly different time to the region model updates.
>
>
> * attr-malloc-6.c and pr96639.c did both contain structs without an
> implementation. Something in the analyzer must have triggered another
> warning about the usage of those without them having an implementation.
> I changed those structs to have an empty implementation, such that the
> additional warning are gone. I think this shouldn't change the test
> case, so is this change okay?
>
> What were the new warnings?
/path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
‘struct FILE’
175 | FILE *p = malloc (100); // { dg-message "allocated here"
}
| ^~~~~~~~~~~~
All were like the one above. error: invalid use of undefined type
'struct XXX'
That error looks bogus; I'm guessing that something the new diagnostics
is calling is generating it. You can probably track it down by using
(gdb) break-on-diagnostic
in the debugger, and then seeing what the backtrace shows when the
breakpoint fires.
See:
https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
break-on-diagnostic is one of the things in the support scripts
mentioned on that page.
Hope this is helpful
(BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)
Dave
next prev parent reply other threads:[~2022-06-17 22:13 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-17 15:54 Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23 ` Tim Lange
2022-06-17 21:39 ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23 ` Tim Lange
2022-06-17 22:13 ` David Malcolm [this message]
2022-06-21 20:00 ` Tim Lange
2022-06-21 23:16 ` David Malcolm
2022-06-22 14:57 ` Tim Lange
2022-06-22 18:23 ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39 ` David Malcolm
2022-06-30 20:40 ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=mail@tim-lange.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).