public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>,
	Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: [RFC] analyzer: allocation size warning
Date: Fri, 17 Jun 2022 17:39:44 -0400	[thread overview]
Message-ID: <1b6a6d3e570b4e945db8b7e452cb47f384140272.camel@redhat.com> (raw)
In-Reply-To: <A7ZMDR.7J1GGPCTKYH3@tim-lange.me>

On Fri, 2022-06-17 at 21:23 +0200, Tim Lange wrote:
> 
> 
> On Fr, Jun 17 2022 at 22:45:42 +0530, Prathamesh Kulkarni 
> <prathamesh.kulkarni@linaro.org> wrote:
> > On Fri, 17 Jun 2022 at 21:25, Tim Lange <mail@tim-lange.me> wrote:
> > > 
> > >  Hi everyone,
> > Hi Tim,
> > Thanks for posting the POC patch!
> > Just a couple of comments (inline)
> Hi Prathamesh,
> thanks for looking at it.
> > > 
> > >  tracked in PR105900 [0], I'd like to add support for a new warning
> > > on
> > >  dubious allocation sizes. The new checker emits a warning when the
> > >  allocation size is not a multiple of the type's size. With the 
> > > checker,
> > >  following mistakes are detected:
> > >    int *arr = malloc(3); // forgot to multiply by sizeof
> > >    arr[0] = ...;
> > >    arr[1] = ...;
> > >  or
> > >    int *buf = malloc (n + sizeof(int)); // probably should be * 
> > > instead
> > >  of +
> > >  Because it is implemented inside the analyzer, it also emits 
> > > warnings
> > >  when the buffer is first of type void* and later on casted to 
> > > something
> > >  else. Though, this also inherits a limitation. The checker can not
> > >  distinguish 2 * sizeof(short) from sizeof(int) because sizeof is
> > >  resolved and constants are folded at the point when the analyzer
> > > runs.
> > >  As a mitigation, I plan to implement a check in the frontend that 
> > > emits
> > >  a warning if sizeof(lhs pointee type) is not part of the malloc
> > >  argument.
> > IMHO, warning if sizeof(lhs pointee_type) is not present inside
> > malloc, might not be a good idea because it
> > would reject valid calls to malloc.
> > For eg:
> > (1)
> > size_t size = sizeof(int);
> > int *p = malloc (size);
> > 
> > (2)
> > void *p = malloc (sizeof(int));
> > int *q = p;
> Hm, that's right. Maybe only warn when there is a sizeof(type) in the
> argument and the lhs pointee_type != type (except for void*, maybe 
> char* and "inherited" structs)?

That sounds plausible.


[...snip...]

> > > 
> > Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ?
> > I assume by symbolic buffer size, 'n' is not known at compile time.
> * VLAs are resolved to n * `sizeof(type) when the analyzer runs and
> work 
> fine.

Great - and please make sure the test suite has test coverage for
anything we're talking about!


IIRC, VLAs work using __builtin_alloca, rather than malloc, and the new
diagnostic is currently implemented as an extension of the sm-malloc.cc
code, so I don't think it could fire anyway for a __builtin_alloca. 
Does it fire for:

  int *ptr = alloca (sizeof (short)); // BUG: sizeof(short) != sizeof(int), probably


There are two places in the analyzer that are tracking memory
allocations:
(a) the sm-malloc.cc code, and
(b) in the region_model's m_dynamic_extents hash_map, which tracks a
symbolic value for the size of each reachable dynamically-allocated
region (and inhibits merging of exploded_nodes that have different
dynamic extents).

See
  region_model::set_dynamic_extents
and
  region_model::get_dynamic_extents

These are called by:
  region_model::create_region_for_alloca
and
  region_model::create_region_for_heap_alloc
and:
  region_model::impl_call_realloc

My gut feeling is that the diagnostic would work better implemented in
terms of (b), rather than (a).  If you test in region_model::set_value,
rather than in a state machine, the test could use the dynamic-extent
tracking of region_model (and thus work e.g. with alloca and realloc,
see the example above), and could also be extended to catch assignments
to pointers from statically-allocated regions of the wrong size e.g.:

  char buf[2];
  int *ptr = (int *)buf; // BUG: sizeof(buf) is only 2 bytes

  short s;
  int *ptr = (int *)&s; // BUG: sizeof(short) != sizeof(int), probably


diagnostic_manager::add_events_for_eedge already has some code that
will add a region_creation_event to the checker_path for the case where
a region is determined to be the one of interest to the diagnostic,

See e.g.
poisoned_value_diagnostic::mark_interesting_stuff, which indicates the
region of interest, which is how we get event (1) in the following for
an alloca:

../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c: In function ‘test_1’:
../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c:6:10: warning: use of uninitialized 
  value ‘*p’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
    6 |   return *p; /* { dg-warning "use of uninitialized value '\\*p'" } */
      |          ^~
  ‘test_1’: events 1-2
    |
    |    5 |   int *p = __builtin_alloca (sizeof (int));
    |      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |            |
    |      |            (1) region created on stack here
    |    6 |   return *p;
    |      |          ~~ 
    |      |          |
    |      |          (2) use of uninitialized value ‘*p’ here
    |

and event (1) for a statically-sized variable here:

../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c: In function ‘test_1’:
../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c:7:10: warning: use of uninitialized 
  value ‘i’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
    7 |   return i;
      |          ^
  ‘test_1’: events 1-2
    |
    |    6 |   int i;
    |      |       ^
    |      |       |
    |      |       (1) region created on stack here
    |    7 |   return i;
    |      |          ~
    |      |          |
    |      |          (2) use of uninitialized value ‘i’ here
    |


So I think you could implement this warning inside region-model.cc
instead of sm-malloc.cc, and by implementing the mark_interesting_stuff
vfunc to set the region of interest to the buffer with the wrong size,
you'd get the allocation event "for free", and could handle assignment
to pointers from the address of statically-sized objects with
incompatible size.

That said, I haven't tried implementing this, and there may be some
snag I've forgotten :/


* Flows with if (cond) n = ...; else n = ...; are tracked by the 
analyzer with a widening_svalue and can be handled (While thinking 
about this answer, I noticed my patch is missing this case. Thanks!)

Great!  Please add test coverage :)


* In case of more complicated flows, the analyzer's buffer size 
tracking resorts to unknown_svalue. If any variable in an expression is
unknown, no warning will be emitted.

That's OK, I think.

* Generally, when requesting memory for a variable type, accepting an
arbitrary number doesn't sound right. I do warn, e.g. if 'n' is a 
conjured_svalue (e.g. a from scanf call).

scanf should probably mark its arguments as tainted, and thus 
-Wanalyzer-tainted-allocation-size should complain if you do a malloc
based on scanf-supplied data without sanitization (otherwise it's a
denial-of-service attack waiting to happen, I think).

I've filed this as:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021



I think only the last case could in theory be a false-positive. I've 
noticed that this is the case when 'n' is guarded by an if making sure 
n is only a multiple of sizeof(type). In theory, I can fix this case 
too as the analysis is path-sensitive.

Sounds like this should be in the test suite also :)

Do you know of some other case where 'n' might be an unknown value 
neither guarded an if condition nor resorted to 'unknown' by a 
complicated flow but still correct?

- Tim
> 
> Thanks,
> Prathamesh
> 

Hope this makes sense and is constructive

Thanks
Dave


  reply	other threads:[~2022-06-17 21:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17 15:54 Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23   ` Tim Lange
2022-06-17 21:39     ` David Malcolm [this message]
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23   ` Tim Lange
2022-06-17 22:13     ` David Malcolm
2022-06-21 20:00       ` Tim Lange
2022-06-21 23:16         ` David Malcolm
2022-06-22 14:57           ` Tim Lange
2022-06-22 18:23             ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39   ` David Malcolm
2022-06-30 20:40     ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b6a6d3e570b4e945db8b7e452cb47f384140272.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    --cc=prathamesh.kulkarni@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).