public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>
Cc: gcc@gcc.gnu.org
Subject: Re: [PATCH v2] analyzer: add allocation size checker
Date: Wed, 29 Jun 2022 13:39:52 -0400	[thread overview]
Message-ID: <db6e135ff103eb3305c68f8ca8b34cbf98d0c00a.camel@redhat.com> (raw)
In-Reply-To: <20220629153949.74450-1-mail@tim-lange.me>

On Wed, 2022-06-29 at 17:39 +0200, Tim Lange wrote:

> Hi,

Thanks for the updated patch.

Overall, looks nearly ready; various nits inline below, throughout...

> 
> I've addressed most of the points from the review.
> * The allocation size warning warns whenever region_model::get_capacity returns
> something useful, i.e. also on statically-allocated regions.

Thanks.  Looks like you added test coverage for this in allocation-
size-5.c

> * I added a new virtual function to the pending-diagnostic class, so
that it
> is possible to emit a custom region creation description.
> * The test cases should have a better coverage now.
> * Conservative struct handling
> 
> The warning now looks like this:
> /path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>     9 |   int *iptr = ptr;
>       |        ^~~~
>   ‘main’: events 1-2
>     |
>     |    8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
>     |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |               |
>     |      |               (1) allocated ‘(long unsigned int)n * 2’ bytes here
>     |    9 |   int *iptr = ptr;
>     |      |        ~~~~    
>     |      |        |
>     |      |        (2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
>     |

Looks great.

> 
> /path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    15 |   int *ptrw = malloc (sizeof (short));
>       |               ^~~~~~~~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |   15 |   int *ptrw = malloc (sizeof (short));
>     |      |               ^~~~~~~~~~~~~~~~~~~~~~~
>     |      |               |
>     |      |               (1) allocated ‘2’ bytes here

Looks a bit weird to be quoting a number here; maybe whenever the
expression is just a constant, print it unquoted?  (though that could
be fiddly to implement, so can be ignored if it turns out to be) .


>     |      |               (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
>     |
> The only thing I couldn't address was moving the second event toward the lhs or
> assign token here. I tracked it down till get_stmt_location where it seems that
> the rhs is actually the location of the statement. Is there any way to get (2)
> to be focused on the lhs?

Annoyingly, we've lost a lot of location information by the time the
analyzer runs.

In theory we could special-case for when we have the def-stmt of the
SSA_NAME that's that default (i.e. initial) value of a VAR_DECL, and if
we see the write is there, we could use the DECL_SOUCE_LOCATION of the
VAR_DECL for the write, so that we'd get:

    |   15 |   int *ptrw = malloc (sizeof (short));
    |      |        ^~~~   ^~~~~~~~~~~~~~~~~~~~~~~
    |      |        |      |
    |      |        |      (1) allocated ‘2’ bytes here
    |      |        (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
    |

which is perhaps slightly more readable.  I'm not sure it's worth it
though.

> 
> Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
> false-positive (but none of them contained a bug found by the checker either).

Great.

> `make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on
> the event splitting, the regression tests are yet to run.
> 
> - Tim
> 
> 
> This patch adds an checker that warns about code paths in which a buffer is
> assigned to a incompatible type, i.e. when the allocated buffer size is not a
> multiple of the pointee's size.
> 
> gcc/analyzer/ChangeLog:

You should add a reference to the RFE bug to the top of the ChangeLog entries:
          PR analyzer/105900

Please also add it to the commit message, in the form " [PR105900]";
see the examples section twoards the end of
https://gcc.gnu.org/contribute.html#patches


> 
>         * analyzer.opt: Added Wanalyzer-allocation-size.

[...snip...]

> 
> gcc/ChangeLog:

...and here

> 
>         * doc/invoke.texi: Added Wanalyzer-allocation-size.
> 
> gcc/testsuite/ChangeLog:

...and here

> 
>         * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
>         * gcc.dg/analyzer/allocation-size-1.c: New test.
>         * gcc.dg/analyzer/allocation-size-2.c: New test.
>         * gcc.dg/analyzer/allocation-size-3.c: New test.
>         * gcc.dg/analyzer/allocation-size-4.c: New test.
>         * gcc.dg/analyzer/allocation-size-5.c: New test.
> 
> Signed-off-by: Tim Lange <mail@tim-lange.me>

[...snip...]

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 4aea52d3a87..912def2faf2 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider
>  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param
>  The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format.
>  
> +Wanalyzer-allocation-size
> +Common Var(warn_analyzer_allocation_size) Init(1) Warning
> +Warn about code paths in which a buffer is assigned to a incompatible type.

Reword "buffer" to "pointer to a buffer", I think.

"a incompatible" -> "an incompatible"

[...snip...]

> +/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
> +
> +class dubious_allocation_size
> +: public pending_diagnostic_subclass<dubious_allocation_size>
> +{
> +public:
> +  dubious_allocation_size (const region *lhs, const region *rhs)
> +  : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE)
> +  {}

[...snip...]

> +  bool operator== (const dubious_allocation_size &other) const
> +  {
> +    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;

Probably should also check that:
  same_tree_p (m_expr, other.m_expr);

[...snip...]

> +/* Return true on dubious allocation sizes for constant sizes.  */
> +
> +static bool
> +capacity_compatible_with_type (tree cst, tree pointee_size_tree,
> +			       bool is_struct)
> +{
> +  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
> +  if (pointee_size == 0)
> +    return 0;

"false" rather than 0, given that this is bool.

[...snip...]

> +/* Return true if the lhs and rhs of an assignment have different types.  */
> +
> +static bool
> +is_any_cast_p (const gimple *stmt)
> +{
> +  if (const gassign *assign = dyn_cast<const gassign *>(stmt))
> +    return gimple_assign_cast_p (assign)
> +	  || (gimple_num_ops (assign) == 2
> +	      && !pending_diagnostic::same_tree_p (
> +				    TREE_TYPE (gimple_assign_lhs (assign)),
> +				    TREE_TYPE (gimple_assign_rhs1 (assign))));

The "== 2" subclause in the above condition doesn't look quite right to
me; what statements did you encounter that needed this?

[...snip...]

> @@ -9758,6 +9759,18 @@ By default, the analysis silently stops if the code is too
>  complicated for the analyzer to fully explore and it reaches an internal
>  limit.  The @option{-Wanalyzer-too-complex} option warns if this occurs.
>  
> +@item -Wno-analyzer-allocation-size
> +@opindex Wanalyzer-allocation-size
> +@opindex Wno-analyzer-allocation-size
> +This warning requires @option{-fanalyzer}, which enables it; use
> +@option{-Wno-analyzer-allocation-size}
> +to disable it.
> +
> +This diagnostic warns for paths through the code in which a buffer is casted
> +to a type where the buffer size is not a multiple of the pointee size.

At the risk of bikeshedding, how about:

This diagnostic warns for paths through the code in which a pointer
is assigned to point at a buffer with a size that is not a multiple
of sizeof(*pointer).

See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}.


[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> new file mode 100644
> index 00000000000..02634ae883b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> @@ -0,0 +1,102 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests with constant buffer sizes.  */
> +
> +void test_1 (void)
> +{
> +  short *ptr = malloc (21 * sizeof (short));
> +  free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> +  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
> +  free (ptr);
> +
> +  /* { dg-warning "" "" { target *-*-* } malloc2 } */
> +  /* { dg-message "" "" { target *-*-* } malloc2 } */

The various dg-warning and dg-message directives here (and throughout
the rest of the patch) shouldn't have just "" "" for their first two
args.

The first arg should be a regexp that matches some (nonempty) subset of
the expected text.  There's a balance to be struck between:
(a) terseness to avoid "gold plating" the test output (where making any
change to the wording would involve lots of tedious updates to test
directives)
versus
(b) giving us test coverage that the message is sane, so that if we
accidentally break the wording due to future changes to the analyzer,
then at least one test starts failing

Probably best for most of these regexps to be terse, but an empty
regexp is too terse.

The 2nd arg helps us disambiguate with directive we're talking about,
so can be "warning" and "note" for the two above.

[...snip...]

> +void test_5 (void)
> +{
> +  int user_input;
> +  scanf("%i", &user_input);
> +  int n;
> +  if (user_input == 0)
> +    n = 21 * sizeof (short);
> +  else
> +    n = 42 * sizeof (short);

I see you've used scanf, presumably to get a symbolic value for the
variable.  If so, a simpler way to do this is to simply use a parameter
to the test function.  But there's no need to change these test cases.

Perhaps scanf should taint its arguments, which is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021 but obviously that
would be a different patch.

[...snip...]

> +void test_9(void) 
> +{
> +  // FIXME

Please make this comment more descriptive about what the issue here is.

> +  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-*
} } */
> +  free (buf);
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> new file mode 100644
> index 00000000000..cb35a9d717b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c

[...snip...]

> +void test_8(void) 
> +{
> +  int n;
> +  scanf("%i", &n);
> +  // FIXME

Make the comment more descriptive please.

> +  int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" ""
{ xfail *-*-* } } */
> +  free (buf);
> +}
> +
> +void test_9 (void)
> +{
> +  int n;
> +  scanf("%i", &n);
> +  /* n is a conjured_svalue.  */
> +  void *ptr = malloc (n); /* { dg-message } */
> +  int *iptr = (int *)ptr; /* { dg-line assign9 } */
> +  free (iptr);
> +
> +  /* { dg-warning "" "" { target *-*-* } assign9 } */
> +  /* { dg-message "" "" { target *-*-* } assign9 } */
> +}
> +
> +void test_11 (void)
> +{
> +  int n;
> +  scanf("%i", &n);
> +  void *ptr = malloc (n);
> +  if (n == 4)

Presumably this should be a test against sizeof (int), rather than 4?

Please add a testcase where the comparison is against the wrong
constant.

> +    {
> +      /* n is a conjured_svalue but guarded.  */
> +      int *iptr = (int *)ptr;
> +      free (iptr);
> +    }
> +  else
> +    free (ptr);
> +}

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> new file mode 100644
> index 00000000000..afb1782e0cd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> @@ -0,0 +1,40 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests related to structs.  */

Looks like this was copied-and-pasted, and should be updated to "Tests
of statically-sized buffers" or somesuch.

> +
> +typedef struct a {
> +  short s;
> +} a;
> +
> +int *test_1 (void)
> +{
> +  a A;
> +  A.s = 1;
> +  int *ptr = (int *) &A; /* { dg-line assign1 } */
> +  return ptr;
> +
> +  /* { dg-warning "" "" { target *-*-* } assign1 } */
> +  /* { dg-message "" "" { target *-*-* } assign1 } */
> +}
> +
> +int *test2 (void)
> +{
> +  char arr[4];

I think this needs to be sizeof(int), rather than 4.

> +  int *ptr = (int *)arr;
> +  return ptr;
> +}
> +
> +int *test3 (void)
> +{
> +  char arr[2];
> +  int *ptr = (int *)arr; /* { dg-line assign3 } */
> +  return ptr;
> +
> +  /* { dg-warning "" "" { target *-*-* } assign3 } */
> +  /* { dg-message "" "" { target *-*-* } assign3 } */
> +}
> +
> +int main() {
> +  return 0;
> +}

[...snip...]

Thanks again for the v2 patch; hope the above makes sense
Dave


  reply	other threads:[~2022-06-29 17:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23   ` Tim Lange
2022-06-17 21:39     ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23   ` Tim Lange
2022-06-17 22:13     ` David Malcolm
2022-06-21 20:00       ` Tim Lange
2022-06-21 23:16         ` David Malcolm
2022-06-22 14:57           ` Tim Lange
2022-06-22 18:23             ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39   ` David Malcolm [this message]
2022-06-30 20:40     ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db6e135ff103eb3305c68f8ca8b34cbf98d0c00a.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).