public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/2] analyzer: out-of-bounds checker [PR106000]
Date: Tue, 09 Aug 2022 18:44:23 -0400	[thread overview]
Message-ID: <7a010cff4272202a15bd90b92863f54946f41b9e.camel@redhat.com> (raw)
In-Reply-To: <20220809211943.82098-2-mail@tim-lange.me>

On Tue, 2022-08-09 at 23:19 +0200, Tim Lange wrote:
> This patch adds an experimental out-of-bounds checker to the
> analyzer.
> 
> The checker was tested on coreutils, curl, httpd and openssh. It is
> mostly
> accurate but does produce false-positives on yacc-generated files and
> sometimes when the analyzer misses an invariant. These cases will be
> documented in bugzilla.
> (Regrtests still running with the latest changes, will report back
> later.)

Hi Tim, thanks for the patch, and for all the testing you've done on
it.

We've already had several rounds of review of this off-list, and this
patch looks very close to ready.

Some nits below...

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 5021376b6fb..8e73af60ceb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -158,6 +158,10 @@ Wanalyzer-tainted-size
>  Common Var(warn_analyzer_tainted_size) Init(1) Warning
>  Warn about code paths in which an unsanitized value is used as a
> size.
>  
> +Wanalyzer-out-of-bounds
> +Common Var(warn_analyzer_out_of_bounds) Init(1) Warning
> +Warn about code paths in which a write or read to a buffer is out-
> of-bounds.
> +

Please keep the list alphabetized; I think this needs to be between
  Wanalyzer-mismatching-deallocation 
and 
  Wanalyzer-possible-null-argument

>  Wanalyzer-use-after-free
>  Common Var(warn_analyzer_use_after_free) Init(1) Warning
>  Warn about code paths in which a freed value is used.
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index f7df2fca245..2f9382ed96c 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -1268,6 +1268,402 @@ region_model::on_stmt_pre (const gimple
> *stmt,
>      }
>  }
>  
> +/* Abstract base class for all out-of-bounds warnings.  */
> +
> +class out_of_bounds : public
> pending_diagnostic_subclass<out_of_bounds>
> +{
> +public:
> +  out_of_bounds (const region *reg, tree diag_arg, byte_range range)
> +  : m_reg (reg), m_diag_arg (diag_arg), m_range (range)
> +  {}
> +
> +  const char *get_kind () const final override
> +  {
> +    return "out_of_bounds_diagnostic";
> +  }
> +
> +  bool operator== (const out_of_bounds &other) const
> +  {
> +    return m_reg == other.m_reg
> +          && m_range == other.m_range
> +          && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_out_of_bounds;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final
> override
> +  {
> +    interest->add_region_creation (m_reg);
> +  }
> +
> +protected:
> +  const region *m_reg;
> +  tree m_diag_arg;
> +  byte_range m_range;

Please add a comment clarifying what the meaning of m_range is here. 
Is it
(a) the range of all bytes that are accessed,
(b) the range of bytes that are accessed out-of-bounds,
(c) etc?

From my reading of the patch I think it's (b).


> +};
> +
> +/* Abstract subclass to complaing about out-of-bounds
> +   past the end of the buffer.  */
> +
> +class past_the_end : public out_of_bounds
> +{
> +public:
> +  past_the_end (const region *reg, tree diag_arg, byte_range range,
> +               tree byte_bound)
> +  : out_of_bounds (reg, diag_arg, range), m_byte_bound (byte_bound)
> +  {}
> +
> +  bool operator== (const past_the_end &other) const
> +  {
> +    return m_reg == other.m_reg
> +          && m_range == other.m_range
> +          && pending_diagnostic::same_tree_p (m_diag_arg,
> other.m_diag_arg)

Is it possible to call
  out_of_bounds::operator== 
for the first three fields, rather than a copy-and-paste of the logic?

> +          && pending_diagnostic::same_tree_p (m_byte_bound,
> +                                              other.m_byte_bound);
> +  }
> +
> +  label_text
> +  describe_region_creation_event (const evdesc::region_creation &ev)
> final
> +  override
> +  {
> +    if (m_byte_bound && TREE_CODE (m_byte_bound) == INTEGER_CST)
> +      return ev.formatted_print ("capacity is %E bytes",
> m_byte_bound);
> +
> +    return label_text ();
> +  }
> +
> +protected:
> +  tree m_byte_bound;
> +};

[...snip the concrete subclasses...]

We went through several rounds of review off-list, and I have lots of
ideas for wording tweaks to the patch, but rather than me be a
"backseat driver" (or bikeshedding), I think that that aspect of the
patch is good enough as-is, and I'll make the wording changes myself
once the patch is in trunk.


[...snip...]

> +
> +    if (warned)
> +      {
> +       char num_bytes_past_buf[WIDE_INT_PRINT_BUFFER_SIZE];
> +       print_dec (m_range.m_size_in_bytes, num_bytes_past_buf,
> UNSIGNED);

I think we can use %wu for this, but I can fix this up in a followup.


[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index fa23fbeaaaa..5ab834af780 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -459,6 +459,7 @@ Objective-C and Objective-C++ Dialects}.
>  -Wno-analyzer-null-dereference @gol
>  -Wno-analyzer-possible-null-argument @gol
>  -Wno-analyzer-possible-null-dereference @gol
> +-Wno-analyzer-out-of-bounds @gol

Please move between
  -Wno-analyzer-null-dereference @gol
and
  -Wno-analyzer-possible-null-argument @gol 
for alphabetization.

>  -Wno-analyzer-shift-count-negative @gol
>  -Wno-analyzer-shift-count-overflow @gol
>  -Wno-analyzer-stale-setjmp-buffer @gol
> @@ -9991,6 +9992,17 @@ This warning requires @option{-fanalyzer},
> which enables it; use
>  This diagnostic warns for paths through the code in which a
>  value known to be NULL is dereferenced.
>  
> +@item -Wno-analyzer-out-of-bounds
> +@opindex Wanalyzer-out-of-bounds
> +@opindex Wno-analyzer-out-of-bounds
> +This warning requires @option{-fanalyzer} to enable it; use
> +@option{-Wno-analyzer-out-of-bounds} to disable it.
> +
> +This diagnostic warns for path through the code in which a buffer is
> +accessed or written out-of-bounds.

Would be good to clarify the limitations: as I understand it:

"The diagnostic only applies for cases where the analyzer is able to
determine a constant size for the buffer.  It warns when any part of a
read or write is definitely before the start of the buffer, or
definitely after the end."

...or somesuch wording.
> +
> +See @url{https://cwe.mitre.org/data/definitions/119.html, CWE-119:
> Improper Restriction of Operations within the Bounds of a Memory
> Buffer}.

Also, please move the new entry to position to keep things
alphabetized.

> +
>  @item -Wno-analyzer-shift-count-negative
>  @opindex Wanalyzer-shift-count-negative
>  @opindex Wno-analyzer-shift-count-negative

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
> b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
> new file mode 100644
> index 00000000000..715c8b7460f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-1.c
> @@ -0,0 +1,119 @@
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +
> +/* Wanalyzer-out-of-bounds tests for buffer overflows.  */
> +
> +/* Avoid folding of memcpy.  */
> +typedef void * (*memcpy_t) (void *dst, const void *src, size_t n);
> +
> +static memcpy_t __attribute__((noinline))
> +get_memcpy (void)
> +{
> +  return memcpy;
> +}
> +
> +
> +/* Taken from CWE-787.  */
> +void test1 (void)
> +{
> +  int id_sequence[3];
> +
> +  id_sequence[0] = 123;
> +  id_sequence[1] = 234;
> +  id_sequence[2] = 345;
> +  id_sequence[3] = 456; /* { dg-line test1 } */
> +
> +  /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */
> +  /* { dg-message "" "note" { target *-*-* } test1 } */

I see that you've left the regexes mostly blank in the various DejaGnu
directives in these new tests.  Normally I'd want these to be less
vague, but given that I plan to change the wordings in a followup
anyway, this is OK.

[...snip lots of great testcases...]

With the above nits fixed, the patch is OK for trunk (assuming that
your testing doesn't show any problems).

Thanks again for the patch; this feels like a major new feature.
Dave




  reply	other threads:[~2022-08-09 22:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09 21:19 [PATCH 1/2] analyzer: consider that realloc could shrink the buffer [PR106539] Tim Lange
2022-08-09 21:19 ` [PATCH 2/2] analyzer: out-of-bounds checker [PR106000] Tim Lange
2022-08-09 22:44   ` David Malcolm [this message]
2022-08-09 22:02 ` [PATCH 1/2] analyzer: consider that realloc could shrink the buffer [PR106539] David Malcolm
2022-08-11 17:24 ` [PATCH 1/2 v2] " Tim Lange
2022-08-11 17:24   ` [PATCH 2/2 v2] analyzer: out-of-bounds checker [PR106000] Tim Lange
2022-08-11 19:30     ` David Malcolm
2022-08-11 19:25   ` [PATCH 1/2 v2] analyzer: consider that realloc could shrink the buffer [PR106539] David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a010cff4272202a15bd90b92863f54946f41b9e.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).