public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 2/2] analyzer: strcpy and strncpy semantics
Date: Fri, 02 Sep 2022 11:22:42 -0400	[thread overview]
Message-ID: <e434af61fcc629aa28a10e0878040c83f4f887be.camel@redhat.com> (raw)
In-Reply-To: <20220902140834.11636-2-mail@tim-lange.me>

On Fri, 2022-09-02 at 16:08 +0200, Tim Lange wrote:
> Hi,
> 
> below is my patch for the strcpy and strncpy semantics inside the
> analyzer, enabling the out-of-bounds checker to also complain about
> overflows caused by those two functions.
> 
> As the plan is to reason about the inequality of symbolic values in
> the
> future, I decided to use eval_condition to compare the number of
> bytes and
> the string size for strncpy [0].
> 
> - Tim
> 
> [0] instead of only trying to handle cases where svalues are
> constant;
>     which was how I did it in an earlier draft discussed off-list.
> 
> 
> This patch adds modelling for the semantics of strcpy and strncpy in
> the
> simple case where the analyzer is able to reason about the inequality
> of
> the size argument and the string size.
> 
> Regrtested on Linux x86_64.

Thanks for the patch.

The strcpy part looks great, but strncpy has some tricky behavior that
isn't fully modeled by the patch; see:

https://en.cppreference.com/w/c/string/byte/strncpy

For example: if the src string with null terminator is shorter than
"count", then dest is padded with additional null characters up to
"count".

You could split out the strcpy part of the patch, as that seems to be
ready to go as-is, and do the strncpy part as a followup if you like;
some further notes below...

[...snip...]
> 
> +/* Handle the on_call_pre part of "strncpy" and
> "__builtin_strncpy_chk".  */
>  
> -  /* For now, just mark region's contents as unknown.  */
> -  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
> +void
> +region_model::impl_call_strncpy (const call_details &cd)
> +{
> +  const svalue *dest_sval = cd.get_arg_svalue (0);
> +  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree
> (0),
> +                                        cd.get_ctxt ());
> +  const svalue *src_sval = cd.get_arg_svalue (1);
> +  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree
> (1),
> +                                       cd.get_ctxt ());
> +  const svalue *src_contents_sval = get_store_value (src_reg,
> +                                                    cd.get_ctxt ());
> +  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
> +
> +  cd.maybe_set_lhs (dest_sval);
> +
> +  const svalue *string_size_sval = get_string_size (src_reg);
> +  if (string_size_sval->get_kind () == SK_UNKNOWN)
> +    string_size_sval = get_string_size (src_contents_sval);
> +
> +  /* strncpy copies until a zero terminator is reached or n bytes
> were copied.
> +     Determine the lesser of both here.  */
> +  tristate ts = eval_condition (string_size_sval, LT_EXPR,
> num_bytes_sval);
> +  const svalue *copied_bytes_sval;
> +  switch (ts.get_value ())
> +    {
> +      case tristate::TS_TRUE:
> +       copied_bytes_sval = string_size_sval;

This is the
  strlen(src) + 1 < count
case, and thus we want to do a zero-fill of size:
   bin_op(num_bytes_sval, MINUS_EXPR, copied_bytes_sval)
in the relevant subregion of dest_reg.
Or perhaps this could be expressed by first doing a full zero-fill of
the num_bytes_sval-sized region of dest_reg, and then copying the src
contents.

> +       break;
> +      case tristate::TS_FALSE:
> +       copied_bytes_sval = num_bytes_sval;
> +       break;
> +      case tristate::TS_UNKNOWN:
> +       copied_bytes_sval
> +         = m_mgr->get_or_create_unknown_svalue (size_type_node);
> +       break;
> +      default:
> +       gcc_unreachable ();
> +    }
> +
> +  const region *sized_dest_reg = m_mgr->get_sized_region (dest_reg,
> NULL_TREE,
> +                                                        
> copied_bytes_sval);
> +  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
>  }
> 

[...snip...]
> 

> b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> new file mode 100644
> index 00000000000..382f0fb5ef4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> @@ -0,0 +1,122 @@
> +/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-
> truncation" } */
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdint.h>
> +
> +/* Wanalyzer-out-of-bounds tests for str(n)py-related overflows.
> +  
> +   The intra-procedural tests are all catched by Wstringop-overflow.

Nit: "catched" -> "caught".

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> new file mode 100644
> index 00000000000..ea051eb761a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> @@ -0,0 +1,23 @@
> +#include <string.h>
> +#include "analyzer-decls.h"
> +
> +void test_1 (void)
> +{
> +  char str[] = "Hello";
> +  char buf[6];
> +  char *result = strncpy (buf, str, 6);
> +  __analyzer_describe (1, result); /* { dg-warning
> "region_svalue.*?'buf'" } */
> +  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
> +}

This covers the case where "count" (the 3rd arg of strncpy) is strlen
(src) + 1, but it would be good to have test coverage for the following
additional cases:

* count == strlen(src) (and thus the dst buf doesn't have a nul 
terminator)

* count < strlen(src)  (and thus we have only a prefix of the src
string writtn)

* count == 0  (with my QA hat on; a no-op, I think, but let's make sure
we don't crash)

* count == strlen(src) + 2  (so that there's an extra 0 byte written to
the end of the dst buffer)

* count is much bigger than strlen(src)

for all of the valid cases (where the dst buffer is big enough).  We'll
eventually also want coverage for out-of-bounds versions of these,
where the dst buffer isn't big enough.


Do we properly handle the case where the destination pointer is at some
nonzero constant offset into a buffer? (for both strcpy and strncpy)

Hope this is constructive
Dave


  reply	other threads:[~2022-09-02 15:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-02 14:08 [PATCH 1/2] analyzer: return a concrete offset for cast_regions Tim Lange
2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
2022-09-02 15:22   ` David Malcolm [this message]
2022-09-04 19:17     ` [PATCH 2/2 v2] analyzer: strcpy semantics Tim Lange
2022-09-04 22:26       ` David Malcolm
2022-09-02 14:36 ` [PATCH 1/2] analyzer: return a concrete offset for cast_regions David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e434af61fcc629aa28a10e0878040c83f4f887be.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mail@tim-lange.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).