public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2] analyzer: return a concrete offset for cast_regions
@ 2022-09-02 14:08 Tim Lange
  2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
  2022-09-02 14:36 ` [PATCH 1/2] analyzer: return a concrete offset for cast_regions David Malcolm
  0 siblings, 2 replies; 6+ messages in thread
From: Tim Lange @ 2022-09-02 14:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: dmalcolm, Tim Lange

This patch fixes a bug where maybe_fold_sub_svalue did not fold the
access of a single char from a string to a char when the offset was zero
because get_relative_concrete_offset did return false for cast_regions.

Regrtested on Linux x86_64.

2022-09-02  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	* region.cc (cast_region::get_relative_concrete_offset):
	New overloaded method.
	* region.h: Add cast_region::get_relative_concrete_offset.

gcc/testsuite/ChangeLog:

	* gcc.dg/analyzer/fold-string-to-char.c: New test.

---
 gcc/analyzer/region.cc                              | 10 ++++++++++
 gcc/analyzer/region.h                               |  2 ++
 gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c |  8 ++++++++
 3 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c

diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index f4aba6b9c88..9c8279b130d 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -1556,6 +1556,16 @@ cast_region::dump_to_pp (pretty_printer *pp, bool simple) const
     }
 }
 
+/* Implementation of region::get_relative_concrete_offset vfunc
+   for cast_region.  */
+
+bool
+cast_region::get_relative_concrete_offset (bit_offset_t *out) const
+{
+  *out = (int) 0;
+  return true;
+}
+
 /* class heap_allocated_region : public region.  */
 
 /* Implementation of region::dump_to_pp vfunc for heap_allocated_region.  */
diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h
index d37584b7285..34ce1fa1714 100644
--- a/gcc/analyzer/region.h
+++ b/gcc/analyzer/region.h
@@ -1087,6 +1087,8 @@ public:
   void accept (visitor *v) const final override;
   void dump_to_pp (pretty_printer *pp, bool simple) const final override;
 
+  bool get_relative_concrete_offset (bit_offset_t *out) const final override;
+
   const region *get_original_region () const { return m_original_region; }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c b/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c
new file mode 100644
index 00000000000..46139216bba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fold-string-to-char.c
@@ -0,0 +1,8 @@
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char *ptr = str;
+  __analyzer_eval (ptr[0] == 'H'); /* { dg-warning "TRUE" } */
+}
-- 
2.37.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] analyzer: strcpy and strncpy semantics
  2022-09-02 14:08 [PATCH 1/2] analyzer: return a concrete offset for cast_regions Tim Lange
@ 2022-09-02 14:08 ` Tim Lange
  2022-09-02 15:22   ` David Malcolm
  2022-09-02 14:36 ` [PATCH 1/2] analyzer: return a concrete offset for cast_regions David Malcolm
  1 sibling, 1 reply; 6+ messages in thread
From: Tim Lange @ 2022-09-02 14:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: dmalcolm, Tim Lange

Hi,

below is my patch for the strcpy and strncpy semantics inside the
analyzer, enabling the out-of-bounds checker to also complain about
overflows caused by those two functions.

As the plan is to reason about the inequality of symbolic values in the
future, I decided to use eval_condition to compare the number of bytes and
the string size for strncpy [0].

- Tim

[0] instead of only trying to handle cases where svalues are constant;
    which was how I did it in an earlier draft discussed off-list.


This patch adds modelling for the semantics of strcpy and strncpy in the
simple case where the analyzer is able to reason about the inequality of
the size argument and the string size.

Regrtested on Linux x86_64.

2022-09-02  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	* region-model-impl-calls.cc (region_model::impl_call_strncpy):
	New function.
	* region-model.cc (region_model::on_call_pre):
	Add call to impl_call_strncpy.
	(region_model::get_string_size): New function.
	* region-model.h (class region_model):
	Add impl_call_strncpy and get_string_size.

gcc/testsuite/ChangeLog:

	* gcc.dg/analyzer/out-of-bounds-4.c: New test.
	* gcc.dg/analyzer/strcpy-3.c: New test.
	* gcc.dg/analyzer/strncpy-1.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc       |  62 ++++++++-
 gcc/analyzer/region-model.cc                  |  33 +++++
 gcc/analyzer/region-model.h                   |   4 +
 .../gcc.dg/analyzer/out-of-bounds-4.c         | 122 ++++++++++++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c      |  23 ++++
 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c     |  23 ++++
 6 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strncpy-1.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..9f1ae020f4f 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,69 @@ region_model::impl_call_strcpy (const call_details &cd)
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 					 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+					cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+						     cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+    copied_bytes_sval = get_string_size (src_contents_sval);
+
+  const region *sized_dest_reg
+    = m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+}
+
+/* Handle the on_call_pre part of "strncpy" and "__builtin_strncpy_chk".  */
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+void
+region_model::impl_call_strncpy (const call_details &cd)
+{
+  const svalue *dest_sval = cd.get_arg_svalue (0);
+  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
+					 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+					cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+						     cd.get_ctxt ());
+  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
+
+  cd.maybe_set_lhs (dest_sval);
+
+  const svalue *string_size_sval = get_string_size (src_reg);
+  if (string_size_sval->get_kind () == SK_UNKNOWN)
+    string_size_sval = get_string_size (src_contents_sval);
+
+  /* strncpy copies until a zero terminator is reached or n bytes were copied.
+     Determine the lesser of both here.  */
+  tristate ts = eval_condition (string_size_sval, LT_EXPR, num_bytes_sval);
+  const svalue *copied_bytes_sval;
+  switch (ts.get_value ())
+    {
+      case tristate::TS_TRUE:
+	copied_bytes_sval = string_size_sval;
+	break;
+      case tristate::TS_FALSE:
+	copied_bytes_sval = num_bytes_sval;
+	break;
+      case tristate::TS_UNKNOWN:
+	copied_bytes_sval
+	  = m_mgr->get_or_create_unknown_svalue (size_type_node);
+	break;
+      default:
+	gcc_unreachable ();
+    }
+
+  const region *sized_dest_reg = m_mgr->get_sized_region (dest_reg, NULL_TREE,
+							  copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
 /* Handle the on_call_pre part of "strlen".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index ec29be259b5..c87c79e3f1b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1858,6 +1858,10 @@ region_model::on_call_pre (const gcall *call, region_model_context *ctxt,
 	  case BUILT_IN_STRCPY_CHK:
 	    impl_call_strcpy (cd);
 	    return false;
+	  case BUILT_IN_STRNCPY:
+	  case BUILT_IN_STRNCPY_CHK:
+	    impl_call_strncpy (cd);
+	    return false;
 	  case BUILT_IN_STRLEN:
 	    impl_call_strlen (cd);
 	    return false;
@@ -3218,6 +3222,35 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
+/* Return the string size, including the 0-terminator, if SVAL is a
+   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const svalue *sval) const
+{
+  tree cst = sval->maybe_get_constant ();
+  if (!cst || TREE_CODE (cst) != STRING_CST)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
+/* Return the string size, including the 0-terminator, if REG is a
+   string_region.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const region *reg) const
+{
+  const string_region *str_reg = dyn_cast <const string_region *> (reg);
+  if (!str_reg)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree cst = str_reg->get_string_cst ();
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
    using DIR to determine if this access is a read or write.  */
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 7ce832f6ce4..a6a92d46d91 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -634,6 +634,7 @@ class region_model
   void impl_call_realloc (const call_details &cd);
   void impl_call_strchr (const call_details &cd);
   void impl_call_strcpy (const call_details &cd);
+  void impl_call_strncpy (const call_details &cd);
   void impl_call_strlen (const call_details &cd);
   void impl_call_operator_new (const call_details &cd);
   void impl_call_operator_delete (const call_details &cd);
@@ -793,6 +794,9 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
+  const svalue *get_string_size (const svalue *sval) const;
+  const svalue *get_string_size (const region *reg) const;
+
   /* Implemented in sm-malloc.cc  */
   void on_realloc_with_move (const call_details &cd,
 			     const svalue *old_ptr_sval,
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
new file mode 100644
index 00000000000..382f0fb5ef4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
@@ -0,0 +1,122 @@
+/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-truncation" } */
+#include <stdlib.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Wanalyzer-out-of-bounds tests for str(n)py-related overflows.
+  
+   The intra-procedural tests are all catched by Wstringop-overflow.
+   The inter-procedural out-of-bounds are only found by the analyzer.  */
+
+void test1 (void)
+{
+  char dst[5];
+  strcpy (dst, "Hello"); /* { dg-line test1 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */
+  /* { dg-message "dst" "note" { target *-*-* } test1 } */
+}
+
+void test2 (void)
+{
+  char dst[6];
+  strcpy (dst, "Hello");
+}
+
+void test3 (void)
+{
+  char dst[5];
+  strncpy (dst, "Hello", 6); /* { dg-line test3 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test3 } */
+  /* { dg-message "dst" "note" { target *-*-* } test3 } */
+}
+
+void test4 (void)
+{
+  char dst[5];
+  strncpy (dst, "Hello", 5);
+}
+
+void test5 (void)
+{
+  char *src = "Hello";
+  char dst[5];
+  strcpy (dst, src); /* { dg-line test5 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test5 } */
+  /* { dg-message "dst" "note" { target *-*-* } test5 } */
+}
+
+void test6 (void)
+{
+  char *src = "Hello";
+  char dst[6];
+  strcpy (dst, src);
+}
+
+void test7 (void)
+{
+  int n = 6;
+  char *src = "Hello";
+  char dst[5];
+  strncpy (dst, src, n); /* { dg-line test7 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test7 } */
+  /* { dg-message "dst" "note" { target *-*-* } test7 } */
+}
+
+void test8 (void)
+{
+  int n = 5;
+  char *src = "Hello";
+  char dst[n];
+  strncpy (dst, src, n);
+}
+
+char *return_hello (void)
+{
+  return "hello";
+}
+
+void test9 (void)
+{
+  char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[5];
+  strcpy (dst, str); /* { dg-line test9 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test9 } */
+  /* { dg-message "dst" "note" { target *-*-* } test9 } */
+}
+
+void test10 (void)
+{
+  char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[6];
+  strcpy (dst, str);
+}
+
+void test11 (void)
+{
+  char *str = return_hello ();
+  if (!str)
+    return;
+    char dst[5];
+    strncpy (dst, str, 6); /* { dg-line test11 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test11 } */
+  /* { dg-message "dst" "note" { target *-*-* } test11 } */
+}
+
+void test12 (void)
+{
+  char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[6];
+  strcpy (dst, str);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
new file mode 100644
index 00000000000..a38f9a7641f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -0,0 +1,23 @@
+#include <string.h>
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char buf[6];
+  char *result = strcpy (buf, str);
+  __analyzer_describe (1, result); /* { dg-warning "region_svalue.*?'buf'" } */
+  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
new file mode 100644
index 00000000000..ea051eb761a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
@@ -0,0 +1,23 @@
+#include <string.h>
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char buf[6];
+  char *result = strncpy (buf, str, 6);
+  __analyzer_describe (1, result); /* { dg-warning "region_svalue.*?'buf'" } */
+  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
+}
-- 
2.37.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] analyzer: return a concrete offset for cast_regions
  2022-09-02 14:08 [PATCH 1/2] analyzer: return a concrete offset for cast_regions Tim Lange
  2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
@ 2022-09-02 14:36 ` David Malcolm
  1 sibling, 0 replies; 6+ messages in thread
From: David Malcolm @ 2022-09-02 14:36 UTC (permalink / raw)
  To: Tim Lange, gcc-patches

On Fri, 2022-09-02 at 16:08 +0200, Tim Lange wrote:
> This patch fixes a bug where maybe_fold_sub_svalue did not fold the
> access of a single char from a string to a char when the offset was
> zero
> because get_relative_concrete_offset did return false for
> cast_regions.
> 
> Regrtested on Linux x86_64.

Thanks; this patch is OK for trunk.

Dave


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] analyzer: strcpy and strncpy semantics
  2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
@ 2022-09-02 15:22   ` David Malcolm
  2022-09-04 19:17     ` [PATCH 2/2 v2] analyzer: strcpy semantics Tim Lange
  0 siblings, 1 reply; 6+ messages in thread
From: David Malcolm @ 2022-09-02 15:22 UTC (permalink / raw)
  To: Tim Lange, gcc-patches

On Fri, 2022-09-02 at 16:08 +0200, Tim Lange wrote:
> Hi,
> 
> below is my patch for the strcpy and strncpy semantics inside the
> analyzer, enabling the out-of-bounds checker to also complain about
> overflows caused by those two functions.
> 
> As the plan is to reason about the inequality of symbolic values in
> the
> future, I decided to use eval_condition to compare the number of
> bytes and
> the string size for strncpy [0].
> 
> - Tim
> 
> [0] instead of only trying to handle cases where svalues are
> constant;
>     which was how I did it in an earlier draft discussed off-list.
> 
> 
> This patch adds modelling for the semantics of strcpy and strncpy in
> the
> simple case where the analyzer is able to reason about the inequality
> of
> the size argument and the string size.
> 
> Regrtested on Linux x86_64.

Thanks for the patch.

The strcpy part looks great, but strncpy has some tricky behavior that
isn't fully modeled by the patch; see:

https://en.cppreference.com/w/c/string/byte/strncpy

For example: if the src string with null terminator is shorter than
"count", then dest is padded with additional null characters up to
"count".

You could split out the strcpy part of the patch, as that seems to be
ready to go as-is, and do the strncpy part as a followup if you like;
some further notes below...

[...snip...]
> 
> +/* Handle the on_call_pre part of "strncpy" and
> "__builtin_strncpy_chk".  */
>  
> -  /* For now, just mark region's contents as unknown.  */
> -  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
> +void
> +region_model::impl_call_strncpy (const call_details &cd)
> +{
> +  const svalue *dest_sval = cd.get_arg_svalue (0);
> +  const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree
> (0),
> +                                        cd.get_ctxt ());
> +  const svalue *src_sval = cd.get_arg_svalue (1);
> +  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree
> (1),
> +                                       cd.get_ctxt ());
> +  const svalue *src_contents_sval = get_store_value (src_reg,
> +                                                    cd.get_ctxt ());
> +  const svalue *num_bytes_sval = cd.get_arg_svalue (2);
> +
> +  cd.maybe_set_lhs (dest_sval);
> +
> +  const svalue *string_size_sval = get_string_size (src_reg);
> +  if (string_size_sval->get_kind () == SK_UNKNOWN)
> +    string_size_sval = get_string_size (src_contents_sval);
> +
> +  /* strncpy copies until a zero terminator is reached or n bytes
> were copied.
> +     Determine the lesser of both here.  */
> +  tristate ts = eval_condition (string_size_sval, LT_EXPR,
> num_bytes_sval);
> +  const svalue *copied_bytes_sval;
> +  switch (ts.get_value ())
> +    {
> +      case tristate::TS_TRUE:
> +       copied_bytes_sval = string_size_sval;

This is the
  strlen(src) + 1 < count
case, and thus we want to do a zero-fill of size:
   bin_op(num_bytes_sval, MINUS_EXPR, copied_bytes_sval)
in the relevant subregion of dest_reg.
Or perhaps this could be expressed by first doing a full zero-fill of
the num_bytes_sval-sized region of dest_reg, and then copying the src
contents.

> +       break;
> +      case tristate::TS_FALSE:
> +       copied_bytes_sval = num_bytes_sval;
> +       break;
> +      case tristate::TS_UNKNOWN:
> +       copied_bytes_sval
> +         = m_mgr->get_or_create_unknown_svalue (size_type_node);
> +       break;
> +      default:
> +       gcc_unreachable ();
> +    }
> +
> +  const region *sized_dest_reg = m_mgr->get_sized_region (dest_reg,
> NULL_TREE,
> +                                                        
> copied_bytes_sval);
> +  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
>  }
> 

[...snip...]
> 

> b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> new file mode 100644
> index 00000000000..382f0fb5ef4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
> @@ -0,0 +1,122 @@
> +/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-
> truncation" } */
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdint.h>
> +
> +/* Wanalyzer-out-of-bounds tests for str(n)py-related overflows.
> +  
> +   The intra-procedural tests are all catched by Wstringop-overflow.

Nit: "catched" -> "caught".

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> new file mode 100644
> index 00000000000..ea051eb761a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/strncpy-1.c
> @@ -0,0 +1,23 @@
> +#include <string.h>
> +#include "analyzer-decls.h"
> +
> +void test_1 (void)
> +{
> +  char str[] = "Hello";
> +  char buf[6];
> +  char *result = strncpy (buf, str, 6);
> +  __analyzer_describe (1, result); /* { dg-warning
> "region_svalue.*?'buf'" } */
> +  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
> +  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
> +}

This covers the case where "count" (the 3rd arg of strncpy) is strlen
(src) + 1, but it would be good to have test coverage for the following
additional cases:

* count == strlen(src) (and thus the dst buf doesn't have a nul 
terminator)

* count < strlen(src)  (and thus we have only a prefix of the src
string writtn)

* count == 0  (with my QA hat on; a no-op, I think, but let's make sure
we don't crash)

* count == strlen(src) + 2  (so that there's an extra 0 byte written to
the end of the dst buffer)

* count is much bigger than strlen(src)

for all of the valid cases (where the dst buffer is big enough).  We'll
eventually also want coverage for out-of-bounds versions of these,
where the dst buffer isn't big enough.


Do we properly handle the case where the destination pointer is at some
nonzero constant offset into a buffer? (for both strcpy and strncpy)

Hope this is constructive
Dave


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2 v2] analyzer: strcpy semantics
  2022-09-02 15:22   ` David Malcolm
@ 2022-09-04 19:17     ` Tim Lange
  2022-09-04 22:26       ` David Malcolm
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Lange @ 2022-09-04 19:17 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc-patches, Tim Lange

Hi Dave,

sorry about the strncpy thing, I should've been more careful. Below is the
patch with just the strcpy part.

- Tim

This patch adds modelling for the semantics of strcpy in the simple case
where the analyzer is able to infer a concrete string size.

Regrtested on Linux x86_64.

2022-09-04  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	* region-model-impl-calls.cc (region_model::impl_call_strcpy):
	Handle the constant string case.
	* region-model.cc (region_model::get_string_size):
	New function to get the string size from a region or svalue.
	* region-model.h (class region_model): Add get_string_size.

gcc/testsuite/ChangeLog:

	* gcc.dg/analyzer/out-of-bounds-4.c: New test.
	* gcc.dg/analyzer/strcpy-3.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc       | 16 ++++-
 gcc/analyzer/region-model.cc                  | 29 +++++++++
 gcc/analyzer/region-model.h                   |  3 +
 .../gcc.dg/analyzer/out-of-bounds-4.c         | 65 +++++++++++++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c      | 23 +++++++
 5 files changed, 133 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..3790eaf2d79 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,23 @@ region_model::impl_call_strcpy (const call_details &cd)
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 					 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+					cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+						     cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+    copied_bytes_sval = get_string_size (src_contents_sval);
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+  const region *sized_dest_reg
+    = m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
 /* Handle the on_call_pre part of "strlen".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index ec29be259b5..4ec18c86774 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3218,6 +3218,35 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
+/* Return the string size, including the 0-terminator, if SVAL is a
+   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const svalue *sval) const
+{
+  tree cst = sval->maybe_get_constant ();
+  if (!cst || TREE_CODE (cst) != STRING_CST)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
+/* Return the string size, including the 0-terminator, if REG is a
+   string_region.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const region *reg) const
+{
+  const string_region *str_reg = dyn_cast <const string_region *> (reg);
+  if (!str_reg)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree cst = str_reg->get_string_cst ();
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
    using DIR to determine if this access is a read or write.  */
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 7ce832f6ce4..a1f2165e145 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -793,6 +793,9 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
+  const svalue *get_string_size (const svalue *sval) const;
+  const svalue *get_string_size (const region *reg) const;
+
   /* Implemented in sm-malloc.cc  */
   void on_realloc_with_move (const call_details &cd,
 			     const svalue *old_ptr_sval,
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
new file mode 100644
index 00000000000..46f600de658
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
@@ -0,0 +1,65 @@
+/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-truncation" } */
+#include <string.h>
+
+/* Wanalyzer-out-of-bounds tests for strpy-related overflows.
+  
+   The intra-procedural tests are all caught by Wstringop-overflow.
+   The inter-procedural out-of-bounds are only found by the analyzer.  */
+
+void test1 (void)
+{
+  char dst[5];
+  strcpy (dst, "Hello"); /* { dg-line test1 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */
+  /* { dg-message "dst" "note" { target *-*-* } test1 } */
+}
+
+void test2 (void)
+{
+  char dst[6];
+  strcpy (dst, "Hello");
+}
+
+void test3 (void)
+{
+  char *src = "Hello";
+  char dst[5];
+  strcpy (dst, src); /* { dg-line test3 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test3 } */
+  /* { dg-message "dst" "note" { target *-*-* } test3 } */
+}
+
+void test4 (void)
+{
+  char *src = "Hello";
+  char dst[6];
+  strcpy (dst, src);
+}
+
+const char *return_hello (void)
+{
+  return "hello";
+}
+
+void test5 (void)
+{
+  const char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[5];
+  strcpy (dst, str); /* { dg-line test5 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test5 } */
+  /* { dg-message "dst" "note" { target *-*-* } test5 } */
+}
+
+void test6 (void)
+{
+  const char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[6];
+  strcpy (dst, str);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
new file mode 100644
index 00000000000..a38f9a7641f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -0,0 +1,23 @@
+#include <string.h>
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char buf[6];
+  char *result = strcpy (buf, str);
+  __analyzer_describe (1, result); /* { dg-warning "region_svalue.*?'buf'" } */
+  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
+}
-- 
2.37.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2 v2] analyzer: strcpy semantics
  2022-09-04 19:17     ` [PATCH 2/2 v2] analyzer: strcpy semantics Tim Lange
@ 2022-09-04 22:26       ` David Malcolm
  0 siblings, 0 replies; 6+ messages in thread
From: David Malcolm @ 2022-09-04 22:26 UTC (permalink / raw)
  To: Tim Lange; +Cc: gcc-patches

On Sun, 2022-09-04 at 21:17 +0200, Tim Lange wrote:
> Hi Dave,
> 
> sorry about the strncpy thing, I should've been more careful. Below
> is the
> patch with just the strcpy part.

Thanks - this patch looks OK for trunk.

Dave


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-04 22:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-02 14:08 [PATCH 1/2] analyzer: return a concrete offset for cast_regions Tim Lange
2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
2022-09-02 15:22   ` David Malcolm
2022-09-04 19:17     ` [PATCH 2/2 v2] analyzer: strcpy semantics Tim Lange
2022-09-04 22:26       ` David Malcolm
2022-09-02 14:36 ` [PATCH 1/2] analyzer: return a concrete offset for cast_regions David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).