public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899]
@ 2023-08-24 14:38 David Malcolm
  2023-08-24 14:38 ` [PATCH 1/9] analyzer: add logging to impl_path_context David Malcolm
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch kit makes improvements to the analyzer's new strlen
implementation, and wires it up to strcpy and strcat.

For example, given:

  #include <string.h>

  void test (void)
  {
    char buf[10];
    strcpy (buf, "hello world!");
  }

we now emit:

demo.c: In function ‘test’:
demo.c:6:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
    6 |   strcpy (buf, "hello world!");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test’: events 1-2
    |
    |    5 |   char buf[10];
    |      |        ^~~
    |      |        |
    |      |        (1) capacity: 10 bytes
    |    6 |   strcpy (buf, "hello world!");
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (2) out-of-bounds write from byte 10 till byte 12 but ‘buf’ ends at byte 10
    |
demo.c:6:3: note: write of 3 bytes to beyond the end of ‘buf’
    6 |   strcpy (buf, "hello world!");
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
demo.c:6:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[9]’

  ┌─────┬─────┬────┬────┬────┬────┬────┬────┬────┬────┐┌─────┬─────┬─────┐
  │ [0] │ [1] │[2] │[3] │[4] │[5] │[6] │[7] │[8] │[9] ││[10] │[11] │[12] │
  ├─────┼─────┼────┼────┼────┼────┼────┼────┼────┼────┤├─────┼─────┼─────┤
  │ ‘h’ │ ‘e’ │‘l’ │‘l’ │‘o’ │‘ ’ │‘w’ │‘o’ │‘r’ │‘l’ ││ ‘d’ │ ‘!’ │ NUL │
  ├─────┴─────┴────┴────┴────┴────┴────┴────┴────┴────┴┴─────┴─────┴─────┤
  │                  string literal (type: ‘char[13]’)                   │
  └──────────────────────────────────────────────────────────────────────┘
     │     │    │    │    │    │    │    │    │    │      │     │     │
     │     │    │    │    │    │    │    │    │    │      │     │     │
     v     v    v    v    v    v    v    v    v    v      v     v     v
  ┌─────┬────────────────────────────────────────┬────┐┌─────────────────┐
  │ [0] │                  ...                   │[9] ││                 │
  ├─────┴────────────────────────────────────────┴────┤│after valid range│
  │             ‘buf’ (type: ‘char[10]’)              ││                 │
  └───────────────────────────────────────────────────┘└─────────────────┘
  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
                            │                                   │
                  ╭─────────┴────────╮              ╭───────────┴──────────╮
                  │capacity: 10 bytes│              │⚠️  overflow of 3 bytes│
                  ╰──────────────────╯              ╰──────────────────────╯

in addition to the pre-existing:

demo.c:6:3: warning: ‘__builtin_memcpy’ writing 13 bytes into a region of size 10 overflows the destination [-Wstringop-overflow=]
demo.c:5:8: note: destination object ‘buf’ of size 10
    5 |   char buf[10];
      |        ^~~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3461-g9aaec66917c96a through to
r14-3469-gbbdc0e0d0042ae.

David Malcolm (9):
  analyzer: add logging to impl_path_context
  analyzer: handle symbolic bindings in scan_for_null_terminator
    [PR105899]
  analyzer: reimplement kf_strcpy [PR105899]
  analyzer: eliminate region_model::get_string_size [PR105899]
  analyzer: reimplement kf_memcpy_memmove
  analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899]
  analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL)
    [PR105899]
  analyzer: handle strlen(BITS_WITHIN) [PR105899]
  analyzer: implement kf_strcat [PR105899]

 gcc/analyzer/call-details.cc                  |  12 +-
 gcc/analyzer/call-details.h                   |   5 +-
 gcc/analyzer/engine.cc                        |  13 +-
 gcc/analyzer/kf.cc                            | 116 +++++---
 gcc/analyzer/region-model-manager.cc          |  19 ++
 gcc/analyzer/region-model.cc                  | 261 +++++++++++++-----
 gcc/analyzer/region-model.h                   |  22 +-
 gcc/doc/invoke.texi                           |   1 +
 .../analyzer/out-of-bounds-diagram-16.c       |  31 +++
 gcc/testsuite/gcc.dg/analyzer/sprintf-1.c     |  11 +
 gcc/testsuite/gcc.dg/analyzer/strcat-1.c      | 136 +++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-1.c      |  22 ++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c      |   8 +
 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c      |  51 ++++
 14 files changed, 601 insertions(+), 107 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcat-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c

-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/9] analyzer: add logging to impl_path_context
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
@ 2023-08-24 14:38 ` David Malcolm
  2023-08-24 14:38 ` [PATCH 2/9] analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899] David Malcolm
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	* engine.cc (impl_path_context::impl_path_context): Add logger
	param.
	(impl_path_context::bifurcate): Add log message.
	(impl_path_context::terminate_path): Likewise.
	(impl_path_context::m_logger): New field.
	(exploded_graph::process_node): Pass logger to path_ctxt ctor.
---
 gcc/analyzer/engine.cc | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 3700154eec2c..a1908cdb364e 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3848,8 +3848,10 @@ exploded_graph::maybe_create_dynamic_call (const gcall *call,
 class impl_path_context : public path_context
 {
 public:
-  impl_path_context (const program_state *cur_state)
+  impl_path_context (const program_state *cur_state,
+		     logger *logger)
   : m_cur_state (cur_state),
+    m_logger (logger),
     m_terminate_path (false)
   {
   }
@@ -3868,6 +3870,9 @@ public:
   void
   bifurcate (std::unique_ptr<custom_edge_info> info) final override
   {
+    if (m_logger)
+      m_logger->log ("bifurcating path");
+
     if (m_state_at_bifurcation)
       /* Verify that the state at bifurcation is consistent when we
 	 split into multiple out-edges.  */
@@ -3884,6 +3889,8 @@ public:
 
   void terminate_path () final override
   {
+    if (m_logger)
+      m_logger->log ("terminating path");
     m_terminate_path = true;
   }
 
@@ -3900,6 +3907,8 @@ public:
 private:
   const program_state *m_cur_state;
 
+  logger *m_logger;
+
   /* Lazily-created copy of the state before the split.  */
   std::unique_ptr<program_state> m_state_at_bifurcation;
 
@@ -4044,7 +4053,7 @@ exploded_graph::process_node (exploded_node *node)
 	   exactly one stmt, the one that caused the change. */
 	program_state next_state (state);
 
-	impl_path_context path_ctxt (&next_state);
+	impl_path_context path_ctxt (&next_state, logger);
 
 	uncertainty_t uncertainty;
 	const supernode *snode = point.get_supernode ();
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/9] analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
  2023-08-24 14:38 ` [PATCH 1/9] analyzer: add logging to impl_path_context David Malcolm
@ 2023-08-24 14:38 ` David Malcolm
  2023-08-24 14:38 ` [PATCH 3/9] analyzer: reimplement kf_strcpy [PR105899] David Malcolm
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* region-model.cc (iterable_cluster::iterable_cluster): Add
	symbolic binding keys to m_symbolic_bindings.
	(iterable_cluster::has_symbolic_bindings_p): New.
	(iterable_cluster::m_symbolic_bindings): New field.
	(region_model::scan_for_null_terminator): Treat clusters with
	symbolic bindings as having unknown strlen.

gcc/testsuite/ChangeLog:
	PR analyzer/105899
	* gcc.dg/analyzer/sprintf-1.c: Include "analyzer-decls.h".
	(test_strlen_1): New.
---
 gcc/analyzer/region-model.cc              | 15 +++++++++++++++
 gcc/testsuite/gcc.dg/analyzer/sprintf-1.c | 11 +++++++++++
 2 files changed, 26 insertions(+)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 99817aee3a93..7a2f81f36e0f 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3420,6 +3420,8 @@ public:
 	    if (concrete_key->get_byte_range (&fragment_bytes))
 	      m_fragments.safe_push (fragment (fragment_bytes, sval));
 	  }
+	else
+	  m_symbolic_bindings.safe_push (key);
       }
     m_fragments.qsort (fragment::cmp_ptrs);
   }
@@ -3440,8 +3442,14 @@ public:
     return false;
   }
 
+  bool has_symbolic_bindings_p () const
+  {
+    return !m_symbolic_bindings.is_empty ();
+  }
+
 private:
   auto_vec<fragment> m_fragments;
+  auto_vec<const binding_key *> m_symbolic_bindings;
 };
 
 /* Simulate reading the bytes at BYTES from BASE_REG.
@@ -3610,6 +3618,13 @@ region_model::scan_for_null_terminator (const region *reg,
   /* No binding for this base_region, or no binding at src_byte_offset
      (or a symbolic binding).  */
 
+  if (c.has_symbolic_bindings_p ())
+    {
+      if (out_sval)
+	*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+      return m_mgr->get_or_create_unknown_svalue (size_type_node);
+    }
+
   /* TODO: the various special-cases seen in
      region_model::get_store_value.  */
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
index f8dc806d6192..e7c2b3089c5b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
@@ -1,6 +1,8 @@
 /* See e.g. https://en.cppreference.com/w/c/io/fprintf
    and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */
 
+#include "analyzer-decls.h"
+
 extern int
 sprintf(char* dst, const char* fmt, ...)
   __attribute__((__nothrow__));
@@ -64,3 +66,12 @@ test_fmt_not_terminated (char *dst)
   return sprintf (dst, fmt); /* { dg-warning "stack-based buffer over-read" } */
   /* { dg-message "while looking for null terminator for argument 2 \\('&fmt'\\) of 'sprintf'..." "event" { target *-*-* } .-1 } */
 }
+
+void
+test_strlen_1 (void)
+{
+  char buf[10];
+  sprintf (buf, "msg: %s\n", "abc");
+  __analyzer_eval (__builtin_strlen (buf) == 8); /* { dg-warning "UNKNOWN" } */
+  // TODO: ideally would be TRUE  
+}
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/9] analyzer: reimplement kf_strcpy [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
  2023-08-24 14:38 ` [PATCH 1/9] analyzer: add logging to impl_path_context David Malcolm
  2023-08-24 14:38 ` [PATCH 2/9] analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899] David Malcolm
@ 2023-08-24 14:38 ` David Malcolm
  2023-08-24 14:38 ` [PATCH 4/9] analyzer: eliminate region_model::get_string_size [PR105899] David Malcolm
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch reimplements the analyzer's implementation of strcpy using
the region_model::scan_for_null_terminator infrastructure, so that e.g.
it can complain about out-of-bounds reads/writes, unterminated strings,
etc.

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* kf.cc (kf_strcpy::impl_call_pre): Reimplement using
	check_for_null_terminated_string_arg.
	* region-model.cc (region_model::get_store_bytes): Shortcut
	reading all of a string_region.
	(region_model::scan_for_null_terminator): Use get_store_value for
	the bytes rather than "unknown" when returning an unknown length.
	(region_model::write_bytes): New.
	* region-model.h (region_model::write_bytes): New decl.

gcc/testsuite/ChangeLog:
	PR analyzer/105899
	* gcc.dg/analyzer/out-of-bounds-diagram-16.c: New test.
	* gcc.dg/analyzer/strcpy-1.c: Add test coverage.
	* gcc.dg/analyzer/strcpy-3.c: Likewise.
	* gcc.dg/analyzer/strcpy-4.c: New test.
---
 gcc/analyzer/kf.cc                            | 32 +++++-------
 gcc/analyzer/region-model.cc                  | 32 ++++++++++--
 gcc/analyzer/region-model.h                   |  4 ++
 .../analyzer/out-of-bounds-diagram-16.c       | 31 +++++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-1.c      | 22 ++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c      |  1 +
 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c      | 51 +++++++++++++++++++
 7 files changed, 150 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 59f46bab581c..6b33cd159dac 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -1135,29 +1135,25 @@ void
 kf_strcpy::impl_call_pre (const call_details &cd) const
 {
   region_model *model = cd.get_model ();
-  region_model_manager *mgr = cd.get_manager ();
+  region_model_context *ctxt = cd.get_ctxt ();
 
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = model->deref_rvalue (dest_sval, cd.get_arg_tree (0),
-					 cd.get_ctxt ());
-  const svalue *src_sval = cd.get_arg_svalue (1);
-  const region *src_reg = model->deref_rvalue (src_sval, cd.get_arg_tree (1),
-					cd.get_ctxt ());
-  const svalue *src_contents_sval = model->get_store_value (src_reg,
-							    cd.get_ctxt ());
-  cd.check_for_null_terminated_string_arg (1);
-
+						    ctxt);
+  /* strcpy returns the initial param.  */
   cd.maybe_set_lhs (dest_sval);
 
-  /* Try to get the string size if SRC_REG is a string_region.  */
-  const svalue *copied_bytes_sval = model->get_string_size (src_reg);
-  /* Otherwise, check if the contents of SRC_REG is a string.  */
-  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
-    copied_bytes_sval = model->get_string_size (src_contents_sval);
-
-  const region *sized_dest_reg
-    = mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
-  model->set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+  const svalue *bytes_to_copy;
+  if (const svalue *num_bytes_read_sval
+	= cd.check_for_null_terminated_string_arg (1, &bytes_to_copy))
+    {
+      model->write_bytes (dest_reg, num_bytes_read_sval, bytes_to_copy, ctxt);
+    }
+  else
+    {
+      if (cd.get_ctxt ())
+	cd.get_ctxt ()->terminate_path ();
+    }
 }
 
 /* Handler for "strdup" and "__builtin_strdup".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 7a2f81f36e0f..cc8d895d9665 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3460,6 +3460,13 @@ region_model::get_store_bytes (const region *base_reg,
 			       const byte_range &bytes,
 			       region_model_context *ctxt) const
 {
+  /* Shortcut reading all of a string_region.  */
+  if (bytes.get_start_byte_offset () == 0)
+    if (const string_region *string_reg = base_reg->dyn_cast_string_region ())
+      if (bytes.m_size_in_bytes
+	  == TREE_STRING_LENGTH (string_reg->get_string_cst ()))
+	return m_mgr->get_or_create_initial_value (base_reg);
+
   const svalue *index_sval
     = m_mgr->get_or_create_int_cst (size_type_node,
 				    bytes.get_start_byte_offset ());
@@ -3533,14 +3540,14 @@ region_model::scan_for_null_terminator (const region *reg,
   if (offset.symbolic_p ())
     {
       if (out_sval)
-	*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+	*out_sval = get_store_value (reg, nullptr);
       return m_mgr->get_or_create_unknown_svalue (size_type_node);
     }
   byte_offset_t src_byte_offset;
   if (!offset.get_concrete_byte_offset (&src_byte_offset))
     {
       if (out_sval)
-	*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+	*out_sval = get_store_value (reg, nullptr);
       return m_mgr->get_or_create_unknown_svalue (size_type_node);
     }
   const byte_offset_t initial_src_byte_offset = src_byte_offset;
@@ -3582,7 +3589,7 @@ region_model::scan_for_null_terminator (const region *reg,
 	  if (is_terminated.is_unknown ())
 	    {
 	      if (out_sval)
-		*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+		*out_sval = get_store_value (reg, nullptr);
 	      return m_mgr->get_or_create_unknown_svalue (size_type_node);
 	    }
 
@@ -3621,7 +3628,7 @@ region_model::scan_for_null_terminator (const region *reg,
   if (c.has_symbolic_bindings_p ())
     {
       if (out_sval)
-	*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+	*out_sval = get_store_value (reg, nullptr);
       return m_mgr->get_or_create_unknown_svalue (size_type_node);
     }
 
@@ -3638,7 +3645,7 @@ region_model::scan_for_null_terminator (const region *reg,
   if (base_reg->can_have_initial_svalue_p ())
     {
       if (out_sval)
-	*out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+	*out_sval = get_store_value (reg, nullptr);
       return m_mgr->get_or_create_unknown_svalue (size_type_node);
     }
   else
@@ -3801,6 +3808,21 @@ region_model::zero_fill_region (const region *reg)
   m_store.zero_fill_region (m_mgr->get_store_manager(), reg);
 }
 
+/* Copy NUM_BYTES_SVAL of SVAL to DEST_REG.
+   Use CTXT to report any warnings associated with the copy
+   (e.g. out-of-bounds writes).  */
+
+void
+region_model::write_bytes (const region *dest_reg,
+			   const svalue *num_bytes_sval,
+			   const svalue *sval,
+			   region_model_context *ctxt)
+{
+  const region *sized_dest_reg
+    = m_mgr->get_sized_region (dest_reg, NULL_TREE, num_bytes_sval);
+  set_value (sized_dest_reg, sval, ctxt);
+}
+
 /* Mark REG as having unknown content.  */
 
 void
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 3979bf124783..9c6e60bbe824 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -367,6 +367,10 @@ class region_model
   void purge_region (const region *reg);
   void fill_region (const region *reg, const svalue *sval);
   void zero_fill_region (const region *reg);
+  void write_bytes (const region *dest_reg,
+		    const svalue *num_bytes_sval,
+		    const svalue *sval,
+		    region_model_context *ctxt);
   void mark_region_as_unknown (const region *reg, uncertainty_t *uncertainty);
 
   tristate eval_condition (const svalue *lhs,
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
new file mode 100644
index 000000000000..b0fb409267ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
@@ -0,0 +1,31 @@
+/* { dg-additional-options "-fdiagnostics-text-art-charset=unicode" } */
+
+#include <string.h>
+#include "analyzer-decls.h"
+
+char *test_fixed_size_heap_2_invalid (void)
+{
+  char str[] = "abc";
+  char *p = __builtin_malloc (strlen (str)); /* { dg-message "\\(1\\) capacity: 3 bytes" } */
+  if (!p)
+    return NULL;
+  strcpy (p, str); /* { dg-warning "heap-based buffer overflow" } */
+  return p;
+}
+
+/* { dg-begin-multiline-output "" }
+  ┌──────────────────────────────────────────────────────────────────────┐
+  │                           write of 4 bytes                           │
+  └──────────────────────────────────────────────────────────────────────┘
+                            │                                   │
+                            │                                   │
+                            v                                   v
+  ┌───────────────────────────────────────────────────┐┌─────────────────┐
+  │          buffer allocated on heap at (1)          ││after valid range│
+  └───────────────────────────────────────────────────┘└─────────────────┘
+  ├─────────────────────────┬─────────────────────────┤├────────┬────────┤
+                            │                                   │
+                   ╭────────┴────────╮                ╭─────────┴────────╮
+                   │capacity: 3 bytes│                │overflow of 1 byte│
+                   ╰─────────────────╯                ╰──────────────────╯
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-1.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-1.c
index d21e77175119..30341061f4cc 100644
--- a/gcc/testsuite/gcc.dg/analyzer/strcpy-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-1.c
@@ -30,3 +30,25 @@ char *test_uninitialized (char *dst)
   return strcpy (dst, buf); /* { dg-warning "use of uninitialized value 'buf\\\[0\\\]'" } */
   /* { dg-message "while looking for null terminator for argument 2 \\('&buf'\\) of 'strcpy'..." "event" { target *-*-* } .-1 } */
 }
+
+extern void external_fn (void *ptr);
+
+char *test_external_fn (void)
+{
+  char src[10];
+  char dst[10];
+  external_fn (src);
+  strcpy (dst, src);
+  __analyzer_eval (strlen (dst) == strlen (src)); /* { dg-warning "UNKNOWN" } */
+  // TODO: ideally would be TRUE  
+}
+
+void test_sprintf_strcpy (const char *a, const char *b)
+{
+  char buf_1[10];
+  char buf_2[10];
+  __builtin_sprintf (buf_1, "%s/%s", a, b);
+  strcpy (buf_2, buf_1);
+  __analyzer_eval (strlen (buf_1) == strlen (buf_2)); /* { dg-warning "UNKNOWN" } */
+  // TODO: ideally would be TRUE  
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
index a38f9a7641fe..abb49bc39f27 100644
--- a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -20,4 +20,5 @@ void test_1 (void)
   __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
   __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
   __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (strlen (result) == 5); /* { dg-warning "TRUE" } */
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-4.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-4.c
new file mode 100644
index 000000000000..435a4cadee9d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-4.c
@@ -0,0 +1,51 @@
+/* { dg-additional-options "-Wno-stringop-overflow" } */
+
+#include <string.h>
+#include "analyzer-decls.h"
+
+void
+test_fixed_size_stack_1 (void)
+{
+  char buf[3];
+  strcpy (buf, "abc"); /* { dg-warning "stack-based buffer overflow" } */
+}
+
+char *test_fixed_size_heap_1 (void)
+{
+  char str[] = "abc";
+  char *p = __builtin_malloc (3);
+  if (!p)
+    return NULL;
+  strcpy (p, str); /* { dg-warning "heap-based buffer overflow" } */
+  return p;
+}
+
+char *test_fixed_size_heap_2_invalid (void)
+{
+  char str[] = "abc";
+  char *p = __builtin_malloc (strlen (str));
+  if (!p)
+    return NULL;
+  strcpy (p, str); /* { dg-warning "heap-based buffer overflow" } */
+  return p;
+}
+
+char *test_fixed_size_heap_2_valid (void)
+{
+  char str[] = "abc";
+  char *p = __builtin_malloc (strlen (str) + 1);
+  if (!p)
+    return NULL;
+  strcpy (p, str); /* { dg-bogus "" } */
+  __analyzer_eval (strlen (p) == 3); /* { dg-warning "TRUE" } */
+  return p;
+}
+
+char *test_dynamic_size_heap_1 (const char *src)
+{
+  char *p = __builtin_malloc (strlen (src));
+  if (!p)
+    return NULL;
+  strcpy (p, src); // TODO: write of null terminator is oob
+  return p;
+}
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/9] analyzer: eliminate region_model::get_string_size [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (2 preceding siblings ...)
  2023-08-24 14:38 ` [PATCH 3/9] analyzer: reimplement kf_strcpy [PR105899] David Malcolm
@ 2023-08-24 14:38 ` David Malcolm
  2023-08-24 14:38 ` [PATCH 5/9] analyzer: reimplement kf_memcpy_memmove David Malcolm
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* region-model.cc (region_model::get_string_size): Delete both.
	* region-model.h (region_model::get_string_size): Delete both
	decls.
---
 gcc/analyzer/region-model.cc | 29 -----------------------------
 gcc/analyzer/region-model.h  |  3 ---
 2 files changed, 32 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index cc8d895d9665..1fe66f4719fa 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2794,35 +2794,6 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
-/* Return the string size, including the 0-terminator, if SVAL is a
-   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
-
-const svalue *
-region_model::get_string_size (const svalue *sval) const
-{
-  tree cst = sval->maybe_get_constant ();
-  if (!cst || TREE_CODE (cst) != STRING_CST)
-    return m_mgr->get_or_create_unknown_svalue (size_type_node);
-
-  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
-  return m_mgr->get_or_create_constant_svalue (out);
-}
-
-/* Return the string size, including the 0-terminator, if REG is a
-   string_region.  Otherwise, return an unknown_svalue.  */
-
-const svalue *
-region_model::get_string_size (const region *reg) const
-{
-  const string_region *str_reg = dyn_cast <const string_region *> (reg);
-  if (!str_reg)
-    return m_mgr->get_or_create_unknown_svalue (size_type_node);
-
-  tree cst = str_reg->get_string_cst ();
-  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
-  return m_mgr->get_or_create_constant_svalue (out);
-}
-
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
    using DIR to determine if this access is a read or write.
    Return TRUE if an OOB access was detected.
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 9c6e60bbe824..41df1885ad5b 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -469,9 +469,6 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
-  const svalue *get_string_size (const svalue *sval) const;
-  const svalue *get_string_size (const region *reg) const;
-
   bool replay_call_summary (call_summary_replay &r,
 			    const region_model &summary);
 
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/9] analyzer: reimplement kf_memcpy_memmove
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (3 preceding siblings ...)
  2023-08-24 14:38 ` [PATCH 4/9] analyzer: eliminate region_model::get_string_size [PR105899] David Malcolm
@ 2023-08-24 14:38 ` David Malcolm
  2023-08-24 14:39 ` [PATCH 6/9] analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899] David Malcolm
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	* kf.cc (kf_memcpy_memmove::impl_call_pre): Reimplement using
	region_model::copy_bytes.
	* region-model.cc (region_model::read_bytes): New.
	(region_model::copy_bytes): New.
	* region-model.h (region_model::read_bytes): New decl.
	(region_model::copy_bytes): New decl.
---
 gcc/analyzer/kf.cc           | 14 ++++----------
 gcc/analyzer/region-model.cc | 35 +++++++++++++++++++++++++++++++++++
 gcc/analyzer/region-model.h  |  9 +++++++++
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 6b33cd159dac..3eddbe200387 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -541,7 +541,6 @@ kf_memcpy_memmove::impl_call_pre (const call_details &cd) const
   const svalue *num_bytes_sval = cd.get_arg_svalue (2);
 
   region_model *model = cd.get_model ();
-  region_model_manager *mgr = cd.get_manager ();
 
   const region *dest_reg
     = model->deref_rvalue (dest_ptr_sval, cd.get_arg_tree (0), cd.get_ctxt ());
@@ -550,15 +549,10 @@ kf_memcpy_memmove::impl_call_pre (const call_details &cd) const
 
   cd.maybe_set_lhs (dest_ptr_sval);
 
-  const region *sized_src_reg
-    = mgr->get_sized_region (src_reg, NULL_TREE, num_bytes_sval);
-  const region *sized_dest_reg
-    = mgr->get_sized_region (dest_reg, NULL_TREE, num_bytes_sval);
-  const svalue *src_contents_sval
-    = model->get_store_value (sized_src_reg, cd.get_ctxt ());
-  model->check_for_poison (src_contents_sval, cd.get_arg_tree (1),
-			   sized_src_reg, cd.get_ctxt ());
-  model->set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+  model->copy_bytes (dest_reg,
+		     src_reg, cd.get_arg_tree (1),
+		     num_bytes_sval,
+		     cd.get_ctxt ());
 }
 
 /* Handler for "memset" and "__builtin_memset".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 1fe66f4719fa..00c306ab7dae 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3794,6 +3794,41 @@ region_model::write_bytes (const region *dest_reg,
   set_value (sized_dest_reg, sval, ctxt);
 }
 
+/* Read NUM_BYTES_SVAL from SRC_REG.
+   Use CTXT to report any warnings associated with the copy
+   (e.g. out-of-bounds reads, copying of uninitialized values, etc).  */
+
+const svalue *
+region_model::read_bytes (const region *src_reg,
+			  tree src_ptr_expr,
+			  const svalue *num_bytes_sval,
+			  region_model_context *ctxt) const
+{
+  const region *sized_src_reg
+    = m_mgr->get_sized_region (src_reg, NULL_TREE, num_bytes_sval);
+  const svalue *src_contents_sval = get_store_value (sized_src_reg, ctxt);
+  check_for_poison (src_contents_sval, src_ptr_expr,
+		    sized_src_reg, ctxt);
+  return src_contents_sval;
+}
+
+/* Copy NUM_BYTES_SVAL bytes from SRC_REG to DEST_REG.
+   Use CTXT to report any warnings associated with the copy
+   (e.g. out-of-bounds reads/writes, copying of uninitialized values,
+   etc).  */
+
+void
+region_model::copy_bytes (const region *dest_reg,
+			  const region *src_reg,
+			  tree src_ptr_expr,
+			  const svalue *num_bytes_sval,
+			  region_model_context *ctxt)
+{
+  const svalue *data_sval
+    = read_bytes (src_reg, src_ptr_expr, num_bytes_sval, ctxt);
+  write_bytes (dest_reg, num_bytes_sval, data_sval, ctxt);
+}
+
 /* Mark REG as having unknown content.  */
 
 void
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 41df1885ad5b..b1c705e22c28 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -371,6 +371,15 @@ class region_model
 		    const svalue *num_bytes_sval,
 		    const svalue *sval,
 		    region_model_context *ctxt);
+  const svalue *read_bytes (const region *src_reg,
+			    tree src_ptr_expr,
+			    const svalue *num_bytes_sval,
+			    region_model_context *ctxt) const;
+  void copy_bytes (const region *dest_reg,
+		   const region *src_reg,
+		   tree src_ptr_expr,
+		   const svalue *num_bytes_sval,
+		   region_model_context *ctxt);
   void mark_region_as_unknown (const region *reg, uncertainty_t *uncertainty);
 
   tristate eval_condition (const svalue *lhs,
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 6/9] analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (4 preceding siblings ...)
  2023-08-24 14:38 ` [PATCH 5/9] analyzer: reimplement kf_memcpy_memmove David Malcolm
@ 2023-08-24 14:39 ` David Malcolm
  2023-08-24 14:39 ` [PATCH 7/9] analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899] David Malcolm
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* region-model.cc (fragment::has_null_terminator): Move STRING_CST
	handling to fragment::string_cst_has_null_terminator; also use it to
	handle INIT_VAL(STRING_REG).
	(fragment::string_cst_has_null_terminator): New, from above.

gcc/testsuite/ChangeLog:
	PR analyzer/105899
	* gcc.dg/analyzer/strcpy-3.c (test_2): New.
---
 gcc/analyzer/region-model.cc             | 68 ++++++++++++++++--------
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c |  7 +++
 2 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 00c306ab7dae..6574ec140074 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3310,27 +3310,10 @@ struct fragment
 	  switch (TREE_CODE (cst))
 	    {
 	    case STRING_CST:
-	      {
-		/* Look for the first 0 byte within STRING_CST
-		   from START_READ_OFFSET onwards.  */
-		const HOST_WIDE_INT num_bytes_to_search
-		  = std::min<HOST_WIDE_INT> ((TREE_STRING_LENGTH (cst)
-					      - rel_start_read_offset_hwi),
-					     available_bytes_hwi);
-		const char *start = (TREE_STRING_POINTER (cst)
-				     + rel_start_read_offset_hwi);
-		if (num_bytes_to_search >= 0)
-		  if (const void *p = memchr (start, 0,
-					      num_bytes_to_search))
-		    {
-		      *out_bytes_read = (const char *)p - start + 1;
-		      return tristate (true);
-		    }
-
-		*out_bytes_read = available_bytes;
-		return tristate (false);
-	      }
-	      break;
+	      return string_cst_has_null_terminator (cst,
+						     rel_start_read_offset_hwi,
+						     available_bytes_hwi,
+						     out_bytes_read);
 	    case INTEGER_CST:
 	      if (rel_start_read_offset_hwi == 0
 		  && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (cst))))
@@ -3357,12 +3340,55 @@ struct fragment
 	    }
 	}
 	break;
+
+      case SK_INITIAL:
+	{
+	  const initial_svalue *initial_sval = (const initial_svalue *)m_sval;
+	  const region *reg = initial_sval->get_region ();
+	  if (const string_region *string_reg = reg->dyn_cast_string_region ())
+	    {
+	      tree string_cst = string_reg->get_string_cst ();
+	      return string_cst_has_null_terminator (string_cst,
+						     rel_start_read_offset_hwi,
+						     available_bytes_hwi,
+						     out_bytes_read);
+	    }
+	  return tristate::TS_UNKNOWN;
+	}
+	break;
+
       default:
 	// TODO: it may be possible to handle other cases here.
 	return tristate::TS_UNKNOWN;
       }
   }
 
+  static tristate
+  string_cst_has_null_terminator (tree string_cst,
+				  HOST_WIDE_INT rel_start_read_offset_hwi,
+				  HOST_WIDE_INT available_bytes_hwi,
+				  byte_offset_t *out_bytes_read)
+  {
+    /* Look for the first 0 byte within STRING_CST
+       from START_READ_OFFSET onwards.  */
+    const HOST_WIDE_INT num_bytes_to_search
+      = std::min<HOST_WIDE_INT> ((TREE_STRING_LENGTH (string_cst)
+				  - rel_start_read_offset_hwi),
+				 available_bytes_hwi);
+    const char *start = (TREE_STRING_POINTER (string_cst)
+			 + rel_start_read_offset_hwi);
+    if (num_bytes_to_search >= 0)
+      if (const void *p = memchr (start, 0,
+				  num_bytes_to_search))
+	{
+	  *out_bytes_read = (const char *)p - start + 1;
+	  return tristate (true);
+	}
+
+    *out_bytes_read = available_bytes_hwi;
+    return tristate (false);
+  }
+
   byte_range m_byte_range;
   const svalue *m_sval;
 };
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
index abb49bc39f27..a7b324fc445e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -22,3 +22,10 @@ void test_1 (void)
   __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
   __analyzer_eval (strlen (result) == 5); /* { dg-warning "TRUE" } */
 }
+
+void test_2 (void)
+{
+  char buf[16];
+  __builtin_strcpy (buf, "abc");
+  __analyzer_eval (strlen (buf) == 3); /* { dg-warning "TRUE" } */
+}
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 7/9] analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (5 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH 6/9] analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899] David Malcolm
@ 2023-08-24 14:39 ` David Malcolm
  2023-08-24 14:39 ` [PATCH 8/9] analyzer: handle strlen(BITS_WITHIN) [PR105899] David Malcolm
  2023-08-24 14:39 ` [PATCH 9/9] analyzer: implement kf_strcat [PR105899] David Malcolm
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* region-model-manager.cc
	(region_model_manager::get_or_create_initial_value): Simplify
	INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) to
	CONSTANT_SVAL(STRING[N]).
---
 gcc/analyzer/region-model-manager.cc | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc
index 65b719056c84..22246876f8f9 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -310,6 +310,25 @@ region_model_manager::get_or_create_initial_value (const region *reg,
 				 get_or_create_initial_value (original_reg));
     }
 
+  /* Simplify:
+       INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL)
+     to:
+       CONSTANT_SVAL(STRING[N]).  */
+  if (const element_region *element_reg = reg->dyn_cast_element_region ())
+    if (tree cst_idx = element_reg->get_index ()->maybe_get_constant ())
+      if (const string_region *string_reg
+	  = element_reg->get_parent_region ()->dyn_cast_string_region ())
+	if (tree_fits_shwi_p (cst_idx))
+	  {
+	    HOST_WIDE_INT idx = tree_to_shwi (cst_idx);
+	    tree string_cst = string_reg->get_string_cst ();
+	    if (idx >= 0 && idx <= TREE_STRING_LENGTH (string_cst))
+	      {
+		int ch = TREE_STRING_POINTER (string_cst)[idx];
+		return get_or_create_int_cst (reg->get_type (), ch);
+	      }
+	  }
+
   /* INIT_VAL (*UNKNOWN_PTR) -> UNKNOWN_VAL.  */
   if (reg->symbolic_for_unknown_ptr_p ())
     return get_or_create_unknown_svalue (reg->get_type ());
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 8/9] analyzer: handle strlen(BITS_WITHIN) [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (6 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH 7/9] analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899] David Malcolm
@ 2023-08-24 14:39 ` David Malcolm
  2023-08-24 14:39 ` [PATCH 9/9] analyzer: implement kf_strcat [PR105899] David Malcolm
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* region-model.cc (fragment::has_null_terminator): Handle
	SK_BITS_WITHIN.
---
 gcc/analyzer/region-model.cc | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6574ec140074..025b555d7b97 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3357,10 +3357,29 @@ struct fragment
 	}
 	break;
 
+      case SK_BITS_WITHIN:
+	{
+	  const bits_within_svalue *bits_within_sval
+	    = (const bits_within_svalue *)m_sval;
+	  byte_range bytes (0, 0);
+	  if (bits_within_sval->get_bits ().as_byte_range (&bytes))
+	    {
+	      const svalue *inner_sval = bits_within_sval->get_inner_svalue ();
+	      fragment f (byte_range
+			  (start_read_offset - bytes.get_start_bit_offset (),
+			   std::max<byte_size_t> (bytes.m_size_in_bytes,
+						  available_bytes)),
+			  inner_sval);
+	      return f.has_null_terminator (start_read_offset, out_bytes_read);
+	    }
+	}
+	break;
+
       default:
 	// TODO: it may be possible to handle other cases here.
-	return tristate::TS_UNKNOWN;
+	break;
       }
+    return tristate::TS_UNKNOWN;
   }
 
   static tristate
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 9/9] analyzer: implement kf_strcat [PR105899]
  2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
                   ` (7 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH 8/9] analyzer: handle strlen(BITS_WITHIN) [PR105899] David Malcolm
@ 2023-08-24 14:39 ` David Malcolm
  8 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-24 14:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/analyzer/ChangeLog:
	PR analyzer/105899
	* call-details.cc
	(call_details::check_for_null_terminated_string_arg): Split into
	overloads, one taking just an arg_idx, the other a new
	"include_terminator" param.
	* call-details.h: Likewise.
	* kf.cc (class kf_strcat): New.
	(kf_strcpy::impl_call_pre): Update for change to
	check_for_null_terminated_string_arg.
	(register_known_functions): Register kf_strcat.
	* region-model.cc
	(region_model::check_for_null_terminated_string_arg): Split into
	overloads, one taking just an arg_idx, the other a new
	"include_terminator" param.  When returning an svalue, handle
	"include_terminator" being false by subtracting one.
	* region-model.h
	(region_model::check_for_null_terminated_string_arg): Split into
	overloads, one taking just an arg_idx, the other a new
	"include_terminator" param.

gcc/ChangeLog:
	PR analyzer/105899
	* doc/invoke.texi (Static Analyzer Options): Add "strcat" to the
	list of functions known to the analyzer.

gcc/testsuite/ChangeLog:
	PR analyzer/105899
	* gcc.dg/analyzer/strcat-1.c: New test.
---
 gcc/analyzer/call-details.cc             |  12 +-
 gcc/analyzer/call-details.h              |   5 +-
 gcc/analyzer/kf.cc                       |  72 ++++++++++--
 gcc/analyzer/region-model.cc             |  63 +++++++++--
 gcc/analyzer/region-model.h              |   6 +-
 gcc/doc/invoke.texi                      |   1 +
 gcc/testsuite/gcc.dg/analyzer/strcat-1.c | 136 +++++++++++++++++++++++
 7 files changed, 275 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcat-1.c

diff --git a/gcc/analyzer/call-details.cc b/gcc/analyzer/call-details.cc
index 8f5b28ce6c26..ce1f859c9996 100644
--- a/gcc/analyzer/call-details.cc
+++ b/gcc/analyzer/call-details.cc
@@ -386,13 +386,23 @@ call_details::lookup_function_attribute (const char *attr_name) const
   return lookup_attribute (attr_name, TYPE_ATTRIBUTES (allocfntype));
 }
 
+void
+call_details::check_for_null_terminated_string_arg (unsigned arg_idx) const
+{
+  check_for_null_terminated_string_arg (arg_idx, false, nullptr);
+}
+
 const svalue *
 call_details::
 check_for_null_terminated_string_arg (unsigned arg_idx,
+				      bool include_terminator,
 				      const svalue **out_sval) const
 {
   region_model *model = get_model ();
-  return model->check_for_null_terminated_string_arg (*this, arg_idx, out_sval);
+  return model->check_for_null_terminated_string_arg (*this,
+						      arg_idx,
+						      include_terminator,
+						      out_sval);
 }
 
 } // namespace ana
diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 58b5ccd2acde..ae528e4ab116 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -72,9 +72,12 @@ public:
 
   tree lookup_function_attribute (const char *attr_name) const;
 
+  void
+  check_for_null_terminated_string_arg (unsigned arg_idx) const;
   const svalue *
   check_for_null_terminated_string_arg (unsigned arg_idx,
-					const svalue **out_sval = nullptr) const;
+					bool include_terminator,
+					const svalue **out_sval) const;
 
 private:
   const gcall *m_call;
diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 3eddbe200387..36d9d10bb013 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -1106,6 +1106,61 @@ public:
   /* Currently a no-op.  */
 };
 
+/* Handler for "strcat" and "__builtin_strcat_chk".  */
+
+class kf_strcat : public known_function
+{
+public:
+  kf_strcat (unsigned int num_args) : m_num_args (num_args) {}
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == m_num_args
+	    && cd.arg_is_pointer_p (0)
+	    && cd.arg_is_pointer_p (1));
+  }
+
+  void impl_call_pre (const call_details &cd) const final override
+  {
+    region_model *model = cd.get_model ();
+    region_model_manager *mgr = cd.get_manager ();
+
+    const svalue *dest_sval = cd.get_arg_svalue (0);
+    const region *dest_reg = model->deref_rvalue (dest_sval, cd.get_arg_tree (0),
+						  cd.get_ctxt ());
+
+    const svalue *dst_strlen_sval
+      = cd.check_for_null_terminated_string_arg (0, false, nullptr);
+    if (!dst_strlen_sval)
+      {
+	if (cd.get_ctxt ())
+	  cd.get_ctxt ()->terminate_path ();
+	return;
+      }
+
+    const svalue *bytes_to_copy;
+    const svalue *num_src_bytes_read_sval
+      = cd.check_for_null_terminated_string_arg (1, true, &bytes_to_copy);
+    if (!num_src_bytes_read_sval)
+      {
+	if (cd.get_ctxt ())
+	  cd.get_ctxt ()->terminate_path ();
+	return;
+      }
+
+    cd.maybe_set_lhs (dest_sval);
+
+    const region *offset_reg
+      = mgr->get_offset_region (dest_reg, NULL_TREE, dst_strlen_sval);
+    model->write_bytes (offset_reg,
+			num_src_bytes_read_sval,
+			bytes_to_copy,
+			cd.get_ctxt ());
+  }
+
+private:
+  unsigned int m_num_args;
+};
+
 /* Handler for "strcpy" and "__builtin_strcpy_chk".  */
 
 class kf_strcpy : public known_function
@@ -1139,7 +1194,7 @@ kf_strcpy::impl_call_pre (const call_details &cd) const
 
   const svalue *bytes_to_copy;
   if (const svalue *num_bytes_read_sval
-	= cd.check_for_null_terminated_string_arg (1, &bytes_to_copy))
+      = cd.check_for_null_terminated_string_arg (1, true, &bytes_to_copy))
     {
       model->write_bytes (dest_reg, num_bytes_read_sval, bytes_to_copy, ctxt);
     }
@@ -1188,16 +1243,10 @@ public:
   }
   void impl_call_pre (const call_details &cd) const final override
   {
-    if (const svalue *bytes_read = cd.check_for_null_terminated_string_arg (0))
-      if (bytes_read->get_kind () != SK_UNKNOWN)
+    if (const svalue *strlen_sval
+	  = cd.check_for_null_terminated_string_arg (0, false, nullptr))
+      if (strlen_sval->get_kind () != SK_UNKNOWN)
 	{
-	  region_model_manager *mgr = cd.get_manager ();
-	  /* strlen is (bytes_read - 1).  */
-	  const svalue *one = mgr->get_or_create_int_cst (size_type_node, 1);
-	  const svalue *strlen_sval = mgr->get_or_create_binop (size_type_node,
-								MINUS_EXPR,
-								bytes_read,
-								one);
 	  cd.maybe_set_lhs (strlen_sval);
 	  return;
 	}
@@ -1415,6 +1464,8 @@ register_known_functions (known_function_manager &kfm)
     kfm.add (BUILT_IN_SPRINTF, make_unique<kf_sprintf> ());
     kfm.add (BUILT_IN_STACK_RESTORE, make_unique<kf_stack_restore> ());
     kfm.add (BUILT_IN_STACK_SAVE, make_unique<kf_stack_save> ());
+    kfm.add (BUILT_IN_STRCAT, make_unique<kf_strcat> (2));
+    kfm.add (BUILT_IN_STRCAT_CHK, make_unique<kf_strcat> (3));
     kfm.add (BUILT_IN_STRCHR, make_unique<kf_strchr> ());
     kfm.add (BUILT_IN_STRCPY, make_unique<kf_strcpy> (2));
     kfm.add (BUILT_IN_STRCPY_CHK, make_unique<kf_strcpy> (3));
@@ -1429,6 +1480,7 @@ register_known_functions (known_function_manager &kfm)
   /* Known builtins and C standard library functions.  */
   {
     kfm.add ("memset", make_unique<kf_memset> ());
+    kfm.add ("strcat", make_unique<kf_strcat> (2));
     kfm.add ("strdup", make_unique<kf_strdup> ());
     kfm.add ("strndup", make_unique<kf_strndup> ());
   }
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 025b555d7b97..02c073c15bcc 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3679,9 +3679,41 @@ region_model::scan_for_null_terminator (const region *reg,
    - the buffer pointed to has any uninitalized bytes before any 0-terminator
    - any of the reads aren't within the bounds of the underlying base region
 
-   Otherwise, return a svalue for the number of bytes read (strlen + 1),
-   and, if OUT_SVAL is non-NULL, write to *OUT_SVAL with an svalue
-   representing the content of the buffer up to and including the terminator.
+   Otherwise, return a svalue for strlen of the buffer (*not* including
+   the null terminator).
+
+   TODO: we should also complain if:
+   - the pointer is NULL (or could be).  */
+
+void
+region_model::check_for_null_terminated_string_arg (const call_details &cd,
+						    unsigned arg_idx)
+{
+  check_for_null_terminated_string_arg (cd,
+					arg_idx,
+					false, /* include_terminator */
+					nullptr); // out_sval
+}
+
+
+/* Check that argument ARG_IDX (0-based) to the call described by CD
+   is a pointer to a valid null-terminated string.
+
+   Simulate scanning through the buffer, reading until we find a 0 byte
+   (equivalent to calling strlen).
+
+   Complain and return NULL if:
+   - the buffer pointed to isn't null-terminated
+   - the buffer pointed to has any uninitalized bytes before any 0-terminator
+   - any of the reads aren't within the bounds of the underlying base region
+
+   Otherwise, return a svalue.  This will be the number of bytes read
+   (including the null terminator) if INCLUDE_TERMINATOR is true, or strlen
+   of the buffer (not including the null terminator) if it is false.
+
+   Also, when returning an svalue, if OUT_SVAL is non-NULL, write to
+   *OUT_SVAL with an svalue representing the content of the buffer up to
+   and including the terminator.
 
    TODO: we should also complain if:
    - the pointer is NULL (or could be).  */
@@ -3689,6 +3721,7 @@ region_model::scan_for_null_terminator (const region *reg,
 const svalue *
 region_model::check_for_null_terminated_string_arg (const call_details &cd,
 						    unsigned arg_idx,
+						    bool include_terminator,
 						    const svalue **out_sval)
 {
   class null_terminator_check_event : public custom_event
@@ -3786,10 +3819,26 @@ region_model::check_for_null_terminated_string_arg (const call_details &cd,
   const region *buf_reg
     = deref_rvalue (arg_sval, cd.get_arg_tree (arg_idx), &my_ctxt);
 
-  return scan_for_null_terminator (buf_reg,
-				   cd.get_arg_tree (arg_idx),
-				   out_sval,
-				   &my_ctxt);
+  if (const svalue *num_bytes_read_sval
+      = scan_for_null_terminator (buf_reg,
+				  cd.get_arg_tree (arg_idx),
+				  out_sval,
+				  &my_ctxt))
+    {
+      if (include_terminator)
+	return num_bytes_read_sval;
+      else
+	{
+	  /* strlen is (bytes_read - 1).  */
+	  const svalue *one = m_mgr->get_or_create_int_cst (size_type_node, 1);
+	  return m_mgr->get_or_create_binop (size_type_node,
+					     MINUS_EXPR,
+					     num_bytes_read_sval,
+					     one);
+	}
+    }
+  else
+    return nullptr;
 }
 
 /* Remove all bindings overlapping REG within the store.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index b1c705e22c28..40259625fb06 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -519,10 +519,14 @@ class region_model
 			       const svalue *sval_hint,
 			       region_model_context *ctxt) const;
 
+  void
+  check_for_null_terminated_string_arg (const call_details &cd,
+					unsigned idx);
   const svalue *
   check_for_null_terminated_string_arg (const call_details &cd,
 					unsigned idx,
-					const svalue **out_sval = nullptr);
+					bool include_terminator,
+					const svalue **out_sval);
 
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ef3f40989860..209d6b0ce4d3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11109,6 +11109,7 @@ and of the following functions:
 @item @code{siglongjmp}
 @item @code{signal}
 @item @code{sigsetjmp}
+@item @code{strcat}
 @item @code{strchr}
 @item @code{strlen}
 @end itemize
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcat-1.c b/gcc/testsuite/gcc.dg/analyzer/strcat-1.c
new file mode 100644
index 000000000000..e3b698ae73d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strcat-1.c
@@ -0,0 +1,136 @@
+/* See e.g. https://en.cppreference.com/w/c/string/byte/strcat */
+
+#include "analyzer-decls.h"
+
+char *strcat (char *dest, const char *src);
+#define NULL ((void *)0)
+
+char *
+test_passthrough (char *dest, const char *src)
+{
+  return strcat (dest, src);
+}
+
+char *
+test_null_dest (const char *src)
+{
+  return strcat (NULL, src); /* { dg-warning "use of NULL where non-null expected" } */
+}
+
+char *
+test_null_src (char *dest)
+{
+  return strcat (dest, NULL); /* { dg-warning "use of NULL where non-null expected" } */
+}
+
+char *
+test_uninit_dest (const char *src)
+{
+  char dest[10];
+  return strcat (dest, src); /* { dg-warning "use of uninitialized value 'dest\\\[0\\\]'" } */
+}
+
+char *
+test_uninit_src (char *dest)
+{
+  const char src[10];
+  return strcat (dest, src); /* { dg-warning "use of uninitialized value 'src\\\[0\\\]'" } */
+}
+
+char *
+test_dest_not_terminated (char *src)
+{
+  char dest[3] = "foo";
+  return strcat (dest, src); /* { dg-warning "stack-based buffer over-read" } */
+  /* { dg-message "while looking for null terminator for argument 1 \\('&dest'\\) of 'strcat'" "" { target *-*-* } .-1 } */
+}
+
+char *
+test_src_not_terminated (char *dest)
+{
+  const char src[3] = "foo";
+  return strcat (dest, src); /* { dg-warning "stack-based buffer over-read" } */
+  /* { dg-message "while looking for null terminator for argument 2 \\('&src'\\) of 'strcat'" "" { target *-*-* } .-1 } */
+}
+
+char * __attribute__((noinline))
+call_strcat (char *dest, const char *src)
+{
+  return strcat (dest, src);
+}
+
+void
+test_concrete_valid_static_size (void)
+{
+  char buf[16];
+  char *p1 = __builtin_strcpy (buf, "abc");
+  char *p2 = call_strcat (buf, "def");
+  __analyzer_eval (p1 == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p2 == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'a'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'b'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'c'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'd'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 'f'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[6] == '\0'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf) == 6);  /* { dg-warning "TRUE" } */
+}
+
+void
+test_concrete_valid_static_size_2 (void)
+{
+  char buf[16];
+  char *p1 = __builtin_strcpy (buf, "abc");
+  char *p2 = call_strcat (buf, "def");
+  char *p3 = call_strcat (buf, "ghi");
+  __analyzer_eval (p1 == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p2 == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (p3 == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'a'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'b'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'c'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'd'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 'f'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[6] == 'g'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[7] == 'h'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[8] == 'i'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[9] == '\0'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf) == 9);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 1) == 8);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 2) == 7);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 3) == 6);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 4) == 5);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 5) == 4);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 6) == 3);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 7) == 2);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 8) == 1);  /* { dg-warning "TRUE" } */
+  __analyzer_eval (__builtin_strlen (buf + 9) == 0);  /* { dg-warning "TRUE" } */
+}
+
+char * __attribute__((noinline))
+call_strcat_invalid (char *dest, const char *src)
+{
+  return strcat (dest, src); /* { dg-warning "stack-based buffer overflow" } */
+}
+
+void
+test_concrete_invalid_static_size (void)
+{
+  char buf[3];
+  buf[0] = '\0';
+  call_strcat_invalid (buf, "abc");
+}
+
+void
+test_concrete_symbolic (const char *suffix)
+{
+  char buf[10];
+  buf[0] = '\0';
+  call_strcat (buf, suffix);
+}
+
+/* TODO:
+     - "The behavior is undefined if the strings overlap."
+*/
-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-24 14:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 14:38 [pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899] David Malcolm
2023-08-24 14:38 ` [PATCH 1/9] analyzer: add logging to impl_path_context David Malcolm
2023-08-24 14:38 ` [PATCH 2/9] analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899] David Malcolm
2023-08-24 14:38 ` [PATCH 3/9] analyzer: reimplement kf_strcpy [PR105899] David Malcolm
2023-08-24 14:38 ` [PATCH 4/9] analyzer: eliminate region_model::get_string_size [PR105899] David Malcolm
2023-08-24 14:38 ` [PATCH 5/9] analyzer: reimplement kf_memcpy_memmove David Malcolm
2023-08-24 14:39 ` [PATCH 6/9] analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899] David Malcolm
2023-08-24 14:39 ` [PATCH 7/9] analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899] David Malcolm
2023-08-24 14:39 ` [PATCH 8/9] analyzer: handle strlen(BITS_WITHIN) [PR105899] David Malcolm
2023-08-24 14:39 ` [PATCH 9/9] analyzer: implement kf_strcat [PR105899] David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).