public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tim Lange <mail@tim-lange.me>
To: dmalcolm@redhat.com
Cc: gcc-patches@gcc.gnu.org, Tim Lange <mail@tim-lange.me>
Subject: [PATCH 2/2 v2] analyzer: strcpy semantics
Date: Sun,  4 Sep 2022 21:17:25 +0200	[thread overview]
Message-ID: <20220904191725.81158-1-mail@tim-lange.me> (raw)
In-Reply-To: <e434af61fcc629aa28a10e0878040c83f4f887be.camel@redhat.com>

Hi Dave,

sorry about the strncpy thing, I should've been more careful. Below is the
patch with just the strcpy part.

- Tim

This patch adds modelling for the semantics of strcpy in the simple case
where the analyzer is able to infer a concrete string size.

Regrtested on Linux x86_64.

2022-09-04  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	* region-model-impl-calls.cc (region_model::impl_call_strcpy):
	Handle the constant string case.
	* region-model.cc (region_model::get_string_size):
	New function to get the string size from a region or svalue.
	* region-model.h (class region_model): Add get_string_size.

gcc/testsuite/ChangeLog:

	* gcc.dg/analyzer/out-of-bounds-4.c: New test.
	* gcc.dg/analyzer/strcpy-3.c: New test.

---
 gcc/analyzer/region-model-impl-calls.cc       | 16 ++++-
 gcc/analyzer/region-model.cc                  | 29 +++++++++
 gcc/analyzer/region-model.h                   |  3 +
 .../gcc.dg/analyzer/out-of-bounds-4.c         | 65 +++++++++++++++++++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c      | 23 +++++++
 5 files changed, 133 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c

diff --git a/gcc/analyzer/region-model-impl-calls.cc b/gcc/analyzer/region-model-impl-calls.cc
index 8eebd122d42..3790eaf2d79 100644
--- a/gcc/analyzer/region-model-impl-calls.cc
+++ b/gcc/analyzer/region-model-impl-calls.cc
@@ -1019,13 +1019,23 @@ region_model::impl_call_strcpy (const call_details &cd)
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = deref_rvalue (dest_sval, cd.get_arg_tree (0),
 					 cd.get_ctxt ());
+  const svalue *src_sval = cd.get_arg_svalue (1);
+  const region *src_reg = deref_rvalue (src_sval, cd.get_arg_tree (1),
+					cd.get_ctxt ());
+  const svalue *src_contents_sval = get_store_value (src_reg,
+						     cd.get_ctxt ());
 
   cd.maybe_set_lhs (dest_sval);
 
-  check_region_for_write (dest_reg, cd.get_ctxt ());
+  /* Try to get the string size if SRC_REG is a string_region.  */
+  const svalue *copied_bytes_sval = get_string_size (src_reg);
+  /* Otherwise, check if the contents of SRC_REG is a string.  */
+  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
+    copied_bytes_sval = get_string_size (src_contents_sval);
 
-  /* For now, just mark region's contents as unknown.  */
-  mark_region_as_unknown (dest_reg, cd.get_uncertainty ());
+  const region *sized_dest_reg
+    = m_mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
+  set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
 }
 
 /* Handle the on_call_pre part of "strlen".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index ec29be259b5..4ec18c86774 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3218,6 +3218,35 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
+/* Return the string size, including the 0-terminator, if SVAL is a
+   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const svalue *sval) const
+{
+  tree cst = sval->maybe_get_constant ();
+  if (!cst || TREE_CODE (cst) != STRING_CST)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
+/* Return the string size, including the 0-terminator, if REG is a
+   string_region.  Otherwise, return an unknown_svalue.  */
+
+const svalue *
+region_model::get_string_size (const region *reg) const
+{
+  const string_region *str_reg = dyn_cast <const string_region *> (reg);
+  if (!str_reg)
+    return m_mgr->get_or_create_unknown_svalue (size_type_node);
+
+  tree cst = str_reg->get_string_cst ();
+  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
+  return m_mgr->get_or_create_constant_svalue (out);
+}
+
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
    using DIR to determine if this access is a read or write.  */
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 7ce832f6ce4..a1f2165e145 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -793,6 +793,9 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
+  const svalue *get_string_size (const svalue *sval) const;
+  const svalue *get_string_size (const region *reg) const;
+
   /* Implemented in sm-malloc.cc  */
   void on_realloc_with_move (const call_details &cd,
 			     const svalue *old_ptr_sval,
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
new file mode 100644
index 00000000000..46f600de658
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-4.c
@@ -0,0 +1,65 @@
+/* { dg-additional-options "-Wno-stringop-overflow -Wno-stringop-truncation" } */
+#include <string.h>
+
+/* Wanalyzer-out-of-bounds tests for strpy-related overflows.
+  
+   The intra-procedural tests are all caught by Wstringop-overflow.
+   The inter-procedural out-of-bounds are only found by the analyzer.  */
+
+void test1 (void)
+{
+  char dst[5];
+  strcpy (dst, "Hello"); /* { dg-line test1 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test1 } */
+  /* { dg-message "dst" "note" { target *-*-* } test1 } */
+}
+
+void test2 (void)
+{
+  char dst[6];
+  strcpy (dst, "Hello");
+}
+
+void test3 (void)
+{
+  char *src = "Hello";
+  char dst[5];
+  strcpy (dst, src); /* { dg-line test3 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test3 } */
+  /* { dg-message "dst" "note" { target *-*-* } test3 } */
+}
+
+void test4 (void)
+{
+  char *src = "Hello";
+  char dst[6];
+  strcpy (dst, src);
+}
+
+const char *return_hello (void)
+{
+  return "hello";
+}
+
+void test5 (void)
+{
+  const char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[5];
+  strcpy (dst, str); /* { dg-line test5 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test5 } */
+  /* { dg-message "dst" "note" { target *-*-* } test5 } */
+}
+
+void test6 (void)
+{
+  const char *str = return_hello ();
+  if (!str)
+    return;
+  char dst[6];
+  strcpy (dst, str);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
new file mode 100644
index 00000000000..a38f9a7641f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -0,0 +1,23 @@
+#include <string.h>
+#include "analyzer-decls.h"
+
+void test_1 (void)
+{
+  char str[] = "Hello";
+  char buf[6];
+  char *result = strcpy (buf, str);
+  __analyzer_describe (1, result); /* { dg-warning "region_svalue.*?'buf'" } */
+  __analyzer_eval (result == buf); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (buf[5] == 0); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[0] == 'H'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[1] == 'e'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[2] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[3] == 'l'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[4] == 'o'); /* { dg-warning "TRUE" } */
+  __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
+}
-- 
2.37.2


  reply	other threads:[~2022-09-04 19:23 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-02 14:08 [PATCH 1/2] analyzer: return a concrete offset for cast_regions Tim Lange
2022-09-02 14:08 ` [PATCH 2/2] analyzer: strcpy and strncpy semantics Tim Lange
2022-09-02 15:22   ` David Malcolm
2022-09-04 19:17     ` Tim Lange [this message]
2022-09-04 22:26       ` [PATCH 2/2 v2] analyzer: strcpy semantics David Malcolm
2022-09-02 14:36 ` [PATCH 1/2] analyzer: return a concrete offset for cast_regions David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220904191725.81158-1-mail@tim-lange.me \
    --to=mail@tim-lange.me \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).