public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-5591] analyzer: new warning: -Wanalyzer-undefined-behavior-strtok [PR107573]
@ 2023-11-19  1:40 David Malcolm
  0 siblings, 0 replies; only message in thread
From: David Malcolm @ 2023-11-19  1:40 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:f65f63c4d86a48be042a3ad242fffe5fe8347ff0

commit r14-5591-gf65f63c4d86a48be042a3ad242fffe5fe8347ff0
Author: David Malcolm <dmalcolm@redhat.com>
Date:   Sat Nov 18 20:35:59 2023 -0500

    analyzer: new warning: -Wanalyzer-undefined-behavior-strtok [PR107573]
    
    This patch:
    - adds support to the analyzer for tracking API-private state
      or which we don't have a decl (such as strtok's internal state),
    - uses it to implement a new -Wanalyzer-undefined-behavior-strtok which
      warns when strtok (NULL, delim) is called as the first call to
      strtok after main.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/107573
            * analyzer.h (register_known_functions): Add region_model_manager
            param.
            * analyzer.opt (Wanalyzer-undefined-behavior-strtok): New.
            * call-summary.cc
            (call_summary_replay::convert_region_from_summary_1): Handle
            RK_PRIVATE.
            * engine.cc (impl_run_checkers): Pass model manager to
            register_known_functions.
            * kf.cc (class undefined_function_behavior): New.
            (class kf_strtok): New.
            (register_known_functions): Add region_model_manager param.
            Use it to register "strtok".
            * region-model-manager.cc
            (region_model_manager::get_or_create_conjured_svalue): Add "idx"
            param.
            * region-model-manager.h
            (region_model_manager::get_or_create_conjured_svalue): Add "idx"
            param.
            (region_model_manager::get_root_region): New accessor.
            * region-model.cc (region_model::scan_for_null_terminator): Handle
            "expr" being null.
            (region_model::get_representative_path_var_1): Handle RK_PRIVATE.
            * region-model.h (region_model::called_from_main_p): Make public.
            * region.cc (region::get_memory_space): Handle RK_PRIVATE.
            (region::can_have_initial_svalue_p): Handle MEMSPACE_PRIVATE.
            (private_region::dump_to_pp): New.
            * region.h (MEMSPACE_PRIVATE): New.
            (RK_PRIVATE): New.
            (class private_region): New.
            (is_a_helper <const private_region *>::test): New.
            * store.cc (store::replay_call_summary_cluster): Handle
            RK_PRIVATE.
            * svalue.h (struct conjured_svalue::key_t): Add "idx" param to
            ctor and "m_idx" field.
            (class conjured_svalue::conjured_svalue): Likewise.
    
    gcc/ChangeLog:
            PR analyzer/107573
            * doc/invoke.texi: Add -Wanalyzer-undefined-behavior-strtok.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/107573
            * c-c++-common/analyzer/strtok-1.c: New test.
            * c-c++-common/analyzer/strtok-2.c: New test.
            * c-c++-common/analyzer/strtok-3.c: New test.
            * c-c++-common/analyzer/strtok-4.c: New test.
            * c-c++-common/analyzer/strtok-cppreference.c: New test.
    
    Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Diff:
---
 gcc/analyzer/analyzer.h                            |   3 +-
 gcc/analyzer/analyzer.opt                          |   4 +
 gcc/analyzer/call-summary.cc                       |   1 +
 gcc/analyzer/engine.cc                             |   3 +-
 gcc/analyzer/kf.cc                                 | 320 ++++++++++++++++++++-
 gcc/analyzer/region-model-manager.cc               |  10 +-
 gcc/analyzer/region-model-manager.h                |   4 +-
 gcc/analyzer/region-model.cc                       |   7 +-
 gcc/analyzer/region-model.h                        |   3 +-
 gcc/analyzer/region.cc                             |  14 +
 gcc/analyzer/region.h                              |  41 ++-
 gcc/analyzer/store.cc                              |   1 +
 gcc/analyzer/svalue.h                              |  13 +-
 gcc/doc/invoke.texi                                |  14 +
 gcc/testsuite/c-c++-common/analyzer/strtok-1.c     |  62 ++++
 gcc/testsuite/c-c++-common/analyzer/strtok-2.c     |  18 ++
 gcc/testsuite/c-c++-common/analyzer/strtok-3.c     |  26 ++
 gcc/testsuite/c-c++-common/analyzer/strtok-4.c     |  42 +++
 .../c-c++-common/analyzer/strtok-cppreference.c    |  50 ++++
 19 files changed, 619 insertions(+), 17 deletions(-)

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 777293ff4b9..f08572bb633 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -326,7 +326,8 @@ public:
   void impl_call_pre (const call_details &cd) const override;
 };
 
-extern void register_known_functions (known_function_manager &mgr);
+extern void register_known_functions (known_function_manager &kfm,
+				      region_model_manager &rmm);
 extern void register_known_analyzer_functions (known_function_manager &kfm);
 extern void register_known_fd_functions (known_function_manager &kfm);
 extern void register_known_file_functions (known_function_manager &kfm);
diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index fae2649389a..a3c30caf2ab 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -222,6 +222,10 @@ Wanalyzer-tainted-size
 Common Var(warn_analyzer_tainted_size) Init(1) Warning
 Warn about code paths in which an unsanitized value is used as a size.
 
+Wanalyzer-undefined-behavior-strtok
+Common Var(warn_analyzer_undefined_behavior_strtok) Init(1) Warning
+Warn about code paths in in which a call is made to strtok with undefined behavior.
+
 Wanalyzer-use-after-free
 Common Var(warn_analyzer_use_after_free) Init(1) Warning
 Warn about code paths in which a freed value is used.
diff --git a/gcc/analyzer/call-summary.cc b/gcc/analyzer/call-summary.cc
index a18a1b1b40a..ecb6fb13c9e 100644
--- a/gcc/analyzer/call-summary.cc
+++ b/gcc/analyzer/call-summary.cc
@@ -585,6 +585,7 @@ call_summary_replay::convert_region_from_summary_1 (const region *summary_reg)
     case RK_STRING:
     case RK_ERRNO:
     case RK_UNKNOWN:
+    case RK_PRIVATE:
       /* We can reuse these regions directly.  */
       return summary_reg;
 
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 041fe6a1d40..b4e855fcf24 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -6174,7 +6174,8 @@ impl_run_checkers (logger *logger)
   auto_delete_vec <state_machine> checkers;
   make_checkers (checkers, logger);
 
-  register_known_functions (*eng.get_known_function_manager ());
+  register_known_functions (*eng.get_known_function_manager (),
+			    *eng.get_model_manager ());
 
   plugin_analyzer_init_impl data (&checkers,
 				  eng.get_known_function_manager (),
diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 92959891fe4..5d8e04d9726 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -40,6 +40,40 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+/* Abstract subclass for describing undefined behavior of an API.  */
+
+class undefined_function_behavior
+  : public pending_diagnostic_subclass<undefined_function_behavior>
+{
+public:
+  undefined_function_behavior (const call_details &cd)
+  : m_call_stmt (cd.get_call_stmt ()),
+    m_callee_fndecl (cd.get_fndecl_for_call ())
+  {
+    gcc_assert (m_call_stmt);
+    gcc_assert (m_callee_fndecl);
+  }
+
+  const char *get_kind () const final override
+  {
+    return "undefined_behavior";
+  }
+
+  bool operator== (const undefined_function_behavior &other) const
+  {
+    return (m_call_stmt == other.m_call_stmt
+	    && m_callee_fndecl == other.m_callee_fndecl);
+  }
+
+  bool terminate_path_p () const final override { return true; }
+
+  tree get_callee_fndecl () const { return m_callee_fndecl; }
+
+private:
+  const gimple *m_call_stmt;
+  tree m_callee_fndecl;
+};
+
 /* class pure_known_function_with_default_return : public known_function.  */
 
 void
@@ -1679,6 +1713,288 @@ kf_strstr::impl_call_post (const call_details &cd) const
     }
 }
 
+/* Handle calls to "strtok".
+   See e.g.
+     https://en.cppreference.com/w/c/string/byte/strtok
+     https://man7.org/linux/man-pages/man3/strtok.3.html  */
+
+class kf_strtok : public known_function
+{
+public:
+  class undefined_behavior : public undefined_function_behavior
+  {
+  public:
+    undefined_behavior (const call_details &cd)
+    : undefined_function_behavior (cd)
+    {
+    }
+    int get_controlling_option () const final override
+    {
+      return OPT_Wanalyzer_undefined_behavior_strtok;
+    }
+
+    bool emit (rich_location *rich_loc, logger *) final override
+    {
+      /* CWE-476: NULL Pointer Dereference.  */
+      diagnostic_metadata m;
+      m.add_cwe (476);
+      if (warning_meta
+	  (rich_loc, m, get_controlling_option (),
+	   "calling %qD for first time with NULL as argument 1"
+	   " has undefined behavior",
+	   get_callee_fndecl ()))
+	{
+	  inform (rich_loc->get_loc (),
+		  "some implementations of %qD may crash on such input",
+		  get_callee_fndecl ());
+	  return true;
+	}
+      return false;
+    }
+
+    label_text describe_final_event (const evdesc::final_event &ev)
+      final override
+    {
+      return ev.formatted_print
+	("calling %qD for first time with NULL as argument 1"
+	 " has undefined behavior",
+	 get_callee_fndecl ());
+    }
+  };
+
+  /* An outcome of a "strtok" call.
+     We have a four-way bifurcation of the analysis via the
+     4 combinations of two flags:
+     - m_nonnull_str covers whether the "str" param was null or non-null
+     - m_found covers whether the result is null or non-null
+   */
+  class strtok_call_info : public call_info
+  {
+  public:
+    strtok_call_info (const call_details &cd,
+		      const private_region &private_reg,
+		      bool nonnull_str,
+		      bool found)
+    : call_info (cd),
+      m_private_reg (private_reg),
+      m_nonnull_str (nonnull_str),
+      m_found (found)
+    {
+    }
+
+    label_text get_desc (bool can_colorize) const final override
+    {
+      if (m_nonnull_str)
+	{
+	  if (m_found)
+	    return make_label_text
+	      (can_colorize,
+	       "when %qE on non-NULL string returns non-NULL",
+	       get_fndecl ());
+	  else
+	    return make_label_text
+	      (can_colorize,
+	       "when %qE on non-NULL string returns NULL",
+	       get_fndecl ());
+	}
+      else
+	{
+	  if (m_found)
+	    return make_label_text
+	      (can_colorize,
+	       "when %qE with NULL string (using prior) returns non-NULL",
+	       get_fndecl ());
+	  else
+	    return make_label_text
+	      (can_colorize,
+	       "when %qE with NULL string (using prior) returns NULL",
+	       get_fndecl ());
+	}
+    }
+
+    bool update_model (region_model *model,
+		       const exploded_edge *,
+		       region_model_context *ctxt) const final override
+    {
+      region_model_manager *mgr = model->get_manager ();
+      const call_details cd (get_call_details (model, ctxt));
+      const svalue *str_sval = cd.get_arg_svalue (0);
+      /* const svalue *delim_sval = cd.get_arg_svalue (1); */
+
+      cd.check_for_null_terminated_string_arg (1);
+      /* We check that either arg 0 or the private region is null
+	 terminated below.  */
+
+      const svalue *null_ptr_sval
+	= mgr->get_or_create_null_ptr (cd.get_arg_type (0));;
+      if (!model->add_constraint (str_sval,
+				  m_nonnull_str ? NE_EXPR : EQ_EXPR,
+				  null_ptr_sval,
+				  cd.get_ctxt ()))
+	return false;
+
+      if (m_nonnull_str)
+	{
+	  /* Update internal buffer.  */
+	  model->set_value (&m_private_reg,
+			    mgr->get_or_create_unmergeable (str_sval),
+			    ctxt);
+	}
+      else
+	{
+	  /* Read from internal buffer.  */
+	  str_sval = model->get_store_value (&m_private_reg, ctxt);
+
+	  /* The initial value of the private region is NULL when we're
+	     on a path from main.  */
+	  if (const initial_svalue *initial_sval
+		= str_sval->dyn_cast_initial_svalue ())
+	    if (initial_sval->get_region () == &m_private_reg
+		&& model->called_from_main_p ())
+	      {
+		/* Implementations of strtok do not necessarily check for NULL
+		   here, and may crash; see PR analyzer/107573.
+		   Warn for this, if we were definitely passed NULL.  */
+		if (cd.get_arg_svalue (0)->all_zeroes_p ())
+		  {
+		    if (ctxt)
+		      ctxt->warn (::make_unique<undefined_behavior> (cd));
+		  }
+
+		/* Assume that "str" was actually non-null; terminate
+		   this path.  */
+		return false;
+	      }
+
+	  /* Now assume str_sval is non-null.  */
+	  if (!model->add_constraint (str_sval,
+				      NE_EXPR,
+				      null_ptr_sval,
+				      cd.get_ctxt ()))
+	    return false;
+	}
+
+      const region *buf_reg = model->deref_rvalue (str_sval, NULL_TREE, ctxt);
+      model->scan_for_null_terminator (buf_reg,
+				       NULL_TREE,
+				       nullptr,
+				       ctxt);
+
+      if (m_found)
+	{
+	  const region *str_reg
+	    = model->deref_rvalue (str_sval, cd.get_arg_tree (0),
+				   cd.get_ctxt ());
+	  /* We want to figure out the start and nul terminator
+	     for the token.
+	     For each, we want str_sval + OFFSET for some unknown OFFSET.
+	     Use a conjured_svalue to represent the offset,
+	     using the str_reg as the id of the conjured_svalue.  */
+	  const svalue *start_offset
+	    = mgr->get_or_create_conjured_svalue (size_type_node,
+						  cd.get_call_stmt (),
+						  str_reg,
+						  conjured_purge (model,
+								  ctxt),
+						  0);
+	  const svalue *nul_offset
+	    = mgr->get_or_create_conjured_svalue (size_type_node,
+						  cd.get_call_stmt (),
+						  str_reg,
+						  conjured_purge (model,
+								  ctxt),
+						  1);
+
+	  tree char_ptr_type = build_pointer_type (char_type_node);
+	  const svalue *result
+	    = mgr->get_or_create_binop (char_ptr_type, POINTER_PLUS_EXPR,
+					str_sval, start_offset);
+	  cd.maybe_set_lhs (result);
+
+	  /* nul_offset + 1; the offset to use for the next call.  */
+	  const svalue *next_offset
+	    = mgr->get_or_create_binop (size_type_node, PLUS_EXPR,
+					nul_offset,
+					mgr->get_or_create_int_cst
+					(char_type_node, 1));
+
+	  /* Write '\0' to str_sval[nul_offset].  */
+	  const svalue *ptr_to_term
+	    = mgr->get_or_create_binop (char_ptr_type, POINTER_PLUS_EXPR,
+					str_sval, nul_offset);
+	  const region *terminator_reg
+	    = model->deref_rvalue (ptr_to_term, NULL_TREE, cd.get_ctxt ());
+	  model->set_value (terminator_reg,
+			    mgr->get_or_create_unmergeable
+			    (mgr->get_or_create_int_cst (char_type_node,
+							 0)),
+			    cd.get_ctxt ());
+
+	  /* Update saved ptr to be at [nul_offset + 1].  */
+	  const svalue *ptr_to_next
+	    = mgr->get_or_create_binop (cd.get_lhs_type (), POINTER_PLUS_EXPR,
+					str_sval, next_offset);
+	  model->set_value (&m_private_reg, ptr_to_next, ctxt);
+	}
+      else
+	if (tree lhs_type = cd.get_lhs_type ())
+	  {
+	    const svalue *result
+	      = mgr->get_or_create_int_cst (lhs_type, 0);
+	    cd.maybe_set_lhs (result);
+	  }
+      return true;
+    }
+  private:
+    const private_region &m_private_reg;
+    bool m_nonnull_str;
+    bool m_found;
+  }; // class strtok_call_info
+
+  kf_strtok (region_model_manager &mgr)
+  : m_private_reg (mgr.alloc_symbol_id (),
+		   mgr.get_root_region (),
+		   get_region_type (),
+		   "strtok buffer")
+  {
+  }
+
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 2
+	    && POINTER_TYPE_P (cd.get_arg_type (0))
+	    && POINTER_TYPE_P (cd.get_arg_type (1)));
+  }
+
+  void impl_call_post (const call_details &cd) const final override
+  {
+    if (cd.get_ctxt ())
+      {
+	/* Four-way bifurcation, based on whether:
+	   - the str is non-null
+	   - the result is non-null
+	   Typically the str is either null or non-null at a particular site,
+	   so hopefully this will generally just lead to two out-edges.  */
+	cd.get_ctxt ()->bifurcate
+	  (make_unique<strtok_call_info> (cd, m_private_reg, false, false));
+	cd.get_ctxt ()->bifurcate
+	  (make_unique<strtok_call_info> (cd, m_private_reg, false, true));
+	cd.get_ctxt ()->bifurcate
+	  (make_unique<strtok_call_info> (cd, m_private_reg, true, false));
+	cd.get_ctxt ()->bifurcate
+	  (make_unique<strtok_call_info> (cd, m_private_reg, true, true));
+	cd.get_ctxt ()->terminate_path ();
+      }
+  }
+
+private:
+  static tree get_region_type ()
+  {
+    return build_pointer_type (char_type_node);
+  }
+  const private_region m_private_reg;
+};
+
 class kf_ubsan_bounds : public internal_known_function
 {
   /* Empty.  */
@@ -1823,7 +2139,8 @@ register_atomic_builtins (known_function_manager &kfm)
    analyzer (as opposed to plugins).  */
 
 void
-register_known_functions (known_function_manager &kfm)
+register_known_functions (known_function_manager &kfm,
+			  region_model_manager &rmm)
 {
   /* Debugging/test support functions, all  with a "__analyzer_" prefix.  */
   register_known_analyzer_functions (kfm);
@@ -1911,6 +2228,7 @@ register_known_functions (known_function_manager &kfm)
   {
     kfm.add ("fopen", make_unique<kf_fopen> ());
     kfm.add ("putenv", make_unique<kf_putenv> ());
+    kfm.add ("strtok", make_unique<kf_strtok> (rmm));
 
     register_known_fd_functions (kfm);
     register_known_file_functions (kfm);
diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc
index 22246876f8f..921edc55868 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -1261,7 +1261,8 @@ conjured_purge::purge (const conjured_svalue *sval) const
 }
 
 /* Return the svalue * of type TYPE for the value conjured for ID_REG
-   at STMT, creating it if necessary.
+   at STMT (using IDX for any further disambiguation),
+   creating it if necessary.
    Use P to purge existing state from the svalue, for the case where a
    conjured_svalue would be reused along an execution path.  */
 
@@ -1269,9 +1270,10 @@ const svalue *
 region_model_manager::get_or_create_conjured_svalue (tree type,
 						     const gimple *stmt,
 						     const region *id_reg,
-						     const conjured_purge &p)
+						     const conjured_purge &p,
+						     unsigned idx)
 {
-  conjured_svalue::key_t key (type, stmt, id_reg);
+  conjured_svalue::key_t key (type, stmt, id_reg, idx);
   if (conjured_svalue **slot = m_conjured_values_map.get (key))
     {
       const conjured_svalue *sval = *slot;
@@ -1283,7 +1285,7 @@ region_model_manager::get_or_create_conjured_svalue (tree type,
       return sval;
     }
   conjured_svalue *conjured_sval
-    = new conjured_svalue (alloc_symbol_id (), type, stmt, id_reg);
+    = new conjured_svalue (alloc_symbol_id (), type, stmt, id_reg, idx);
   RETURN_UNKNOWN_IF_TOO_COMPLEX (conjured_sval);
   m_conjured_values_map.put (key, conjured_sval);
   return conjured_sval;
diff --git a/gcc/analyzer/region-model-manager.h b/gcc/analyzer/region-model-manager.h
index a5281819a69..2ddd7de7ccf 100644
--- a/gcc/analyzer/region-model-manager.h
+++ b/gcc/analyzer/region-model-manager.h
@@ -79,7 +79,8 @@ public:
 					       const binding_map &map);
   const svalue *get_or_create_conjured_svalue (tree type, const gimple *stmt,
 					       const region *id_reg,
-					       const conjured_purge &p);
+					       const conjured_purge &p,
+					       unsigned idx = 0);
   const svalue *
   get_or_create_asm_output_svalue (tree type,
 				   const gasm *asm_stmt,
@@ -105,6 +106,7 @@ public:
   const svalue *create_unique_svalue (tree type);
 
   /* region consolidation.  */
+  const root_region *get_root_region () const { return &m_root_region; }
   const stack_region * get_stack_region () const { return &m_stack_region; }
   const heap_region *get_heap_region () const { return &m_heap_region; }
   const code_region *get_code_region () const { return &m_code_region; }
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 052c47d38f1..420c10380a4 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3884,8 +3884,10 @@ region_model::scan_for_null_terminator (const region *reg,
   byte_range bytes_to_read (src_byte_offset, 1);
   const svalue *sval = get_store_bytes (base_reg, bytes_to_read, ctxt);
   tree byte_expr
-    = get_tree_for_byte_offset (expr,
-				src_byte_offset - initial_src_byte_offset);
+    = (expr
+       ? get_tree_for_byte_offset (expr,
+				   src_byte_offset - initial_src_byte_offset)
+       : NULL_TREE);
   check_for_poison (sval, byte_expr, nullptr, ctxt);
   if (base_reg->can_have_initial_svalue_p ())
     {
@@ -5077,6 +5079,7 @@ region_model::get_representative_path_var_1 (const region *reg,
     case RK_VAR_ARG:
     case RK_ERRNO:
     case RK_UNKNOWN:
+    case RK_PRIVATE:
       return path_var (NULL_TREE, 0);
     }
 }
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index e26d18713b1..2e15924fddb 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -559,6 +559,8 @@ class region_model
 	callback (model, prev_model, retval, ctxt);
   }
 
+  bool called_from_main_p () const;
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -606,7 +608,6 @@ private:
 			   bool nonnull,
 			   region_model_context *ctxt);
 
-  bool called_from_main_p () const;
   const svalue *get_initial_value_for_global (const region *reg) const;
 
   const region * get_region_for_poisoned_expr (tree expr) const;
diff --git a/gcc/analyzer/region.cc b/gcc/analyzer/region.cc
index 730dab3d707..4feb9721ae7 100644
--- a/gcc/analyzer/region.cc
+++ b/gcc/analyzer/region.cc
@@ -514,6 +514,8 @@ region::get_memory_space () const
 	  return MEMSPACE_HEAP;
 	case RK_STRING:
 	  return MEMSPACE_READONLY_DATA;
+	case RK_PRIVATE:
+	  return MEMSPACE_PRIVATE;
 	}
       if (iter->get_kind () == RK_CAST)
 	iter = iter->dyn_cast_cast_region ()->get_original_region ();
@@ -543,6 +545,7 @@ region::can_have_initial_svalue_p () const
     case MEMSPACE_CODE:
     case MEMSPACE_GLOBALS:
     case MEMSPACE_READONLY_DATA:
+    case MEMSPACE_PRIVATE:
       /* Such regions have initial_svalues.  */
       return true;
 
@@ -2259,6 +2262,17 @@ errno_region::dump_to_pp (pretty_printer *pp, bool simple) const
     pp_string (pp, "errno_region()");
 }
 
+/* class private_region : public region.  */
+
+void
+private_region::dump_to_pp (pretty_printer *pp, bool simple) const
+{
+  if (simple)
+    pp_printf (pp, "PRIVATE_REG(%qs)", m_desc);
+  else
+    pp_printf (pp, "private_region(%qs)", m_desc);
+}
+
 /* class unknown_region : public region.  */
 
 /* Implementation of region::dump_to_pp vfunc for unknown_region.  */
diff --git a/gcc/analyzer/region.h b/gcc/analyzer/region.h
index 47242385fd1..fee161cf863 100644
--- a/gcc/analyzer/region.h
+++ b/gcc/analyzer/region.h
@@ -35,7 +35,8 @@ enum memory_space
   MEMSPACE_STACK,
   MEMSPACE_HEAP,
   MEMSPACE_READONLY_DATA,
-  MEMSPACE_THREAD_LOCAL
+  MEMSPACE_THREAD_LOCAL,
+  MEMSPACE_PRIVATE
 };
 
 /* An enum for discriminating between the different concrete subclasses
@@ -65,6 +66,7 @@ enum region_kind
   RK_BIT_RANGE,
   RK_VAR_ARG,
   RK_ERRNO,
+  RK_PRIVATE,
   RK_UNKNOWN,
 };
 
@@ -108,6 +110,7 @@ enum region_kind
      var_arg_region (RK_VAR_ARG): a region for the N-th vararg within a
 				  frame_region for a variadic call
      errno_region (RK_ERRNO): a region for holding "errno"
+     private_region (RK_PRIVATE): a region for internal state of an API
      unknown_region (RK_UNKNOWN): for handling unimplemented tree codes.  */
 
 /* Abstract base class for representing ways of accessing chunks of memory.
@@ -1434,6 +1437,42 @@ is_a_helper <const errno_region *>::test (const region *reg)
 
 namespace ana {
 
+/* Similar to a decl region, but we don't have the decl.
+   For implementing e.g. static buffers of known_functions,
+   or other internal state of an API.
+
+   These are owned by known_function instances, rather than the
+   region_model_manager.  */
+
+class private_region : public region
+{
+public:
+  private_region (unsigned id, const region *parent, tree type,
+		  const char *desc)
+  : region (complexity (parent), id, parent, type),
+    m_desc (desc)
+  {}
+
+  enum region_kind get_kind () const final override { return RK_PRIVATE; }
+
+  void dump_to_pp (pretty_printer *pp, bool simple) const final override;
+
+private:
+  const char *m_desc;
+};
+
+} // namespace ana
+
+template <>
+template <>
+inline bool
+is_a_helper <const private_region *>::test (const region *reg)
+{
+  return reg->get_kind () == RK_PRIVATE;
+}
+
+namespace ana {
+
 /* An unknown region, for handling unimplemented tree codes.  */
 
 class unknown_region : public region
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index aeea6931137..602508598cc 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -3372,6 +3372,7 @@ store::replay_call_summary_cluster (call_summary_replay &r,
     case RK_HEAP_ALLOCATED:
     case RK_DECL:
     case RK_ERRNO:
+    case RK_PRIVATE:
       {
 	const region *caller_dest_reg
 	  = r.convert_region_from_summary (summary_base_reg);
diff --git a/gcc/analyzer/svalue.h b/gcc/analyzer/svalue.h
index 263a0d7af6f..9415e980895 100644
--- a/gcc/analyzer/svalue.h
+++ b/gcc/analyzer/svalue.h
@@ -1360,8 +1360,8 @@ public:
   /* A support class for uniquifying instances of conjured_svalue.  */
   struct key_t
   {
-    key_t (tree type, const gimple *stmt, const region *id_reg)
-    : m_type (type), m_stmt (stmt), m_id_reg (id_reg)
+    key_t (tree type, const gimple *stmt, const region *id_reg, unsigned idx)
+    : m_type (type), m_stmt (stmt), m_id_reg (id_reg), m_idx (idx)
     {}
 
     hashval_t hash () const
@@ -1377,7 +1377,8 @@ public:
     {
       return (m_type == other.m_type
 	      && m_stmt == other.m_stmt
-	      && m_id_reg == other.m_id_reg);
+	      && m_id_reg == other.m_id_reg
+	      && m_idx == other.m_idx);
     }
 
     /* Use m_stmt to mark empty/deleted, as m_type can be NULL for
@@ -1393,12 +1394,13 @@ public:
     tree m_type;
     const gimple *m_stmt;
     const region *m_id_reg;
+    unsigned m_idx;
   };
 
   conjured_svalue (symbol::id_t id, tree type, const gimple *stmt,
-		   const region *id_reg)
+		   const region *id_reg, unsigned idx)
   : svalue (complexity (id_reg), id, type),
-    m_stmt (stmt), m_id_reg (id_reg)
+    m_stmt (stmt), m_id_reg (id_reg), m_idx (idx)
   {
     gcc_assert (m_stmt != NULL);
   }
@@ -1419,6 +1421,7 @@ public:
  private:
   const gimple *m_stmt;
   const region *m_id_reg;
+  unsigned m_idx;
 };
 
 } // namespace ana
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 557d613a1e6..1f109679c70 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -490,6 +490,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wno-analyzer-tainted-offset
 -Wno-analyzer-tainted-size
 -Wanalyzer-too-complex
+-Wno-analyzer-undefined-behavior-strtok
 -Wno-analyzer-unsafe-call-within-signal-handler
 -Wno-analyzer-use-after-free
 -Wno-analyzer-use-of-pointer-in-stale-stack-frame
@@ -10424,6 +10425,7 @@ Enabling this option effectively enables the following warnings:
 -Wanalyzer-tainted-divisor
 -Wanalyzer-tainted-offset
 -Wanalyzer-tainted-size
+-Wanalyzer-undefined-behavior-strtok
 -Wanalyzer-unsafe-call-within-signal-handler
 -Wanalyzer-use-after-free
 -Wanalyzer-use-of-pointer-in-stale-stack-frame
@@ -11060,6 +11062,18 @@ attacker could inject an out-of-bounds access.
 
 See @uref{https://cwe.mitre.org/data/definitions/129.html, CWE-129: Improper Validation of Array Index}.
 
+@opindex Wanalyzer-undefined-behavior-strtok
+@opindex Wno-analyzer-undefined-behavior-strtok
+@item -Wno-analyzer-undefined-behavior-strtok
+This warning requires @option{-fanalyzer}, which enables it; use
+@option{-Wno-analyzer-undefined-behavior-strtok} to disable it.
+
+This diagnostic warns for paths through the code in which a
+call is made to @code{strtok} with undefined behavior.
+
+Specifically, passing NULL as the first parameter for the initial
+call to @code{strtok} within a process has undefined behavior.
+
 @opindex Wanalyzer-unsafe-call-within-signal-handler
 @opindex Wno-analyzer-unsafe-call-within-signal-handler
 @item -Wno-analyzer-unsafe-call-within-signal-handler
diff --git a/gcc/testsuite/c-c++-common/analyzer/strtok-1.c b/gcc/testsuite/c-c++-common/analyzer/strtok-1.c
new file mode 100644
index 00000000000..33150ced90a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strtok-1.c
@@ -0,0 +1,62 @@
+/* { dg-additional-options "-fpermissive" { target c++ } } */
+
+#include "../../gcc.dg/analyzer/analyzer-decls.h"
+
+typedef __SIZE_TYPE__ size_t;
+
+extern char *strtok (char *str, const char *delim)
+  __attribute__((nonnull (2)));
+
+char *
+test_passthrough (char *str, const char *delim)
+{
+  return strtok (str, delim);
+}
+
+char *
+test_null_str (const char *delim)
+{
+  return strtok (NULL, delim);
+}
+
+char *
+test_null_delim (char *str)
+{
+  return strtok (str, NULL); /* { dg-warning "use of NULL where non-null expected" } */
+  /* This is from the attribute.  */
+}
+
+char *
+test_write_to_literal (void)
+{
+  const char *str = "hello world";
+  return strtok ((char *)str, " "); /* { dg-warning "write to string literal" } */
+}
+
+char *
+test_unterminated_str (const char *delim)
+{
+  char str[3] = "abc"; /* { dg-warning "initializer-string for '\[^\n\]*' is too long" "" { target c++ } } */
+  return strtok (str, delim); /* { dg-warning "stack-based buffer over-read" } */
+}
+
+char *
+test_unterminated_delimstr (char *str)
+{
+  char delim[3] = "abc"; /* { dg-warning "initializer-string for '\[^\n\]*' is too long" "" { target c++ } } */
+  return strtok (str, delim); /* { dg-warning "stack-based buffer over-read" } */
+  /* { dg-message "while looking for null terminator for argument 2" "" { target *-*-* } .-1 } */
+}
+
+size_t
+test_use_after_free_via_null_2 (char *p)
+{
+  strtok (p, " ");
+  __builtin_free (p);
+
+  char *q = strtok (NULL, " "); /* TODO: should complain about this.  */
+  if (q)
+    return __builtin_strlen (q); /* TODO: should complain about this.  */
+  else
+    return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/strtok-2.c b/gcc/testsuite/c-c++-common/analyzer/strtok-2.c
new file mode 100644
index 00000000000..0336bf0cfe9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strtok-2.c
@@ -0,0 +1,18 @@
+#include "../../gcc.dg/analyzer/analyzer-decls.h"
+
+extern char *strtok (char *str, const char *delim)
+  __attribute__((nonnull (2)));
+
+int
+main(int argc, char *argv[])
+{
+  char *cmd;
+  char *arg;
+
+  if (argc < 2)
+    return -1;
+
+  cmd = strtok (argv[1], " "); /* { dg-bogus "undefined behavior" } */
+  arg = strtok (NULL, " ");
+  return 0;  
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/strtok-3.c b/gcc/testsuite/c-c++-common/analyzer/strtok-3.c
new file mode 100644
index 00000000000..f18f1a9eebe
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strtok-3.c
@@ -0,0 +1,26 @@
+#include "../../gcc.dg/analyzer/analyzer-decls.h"
+
+extern char *strtok (char *str, const char *delim)
+  __attribute__((nonnull (2)));
+
+int
+main (int argc, char *argv[])
+{
+  char *cmd;
+  char *arg;
+
+  if (argc < 2)
+    return -1;
+
+  cmd = strtok (NULL, " "); /* { dg-line "first_call" } */
+  arg = strtok (NULL, " ");
+  return 0;
+
+  /* C:
+     { dg-warning "calling 'strtok' for first time with NULL as argument 1 has undefined behavior \\\[CWE-476\\\] \\\[-Wanalyzer-undefined-behavior-strtok\\\]" "" { target c } first_call }
+     { dg-message "some implementations of 'strtok' may crash on such input" "" { target c } first_call } */
+
+  /* C++:
+     { dg-warning "calling 'char\\* strtok\\(char\\*, const char\\*\\)' for first time with NULL as argument 1 has undefined behavior \\\[CWE-476\\\] \\\[-Wanalyzer-undefined-behavior-strtok\\\]" "" { target c++ } first_call }
+     { dg-message "some implementations of 'char\\* strtok\\(char\\*, const char\\*\\)' may crash on such input" "" { target c++ } first_call }  */
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/strtok-4.c b/gcc/testsuite/c-c++-common/analyzer/strtok-4.c
new file mode 100644
index 00000000000..b6b7d49e3c3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strtok-4.c
@@ -0,0 +1,42 @@
+#include "../../gcc.dg/analyzer/analyzer-decls.h"
+
+extern char *strtok (char *str, const char *delim);
+
+void test (void)
+{
+  char buffer[] = { 'a', 'x', 'b', 'y', 'c', '\0' };
+
+  char *p1 = strtok (buffer, "x");
+  /* Should result in:
+      | buffer[] = { 'a', '\0', 'b', 'y', 'c', '\0' },
+      |               ^    ^     ^
+      |               |    |     |
+      |               |    |     internal ptr
+      |               p1   modified.  */
+
+  char *p2 = strtok (NULL, "y"); /* note new delimiter.  */
+  /* Should result in:
+      | buffer[] = { 'a', '\0', 'b', '\0', 'c', '\0' },
+      |                          ^    ^     ^
+      |                          |    |     |
+      |                          |    |     internal ptr
+      |                          p2   modified.  */
+
+  char *p3 = strtok (NULL, "z"); /* again new delimiter.  */
+  /* Should result in:
+      | buffer[] = { 'a', '\0', 'b', '\0', 'c', '\0' },
+      |                                     ^    ^~
+      |                                     |    |
+      |                                     |    internal ptr
+      |                                     p3.  */
+
+  char *p4 = strtok (NULL, "z");
+  /* Should result in p4 == NULL, and:
+      | buffer[] = { 'a', '\0', 'b', '\0', 'c', '\0' },
+      |                                          ^~
+      |                                          |
+      |                                         internal ptr.  */
+
+  /* We don't yet model strtok closely enough to capture
+     these exact behaviors.  */
+}
diff --git a/gcc/testsuite/c-c++-common/analyzer/strtok-cppreference.c b/gcc/testsuite/c-c++-common/analyzer/strtok-cppreference.c
new file mode 100644
index 00000000000..a2e912341d6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/strtok-cppreference.c
@@ -0,0 +1,50 @@
+/* Example of strtok adapted from:
+     https://en.cppreference.com/w/c/string/byte/strtok
+   which is 
+    "licensed under Creative Commons Attribution-Sharealike 3.0
+     Unported License (CC-BY-SA) and by the GNU Free Documentation License
+     (GFDL) (unversioned, with no invariant sections, front-cover texts, or
+     back-cover texts). That means that you can use this site in almost any way
+     you like, including mirroring, copying, translating, etc. All we would ask
+     is to provide link back to cppreference.com so that people know where to
+     get the most up-to-date content. In addition to that, any modified content
+     should be released under an equivalent license so that everyone could
+     benefit from the modified versions. "  */
+
+#define __STDC_WANT_LIB_EXT1__ 0
+#include <string.h>
+#include <stdio.h>
+ 
+int main(void)
+{
+    char input[] = "A bird came down the walk";
+    printf("Parsing the input string '%s'\n", input);
+    char *token = strtok(input, " ");
+    while(token) {
+        puts(token);
+        token = strtok(NULL, " ");
+    }
+ 
+    printf("Contents of the input string now: '");
+    for(size_t n = 0; n < sizeof input; ++n)
+        input[n] ? putchar(input[n]) : fputs("\\0", stdout);
+    puts("'");
+ 
+#ifdef __STDC_LIB_EXT1__
+    char str[] = "A bird came down the walk";
+    rsize_t strmax = sizeof str;
+    const char *delim = " ";
+    char *next_token;
+    printf("Parsing the input string '%s'\n", str);
+    token = strtok_s(str, &strmax, delim, &next_token);
+    while(token) {
+        puts(token);
+        token = strtok_s(NULL, &strmax, delim, &next_token);
+    }
+ 
+    printf("Contents of the input string now: '");
+    for(size_t n = 0; n < sizeof str; ++n)
+        str[n] ? putchar(str[n]) : fputs("\\0", stdout);
+    puts("'");
+#endif
+}

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-11-19  1:40 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-19  1:40 [gcc r14-5591] analyzer: new warning: -Wanalyzer-undefined-behavior-strtok [PR107573] David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).