public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Eric Feng <ef2648@columbia.edu>
To: dmalcolm@redhat.com
Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org,
	Eric Feng <ef2648@columbia.edu>
Subject: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
Date: Tue, 29 Aug 2023 00:31:55 -0400	[thread overview]
Message-ID: <20230829043155.17651-1-ef2648@columbia.edu> (raw)
In-Reply-To: <c720fc39d86cb3ae8041fe5c86dad34d71466cec.camel@redhat.com>

Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
	* engine.cc (impl_region_model_context::warn): New optional parameter.
	* exploded-graph.h (class impl_region_model_context): Likewise.
	* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
	* region-model.h (struct append_regions_cb_data): Likewise.
	(class region_model): Likewise.
	(class region_model_context): New optional parameter.
	(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
	* gcc.dg/plugin/plugin.exp: New tests.
	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

---
 gcc/analyzer/engine.cc                        |   8 +-
 gcc/analyzer/exploded-graph.h                 |   4 +-
 gcc/analyzer/region-model.cc                  |   3 +
 gcc/analyzer/region-model.h                   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
 ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index a1908cdb364..736a41ecdaf 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
+impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
+				 const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
     {
       if (get_logger ())
 	get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
       bool terminate_path = d->terminate_path_p ();
       if (m_eg->get_diagnostic_manager ().add_diagnostic
 	  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-	   m_stmt, m_stmt_finder, std::move (d)))
+	   m_stmt, curr_stmt_finder, std::move (d)))
 	{
 	  if (m_path_ctxt
 	      && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 5a7ab645bfe..6e9a5ef58c7 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 			     uncertainty_t *uncertainty,
 			     logger *logger = NULL);
 
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr<pending_note> pn) final override;
   void add_event (std::unique_ptr<checker_event> event) final override;
   void on_svalue_leak (const svalue *) override;
@@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
 			 std::unique_ptr<sm_context> *out_sm_context) override;
 
   const gimple *get_stmt () const override { return m_stmt; }
+  const exploded_graph *get_eg () const override { return m_eg; }
 
   exploded_graph *m_eg;
   log_user m_logger;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 4f31a6dcf0f..eb4f976b83a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 10b2a59e787..440ea6d828d 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+				    const svalue *retval,
+				    region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
    a tree of regions, along with their associated values.
    The representation is graph-like because values can be pointers to
@@ -532,6 +536,20 @@ class region_model
   get_builtin_kf (const gcall *call,
 		  region_model_context *ctxt = NULL) const;
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback &callback)
+  {
+    pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+		       region_model_context *ctxt)
+  {
+    for (auto &callback : pop_frame_callbacks)
+	callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -621,6 +639,7 @@ private:
 						tree callee_fndecl,
 						region_model_context *ctxt) const;
 
+  static auto_vec<pop_frame_callback> pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
@@ -649,8 +668,10 @@ class region_model_context
 {
  public:
   /* Hook for clients to store pending diagnostics.
-     Return true if the diagnostic was stored, or false if it was deleted.  */
-  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
+     Return true if the diagnostic was stored, or false if it was deleted.
+     Optionally provide a custom stmt_finder.  */
+  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
+		     const stmt_finder *custom_finder = NULL) = 0;
 
   /* Hook for clients to add a note to the last previously stored
      pending diagnostic.  */
@@ -757,6 +778,8 @@ class region_model_context
 
   /* Get the current statement, if any.  */
   virtual const gimple *get_stmt () const = 0;
+
+  virtual const exploded_graph *get_eg () const = 0;
 };
 
 /* A "do nothing" subclass of region_model_context.  */
@@ -764,7 +787,8 @@ class region_model_context
 class noop_region_model_context : public region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override { return false; }
   void add_note (std::unique_ptr<pending_note>) override;
   void add_event (std::unique_ptr<checker_event>) override;
   void on_svalue_leak (const svalue *) override {}
@@ -812,6 +836,7 @@ public:
   }
 
   const gimple *get_stmt () const override { return NULL; }
+  const exploded_graph *get_eg () const override { return NULL; }
 };
 
 /* A subclass of region_model_context for determining if operations fail
@@ -840,7 +865,8 @@ private:
 class region_model_context_decorator : public region_model_context
 {
  public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder)
   {
     if (m_inner)
       return m_inner->warn (std::move (d));
@@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
       return nullptr;
   }
 
+  const exploded_graph *get_eg () const override
+  {
+    if (m_inner)
+	return m_inner->get_eg ();
+    else
+	return nullptr;
+  }
+
 protected:
   region_model_context_decorator (region_model_context *inner)
   : m_inner (inner)
@@ -993,7 +1027,8 @@ protected:
 class annotating_context : public region_model_context_decorator
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override
   {
     if (m_inner)
       if (m_inner->warn (std::move (d)))
@@ -1158,7 +1193,8 @@ using namespace ::selftest;
 class test_region_model_context : public noop_region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) final override
   {
     m_diagnostics.safe_push (d.release ());
     return true;
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..b2caed8fc1b 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"
+#include "analyzer/exploded-graph.h"
 #include "make-unique.h"
 
 int plugin_is_GPL_compatible;
@@ -191,6 +192,372 @@ public:
   }
 };
 
+/* This is just a copy of leak_stmt_finder for now (subject to change if
+ * necssary)  */
+
+class refcnt_stmt_finder : public stmt_finder
+{
+public:
+  refcnt_stmt_finder (const exploded_graph &eg, tree var)
+      : m_eg (eg), m_var (var)
+  {
+  }
+
+  std::unique_ptr<stmt_finder>
+  clone () const final override
+  {
+    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
+  }
+
+  const gimple *
+  find_stmt (const exploded_path &epath) final override
+  {
+    logger *const logger = m_eg.get_logger ();
+    LOG_FUNC (logger);
+
+    if (m_var && TREE_CODE (m_var) == SSA_NAME)
+      {
+	/* Locate the final write to this SSA name in the path.  */
+	const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
+
+	int idx_of_def_stmt;
+	bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
+	if (!found)
+	  goto not_found;
+
+	/* What was the next write to the underlying var
+	   after the SSA name was set? (if any).  */
+
+	for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
+	     ++idx)
+	  {
+	    const exploded_edge *eedge = epath.m_edges[idx];
+	    if (logger)
+		    logger->log ("eedge[%i]: EN %i -> EN %i", idx,
+				 eedge->m_src->m_index,
+				 eedge->m_dest->m_index);
+	    const exploded_node *dst_node = eedge->m_dest;
+	    const program_point &dst_point = dst_node->get_point ();
+	    const gimple *stmt = dst_point.get_stmt ();
+	    if (!stmt)
+		    continue;
+	    if (const gassign *assign = dyn_cast<const gassign *> (stmt))
+		    {
+			    tree lhs = gimple_assign_lhs (assign);
+			    if (TREE_CODE (lhs) == SSA_NAME
+				&& SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
+				    return assign;
+		    }
+	  }
+      }
+
+  not_found:
+
+    /* Look backwards for the first statement with a location.  */
+    int i;
+    const exploded_edge *eedge;
+    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
+    {
+      if (logger)
+	logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
+		     eedge->m_dest->m_index);
+      const exploded_node *dst_node = eedge->m_dest;
+      const program_point &dst_point = dst_node->get_point ();
+      const gimple *stmt = dst_point.get_stmt ();
+      if (stmt)
+	if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
+	  return stmt;
+    }
+
+    gcc_unreachable ();
+    return NULL;
+  }
+
+private:
+  const exploded_graph &m_eg;
+  tree m_var;
+};
+
+class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
+{
+public:
+  refcnt_mismatch (const region *base_region,
+				const svalue *ob_refcnt,
+				const svalue *actual_refcnt,
+        tree reg_tree)
+      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
+	m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "refcnt_mismatch";
+  }
+
+  bool
+  operator== (const refcnt_mismatch &other) const
+  {
+    return (m_base_region == other.m_base_region
+	    && m_ob_refcnt == other.m_ob_refcnt
+	    && m_actual_refcnt == other.m_actual_refcnt);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return 0;
+  }
+
+  bool
+  emit (rich_location *rich_loc, logger *) final override
+  {
+    diagnostic_metadata m;
+    bool warned;
+    // just assuming constants for now
+    auto actual_refcnt
+	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    warned = warning_meta (
+	rich_loc, m, get_controlling_option (),
+	"expected <variable name belonging to m_base_region> to have "
+	"reference count: %qE but ob_refcnt field is: %qE",
+	actual_refcnt, ob_refcnt);
+
+    // location_t loc = rich_loc->get_loc ();
+    // foo (loc);
+    return warned;
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_base_region)
+      interest->add_region_creation (m_base_region);
+  }
+
+private:
+
+  void foo(location_t loc) const 
+  {
+    inform(loc, "something is up right here");
+  }
+  const region *m_base_region;
+  const svalue *m_ob_refcnt;
+  const svalue *m_actual_refcnt;
+  tree m_reg_tree;
+};
+
+/* Retrieves the svalue associated with the ob_refcnt field of the base region.
+ */
+static const svalue *
+retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
+			 region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (base_reg, ob_refcnt_tree);
+  const svalue *ob_refcnt_sval
+      = model->get_store_value (ob_refcnt_region, ctxt);
+  return ob_refcnt_sval;
+}
+
+static void
+increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
+{
+  bool existed;
+  auto &refcnt = map.get_or_insert (key, &existed);
+  refcnt = existed ? refcnt + 1 : 1;
+}
+
+
+/* Recursively fills in region_to_refcnt with the references owned by
+   pyobj_ptr_sval.  */
+static void
+count_expected_pyobj_references (const region_model *model,
+			   hash_map<const region *, int> &region_to_refcnt,
+			   const svalue *pyobj_ptr_sval,
+			   hash_set<const region *> &seen)
+{
+  if (!pyobj_ptr_sval)
+    return;
+
+  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
+  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
+  if (!pyobj_region_sval && !pyobj_initial_sval)
+    return;
+
+  // todo: support initial sval (e.g passed in as parameter)
+  if (pyobj_initial_sval)
+    {
+  //     increment_region_refcnt (region_to_refcnt,
+	// 		       pyobj_initial_sval->get_region ());
+      return;
+    }
+
+  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  if (!pyobj_region || seen.contains (pyobj_region))
+    return;
+
+  seen.add (pyobj_region);
+
+  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
+    increment_region_refcnt (region_to_refcnt, pyobj_region);
+
+  const auto *curr_store = model->get_store ();
+  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
+  if (!retval_cluster)
+    return;
+
+  const auto &retval_binding_map = retval_cluster->get_map ();
+
+  for (const auto &binding : retval_binding_map)
+    {
+      const svalue *binding_sval = binding.second;
+      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
+      const region *pointee = unwrapped_sval->maybe_get_region ();
+
+      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
+	count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
+					 seen);
+    }
+}
+
+/* Compare ob_refcnt field vs the actual reference count of a region */
+static void
+check_refcnt (const region_model *model, region_model_context *ctxt,
+	      const hash_map<const ana::region *,
+			     int>::iterator::reference_pair region_refcnt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  const auto &curr_region = region_refcnt.first;
+  const auto &actual_refcnt = region_refcnt.second;
+  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
+  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+      ob_refcnt_sval->get_type (), actual_refcnt);
+
+  if (ob_refcnt_sval != actual_refcnt_sval)
+  {
+    // todo: fix this
+    tree reg_tree = model->get_representative_tree (curr_region);
+
+    const auto &eg = ctxt->get_eg ();
+    refcnt_stmt_finder finder (*eg, reg_tree);
+    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
+					    actual_refcnt_sval, reg_tree);
+    if (pd && eg)
+    ctxt->warn (std::move (pd), &finder);
+  }
+}
+
+static void
+check_refcnts (const region_model *model, const svalue *retval,
+	    region_model_context *ctxt,
+	    hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    check_refcnt(model, ctxt, region_refcnt);
+  }
+}
+
+/* Validates the reference count of all Python objects. */
+void
+pyobj_refcnt_checker (const region_model *model, const svalue *retval,
+		    region_model_context *ctxt)
+{
+  if (!ctxt)
+  return;
+
+  auto region_to_refcnt = hash_map<const region *, int> ();
+  auto seen_regions = hash_set<const region *> ();
+
+  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
+  check_refcnts (model, retval, ctxt, region_to_refcnt);
+}
+
+/* Counts the actual pyobject references from all clusters in the model's
+ * store. */
+static void
+count_all_references (const region_model *model,
+		      hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &cluster : *model->get_store ())
+  {
+    auto curr_region = cluster.first;
+    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
+    continue;
+
+    increment_region_refcnt (region_to_refcnt, curr_region);
+
+    auto binding_cluster = cluster.second;
+    for (const auto &binding : binding_cluster->get_map ())
+    {
+	  const svalue *binding_sval = binding.second;
+
+	  const svalue *unwrapped_sval
+	      = binding_sval->unwrap_any_unmergeable ();
+	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
+	  // continue;
+
+	  const region *pointee = unwrapped_sval->maybe_get_region ();
+	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
+	    continue;
+
+	  increment_region_refcnt (region_to_refcnt, pointee);
+    }
+  }
+}
+
+static void
+dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
+		  const region_model *model, region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp_show_color (&pp) = pp_show_color (global_dc->printer);
+  pp.buffer->stream = stderr;
+
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    auto region = region_refcnt.first;
+    auto actual_refcnt = region_refcnt.second;
+    const svalue *ob_refcnt_sval
+	= retrieve_ob_refcnt_sval (region, model, ctxt);
+    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+	ob_refcnt_sval->get_type (), actual_refcnt);
+
+    region->dump_to_pp (&pp, true);
+    pp_string (&pp, " — ob_refcnt: ");
+    ob_refcnt_sval->dump_to_pp (&pp, true);
+    pp_string (&pp, " actual refcnt: ");
+    actual_refcnt_sval->dump_to_pp (&pp, true);
+    pp_newline (&pp);
+  }
+  pp_string (&pp, "~~~~~~~~\n");
+  pp_flush (&pp);
+}
+
+class kf_analyzer_cpython_dump_refcounts : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return cd.num_args () == 0;
+  }
+  void impl_call_pre (const call_details &cd) const final override
+  {
+    region_model_context *ctxt = cd.get_ctxt ();
+    if (!ctxt)
+      return;
+    region_model *model = cd.get_model ();
+    auto region_to_refcnt = hash_map<const region *, int> ();
+    count_all_references(model, region_to_refcnt);
+    dump_refcnt_info(region_to_refcnt, model, ctxt);
+  }
+};
+
 /* Some concessions were made to
 simplify the analysis process when comparing kf_PyList_Append with the
 real implementation. In particular, PyList_Append performs some
@@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
   iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
   iface->register_known_function ("PyLong_FromLong",
                                   make_unique<kf_PyLong_FromLong> ());
+
+  iface->register_known_function (
+      "__analyzer_cpython_dump_refcounts",
+      make_unique<kf_analyzer_cpython_dump_refcounts> ());
 }
 } // namespace ana
 
@@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
   const char *plugin_name = plugin_info->base_name;
   if (0)
     inform (input_location, "got here; %qs", plugin_name);
-  ana::register_finish_translation_unit_callback (&stash_named_types);
-  ana::register_finish_translation_unit_callback (&stash_global_vars);
+  register_finish_translation_unit_callback (&stash_named_types);
+  register_finish_translation_unit_callback (&stash_global_vars);
+  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
   register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
                      ana::cpython_analyzer_init_cb,
                      NULL); /* void *user_data */
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
similarity index 64%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
index 19b5c17428a..9912f9105d4 100644
--- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
@@ -8,34 +8,6 @@
 #include <Python.h>
 #include "../analyzer/analyzer-decls.h"
 
-PyObject *
-test_PyList_New (Py_ssize_t len)
-{
-  PyObject *obj = PyList_New (len);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
-PyObject *
-test_PyLong_New (long n)
-{
-  PyObject *obj = PyLong_FromLong (n);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
 PyObject *
 test_PyListAppend (long n)
 {
@@ -43,6 +15,7 @@ test_PyListAppend (long n)
   PyObject *list = PyList_New (0);
   PyList_Append(list, item);
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 PyObject *
@@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
   else
     __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 
@@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
 {
   PyList_Append (list, item);
   return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
 }
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
new file mode 100644
index 00000000000..492d4f7d58d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyList_New_2 ()
+{
+  PyObject *obj = PyList_New (0);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyList ()
+{
+  PyObject *p = PyList_New (2);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
new file mode 100644
index 00000000000..97b29849302
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyLong_New_2 (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyLong (long val)
+{
+  PyObject *p = PyLong_FromLong (val);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
new file mode 100644
index 00000000000..9912f9105d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index e1ed2d2589e..cbef6da8d86 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -161,8 +161,9 @@ set plugin_test_list [list \
 	  taint-CVE-2011-0521-6.c \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c \
-	  cpython-plugin-test-2.c } \
+	  cpython-plugin-test-PyList_Append.c \
+	  cpython-plugin-test-PyList_New.c \
+	  cpython-plugin-test-PyLong_FromLong.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
2.30.2


       reply	other threads:[~2023-08-29  4:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c720fc39d86cb3ae8041fe5c86dad34d71466cec.camel@redhat.com>
2023-08-29  4:31 ` Eric Feng [this message]
2023-08-29  4:35   ` Eric Feng
2023-08-29 17:28     ` Eric Feng
2023-08-29 21:14       ` David Malcolm
2023-08-30 22:15         ` Eric Feng
2023-08-31 17:01           ` David Malcolm
2023-08-29 21:08   ` David Malcolm
2023-09-01  2:49   ` Hans-Peter Nilsson
2023-09-01 14:51     ` David Malcolm
2023-09-01 21:07       ` Eric Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230829043155.17651-1-ef2648@columbia.edu \
    --to=ef2648@columbia.edu \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).