public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
       [not found] <c720fc39d86cb3ae8041fe5c86dad34d71466cec.camel@redhat.com>
@ 2023-08-29  4:31 ` Eric Feng
  2023-08-29  4:35   ` Eric Feng
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Eric Feng @ 2023-08-29  4:31 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
	* engine.cc (impl_region_model_context::warn): New optional parameter.
	* exploded-graph.h (class impl_region_model_context): Likewise.
	* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
	* region-model.h (struct append_regions_cb_data): Likewise.
	(class region_model): Likewise.
	(class region_model_context): New optional parameter.
	(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
	* gcc.dg/plugin/plugin.exp: New tests.
	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

---
 gcc/analyzer/engine.cc                        |   8 +-
 gcc/analyzer/exploded-graph.h                 |   4 +-
 gcc/analyzer/region-model.cc                  |   3 +
 gcc/analyzer/region-model.h                   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
 ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index a1908cdb364..736a41ecdaf 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
+impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
+				 const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
     {
       if (get_logger ())
 	get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
       bool terminate_path = d->terminate_path_p ();
       if (m_eg->get_diagnostic_manager ().add_diagnostic
 	  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-	   m_stmt, m_stmt_finder, std::move (d)))
+	   m_stmt, curr_stmt_finder, std::move (d)))
 	{
 	  if (m_path_ctxt
 	      && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 5a7ab645bfe..6e9a5ef58c7 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 			     uncertainty_t *uncertainty,
 			     logger *logger = NULL);
 
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr<pending_note> pn) final override;
   void add_event (std::unique_ptr<checker_event> event) final override;
   void on_svalue_leak (const svalue *) override;
@@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
 			 std::unique_ptr<sm_context> *out_sm_context) override;
 
   const gimple *get_stmt () const override { return m_stmt; }
+  const exploded_graph *get_eg () const override { return m_eg; }
 
   exploded_graph *m_eg;
   log_user m_logger;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 4f31a6dcf0f..eb4f976b83a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 10b2a59e787..440ea6d828d 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+				    const svalue *retval,
+				    region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
    a tree of regions, along with their associated values.
    The representation is graph-like because values can be pointers to
@@ -532,6 +536,20 @@ class region_model
   get_builtin_kf (const gcall *call,
 		  region_model_context *ctxt = NULL) const;
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback &callback)
+  {
+    pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+		       region_model_context *ctxt)
+  {
+    for (auto &callback : pop_frame_callbacks)
+	callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -621,6 +639,7 @@ private:
 						tree callee_fndecl,
 						region_model_context *ctxt) const;
 
+  static auto_vec<pop_frame_callback> pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
@@ -649,8 +668,10 @@ class region_model_context
 {
  public:
   /* Hook for clients to store pending diagnostics.
-     Return true if the diagnostic was stored, or false if it was deleted.  */
-  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
+     Return true if the diagnostic was stored, or false if it was deleted.
+     Optionally provide a custom stmt_finder.  */
+  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
+		     const stmt_finder *custom_finder = NULL) = 0;
 
   /* Hook for clients to add a note to the last previously stored
      pending diagnostic.  */
@@ -757,6 +778,8 @@ class region_model_context
 
   /* Get the current statement, if any.  */
   virtual const gimple *get_stmt () const = 0;
+
+  virtual const exploded_graph *get_eg () const = 0;
 };
 
 /* A "do nothing" subclass of region_model_context.  */
@@ -764,7 +787,8 @@ class region_model_context
 class noop_region_model_context : public region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override { return false; }
   void add_note (std::unique_ptr<pending_note>) override;
   void add_event (std::unique_ptr<checker_event>) override;
   void on_svalue_leak (const svalue *) override {}
@@ -812,6 +836,7 @@ public:
   }
 
   const gimple *get_stmt () const override { return NULL; }
+  const exploded_graph *get_eg () const override { return NULL; }
 };
 
 /* A subclass of region_model_context for determining if operations fail
@@ -840,7 +865,8 @@ private:
 class region_model_context_decorator : public region_model_context
 {
  public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder)
   {
     if (m_inner)
       return m_inner->warn (std::move (d));
@@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
       return nullptr;
   }
 
+  const exploded_graph *get_eg () const override
+  {
+    if (m_inner)
+	return m_inner->get_eg ();
+    else
+	return nullptr;
+  }
+
 protected:
   region_model_context_decorator (region_model_context *inner)
   : m_inner (inner)
@@ -993,7 +1027,8 @@ protected:
 class annotating_context : public region_model_context_decorator
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override
   {
     if (m_inner)
       if (m_inner->warn (std::move (d)))
@@ -1158,7 +1193,8 @@ using namespace ::selftest;
 class test_region_model_context : public noop_region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) final override
   {
     m_diagnostics.safe_push (d.release ());
     return true;
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..b2caed8fc1b 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"
+#include "analyzer/exploded-graph.h"
 #include "make-unique.h"
 
 int plugin_is_GPL_compatible;
@@ -191,6 +192,372 @@ public:
   }
 };
 
+/* This is just a copy of leak_stmt_finder for now (subject to change if
+ * necssary)  */
+
+class refcnt_stmt_finder : public stmt_finder
+{
+public:
+  refcnt_stmt_finder (const exploded_graph &eg, tree var)
+      : m_eg (eg), m_var (var)
+  {
+  }
+
+  std::unique_ptr<stmt_finder>
+  clone () const final override
+  {
+    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
+  }
+
+  const gimple *
+  find_stmt (const exploded_path &epath) final override
+  {
+    logger *const logger = m_eg.get_logger ();
+    LOG_FUNC (logger);
+
+    if (m_var && TREE_CODE (m_var) == SSA_NAME)
+      {
+	/* Locate the final write to this SSA name in the path.  */
+	const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
+
+	int idx_of_def_stmt;
+	bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
+	if (!found)
+	  goto not_found;
+
+	/* What was the next write to the underlying var
+	   after the SSA name was set? (if any).  */
+
+	for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
+	     ++idx)
+	  {
+	    const exploded_edge *eedge = epath.m_edges[idx];
+	    if (logger)
+		    logger->log ("eedge[%i]: EN %i -> EN %i", idx,
+				 eedge->m_src->m_index,
+				 eedge->m_dest->m_index);
+	    const exploded_node *dst_node = eedge->m_dest;
+	    const program_point &dst_point = dst_node->get_point ();
+	    const gimple *stmt = dst_point.get_stmt ();
+	    if (!stmt)
+		    continue;
+	    if (const gassign *assign = dyn_cast<const gassign *> (stmt))
+		    {
+			    tree lhs = gimple_assign_lhs (assign);
+			    if (TREE_CODE (lhs) == SSA_NAME
+				&& SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
+				    return assign;
+		    }
+	  }
+      }
+
+  not_found:
+
+    /* Look backwards for the first statement with a location.  */
+    int i;
+    const exploded_edge *eedge;
+    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
+    {
+      if (logger)
+	logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
+		     eedge->m_dest->m_index);
+      const exploded_node *dst_node = eedge->m_dest;
+      const program_point &dst_point = dst_node->get_point ();
+      const gimple *stmt = dst_point.get_stmt ();
+      if (stmt)
+	if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
+	  return stmt;
+    }
+
+    gcc_unreachable ();
+    return NULL;
+  }
+
+private:
+  const exploded_graph &m_eg;
+  tree m_var;
+};
+
+class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
+{
+public:
+  refcnt_mismatch (const region *base_region,
+				const svalue *ob_refcnt,
+				const svalue *actual_refcnt,
+        tree reg_tree)
+      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
+	m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "refcnt_mismatch";
+  }
+
+  bool
+  operator== (const refcnt_mismatch &other) const
+  {
+    return (m_base_region == other.m_base_region
+	    && m_ob_refcnt == other.m_ob_refcnt
+	    && m_actual_refcnt == other.m_actual_refcnt);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return 0;
+  }
+
+  bool
+  emit (rich_location *rich_loc, logger *) final override
+  {
+    diagnostic_metadata m;
+    bool warned;
+    // just assuming constants for now
+    auto actual_refcnt
+	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    warned = warning_meta (
+	rich_loc, m, get_controlling_option (),
+	"expected <variable name belonging to m_base_region> to have "
+	"reference count: %qE but ob_refcnt field is: %qE",
+	actual_refcnt, ob_refcnt);
+
+    // location_t loc = rich_loc->get_loc ();
+    // foo (loc);
+    return warned;
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_base_region)
+      interest->add_region_creation (m_base_region);
+  }
+
+private:
+
+  void foo(location_t loc) const 
+  {
+    inform(loc, "something is up right here");
+  }
+  const region *m_base_region;
+  const svalue *m_ob_refcnt;
+  const svalue *m_actual_refcnt;
+  tree m_reg_tree;
+};
+
+/* Retrieves the svalue associated with the ob_refcnt field of the base region.
+ */
+static const svalue *
+retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
+			 region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (base_reg, ob_refcnt_tree);
+  const svalue *ob_refcnt_sval
+      = model->get_store_value (ob_refcnt_region, ctxt);
+  return ob_refcnt_sval;
+}
+
+static void
+increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
+{
+  bool existed;
+  auto &refcnt = map.get_or_insert (key, &existed);
+  refcnt = existed ? refcnt + 1 : 1;
+}
+
+
+/* Recursively fills in region_to_refcnt with the references owned by
+   pyobj_ptr_sval.  */
+static void
+count_expected_pyobj_references (const region_model *model,
+			   hash_map<const region *, int> &region_to_refcnt,
+			   const svalue *pyobj_ptr_sval,
+			   hash_set<const region *> &seen)
+{
+  if (!pyobj_ptr_sval)
+    return;
+
+  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
+  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
+  if (!pyobj_region_sval && !pyobj_initial_sval)
+    return;
+
+  // todo: support initial sval (e.g passed in as parameter)
+  if (pyobj_initial_sval)
+    {
+  //     increment_region_refcnt (region_to_refcnt,
+	// 		       pyobj_initial_sval->get_region ());
+      return;
+    }
+
+  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  if (!pyobj_region || seen.contains (pyobj_region))
+    return;
+
+  seen.add (pyobj_region);
+
+  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
+    increment_region_refcnt (region_to_refcnt, pyobj_region);
+
+  const auto *curr_store = model->get_store ();
+  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
+  if (!retval_cluster)
+    return;
+
+  const auto &retval_binding_map = retval_cluster->get_map ();
+
+  for (const auto &binding : retval_binding_map)
+    {
+      const svalue *binding_sval = binding.second;
+      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
+      const region *pointee = unwrapped_sval->maybe_get_region ();
+
+      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
+	count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
+					 seen);
+    }
+}
+
+/* Compare ob_refcnt field vs the actual reference count of a region */
+static void
+check_refcnt (const region_model *model, region_model_context *ctxt,
+	      const hash_map<const ana::region *,
+			     int>::iterator::reference_pair region_refcnt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  const auto &curr_region = region_refcnt.first;
+  const auto &actual_refcnt = region_refcnt.second;
+  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
+  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+      ob_refcnt_sval->get_type (), actual_refcnt);
+
+  if (ob_refcnt_sval != actual_refcnt_sval)
+  {
+    // todo: fix this
+    tree reg_tree = model->get_representative_tree (curr_region);
+
+    const auto &eg = ctxt->get_eg ();
+    refcnt_stmt_finder finder (*eg, reg_tree);
+    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
+					    actual_refcnt_sval, reg_tree);
+    if (pd && eg)
+    ctxt->warn (std::move (pd), &finder);
+  }
+}
+
+static void
+check_refcnts (const region_model *model, const svalue *retval,
+	    region_model_context *ctxt,
+	    hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    check_refcnt(model, ctxt, region_refcnt);
+  }
+}
+
+/* Validates the reference count of all Python objects. */
+void
+pyobj_refcnt_checker (const region_model *model, const svalue *retval,
+		    region_model_context *ctxt)
+{
+  if (!ctxt)
+  return;
+
+  auto region_to_refcnt = hash_map<const region *, int> ();
+  auto seen_regions = hash_set<const region *> ();
+
+  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
+  check_refcnts (model, retval, ctxt, region_to_refcnt);
+}
+
+/* Counts the actual pyobject references from all clusters in the model's
+ * store. */
+static void
+count_all_references (const region_model *model,
+		      hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &cluster : *model->get_store ())
+  {
+    auto curr_region = cluster.first;
+    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
+    continue;
+
+    increment_region_refcnt (region_to_refcnt, curr_region);
+
+    auto binding_cluster = cluster.second;
+    for (const auto &binding : binding_cluster->get_map ())
+    {
+	  const svalue *binding_sval = binding.second;
+
+	  const svalue *unwrapped_sval
+	      = binding_sval->unwrap_any_unmergeable ();
+	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
+	  // continue;
+
+	  const region *pointee = unwrapped_sval->maybe_get_region ();
+	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
+	    continue;
+
+	  increment_region_refcnt (region_to_refcnt, pointee);
+    }
+  }
+}
+
+static void
+dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
+		  const region_model *model, region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp_show_color (&pp) = pp_show_color (global_dc->printer);
+  pp.buffer->stream = stderr;
+
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    auto region = region_refcnt.first;
+    auto actual_refcnt = region_refcnt.second;
+    const svalue *ob_refcnt_sval
+	= retrieve_ob_refcnt_sval (region, model, ctxt);
+    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+	ob_refcnt_sval->get_type (), actual_refcnt);
+
+    region->dump_to_pp (&pp, true);
+    pp_string (&pp, " — ob_refcnt: ");
+    ob_refcnt_sval->dump_to_pp (&pp, true);
+    pp_string (&pp, " actual refcnt: ");
+    actual_refcnt_sval->dump_to_pp (&pp, true);
+    pp_newline (&pp);
+  }
+  pp_string (&pp, "~~~~~~~~\n");
+  pp_flush (&pp);
+}
+
+class kf_analyzer_cpython_dump_refcounts : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return cd.num_args () == 0;
+  }
+  void impl_call_pre (const call_details &cd) const final override
+  {
+    region_model_context *ctxt = cd.get_ctxt ();
+    if (!ctxt)
+      return;
+    region_model *model = cd.get_model ();
+    auto region_to_refcnt = hash_map<const region *, int> ();
+    count_all_references(model, region_to_refcnt);
+    dump_refcnt_info(region_to_refcnt, model, ctxt);
+  }
+};
+
 /* Some concessions were made to
 simplify the analysis process when comparing kf_PyList_Append with the
 real implementation. In particular, PyList_Append performs some
@@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
   iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
   iface->register_known_function ("PyLong_FromLong",
                                   make_unique<kf_PyLong_FromLong> ());
+
+  iface->register_known_function (
+      "__analyzer_cpython_dump_refcounts",
+      make_unique<kf_analyzer_cpython_dump_refcounts> ());
 }
 } // namespace ana
 
@@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
   const char *plugin_name = plugin_info->base_name;
   if (0)
     inform (input_location, "got here; %qs", plugin_name);
-  ana::register_finish_translation_unit_callback (&stash_named_types);
-  ana::register_finish_translation_unit_callback (&stash_global_vars);
+  register_finish_translation_unit_callback (&stash_named_types);
+  register_finish_translation_unit_callback (&stash_global_vars);
+  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
   register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
                      ana::cpython_analyzer_init_cb,
                      NULL); /* void *user_data */
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
similarity index 64%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
index 19b5c17428a..9912f9105d4 100644
--- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
@@ -8,34 +8,6 @@
 #include <Python.h>
 #include "../analyzer/analyzer-decls.h"
 
-PyObject *
-test_PyList_New (Py_ssize_t len)
-{
-  PyObject *obj = PyList_New (len);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
-PyObject *
-test_PyLong_New (long n)
-{
-  PyObject *obj = PyLong_FromLong (n);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
 PyObject *
 test_PyListAppend (long n)
 {
@@ -43,6 +15,7 @@ test_PyListAppend (long n)
   PyObject *list = PyList_New (0);
   PyList_Append(list, item);
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 PyObject *
@@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
   else
     __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 
@@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
 {
   PyList_Append (list, item);
   return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
 }
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
new file mode 100644
index 00000000000..492d4f7d58d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyList_New_2 ()
+{
+  PyObject *obj = PyList_New (0);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyList ()
+{
+  PyObject *p = PyList_New (2);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
new file mode 100644
index 00000000000..97b29849302
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyLong_New_2 (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyLong (long val)
+{
+  PyObject *p = PyLong_FromLong (val);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
new file mode 100644
index 00000000000..9912f9105d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index e1ed2d2589e..cbef6da8d86 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -161,8 +161,9 @@ set plugin_test_list [list \
 	  taint-CVE-2011-0521-6.c \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c \
-	  cpython-plugin-test-2.c } \
+	  cpython-plugin-test-PyList_Append.c \
+	  cpython-plugin-test-PyList_New.c \
+	  cpython-plugin-test-PyLong_FromLong.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
@ 2023-08-29  4:35   ` Eric Feng
  2023-08-29 17:28     ` Eric Feng
  2023-08-29 21:08   ` David Malcolm
  2023-09-01  2:49   ` Hans-Peter Nilsson
  2 siblings, 1 reply; 10+ messages in thread
From: Eric Feng @ 2023-08-29  4:35 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches

On Tue, Aug 29, 2023 at 12:32 AM Eric Feng <ef2648@columbia.edu> wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
>         * engine.cc (impl_region_model_context::warn): New optional parameter.
>         * exploded-graph.h (class impl_region_model_context): Likewise.
>         * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
>         * region-model.h (struct append_regions_cb_data): Likewise.
>         (class region_model): Likewise.
>         (class region_model_context): New optional parameter.
>         (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
>         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
>         * gcc.dg/plugin/plugin.exp: New tests.
>         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
>
> ---
>  gcc/analyzer/engine.cc                        |   8 +-
>  gcc/analyzer/exploded-graph.h                 |   4 +-
>  gcc/analyzer/region-model.cc                  |   3 +
>  gcc/analyzer/region-model.h                   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
>  ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
>
> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index a1908cdb364..736a41ecdaf 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
>  }
>
>  bool
> -impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
> +impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
> +                                const stmt_finder *custom_finder)
>  {
>    LOG_FUNC (get_logger ());
> -  if (m_stmt == NULL && m_stmt_finder == NULL)
> +  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
> +  if (m_stmt == NULL && curr_stmt_finder == NULL)
>      {
>        if (get_logger ())
>         get_logger ()->log ("rejecting diagnostic: no stmt");
> @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
>        bool terminate_path = d->terminate_path_p ();
>        if (m_eg->get_diagnostic_manager ().add_diagnostic
>           (m_enode_for_diag, m_enode_for_diag->get_supernode (),
> -          m_stmt, m_stmt_finder, std::move (d)))
> +          m_stmt, curr_stmt_finder, std::move (d)))
>         {
>           if (m_path_ctxt
>               && terminate_path
> diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
> index 5a7ab645bfe..6e9a5ef58c7 100644
> --- a/gcc/analyzer/exploded-graph.h
> +++ b/gcc/analyzer/exploded-graph.h
> @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
>                              uncertainty_t *uncertainty,
>                              logger *logger = NULL);
>
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder = NULL) final override;
>    void add_note (std::unique_ptr<pending_note> pn) final override;
>    void add_event (std::unique_ptr<checker_event> event) final override;
>    void on_svalue_leak (const svalue *) override;
> @@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
>                          std::unique_ptr<sm_context> *out_sm_context) override;
>
>    const gimple *get_stmt () const override { return m_stmt; }
> +  const exploded_graph *get_eg () const override { return m_eg; }
>
>    exploded_graph *m_eg;
>    log_user m_logger;
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 4f31a6dcf0f..eb4f976b83a 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  namespace ana {
>
> +auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
> +
>  /* Dump T to PP in language-independent form, for debugging/logging/dumping
>     purposes.  */
>
> @@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
>      }
>
>    unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
> +  notify_on_pop_frame (this, retval, ctxt);
>  }
>
>  /* Get the number of frames in this region_model's stack.  */
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 10b2a59e787..440ea6d828d 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -236,6 +236,10 @@ public:
>
>  struct append_regions_cb_data;
>
> +typedef void (*pop_frame_callback) (const region_model *model,
> +                                   const svalue *retval,
> +                                   region_model_context *ctxt);
> +
>  /* A region_model encapsulates a representation of the state of memory, with
>     a tree of regions, along with their associated values.
>     The representation is graph-like because values can be pointers to
> @@ -532,6 +536,20 @@ class region_model
>    get_builtin_kf (const gcall *call,
>                   region_model_context *ctxt = NULL) const;
>
> +  static void
> +  register_pop_frame_callback (const pop_frame_callback &callback)
> +  {
> +    pop_frame_callbacks.safe_push (callback);
> +  }
> +
> +  static void
> +  notify_on_pop_frame (const region_model *model, const svalue *retval,
> +                      region_model_context *ctxt)
> +  {
> +    for (auto &callback : pop_frame_callbacks)
> +       callback (model, retval, ctxt);
> +  }
> +
>  private:
>    const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
>    const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
> @@ -621,6 +639,7 @@ private:
>                                                 tree callee_fndecl,
>                                                 region_model_context *ctxt) const;
>
> +  static auto_vec<pop_frame_callback> pop_frame_callbacks;
>    /* Storing this here to avoid passing it around everywhere.  */
>    region_model_manager *const m_mgr;
>
> @@ -649,8 +668,10 @@ class region_model_context
>  {
>   public:
>    /* Hook for clients to store pending diagnostics.
> -     Return true if the diagnostic was stored, or false if it was deleted.  */
> -  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
> +     Return true if the diagnostic was stored, or false if it was deleted.
> +     Optionally provide a custom stmt_finder.  */
> +  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
> +                    const stmt_finder *custom_finder = NULL) = 0;
>
>    /* Hook for clients to add a note to the last previously stored
>       pending diagnostic.  */
> @@ -757,6 +778,8 @@ class region_model_context
>
>    /* Get the current statement, if any.  */
>    virtual const gimple *get_stmt () const = 0;
> +
> +  virtual const exploded_graph *get_eg () const = 0;
>  };
>
>  /* A "do nothing" subclass of region_model_context.  */
> @@ -764,7 +787,8 @@ class region_model_context
>  class noop_region_model_context : public region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override { return false; }
>    void add_note (std::unique_ptr<pending_note>) override;
>    void add_event (std::unique_ptr<checker_event>) override;
>    void on_svalue_leak (const svalue *) override {}
> @@ -812,6 +836,7 @@ public:
>    }
>
>    const gimple *get_stmt () const override { return NULL; }
> +  const exploded_graph *get_eg () const override { return NULL; }
>  };
>
>  /* A subclass of region_model_context for determining if operations fail
> @@ -840,7 +865,8 @@ private:
>  class region_model_context_decorator : public region_model_context
>  {
>   public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder)
>    {
>      if (m_inner)
>        return m_inner->warn (std::move (d));
> @@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
>        return nullptr;
>    }
>
> +  const exploded_graph *get_eg () const override
> +  {
> +    if (m_inner)
> +       return m_inner->get_eg ();
> +    else
> +       return nullptr;
> +  }
> +
>  protected:
>    region_model_context_decorator (region_model_context *inner)
>    : m_inner (inner)
> @@ -993,7 +1027,8 @@ protected:
>  class annotating_context : public region_model_context_decorator
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override
>    {
>      if (m_inner)
>        if (m_inner->warn (std::move (d)))
> @@ -1158,7 +1193,8 @@ using namespace ::selftest;
>  class test_region_model_context : public noop_region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) final override
>    {
>      m_diagnostics.safe_push (d.release ());
>      return true;
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..b2caed8fc1b 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -44,6 +44,7 @@
>  #include "analyzer/region-model.h"
>  #include "analyzer/call-details.h"
>  #include "analyzer/call-info.h"
> +#include "analyzer/exploded-graph.h"
>  #include "make-unique.h"
>
>  int plugin_is_GPL_compatible;
> @@ -191,6 +192,372 @@ public:
>    }
>  };
>
> +/* This is just a copy of leak_stmt_finder for now (subject to change if
> + * necssary)  */
> +
> +class refcnt_stmt_finder : public stmt_finder
> +{
> +public:
> +  refcnt_stmt_finder (const exploded_graph &eg, tree var)
> +      : m_eg (eg), m_var (var)
> +  {
> +  }
> +
> +  std::unique_ptr<stmt_finder>
> +  clone () const final override
> +  {
> +    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
> +  }
> +
> +  const gimple *
> +  find_stmt (const exploded_path &epath) final override
> +  {
> +    logger *const logger = m_eg.get_logger ();
> +    LOG_FUNC (logger);
> +
> +    if (m_var && TREE_CODE (m_var) == SSA_NAME)
> +      {
> +       /* Locate the final write to this SSA name in the path.  */
> +       const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
> +
> +       int idx_of_def_stmt;
> +       bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
> +       if (!found)
> +         goto not_found;
> +
> +       /* What was the next write to the underlying var
> +          after the SSA name was set? (if any).  */
> +
> +       for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
> +            ++idx)
> +         {
> +           const exploded_edge *eedge = epath.m_edges[idx];
> +           if (logger)
> +                   logger->log ("eedge[%i]: EN %i -> EN %i", idx,
> +                                eedge->m_src->m_index,
> +                                eedge->m_dest->m_index);
> +           const exploded_node *dst_node = eedge->m_dest;
> +           const program_point &dst_point = dst_node->get_point ();
> +           const gimple *stmt = dst_point.get_stmt ();
> +           if (!stmt)
> +                   continue;
> +           if (const gassign *assign = dyn_cast<const gassign *> (stmt))
> +                   {
> +                           tree lhs = gimple_assign_lhs (assign);
> +                           if (TREE_CODE (lhs) == SSA_NAME
> +                               && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
> +                                   return assign;
> +                   }
> +         }
> +      }
> +
> +  not_found:
> +
> +    /* Look backwards for the first statement with a location.  */
> +    int i;
> +    const exploded_edge *eedge;
> +    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
> +    {
> +      if (logger)
> +       logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
> +                    eedge->m_dest->m_index);
> +      const exploded_node *dst_node = eedge->m_dest;
> +      const program_point &dst_point = dst_node->get_point ();
> +      const gimple *stmt = dst_point.get_stmt ();
> +      if (stmt)
> +       if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
> +         return stmt;
> +    }
> +
> +    gcc_unreachable ();
> +    return NULL;
> +  }
> +
> +private:
> +  const exploded_graph &m_eg;
> +  tree m_var;
> +};
> +
> +class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
> +{
> +public:
> +  refcnt_mismatch (const region *base_region,
> +                               const svalue *ob_refcnt,
> +                               const svalue *actual_refcnt,
> +        tree reg_tree)
> +      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
> +       m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "refcnt_mismatch";
> +  }
> +
> +  bool
> +  operator== (const refcnt_mismatch &other) const
> +  {
> +    return (m_base_region == other.m_base_region
> +           && m_ob_refcnt == other.m_ob_refcnt
> +           && m_actual_refcnt == other.m_actual_refcnt);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return 0;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc, logger *) final override
> +  {
> +    diagnostic_metadata m;
> +    bool warned;
> +    // just assuming constants for now
> +    auto actual_refcnt
> +       = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    warned = warning_meta (
> +       rich_loc, m, get_controlling_option (),
> +       "expected <variable name belonging to m_base_region> to have "
> +       "reference count: %qE but ob_refcnt field is: %qE",
> +       actual_refcnt, ob_refcnt);
> +
> +    // location_t loc = rich_loc->get_loc ();
> +    // foo (loc);
> +    return warned;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final override
> +  {
> +    if (m_base_region)
> +      interest->add_region_creation (m_base_region);
> +  }
> +
> +private:
> +
> +  void foo(location_t loc) const
> +  {
> +    inform(loc, "something is up right here");
> +  }
> +  const region *m_base_region;
> +  const svalue *m_ob_refcnt;
> +  const svalue *m_actual_refcnt;
> +  tree m_reg_tree;
> +};
> +
> +/* Retrieves the svalue associated with the ob_refcnt field of the base region.
> + */
> +static const svalue *
> +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
> +                        region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +  const region *ob_refcnt_region
> +      = mgr->get_field_region (base_reg, ob_refcnt_tree);
> +  const svalue *ob_refcnt_sval
> +      = model->get_store_value (ob_refcnt_region, ctxt);
> +  return ob_refcnt_sval;
> +}
> +
> +static void
> +increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
> +{
> +  bool existed;
> +  auto &refcnt = map.get_or_insert (key, &existed);
> +  refcnt = existed ? refcnt + 1 : 1;
> +}
> +
> +
> +/* Recursively fills in region_to_refcnt with the references owned by
> +   pyobj_ptr_sval.  */
> +static void
> +count_expected_pyobj_references (const region_model *model,
> +                          hash_map<const region *, int> &region_to_refcnt,
> +                          const svalue *pyobj_ptr_sval,
> +                          hash_set<const region *> &seen)
> +{
> +  if (!pyobj_ptr_sval)
> +    return;
> +
> +  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
> +  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
> +  if (!pyobj_region_sval && !pyobj_initial_sval)
> +    return;
> +
> +  // todo: support initial sval (e.g passed in as parameter)
> +  if (pyobj_initial_sval)
> +    {
> +  //     increment_region_refcnt (region_to_refcnt,
> +       //                     pyobj_initial_sval->get_region ());
> +      return;
> +    }
> +
> +  const region *pyobj_region = pyobj_region_sval->get_pointee ();
> +  if (!pyobj_region || seen.contains (pyobj_region))
> +    return;
> +
> +  seen.add (pyobj_region);
> +
> +  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
> +    increment_region_refcnt (region_to_refcnt, pyobj_region);
> +
> +  const auto *curr_store = model->get_store ();
> +  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
> +  if (!retval_cluster)
> +    return;
> +
> +  const auto &retval_binding_map = retval_cluster->get_map ();
> +
> +  for (const auto &binding : retval_binding_map)
> +    {
> +      const svalue *binding_sval = binding.second;
> +      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
> +      const region *pointee = unwrapped_sval->maybe_get_region ();
> +
> +      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
> +       count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
> +                                        seen);
> +    }
> +}
> +
> +/* Compare ob_refcnt field vs the actual reference count of a region */
> +static void
> +check_refcnt (const region_model *model, region_model_context *ctxt,
> +             const hash_map<const ana::region *,
> +                            int>::iterator::reference_pair region_refcnt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  const auto &curr_region = region_refcnt.first;
> +  const auto &actual_refcnt = region_refcnt.second;
> +  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +      ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +  if (ob_refcnt_sval != actual_refcnt_sval)
> +  {
> +    // todo: fix this
> +    tree reg_tree = model->get_representative_tree (curr_region);
> +
> +    const auto &eg = ctxt->get_eg ();
> +    refcnt_stmt_finder finder (*eg, reg_tree);
> +    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> +                                           actual_refcnt_sval, reg_tree);
> +    if (pd && eg)
> +    ctxt->warn (std::move (pd), &finder);
> +  }
> +}
> +
> +static void
> +check_refcnts (const region_model *model, const svalue *retval,
> +           region_model_context *ctxt,
> +           hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    check_refcnt(model, ctxt, region_refcnt);
> +  }
> +}
> +
> +/* Validates the reference count of all Python objects. */
> +void
> +pyobj_refcnt_checker (const region_model *model, const svalue *retval,
> +                   region_model_context *ctxt)
> +{
> +  if (!ctxt)
> +  return;
> +
> +  auto region_to_refcnt = hash_map<const region *, int> ();
> +  auto seen_regions = hash_set<const region *> ();
> +
> +  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
> +  check_refcnts (model, retval, ctxt, region_to_refcnt);
> +}
> +
> +/* Counts the actual pyobject references from all clusters in the model's
> + * store. */
> +static void
> +count_all_references (const region_model *model,
> +                     hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &cluster : *model->get_store ())
> +  {
> +    auto curr_region = cluster.first;
> +    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> +    continue;
> +
> +    increment_region_refcnt (region_to_refcnt, curr_region);
> +
> +    auto binding_cluster = cluster.second;
> +    for (const auto &binding : binding_cluster->get_map ())
> +    {
> +         const svalue *binding_sval = binding.second;
> +
> +         const svalue *unwrapped_sval
> +             = binding_sval->unwrap_any_unmergeable ();
> +         // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> +         // continue;
> +
> +         const region *pointee = unwrapped_sval->maybe_get_region ();
> +         if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> +           continue;
> +
> +         increment_region_refcnt (region_to_refcnt, pointee);
> +    }
> +  }
> +}
> +
> +static void
> +dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
> +                 const region_model *model, region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  pretty_printer pp;
> +  pp_format_decoder (&pp) = default_tree_printer;
> +  pp_show_color (&pp) = pp_show_color (global_dc->printer);
> +  pp.buffer->stream = stderr;
> +
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    auto region = region_refcnt.first;
> +    auto actual_refcnt = region_refcnt.second;
> +    const svalue *ob_refcnt_sval
> +       = retrieve_ob_refcnt_sval (region, model, ctxt);
> +    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +       ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +    region->dump_to_pp (&pp, true);
> +    pp_string (&pp, " — ob_refcnt: ");
> +    ob_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_string (&pp, " actual refcnt: ");
> +    actual_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_newline (&pp);
> +  }
> +  pp_string (&pp, "~~~~~~~~\n");
> +  pp_flush (&pp);
> +}
> +
> +class kf_analyzer_cpython_dump_refcounts : public known_function
> +{
> +public:
> +  bool matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return cd.num_args () == 0;
> +  }
> +  void impl_call_pre (const call_details &cd) const final override
> +  {
> +    region_model_context *ctxt = cd.get_ctxt ();
> +    if (!ctxt)
> +      return;
> +    region_model *model = cd.get_model ();
> +    auto region_to_refcnt = hash_map<const region *, int> ();
> +    count_all_references(model, region_to_refcnt);
> +    dump_refcnt_info(region_to_refcnt, model, ctxt);
> +  }
> +};
> +
>  /* Some concessions were made to
>  simplify the analysis process when comparing kf_PyList_Append with the
>  real implementation. In particular, PyList_Append performs some
> @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
>    iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
>    iface->register_known_function ("PyLong_FromLong",
>                                    make_unique<kf_PyLong_FromLong> ());
> +
> +  iface->register_known_function (
> +      "__analyzer_cpython_dump_refcounts",
> +      make_unique<kf_analyzer_cpython_dump_refcounts> ());
>  }
>  } // namespace ana
>
> @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
>    const char *plugin_name = plugin_info->base_name;
>    if (0)
>      inform (input_location, "got here; %qs", plugin_name);
> -  ana::register_finish_translation_unit_callback (&stash_named_types);
> -  ana::register_finish_translation_unit_callback (&stash_global_vars);
> +  register_finish_translation_unit_callback (&stash_named_types);
> +  register_finish_translation_unit_callback (&stash_global_vars);
> +  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
>    register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
>                       ana::cpython_analyzer_init_cb,
>                       NULL); /* void *user_data */
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> similarity index 64%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index 19b5c17428a..9912f9105d4 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -8,34 +8,6 @@
>  #include <Python.h>
>  #include "../analyzer/analyzer-decls.h"
>
> -PyObject *
> -test_PyList_New (Py_ssize_t len)
> -{
> -  PyObject *obj = PyList_New (len);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
> -PyObject *
> -test_PyLong_New (long n)
> -{
> -  PyObject *obj = PyLong_FromLong (n);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
>  PyObject *
>  test_PyListAppend (long n)
>  {
> @@ -43,6 +15,7 @@ test_PyListAppend (long n)
>    PyObject *list = PyList_New (0);
>    PyList_Append(list, item);
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>  PyObject *
> @@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
>    else
>      __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>
> @@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
>  {
>    PyList_Append (list, item);
>    return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
>  }
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> new file mode 100644
> index 00000000000..492d4f7d58d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyList_New (Py_ssize_t len)
> +{
> +  PyObject *obj = PyList_New (len);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyList_New_2 ()
> +{
> +  PyObject *obj = PyList_New (0);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyList ()
> +{
> +  PyObject *p = PyList_New (2);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> new file mode 100644
> index 00000000000..97b29849302
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyLong_New (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyLong_New_2 (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyLong (long val)
> +{
> +  PyObject *p = PyLong_FromLong (val);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> new file mode 100644
> index 00000000000..9912f9105d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyListAppend (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, item);
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +PyObject *
> +test_PyListAppend_2 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  if (!item)
> +       return NULL;
> +
> +  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  PyObject *list = PyList_New (n);
> +  if (!list)
> +  {
> +       Py_DECREF(item);
> +       return NULL;
> +  }
> +
> +  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +
> +  if (PyList_Append (list, item) < 0)
> +    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  else
> +    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +
> +PyObject *
> +test_PyListAppend_3 (PyObject *item, PyObject *list)
> +{
> +  PyList_Append (list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index e1ed2d2589e..cbef6da8d86 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,8 +161,9 @@ set plugin_test_list [list \
>           taint-CVE-2011-0521-6.c \
>           taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -         cpython-plugin-test-1.c \
> -         cpython-plugin-test-2.c } \
> +         cpython-plugin-test-PyList_Append.c \
> +         cpython-plugin-test-PyList_New.c \
> +         cpython-plugin-test-PyLong_FromLong.c } \
>  ]
>
>  foreach plugin_test $plugin_test_list {
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:35   ` Eric Feng
@ 2023-08-29 17:28     ` Eric Feng
  2023-08-29 21:14       ` David Malcolm
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Feng @ 2023-08-29 17:28 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Additionally, by using the old model and the pointer per your suggestion,
we are able to find the representative tree and emit a more accurate diagnostic!

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

If a representative tree is not found, I decided we should just bail out
of emitting a diagnostic for now, to avoid confusing the user on what
the problem is.

I've attached the patch for this (on top of the previous one) below. If
it also looks good, I can merge it with the last patch and push it in at
the same time.

Best,
Eric

---
 gcc/analyzer/region-model.cc                  |  3 +-
 gcc/analyzer/region-model.h                   |  7 ++--
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 35 +++++++++++--------
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb4f976b83a..c1d266d351b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5391,6 +5391,7 @@ region_model::pop_frame (tree result_lvalue,
 {
   gcc_assert (m_current_frame);
 
+  const region_model pre_popped_model = *this;
   const frame_region *frame_reg = m_current_frame;
 
   /* Notify state machines.  */
@@ -5424,7 +5425,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
-  notify_on_pop_frame (this, retval, ctxt);
+  notify_on_pop_frame (this, &pre_popped_model, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 440ea6d828d..b89c6f6c649 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -237,6 +237,7 @@ public:
 struct append_regions_cb_data;
 
 typedef void (*pop_frame_callback) (const region_model *model,
+				    const region_model *prev_model,
 				    const svalue *retval,
 				    region_model_context *ctxt);
 
@@ -543,11 +544,13 @@ class region_model
   }
 
   static void
-  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  notify_on_pop_frame (const region_model *model,
+		       const region_model *prev_model,
+          const svalue *retval,
 		       region_model_context *ctxt)
   {
     for (auto &callback : pop_frame_callbacks)
-	callback (model, retval, ctxt);
+	callback (model, prev_model, retval, ctxt);
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index b2caed8fc1b..6f0a355fe30 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -318,11 +318,10 @@ public:
     auto actual_refcnt
 	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
     auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-    warned = warning_meta (
-	rich_loc, m, get_controlling_option (),
-	"expected <variable name belonging to m_base_region> to have "
-	"reference count: %qE but ob_refcnt field is: %qE",
-	actual_refcnt, ob_refcnt);
+    warned = warning_meta (rich_loc, m, get_controlling_option (),
+			   "expected %qE to have "
+			   "reference count: %qE but ob_refcnt field is: %qE",
+			   m_reg_tree, actual_refcnt, ob_refcnt);
 
     // location_t loc = rich_loc->get_loc ();
     // foo (loc);
@@ -425,7 +424,8 @@ count_expected_pyobj_references (const region_model *model,
 
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
-check_refcnt (const region_model *model, region_model_context *ctxt,
+check_refcnt (const region_model *model, const region_model *old_model,
+	      region_model_context *ctxt,
 	      const hash_map<const ana::region *,
 			     int>::iterator::reference_pair region_refcnt)
 {
@@ -438,8 +438,11 @@ check_refcnt (const region_model *model, region_model_context *ctxt,
 
   if (ob_refcnt_sval != actual_refcnt_sval)
   {
-    // todo: fix this
-    tree reg_tree = model->get_representative_tree (curr_region);
+    const svalue *curr_reg_sval
+	= mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
+    tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
+    if (!reg_tree)
+	    return;
 
     const auto &eg = ctxt->get_eg ();
     refcnt_stmt_finder finder (*eg, reg_tree);
@@ -451,20 +454,22 @@ check_refcnt (const region_model *model, region_model_context *ctxt,
 }
 
 static void
-check_refcnts (const region_model *model, const svalue *retval,
-	    region_model_context *ctxt,
-	    hash_map<const region *, int> &region_to_refcnt)
+check_refcnts (const region_model *model, const region_model *old_model,
+	       const svalue *retval, region_model_context *ctxt,
+	       hash_map<const region *, int> &region_to_refcnt)
 {
   for (const auto &region_refcnt : region_to_refcnt)
   {
-    check_refcnt(model, ctxt, region_refcnt);
+    check_refcnt(model, old_model, ctxt, region_refcnt);
   }
 }
 
 /* Validates the reference count of all Python objects. */
 void
-pyobj_refcnt_checker (const region_model *model, const svalue *retval,
-		    region_model_context *ctxt)
+pyobj_refcnt_checker (const region_model *model,
+		      const region_model *old_model,
+         const svalue *retval,
+		      region_model_context *ctxt)
 {
   if (!ctxt)
   return;
@@ -473,7 +478,7 @@ pyobj_refcnt_checker (const region_model *model, const svalue *retval,
   auto seen_regions = hash_set<const region *> ();
 
   count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
-  check_refcnts (model, retval, ctxt, region_to_refcnt);
+  check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
 }
 
 /* Counts the actual pyobject references from all clusters in the model's
-- 
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
  2023-08-29  4:35   ` Eric Feng
@ 2023-08-29 21:08   ` David Malcolm
  2023-09-01  2:49   ` Hans-Peter Nilsson
  2 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-29 21:08 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Tue, 2023-08-29 at 00:31 -0400, Eric Feng wrote:
> Hi Dave,

Hi Eric.

Thanks for the updated patch.

A few nits below; this is OK for trunk with them fixed...

[...snip...]

> 
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
> 	* engine.cc (impl_region_model_context::warn): New optional parameter.
> 	* exploded-graph.h (class impl_region_model_context): Likewise.
> 	* region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
> 	* region-model.h (struct append_regions_cb_data): Likewise.
> 	(class region_model): Likewise.
> 	(class region_model_context): New optional parameter.
> 	(class region_model_context_decorator): Likewise.
> 
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> 	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> 	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> 	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> 	* gcc.dg/plugin/plugin.exp: New tests.
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

The ChangeLog formatting here seems wrong; lines starting with a '*'
should refer to a filename.  Continuation lines begin with just a tab
character.

[...snip...]

> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 10b2a59e787..440ea6d828d 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h

[...snip...]

> @@ -840,7 +865,8 @@ private:
>  class region_model_context_decorator : public region_model_context
>  {
>   public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +	     const stmt_finder *custom_finder)
>    {
>      if (m_inner)
>        return m_inner->warn (std::move (d));

This should presumably pass the custom_finder on to the 2nd argument of
m_inner->warn, rather than have the inner call to warn implicitly use
the NULL default arg.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c

Looks like
  "-no-Python-h.c"
would be a better suffix than
  "-no-plugin.c"
as it's the include that's missing, not the plugin.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index e1ed2d2589e..cbef6da8d86 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,8 +161,9 @@ set plugin_test_list [list \
>  	  taint-CVE-2011-0521-6.c \
>  	  taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -	  cpython-plugin-test-1.c \
> -	  cpython-plugin-test-2.c } \
> +	  cpython-plugin-test-PyList_Append.c \
> +	  cpython-plugin-test-PyList_New.c \
> +	  cpython-plugin-test-PyLong_FromLong.c } \

Looks like this is missing:
  cpython-plugin-test-no-plugin.c
and
  cpython-plugin-test-refcnt-checking.c
(though as noted above"cpython-plugin-test-no-Python-h.c" would be a
better name for the former)

so it wasn't actually compiling these tests.

Be sure to doublecheck that these tests pass when updating.

[...snip...]

OK for trunk with the above nits fixed.

Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29 17:28     ` Eric Feng
@ 2023-08-29 21:14       ` David Malcolm
  2023-08-30 22:15         ` Eric Feng
  0 siblings, 1 reply; 10+ messages in thread
From: David Malcolm @ 2023-08-29 21:14 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> Additionally, by using the old model and the pointer per your
> suggestion,
> we are able to find the representative tree and emit a more accurate
> diagnostic!
> 
> rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |

Excellent, that's a big improvement.

> 
> If a representative tree is not found, I decided we should just bail
> out
> of emitting a diagnostic for now, to avoid confusing the user on what
> the problem is.

Fair enough.

> 
> I've attached the patch for this (on top of the previous one) below.
> If
> it also looks good, I can merge it with the last patch and push it in
> at
> the same time.

I don't mind either way, but please can you update the tests so that we
have some automated test coverage that the correct name is being
printed in the warning.

Thanks
Dave


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29 21:14       ` David Malcolm
@ 2023-08-30 22:15         ` Eric Feng
  2023-08-31 17:01           ` David Malcolm
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Feng @ 2023-08-30 22:15 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc, gcc-patches

On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >    23 |   return list;
> >       |          ^~~~
> >   ‘create_py_object’: events 1-4
> >     |
> >     |    4 |   PyObject* item = PyLong_FromLong(3);
> >     |      |                    ^~~~~~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (1) when ‘PyLong_FromLong’ succeeds
> >     |    5 |   PyObject* list = PyList_New(1);
> >     |      |                    ~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (2) when ‘PyList_New’ succeeds
> >     |......
> >     |   14 |   PyList_Append(list, item);
> >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |   |
> >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> >     |......
> >     |   23 |   return list;
> >     |      |          ~~~~
> >     |      |          |
> >     |      |          (4) here
> >     |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-30 22:15         ` Eric Feng
@ 2023-08-31 17:01           ` David Malcolm
  0 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2023-08-31 17:01 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > Additionally, by using the old model and the pointer per your
> > > suggestion,
> > > we are able to find the representative tree and emit a more
> > > accurate
> > > diagnostic!
> > > 
> > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > ‘1’
> > > but ob_refcnt field is: ‘2’
> > >    23 |   return list;
> > >       |          ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |      |                    ~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (2) when ‘PyList_New’ succeeds
> > >     |......
> > >     |   14 |   PyList_Append(list, item);
> > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |   |
> > >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > >     |......
> > >     |   23 |   return list;
> > >     |      |          ~~~~
> > >     |      |          |
> > >     |      |          (4) here
> > >     |
> > 
> > Excellent, that's a big improvement.
> > 
> > > 
> > > If a representative tree is not found, I decided we should just
> > > bail
> > > out
> > > of emitting a diagnostic for now, to avoid confusing the user on
> > > what
> > > the problem is.
> > 
> > Fair enough.
> > 
> > > 
> > > I've attached the patch for this (on top of the previous one)
> > > below.
> > > If
> > > it also looks good, I can merge it with the last patch and push
> > > it in
> > > at
> > > the same time.
> > 
> > I don't mind either way, but please can you update the tests so
> > that we
> > have some automated test coverage that the correct name is being
> > printed in the warning.
> > 
> > Thanks
> > Dave
> > 
> 
> Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> to
> preserve our chain on the list:
> 
> ---
> 
> Thanks; pushed to trunk with nits fixed:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> .

Thanks; looks good.

Do you want to add this to the GCC 14 part of the "History" section on
the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer
or should I?

> 
> Incidentally, I updated my formatting settings in VSCode, which I've
> previously mentioned in passing. In case anyone is interested:
> 
> "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> AlignAfterOpenBracket: Align,
> AllowAllParametersOfDeclarationOnNextLine: true }",
> 
> This fixes some issues with the indent width and also ensures
> function
> parameters of appropriate length are aligned properly and on a new
> line each (like the rest of the analyzer code).

Thanks
Dave



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
  2023-08-29  4:35   ` Eric Feng
  2023-08-29 21:08   ` David Malcolm
@ 2023-09-01  2:49   ` Hans-Peter Nilsson
  2023-09-01 14:51     ` David Malcolm
  2 siblings, 1 reply; 10+ messages in thread
From: Hans-Peter Nilsson @ 2023-09-01  2:49 UTC (permalink / raw)
  To: Eric Feng; +Cc: dmalcolm, gcc, gcc-patches, ef2648

(Looks like this was committed as r14-3580-g597b9ec69bca8a)

> Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng <ef2648@columbia.edu>
> From: Eric Feng via Gcc <gcc@gcc.gnu.org>

> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> 	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> 	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> 	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> 	* gcc.dg/plugin/plugin.exp: New tests.
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

It seems this was more or less a rewrite, but that said,
it's generally preferable to always *add* tests, never *modify* them.

>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-

^^^ Ouch!  Was it not within reason to keep that test as it
was, and just add another test?

Anyway, the test after rewrite fails, and for some targets
like cris-elf and apparently m68k-linux, yields an error.
I see a PR was already opened.

Also, mostly for future reference, several files in the
patch miss a final newline, as seen by a "\ No newline at
end of file"-marker.

I think I found the problem; a mismatch between default C++
language standard between host-gcc and target-gcc.

(It's actually *not* as simple as "auto var = typeofvar<bar>()"
not being recognized in C++11 --or else there'd be an error
for the hash_set declaration too, which I just changed for
consistency-- but it's close enough for me.)

With this, retesting plugin.exp for cris-elf works.

Ok to commit?

-- >8 --
From: Hans-Peter Nilsson <hp@axis.com>
Date: Fri, 1 Sep 2023 04:36:03 +0200
Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c declarations, PR testsuite/111264

Also, add missing newline at end of file.

	PR testsuite/111264
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
	C++11-compatible.
---
 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7af520436549..bf1982e79c37 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
   if (!ctxt)
     return;
 
-  auto region_to_refcnt = hash_map<const region *, int> ();
-  auto seen_regions = hash_set<const region *> ();
+  hash_map<const region *, int> region_to_refcnt;
+  hash_set<const region *> seen_regions;
 
   count_pyobj_references (model, region_to_refcnt, retval, seen_regions);
   check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
@@ -561,7 +561,7 @@ public:
     if (!ctxt)
       return;
     region_model *model = cd.get_model ();
-    auto region_to_refcnt = hash_map<const region *, int> ();
+    hash_map<const region *, int> region_to_refcnt;
     count_all_references(model, region_to_refcnt);
     dump_refcnt_info(region_to_refcnt, model, ctxt);
   }
@@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args *plugin_info,
   sorry_no_analyzer ();
 #endif
   return 0;
-}
\ No newline at end of file
+}
-- 
2.30.2

brgds, H-P

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-09-01  2:49   ` Hans-Peter Nilsson
@ 2023-09-01 14:51     ` David Malcolm
  2023-09-01 21:07       ` Eric Feng
  0 siblings, 1 reply; 10+ messages in thread
From: David Malcolm @ 2023-09-01 14:51 UTC (permalink / raw)
  To: Hans-Peter Nilsson, Eric Feng; +Cc: gcc, gcc-patches

On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> 
> > Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng
> > <ef2648@columbia.edu>
> > From: Eric Feng via Gcc <gcc@gcc.gnu.org>
> 
> > gcc/testsuite/ChangeLog:
> >   PR analyzer/107646
> >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > reference count
> >   * checking for PyObjects.
> >         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> >         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > ...here (and
> >   * added more tests).
> >         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> >         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > (and added
> >   * more tests).
> >         * gcc.dg/plugin/plugin.exp: New tests.
> >         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> >         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > test.
> >         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > test.
> 
> It seems this was more or less a rewrite, but that said,
> it's generally preferable to always *add* tests, never *modify* them.
> 
> >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > +++++++++++++++++-
> 
> ^^^ Ouch!  Was it not within reason to keep that test as it
> was, and just add another test?
> 
> Anyway, the test after rewrite fails, and for some targets
> like cris-elf and apparently m68k-linux, yields an error.
> I see a PR was already opened.
> 
> Also, mostly for future reference, several files in the
> patch miss a final newline, as seen by a "\ No newline at
> end of file"-marker.
> 
> I think I found the problem; a mismatch between default C++
> language standard between host-gcc and target-gcc.
> 
> (It's actually *not* as simple as "auto var = typeofvar<bar>()"
> not being recognized in C++11 --or else there'd be an error
> for the hash_set declaration too, which I just changed for
> consistency-- but it's close enough for me.)
> 
> With this, retesting plugin.exp for cris-elf works.
> 
> Ok to commit?

Sorry about the failing tests.

Thanks for the patch; please go ahead and commit.

Dave

> 
> -- >8 --
> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Fri, 1 Sep 2023 04:36:03 +0200
> Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> declarations, PR testsuite/111264
> 
> Also, add missing newline at end of file.
> 
>         PR testsuite/111264
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
>         C++11-compatible.
> ---
>  gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7af520436549..bf1982e79c37 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
>    if (!ctxt)
>      return;
>  
> -  auto region_to_refcnt = hash_map<const region *, int> ();
> -  auto seen_regions = hash_set<const region *> ();
> +  hash_map<const region *, int> region_to_refcnt;
> +  hash_set<const region *> seen_regions;
>  
>    count_pyobj_references (model, region_to_refcnt, retval,
> seen_regions);
>    check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
> @@ -561,7 +561,7 @@ public:
>      if (!ctxt)
>        return;
>      region_model *model = cd.get_model ();
> -    auto region_to_refcnt = hash_map<const region *, int> ();
> +    hash_map<const region *, int> region_to_refcnt;
>      count_all_references(model, region_to_refcnt);
>      dump_refcnt_info(region_to_refcnt, model, ctxt);
>    }
> @@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args
> *plugin_info,
>    sorry_no_analyzer ();
>  #endif
>    return 0;
> -}
> \ No newline at end of file
> +}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-09-01 14:51     ` David Malcolm
@ 2023-09-01 21:07       ` Eric Feng
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Feng @ 2023-09-01 21:07 UTC (permalink / raw)
  To: David Malcolm; +Cc: Hans-Peter Nilsson, gcc, gcc-patches

Thank you for the patch!

On Fri, Sep 1, 2023 at 10:51 AM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> > (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> >
> > > Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng
> > > <ef2648@columbia.edu>
> > > From: Eric Feng via Gcc <gcc@gcc.gnu.org>
> >
> > > gcc/testsuite/ChangeLog:
> > >   PR analyzer/107646
> > >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > > reference count
> > >   * checking for PyObjects.
> > >         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> > >         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > > ...here (and
> > >   * added more tests).
> > >         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> > >         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > > (and added
> > >   * more tests).
> > >         * gcc.dg/plugin/plugin.exp: New tests.
> > >         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> > >         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > > test.
> > >         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > > test.
> >
> > It seems this was more or less a rewrite, but that said,
> > it's generally preferable to always *add* tests, never *modify* them.
> >
> > >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > > +++++++++++++++++-
> >
> > ^^^ Ouch!  Was it not within reason to keep that test as it
> > was, and just add another test?
Thanks for the feedback. To clarify, 'analyzer_cpython_plugin.c' is
not a test itself but rather a plugin that currently lives within the
testsuite. The core of the test cases were also not modified, rather I
renamed certain filenames containing them for clarity (unless this is
what you meant in terms of modification, in which case noted) and
added to them. However, I understand the preference and will keep that
in mind.
> >
> > Anyway, the test after rewrite fails, and for some targets
> > like cris-elf and apparently m68k-linux, yields an error.
> > I see a PR was already opened.
> >
> > Also, mostly for future reference, several files in the
> > patch miss a final newline, as seen by a "\ No newline at
> > end of file"-marker.
Noted.
> >
> > I think I found the problem; a mismatch between default C++
> > language standard between host-gcc and target-gcc.
> >
> > (It's actually *not* as simple as "auto var = typeofvar<bar>()"
> > not being recognized in C++11 --or else there'd be an error
> > for the hash_set declaration too, which I just changed for
> > consistency-- but it's close enough for me.)
> >
> > With this, retesting plugin.exp for cris-elf works.
Sounds good, thanks again! I was also curious about why hash_map had
an issue here with that syntax whilst hash_set did not, so I tried to
investigate a bit further. I believe the issue was due to the compiler
having trouble disambiguating between the hash_map constructors in
C++11.

From the error message we received:

test/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c:480:58:
error: no matching function for call to 'hash_map<const ana::region*,
int>::hash_map(hash_map<const ana::region*, int>)'
   auto region_to_refcnt = hash_map<const region *, int> ();

I think the compiler is mistakenly interpreting the call here as we
would like to create a new hash_map object using its copy constructor,
but we "forgot" to provide the object to be copied, rather than our
intention of using the default constructor.

Looking at hash_map.h and hash_set.h seems to support this hypothesis,
as hash_map has two constructors, one of which resembles a copy
constructor with additional arguments:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-map.h#L147.
Perhaps the default arguments here further complicated the ambiguity
as to which constructor to use in the presence of the empty
parenthesis.

On the other hand, hash_set has only the default constructor with
default parameters, and thus there is no ambiguity:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-set.h#L40.

I assume this ambiguity was cleared up by later versions, and thus we
observed no problems in C++17. However, I am certainly still a
relative newbie of C++, so please anyone feel free to correct my
reasoning and chime in!
> >
> > Ok to commit?
>
> Sorry about the failing tests.
>
> Thanks for the patch; please go ahead and commit.
>
> Dave
>
> >
> > -- >8 --
> > From: Hans-Peter Nilsson <hp@axis.com>
> > Date: Fri, 1 Sep 2023 04:36:03 +0200
> > Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> > declarations, PR testsuite/111264
> >
> > Also, add missing newline at end of file.
> >
> >         PR testsuite/111264
> >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
> >         C++11-compatible.
> > ---
> >  gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index 7af520436549..bf1982e79c37 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
> >    if (!ctxt)
> >      return;
> >
> > -  auto region_to_refcnt = hash_map<const region *, int> ();
> > -  auto seen_regions = hash_set<const region *> ();
> > +  hash_map<const region *, int> region_to_refcnt;
> > +  hash_set<const region *> seen_regions;
> >
> >    count_pyobj_references (model, region_to_refcnt, retval,
> > seen_regions);
> >    check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
> > @@ -561,7 +561,7 @@ public:
> >      if (!ctxt)
> >        return;
> >      region_model *model = cd.get_model ();
> > -    auto region_to_refcnt = hash_map<const region *, int> ();
> > +    hash_map<const region *, int> region_to_refcnt;
> >      count_all_references(model, region_to_refcnt);
> >      dump_refcnt_info(region_to_refcnt, model, ctxt);
> >    }
> > @@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args
> > *plugin_info,
> >    sorry_no_analyzer ();
> >  #endif
> >    return 0;
> > -}
> > \ No newline at end of file
> > +}
>

Best,
Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-01 21:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <c720fc39d86cb3ae8041fe5c86dad34d71466cec.camel@redhat.com>
2023-08-29  4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29  4:35   ` Eric Feng
2023-08-29 17:28     ` Eric Feng
2023-08-29 21:14       ` David Malcolm
2023-08-30 22:15         ` Eric Feng
2023-08-31 17:01           ` David Malcolm
2023-08-29 21:08   ` David Malcolm
2023-09-01  2:49   ` Hans-Peter Nilsson
2023-09-01 14:51     ` David Malcolm
2023-09-01 21:07       ` Eric Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).