public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Eric Feng <ef2648@columbia.edu>
To: dmalcolm@redhat.com
Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
Date: Tue, 29 Aug 2023 00:35:06 -0400	[thread overview]
Message-ID: <CANGHATV4fXHOfew8ZC0btUYpoNFFTy1-7U9drDULM0g+_cAc8A@mail.gmail.com> (raw)
In-Reply-To: <20230829043155.17651-1-ef2648@columbia.edu>

On Tue, Aug 29, 2023 at 12:32 AM Eric Feng <ef2648@columbia.edu> wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
>         * engine.cc (impl_region_model_context::warn): New optional parameter.
>         * exploded-graph.h (class impl_region_model_context): Likewise.
>         * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
>         * region-model.h (struct append_regions_cb_data): Likewise.
>         (class region_model): Likewise.
>         (class region_model_context): New optional parameter.
>         (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
>         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
>         * gcc.dg/plugin/plugin.exp: New tests.
>         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
>
> ---
>  gcc/analyzer/engine.cc                        |   8 +-
>  gcc/analyzer/exploded-graph.h                 |   4 +-
>  gcc/analyzer/region-model.cc                  |   3 +
>  gcc/analyzer/region-model.h                   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
>  ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
>
> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index a1908cdb364..736a41ecdaf 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
>  }
>
>  bool
> -impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
> +impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
> +                                const stmt_finder *custom_finder)
>  {
>    LOG_FUNC (get_logger ());
> -  if (m_stmt == NULL && m_stmt_finder == NULL)
> +  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
> +  if (m_stmt == NULL && curr_stmt_finder == NULL)
>      {
>        if (get_logger ())
>         get_logger ()->log ("rejecting diagnostic: no stmt");
> @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
>        bool terminate_path = d->terminate_path_p ();
>        if (m_eg->get_diagnostic_manager ().add_diagnostic
>           (m_enode_for_diag, m_enode_for_diag->get_supernode (),
> -          m_stmt, m_stmt_finder, std::move (d)))
> +          m_stmt, curr_stmt_finder, std::move (d)))
>         {
>           if (m_path_ctxt
>               && terminate_path
> diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
> index 5a7ab645bfe..6e9a5ef58c7 100644
> --- a/gcc/analyzer/exploded-graph.h
> +++ b/gcc/analyzer/exploded-graph.h
> @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
>                              uncertainty_t *uncertainty,
>                              logger *logger = NULL);
>
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder = NULL) final override;
>    void add_note (std::unique_ptr<pending_note> pn) final override;
>    void add_event (std::unique_ptr<checker_event> event) final override;
>    void on_svalue_leak (const svalue *) override;
> @@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
>                          std::unique_ptr<sm_context> *out_sm_context) override;
>
>    const gimple *get_stmt () const override { return m_stmt; }
> +  const exploded_graph *get_eg () const override { return m_eg; }
>
>    exploded_graph *m_eg;
>    log_user m_logger;
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 4f31a6dcf0f..eb4f976b83a 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  namespace ana {
>
> +auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
> +
>  /* Dump T to PP in language-independent form, for debugging/logging/dumping
>     purposes.  */
>
> @@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
>      }
>
>    unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
> +  notify_on_pop_frame (this, retval, ctxt);
>  }
>
>  /* Get the number of frames in this region_model's stack.  */
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 10b2a59e787..440ea6d828d 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -236,6 +236,10 @@ public:
>
>  struct append_regions_cb_data;
>
> +typedef void (*pop_frame_callback) (const region_model *model,
> +                                   const svalue *retval,
> +                                   region_model_context *ctxt);
> +
>  /* A region_model encapsulates a representation of the state of memory, with
>     a tree of regions, along with their associated values.
>     The representation is graph-like because values can be pointers to
> @@ -532,6 +536,20 @@ class region_model
>    get_builtin_kf (const gcall *call,
>                   region_model_context *ctxt = NULL) const;
>
> +  static void
> +  register_pop_frame_callback (const pop_frame_callback &callback)
> +  {
> +    pop_frame_callbacks.safe_push (callback);
> +  }
> +
> +  static void
> +  notify_on_pop_frame (const region_model *model, const svalue *retval,
> +                      region_model_context *ctxt)
> +  {
> +    for (auto &callback : pop_frame_callbacks)
> +       callback (model, retval, ctxt);
> +  }
> +
>  private:
>    const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
>    const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
> @@ -621,6 +639,7 @@ private:
>                                                 tree callee_fndecl,
>                                                 region_model_context *ctxt) const;
>
> +  static auto_vec<pop_frame_callback> pop_frame_callbacks;
>    /* Storing this here to avoid passing it around everywhere.  */
>    region_model_manager *const m_mgr;
>
> @@ -649,8 +668,10 @@ class region_model_context
>  {
>   public:
>    /* Hook for clients to store pending diagnostics.
> -     Return true if the diagnostic was stored, or false if it was deleted.  */
> -  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
> +     Return true if the diagnostic was stored, or false if it was deleted.
> +     Optionally provide a custom stmt_finder.  */
> +  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
> +                    const stmt_finder *custom_finder = NULL) = 0;
>
>    /* Hook for clients to add a note to the last previously stored
>       pending diagnostic.  */
> @@ -757,6 +778,8 @@ class region_model_context
>
>    /* Get the current statement, if any.  */
>    virtual const gimple *get_stmt () const = 0;
> +
> +  virtual const exploded_graph *get_eg () const = 0;
>  };
>
>  /* A "do nothing" subclass of region_model_context.  */
> @@ -764,7 +787,8 @@ class region_model_context
>  class noop_region_model_context : public region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override { return false; }
>    void add_note (std::unique_ptr<pending_note>) override;
>    void add_event (std::unique_ptr<checker_event>) override;
>    void on_svalue_leak (const svalue *) override {}
> @@ -812,6 +836,7 @@ public:
>    }
>
>    const gimple *get_stmt () const override { return NULL; }
> +  const exploded_graph *get_eg () const override { return NULL; }
>  };
>
>  /* A subclass of region_model_context for determining if operations fail
> @@ -840,7 +865,8 @@ private:
>  class region_model_context_decorator : public region_model_context
>  {
>   public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder)
>    {
>      if (m_inner)
>        return m_inner->warn (std::move (d));
> @@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
>        return nullptr;
>    }
>
> +  const exploded_graph *get_eg () const override
> +  {
> +    if (m_inner)
> +       return m_inner->get_eg ();
> +    else
> +       return nullptr;
> +  }
> +
>  protected:
>    region_model_context_decorator (region_model_context *inner)
>    : m_inner (inner)
> @@ -993,7 +1027,8 @@ protected:
>  class annotating_context : public region_model_context_decorator
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override
>    {
>      if (m_inner)
>        if (m_inner->warn (std::move (d)))
> @@ -1158,7 +1193,8 @@ using namespace ::selftest;
>  class test_region_model_context : public noop_region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) final override
>    {
>      m_diagnostics.safe_push (d.release ());
>      return true;
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..b2caed8fc1b 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -44,6 +44,7 @@
>  #include "analyzer/region-model.h"
>  #include "analyzer/call-details.h"
>  #include "analyzer/call-info.h"
> +#include "analyzer/exploded-graph.h"
>  #include "make-unique.h"
>
>  int plugin_is_GPL_compatible;
> @@ -191,6 +192,372 @@ public:
>    }
>  };
>
> +/* This is just a copy of leak_stmt_finder for now (subject to change if
> + * necssary)  */
> +
> +class refcnt_stmt_finder : public stmt_finder
> +{
> +public:
> +  refcnt_stmt_finder (const exploded_graph &eg, tree var)
> +      : m_eg (eg), m_var (var)
> +  {
> +  }
> +
> +  std::unique_ptr<stmt_finder>
> +  clone () const final override
> +  {
> +    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
> +  }
> +
> +  const gimple *
> +  find_stmt (const exploded_path &epath) final override
> +  {
> +    logger *const logger = m_eg.get_logger ();
> +    LOG_FUNC (logger);
> +
> +    if (m_var && TREE_CODE (m_var) == SSA_NAME)
> +      {
> +       /* Locate the final write to this SSA name in the path.  */
> +       const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
> +
> +       int idx_of_def_stmt;
> +       bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
> +       if (!found)
> +         goto not_found;
> +
> +       /* What was the next write to the underlying var
> +          after the SSA name was set? (if any).  */
> +
> +       for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
> +            ++idx)
> +         {
> +           const exploded_edge *eedge = epath.m_edges[idx];
> +           if (logger)
> +                   logger->log ("eedge[%i]: EN %i -> EN %i", idx,
> +                                eedge->m_src->m_index,
> +                                eedge->m_dest->m_index);
> +           const exploded_node *dst_node = eedge->m_dest;
> +           const program_point &dst_point = dst_node->get_point ();
> +           const gimple *stmt = dst_point.get_stmt ();
> +           if (!stmt)
> +                   continue;
> +           if (const gassign *assign = dyn_cast<const gassign *> (stmt))
> +                   {
> +                           tree lhs = gimple_assign_lhs (assign);
> +                           if (TREE_CODE (lhs) == SSA_NAME
> +                               && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
> +                                   return assign;
> +                   }
> +         }
> +      }
> +
> +  not_found:
> +
> +    /* Look backwards for the first statement with a location.  */
> +    int i;
> +    const exploded_edge *eedge;
> +    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
> +    {
> +      if (logger)
> +       logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
> +                    eedge->m_dest->m_index);
> +      const exploded_node *dst_node = eedge->m_dest;
> +      const program_point &dst_point = dst_node->get_point ();
> +      const gimple *stmt = dst_point.get_stmt ();
> +      if (stmt)
> +       if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
> +         return stmt;
> +    }
> +
> +    gcc_unreachable ();
> +    return NULL;
> +  }
> +
> +private:
> +  const exploded_graph &m_eg;
> +  tree m_var;
> +};
> +
> +class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
> +{
> +public:
> +  refcnt_mismatch (const region *base_region,
> +                               const svalue *ob_refcnt,
> +                               const svalue *actual_refcnt,
> +        tree reg_tree)
> +      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
> +       m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "refcnt_mismatch";
> +  }
> +
> +  bool
> +  operator== (const refcnt_mismatch &other) const
> +  {
> +    return (m_base_region == other.m_base_region
> +           && m_ob_refcnt == other.m_ob_refcnt
> +           && m_actual_refcnt == other.m_actual_refcnt);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return 0;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc, logger *) final override
> +  {
> +    diagnostic_metadata m;
> +    bool warned;
> +    // just assuming constants for now
> +    auto actual_refcnt
> +       = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    warned = warning_meta (
> +       rich_loc, m, get_controlling_option (),
> +       "expected <variable name belonging to m_base_region> to have "
> +       "reference count: %qE but ob_refcnt field is: %qE",
> +       actual_refcnt, ob_refcnt);
> +
> +    // location_t loc = rich_loc->get_loc ();
> +    // foo (loc);
> +    return warned;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final override
> +  {
> +    if (m_base_region)
> +      interest->add_region_creation (m_base_region);
> +  }
> +
> +private:
> +
> +  void foo(location_t loc) const
> +  {
> +    inform(loc, "something is up right here");
> +  }
> +  const region *m_base_region;
> +  const svalue *m_ob_refcnt;
> +  const svalue *m_actual_refcnt;
> +  tree m_reg_tree;
> +};
> +
> +/* Retrieves the svalue associated with the ob_refcnt field of the base region.
> + */
> +static const svalue *
> +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
> +                        region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +  const region *ob_refcnt_region
> +      = mgr->get_field_region (base_reg, ob_refcnt_tree);
> +  const svalue *ob_refcnt_sval
> +      = model->get_store_value (ob_refcnt_region, ctxt);
> +  return ob_refcnt_sval;
> +}
> +
> +static void
> +increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
> +{
> +  bool existed;
> +  auto &refcnt = map.get_or_insert (key, &existed);
> +  refcnt = existed ? refcnt + 1 : 1;
> +}
> +
> +
> +/* Recursively fills in region_to_refcnt with the references owned by
> +   pyobj_ptr_sval.  */
> +static void
> +count_expected_pyobj_references (const region_model *model,
> +                          hash_map<const region *, int> &region_to_refcnt,
> +                          const svalue *pyobj_ptr_sval,
> +                          hash_set<const region *> &seen)
> +{
> +  if (!pyobj_ptr_sval)
> +    return;
> +
> +  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
> +  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
> +  if (!pyobj_region_sval && !pyobj_initial_sval)
> +    return;
> +
> +  // todo: support initial sval (e.g passed in as parameter)
> +  if (pyobj_initial_sval)
> +    {
> +  //     increment_region_refcnt (region_to_refcnt,
> +       //                     pyobj_initial_sval->get_region ());
> +      return;
> +    }
> +
> +  const region *pyobj_region = pyobj_region_sval->get_pointee ();
> +  if (!pyobj_region || seen.contains (pyobj_region))
> +    return;
> +
> +  seen.add (pyobj_region);
> +
> +  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
> +    increment_region_refcnt (region_to_refcnt, pyobj_region);
> +
> +  const auto *curr_store = model->get_store ();
> +  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
> +  if (!retval_cluster)
> +    return;
> +
> +  const auto &retval_binding_map = retval_cluster->get_map ();
> +
> +  for (const auto &binding : retval_binding_map)
> +    {
> +      const svalue *binding_sval = binding.second;
> +      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
> +      const region *pointee = unwrapped_sval->maybe_get_region ();
> +
> +      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
> +       count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
> +                                        seen);
> +    }
> +}
> +
> +/* Compare ob_refcnt field vs the actual reference count of a region */
> +static void
> +check_refcnt (const region_model *model, region_model_context *ctxt,
> +             const hash_map<const ana::region *,
> +                            int>::iterator::reference_pair region_refcnt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  const auto &curr_region = region_refcnt.first;
> +  const auto &actual_refcnt = region_refcnt.second;
> +  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +      ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +  if (ob_refcnt_sval != actual_refcnt_sval)
> +  {
> +    // todo: fix this
> +    tree reg_tree = model->get_representative_tree (curr_region);
> +
> +    const auto &eg = ctxt->get_eg ();
> +    refcnt_stmt_finder finder (*eg, reg_tree);
> +    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> +                                           actual_refcnt_sval, reg_tree);
> +    if (pd && eg)
> +    ctxt->warn (std::move (pd), &finder);
> +  }
> +}
> +
> +static void
> +check_refcnts (const region_model *model, const svalue *retval,
> +           region_model_context *ctxt,
> +           hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    check_refcnt(model, ctxt, region_refcnt);
> +  }
> +}
> +
> +/* Validates the reference count of all Python objects. */
> +void
> +pyobj_refcnt_checker (const region_model *model, const svalue *retval,
> +                   region_model_context *ctxt)
> +{
> +  if (!ctxt)
> +  return;
> +
> +  auto region_to_refcnt = hash_map<const region *, int> ();
> +  auto seen_regions = hash_set<const region *> ();
> +
> +  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
> +  check_refcnts (model, retval, ctxt, region_to_refcnt);
> +}
> +
> +/* Counts the actual pyobject references from all clusters in the model's
> + * store. */
> +static void
> +count_all_references (const region_model *model,
> +                     hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &cluster : *model->get_store ())
> +  {
> +    auto curr_region = cluster.first;
> +    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> +    continue;
> +
> +    increment_region_refcnt (region_to_refcnt, curr_region);
> +
> +    auto binding_cluster = cluster.second;
> +    for (const auto &binding : binding_cluster->get_map ())
> +    {
> +         const svalue *binding_sval = binding.second;
> +
> +         const svalue *unwrapped_sval
> +             = binding_sval->unwrap_any_unmergeable ();
> +         // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> +         // continue;
> +
> +         const region *pointee = unwrapped_sval->maybe_get_region ();
> +         if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> +           continue;
> +
> +         increment_region_refcnt (region_to_refcnt, pointee);
> +    }
> +  }
> +}
> +
> +static void
> +dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
> +                 const region_model *model, region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  pretty_printer pp;
> +  pp_format_decoder (&pp) = default_tree_printer;
> +  pp_show_color (&pp) = pp_show_color (global_dc->printer);
> +  pp.buffer->stream = stderr;
> +
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    auto region = region_refcnt.first;
> +    auto actual_refcnt = region_refcnt.second;
> +    const svalue *ob_refcnt_sval
> +       = retrieve_ob_refcnt_sval (region, model, ctxt);
> +    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +       ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +    region->dump_to_pp (&pp, true);
> +    pp_string (&pp, " — ob_refcnt: ");
> +    ob_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_string (&pp, " actual refcnt: ");
> +    actual_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_newline (&pp);
> +  }
> +  pp_string (&pp, "~~~~~~~~\n");
> +  pp_flush (&pp);
> +}
> +
> +class kf_analyzer_cpython_dump_refcounts : public known_function
> +{
> +public:
> +  bool matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return cd.num_args () == 0;
> +  }
> +  void impl_call_pre (const call_details &cd) const final override
> +  {
> +    region_model_context *ctxt = cd.get_ctxt ();
> +    if (!ctxt)
> +      return;
> +    region_model *model = cd.get_model ();
> +    auto region_to_refcnt = hash_map<const region *, int> ();
> +    count_all_references(model, region_to_refcnt);
> +    dump_refcnt_info(region_to_refcnt, model, ctxt);
> +  }
> +};
> +
>  /* Some concessions were made to
>  simplify the analysis process when comparing kf_PyList_Append with the
>  real implementation. In particular, PyList_Append performs some
> @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
>    iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
>    iface->register_known_function ("PyLong_FromLong",
>                                    make_unique<kf_PyLong_FromLong> ());
> +
> +  iface->register_known_function (
> +      "__analyzer_cpython_dump_refcounts",
> +      make_unique<kf_analyzer_cpython_dump_refcounts> ());
>  }
>  } // namespace ana
>
> @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
>    const char *plugin_name = plugin_info->base_name;
>    if (0)
>      inform (input_location, "got here; %qs", plugin_name);
> -  ana::register_finish_translation_unit_callback (&stash_named_types);
> -  ana::register_finish_translation_unit_callback (&stash_global_vars);
> +  register_finish_translation_unit_callback (&stash_named_types);
> +  register_finish_translation_unit_callback (&stash_global_vars);
> +  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
>    register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
>                       ana::cpython_analyzer_init_cb,
>                       NULL); /* void *user_data */
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> similarity index 64%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index 19b5c17428a..9912f9105d4 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -8,34 +8,6 @@
>  #include <Python.h>
>  #include "../analyzer/analyzer-decls.h"
>
> -PyObject *
> -test_PyList_New (Py_ssize_t len)
> -{
> -  PyObject *obj = PyList_New (len);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
> -PyObject *
> -test_PyLong_New (long n)
> -{
> -  PyObject *obj = PyLong_FromLong (n);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
>  PyObject *
>  test_PyListAppend (long n)
>  {
> @@ -43,6 +15,7 @@ test_PyListAppend (long n)
>    PyObject *list = PyList_New (0);
>    PyList_Append(list, item);
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>  PyObject *
> @@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
>    else
>      __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>
> @@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
>  {
>    PyList_Append (list, item);
>    return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
>  }
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> new file mode 100644
> index 00000000000..492d4f7d58d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyList_New (Py_ssize_t len)
> +{
> +  PyObject *obj = PyList_New (len);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyList_New_2 ()
> +{
> +  PyObject *obj = PyList_New (0);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyList ()
> +{
> +  PyObject *p = PyList_New (2);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> new file mode 100644
> index 00000000000..97b29849302
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyLong_New (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyLong_New_2 (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyLong (long val)
> +{
> +  PyObject *p = PyLong_FromLong (val);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> new file mode 100644
> index 00000000000..9912f9105d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyListAppend (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, item);
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +PyObject *
> +test_PyListAppend_2 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  if (!item)
> +       return NULL;
> +
> +  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  PyObject *list = PyList_New (n);
> +  if (!list)
> +  {
> +       Py_DECREF(item);
> +       return NULL;
> +  }
> +
> +  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +
> +  if (PyList_Append (list, item) < 0)
> +    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  else
> +    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +
> +PyObject *
> +test_PyListAppend_3 (PyObject *item, PyObject *list)
> +{
> +  PyList_Append (list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index e1ed2d2589e..cbef6da8d86 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,8 +161,9 @@ set plugin_test_list [list \
>           taint-CVE-2011-0521-6.c \
>           taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -         cpython-plugin-test-1.c \
> -         cpython-plugin-test-2.c } \
> +         cpython-plugin-test-PyList_Append.c \
> +         cpython-plugin-test-PyList_New.c \
> +         cpython-plugin-test-PyLong_FromLong.c } \
>  ]
>
>  foreach plugin_test $plugin_test_list {
> --
> 2.30.2
>

  reply	other threads:[~2023-08-29  4:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c720fc39d86cb3ae8041fe5c86dad34d71466cec.camel@redhat.com>
2023-08-29  4:31 ` Eric Feng
2023-08-29  4:35   ` Eric Feng [this message]
2023-08-29 17:28     ` Eric Feng
2023-08-29 21:14       ` David Malcolm
2023-08-30 22:15         ` Eric Feng
2023-08-31 17:01           ` David Malcolm
2023-08-29 21:08   ` David Malcolm
2023-09-01  2:49   ` Hans-Peter Nilsson
2023-09-01 14:51     ` David Malcolm
2023-09-01 21:07       ` Eric Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANGHATV4fXHOfew8ZC0btUYpoNFFTy1-7U9drDULM0g+_cAc8A@mail.gmail.com \
    --to=ef2648@columbia.edu \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).