public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Eric Feng <ef2648@columbia.edu>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc@gcc.gnu.org
Subject: Re: Update on CPython Extension Module -fanalyzer plugin development
Date: Mon, 21 Aug 2023 10:05:56 -0400	[thread overview]
Message-ID: <CANGHATV6Z7zzG5_Gr+g0nAJwRgRaAEh61o01ypZT91COfhP_sQ@mail.gmail.com> (raw)
In-Reply-To: <CANGHATWtt60xdShpk=U44Kh_ex8fF4tY=hpSzAb0sbz65GPL8w@mail.gmail.com>

Hi Dave,

Just wanted to give you and everyone else a short update on how
reference count checking is going — we can now observe the refcnt
diagnostic being emitted:

rc3.c:22:10: warning: REF COUNT PROBLEM
   22 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ fails
    |......
    |   22 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

I will fix up and refactor the logic for counting the actual ref count
before coming back and refining the diagnostic to give much more
detailed information.

Best,
Eric


On Wed, Aug 16, 2023 at 9:47 PM Eric Feng <ef2648@columbia.edu> wrote:
>
> Hi Dave,
>
> Thanks for the feedback!
>
>
> On Wed, Aug 16, 2023 at 5:29 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >
> > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > > Hi everyone,
> >
> > [fixing typo in my email address]
> >
> > Hi Eric, thanks for the update, and the WIP patch.
> >
> > >
> > > After pushing the code that supports various known function classes last week,
> > > I've turned my attention back to the core reference count checking
> > > functionality. This functionality used to reside in region_model, which
> > > wasn't ideal. To address this, I've introduced a hook to register callbacks
> > > to pop_frame. Specifically, this allows the code that checks the reference
> > > count and emits diagnostics to be housed within the plugin, rather than the
> > > core analyzer.
> > >
> > > As of now, the parameters of pop_frame_callback are tailored specifically to
> > > our needs. If the use of callbacks at the end of pop_frame becomes more
> > > prevalent, we can revisit the setup to potentially make it more general.
> > >
> > > Moreover, the core reference count checking logic was previously somewhat
> > > bloated, contained in one extensive function. I've since refactored it,
> > > breaking it down into several helper functions to simplify and reduce
> > > complexity. There are still some aspects that need refinement, especially
> > > since the plugin has seen changes since I last worked on this logic. However,
> > > I believe that there aren't any significant problems.
> >
> > Suggestion: introduce some more decls into analyzer-decls.h and
> > known_functions for them into the plugin so that you can run/test/debug
> > the helper functions independently (similar to the existing ones in kf-
> > analyzer.cc).
> >
> > e.g.
> >   extern void __analyzer_cpython_dump_real_refcounts (void);
> >   extern void __analyzer_cpython_dump_ob_refcnt (void);
> >
> > >
> Thanks for the suggestion. This will be even more helpful now that we
> have split the logic into helper functions. I will look into these
> when I come back to the "is there a refcount problem" side of the
> equation.
> > > Currently, I've started working a custom stmt_finder similar to leak_stmt_finder
> > > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > > region_model::pop_frame. This approach was discussed as a viable solution in
> > > a previous email, and I'll keep everyone posted on my progress. Afterwards, I
> > > will go back to address the refinements necessary mentioned above.
> >
> > You might want to experiment with splitting out
> > (a) "is there a refcount problem" from
> > (b) "emit a refcount problem".
> >
> > For example, you could hardcode (a) to true, so we always complain with
> > (b) on every heap-allocated object, just to debug the stmt_finder
> > workaround.
> >
> >
> > [...snip...]
> >
> > BTW, you don't need to bother to write ChangeLog entries if you're just
> > sending a work-in-progress for me.
> >
> > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > index 7cd72e8a886..918bb5a5587 100644
> > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > [...]
> >
> > > +/* For PyListObjects: processes the ob_item field within the current region and
> > > + * increments the reference count if conditions are met. */
> > > +void
> > > +process_ob_item_region (const region_model *model, region_model_manager *mgr,
> > > +                       region_model_context *ctxt, const region *curr_region,
> > > +                       const svalue *pylist_type_ptr, const region *base_reg,
> > > +                       int &actual_refcnt)
> >
> > You seem to be special-casing PyListObject here; why?  That seems like
> > it's not going to be scalable tothe general case.
> >
> > Am I right in thinking the intent of this code is to count the actual
> > number of pointers in memory that point to a particular region?
> >
> > Doesn't the ob_item buffer show up in the store as another cluster?
> > Can't you just look at the bindings in the clusters and tally up the
> > pointers for each heap_allocated_region? (accumulating a result map
> > from region to int of the actual reference counts).
> Yes, you're correct, I hadn't thought of that ... This simplifies a
> lot of things. Thanks for the great suggestion!
> >
> > Or am I missing something?
> >
> > What does
> >   model->debug ();
> > show in your examples?
> >
> >
> > > +{
> > > +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
> > > +  const region *ob_item_field_reg
> > > +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> > > +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, ctxt);
> > > +
> > > +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
> > > +    {
> > > +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> > > +      const svalue *allocated_bytes = model->get_dynamic_extents (ob_item_reg);
> > > +      const region *ob_item_sized = mgr->get_sized_region (
> > > +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> > > +      const svalue *buffer_contents_sval
> > > +         = model->get_store_value (ob_item_sized, ctxt);
> > > +
> > > +      if (const auto &buffer_contents
> > > +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> > > +       {
> > > +         for (const auto &buffer_content : buffer_contents->get_map ())
> > > +           {
> > > +                   const auto &content_value = buffer_content.second;
> > > +                   if (const auto &content_region
> > > +                       = content_value->dyn_cast_region_svalue ())
> > > +                           if (content_region->get_pointee () == base_reg)
> > > +                                   actual_refcnt++;
> > > +           }
> > > +       }
> > > +    }
> > > +}
> > > +
> > > +/* Counts the actual references from all clusters in the model's store. */
> > > +int
> > > +count_actual_references (const region_model *model, region_model_manager *mgr,
> > > +                        region_model_context *ctxt, const region *base_reg,
> > > +                        const svalue *pylist_type_ptr, tree ob_type_field)
> > > +{
> > > +  int actual_refcnt = 0;
> > > +  for (const auto &other_cluster : *model->get_store ())
> > > +    {
> > > +      for (const auto &binding : other_cluster.second->get_map ())
> > > +       {
> > > +         const auto &sval = binding.second;
> > > +         const auto &curr_region = sval->maybe_get_region ();
> > > +
> > > +         if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
> > > +           continue;
> > > +
> > > +         increment_count_if_base_matches (curr_region, base_reg,
> > > +                                           actual_refcnt);
> > > +
> > > +         const region *ob_type_region
> > > +             = mgr->get_field_region (curr_region, ob_type_field);
> > > +         const svalue *stored_sval
> > > +             = model->get_store_value (ob_type_region, ctxt);
> > > +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
> > > +
> > > +         if (!remove_cast)
> > > +           continue;
> > > +
> > > +         const svalue *type = remove_cast->get_arg ();
> > > +         if (type == pylist_type_ptr)
> > > +           process_ob_item_region (model, mgr, ctxt, curr_region,
> > > +                                   pylist_type_ptr, base_reg, actual_refcnt);
> > > +       }
> > > +    }
> > > +  return actual_refcnt;
> > > +}
> > >
> >
> > Hope the above is constructive.
> > Dave
> >
>
> Best,
> Eric

  reply	other threads:[~2023-08-21 14:06 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-25  4:49 Update and Questions " Eric Feng
2023-07-25 14:41 ` David Malcolm
2023-07-27 22:13   ` Eric Feng
2023-07-27 22:35     ` David Malcolm
2023-07-30 17:52       ` Eric Feng
2023-07-30 23:44         ` David Malcolm
2023-08-01 13:57           ` Eric Feng
2023-08-01 17:06             ` David Malcolm
2023-08-04 15:02               ` Eric Feng
2023-08-04 15:39                 ` David Malcolm
2023-08-04 20:48                   ` Eric Feng
2023-08-04 22:42                     ` David Malcolm
2023-08-04 22:46                       ` David Malcolm
2023-08-07 18:31                         ` Eric Feng
2023-08-07 23:16                           ` David Malcolm
2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
2023-08-08 18:08                               ` David Malcolm
2023-08-08 18:51                               ` David Malcolm
2023-08-09 19:22                                 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
2023-08-09 21:36                                   ` David Malcolm
2023-08-11 17:47                                     ` [COMMITTED] " Eric Feng
2023-08-11 20:23                                       ` Eric Feng
2023-08-16 19:17                                         ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-08-16 21:28                                           ` David Malcolm
2023-08-17  1:47                                             ` Eric Feng
2023-08-21 14:05                                               ` Eric Feng [this message]
2023-08-21 15:04                                                 ` David Malcolm
2023-08-23 21:15                                                   ` Eric Feng
2023-08-23 23:16                                                     ` David Malcolm
2023-08-24 14:45                                                       ` Eric Feng
2023-08-25 12:50                                                         ` Eric Feng
2023-08-25 19:50                                                           ` David Malcolm
2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29  4:35                                                               ` Eric Feng
2023-08-29 17:28                                                                 ` Eric Feng
2023-08-29 21:14                                                                   ` David Malcolm
2023-08-30 22:15                                                                     ` Eric Feng
2023-08-31 17:01                                                                       ` David Malcolm
2023-08-31 19:09                                                                         ` Eric Feng
2023-08-31 20:19                                                                           ` David Malcolm
2023-09-01  1:25                                                                             ` Eric Feng
2023-09-01 11:57                                                                               ` David Malcolm
2023-09-05  2:13                                                                                 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
2023-09-07 17:28                                                                                   ` David Malcolm
2023-09-11  2:12                                                                                     ` Eric Feng
2023-09-11 19:00                                                                                       ` David Malcolm
2023-08-29 21:08                                                               ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
2023-09-01  2:49                                                               ` Hans-Peter Nilsson
2023-09-01 14:51                                                                 ` David Malcolm
2023-09-01 21:07                                                                   ` Eric Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANGHATV6Z7zzG5_Gr+g0nAJwRgRaAEh61o01ypZT91COfhP_sQ@mail.gmail.com \
    --to=ef2648@columbia.edu \
    --cc=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).