From: Eric Feng <ef2648@columbia.edu>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
Date: Sun, 10 Sep 2023 22:12:29 -0400 [thread overview]
Message-ID: <CANGHATVjJ775Kg=jrR73qXuNwy9XOX9_XEHM0sPDqGhr5c07wg@mail.gmail.com> (raw)
In-Reply-To: <14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 12148 bytes --]
On Thu, Sep 7, 2023 at 1:28 PM David Malcolm <dmalcolm@redhat.com> wrote:
> On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
>
> > Hi Dave,
>
> Hi Eric, thanks for the patch.
>
> >
> > Recently I've been working on symbolic value support for the reference
> > count checker. I've attached a patch for it below; let me know it looks
> > OK for trunk. Thanks!
> >
> > Best,
> > Eric
> >
> > ---
> >
> > This patch enhances the reference count checker in the CPython plugin by
> > adding support for symbolic values. Whereas previously we were only able
> > to check the reference count of PyObject* objects created in the scope
> > of the function; we are now able to emit diagnostics on reference count
> > mismatch of objects that were, for example, passed in as a function
> > parameter.
> >
> > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but
> ob_refcnt field is N + ‘2’
> > 6 | return obj;
> > | ^~~
>
> [...snip...]
>
> > create mode 100644
> gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index bf1982e79c3..d7ecd7fce09 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -314,17 +314,20 @@ public:
> > {
> > diagnostic_metadata m;
> > bool warned;
> > - // just assuming constants for now
> > - auto actual_refcnt
> > - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> > - auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue
> ()->get_constant ();
> > - warned = warning_meta (rich_loc, m, get_controlling_option (),
> > - "expected %qE to have "
> > - "reference count: %qE but ob_refcnt field is:
> %qE",
> > - m_reg_tree, actual_refcnt, ob_refcnt);
> > -
> > - // location_t loc = rich_loc->get_loc ();
> > - // foo (loc);
> > +
> > + const auto *actual_refcnt_constant
> > + = m_actual_refcnt->dyn_cast_constant_svalue ();
> > + const auto *ob_refcnt_constant =
> m_ob_refcnt->dyn_cast_constant_svalue ();
> > + if (!actual_refcnt_constant || !ob_refcnt_constant)
> > + return false;
> > +
> > + auto actual_refcnt = actual_refcnt_constant->get_constant ();
> > + auto ob_refcnt = ob_refcnt_constant->get_constant ();
> > + warned = warning_meta (
> > + rich_loc, m, get_controlling_option (),
> > + "expected %qE to have "
> > + "reference count: N + %qE but ob_refcnt field is N + %qE",
> > + m_reg_tree, actual_refcnt, ob_refcnt);
> > return warned;
>
> I know you're emulating the old behavior I implemented way back in
> cpychecker, but I don't like that behavior :(
>
> Specifically, although the patch improves the behavior for symbolic
> values, it regresses the precision of wording for the concrete values
> case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have
> 1 pointer, then it's more readable to say:
>
> warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
> field is ‘2’
>
> than:
>
> warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt
> field is N + ‘2’
>
> ...and we shouldn't quote concrete numbers, the message should be:
>
> warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field
> is 2
> or better:
>
> warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2
>
>
> Can you move the unwrapping of the svalue from the tests below into the
> emit vfunc? That way the m_actual_refcnt doesn't have to be a
> constant_svalue; you could have logic in the emit vfunc to print
> readable messages based on what kind of svalue it is.
>
> Rather than 'N', it might be better to say 'initial'; how about:
>
> warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt'
> field has increased by 1
> warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field has increased by 2
> warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field is unchanged
> warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt'
> field has decreased by 1
> warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field
> is unchanged
>
> and similar?
>
That makes sense to me as well (indeed I was just emulating the old
behavior)! Will experiment and keep you posted on a revised patch with this
in mind. This is somewhat of a minor detail but can we emit ‘*obj’ as
bolded text in the diagnostic message? Currently, I can emit this
(including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't
bold the body of the single quotations. Is this possible?
>
> Maybe have a flag that tracks whether we're talking about a concrete
> value that's absolute versus a concrete value that's relative to the
> initial value?
>
>
> [...snip...]
>
>
> > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *,
> int> &map, const region *key)
> > refcnt = existed ? refcnt + 1 : 1;
> > }
> >
> > +static const region *
> > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
> > +{
> > + const auto *region_sval = sval->dyn_cast_region_svalue ();
> > + if (region_sval)
> > + return region_sval->get_pointee ();
> > +
> > + const auto *initial_sval = sval->dyn_cast_initial_svalue ();
> > + if (initial_sval)
> > + return mgr->get_symbolic_region (initial_sval);
> > +
> > + return nullptr;
> > +}
>
> This is dereferencing a pointer, right?
>
> Can the caller use region_model::deref_rvalue instead?
>
>
> [...snip...]
>
> > +static void
> > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
> > +{
> > + if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
> > + {
> > + auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
> > + if (!unwrap_cast)
> > + unwrap_cast = ob_refcnt_sval;
> > +
> > + if (unwrap_cast->get_kind () == SK_BINOP)
> > + ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1
> ();
>
> This would be better spelled as:
>
> if (const binop_svalue *binop_sval =
> unwrap_cast->dyn_cast_binop_svalue ())
> ob_refcnt_sval = binop_sval->get_arg1 ();
>
> [...snip...]
>
> > /* Compare ob_refcnt field vs the actual reference count of a region */
> > static void
> > check_refcnt (const region_model *model,
> > const region_model *old_model,
> > region_model_context *ctxt,
> > const hash_map<const ana::region *,
> > - int>::iterator::reference_pair region_refcnt)
> > + int>::iterator::reference_pair ®ion_refcnt)
>
> Could really use a typedef for
> const hash_map<const ana::region *, int>
> to simplify this code.
>
> > {
> > region_model_manager *mgr = model->get_manager ();
> > const auto &curr_region = region_refcnt.first;
> > const auto &actual_refcnt = region_refcnt.second;
> > +
> > const svalue *ob_refcnt_sval
> > = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> > + if (!ob_refcnt_sval)
> > + return;
> > +
> > + unwrap_any_ob_refcnt_sval (ob_refcnt_sval);
>
> As noted above, can the diagnostic store the pre-unwrapped
> ob_refcnt_sval? Might mean you have to do the unwrapping both here,
> and later when displaying the diagnostic. Or (probably best) track
> both the original and unwrapped ob_refcnt_sval, and store both in the
> pending_diagnostic.
>
> > +
> > const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> > ob_refcnt_sval->get_type (), actual_refcnt);
> > -
> > if (ob_refcnt_sval != actual_refcnt_sval)
> > - {
> > - const svalue *curr_reg_sval
> > - = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
> > - tree reg_tree = old_model->get_representative_tree
> (curr_reg_sval);
> > - if (!reg_tree)
> > - return;
> > -
> > - const auto &eg = ctxt->get_eg ();
> > - refcnt_stmt_finder finder (*eg, reg_tree);
> > - auto pd = make_unique<refcnt_mismatch> (curr_region,
> ob_refcnt_sval,
> > - actual_refcnt_sval,
> reg_tree);
> > - if (pd && eg)
> > - ctxt->warn (std::move (pd), &finder);
> > - }
> > + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
> > + actual_refcnt_sval, ctxt);
> > }
> >
> > static void
> > @@ -493,8 +520,6 @@ count_all_references (const region_model *model,
> > for (const auto &cluster : *model->get_store ())
> > {
> > auto curr_region = cluster.first;
> > - if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> > - continue;
> >
> > increment_region_refcnt (region_to_refcnt, curr_region);
> >
> > @@ -505,8 +530,8 @@ count_all_references (const region_model *model,
> >
> > const svalue *unwrapped_sval
> > = binding_sval->unwrap_any_unmergeable ();
> > - // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> > - // continue;
> > + if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> > + continue;
>
> We'll probably want a smarter test for this, that the type "inherits"
> C-style from PyObject (e.g. PyLongObject).
>
>
> >
> > const region *pointee = unwrapped_sval->maybe_get_region ();
> > if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> > diff --git
> a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > index e1efd9efda5..46daf2f8975 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > @@ -75,4 +75,23 @@ test_PyListAppend_6 ()
> > PyObject *list = NULL;
> > PyList_Append(list, item);
> > return list;
> > -}
> > \ No newline at end of file
> > +}
> > +
> > +PyObject *
> > +test_PyListAppend_7 (PyObject *item)
> > +{
> > + PyObject *list = PyList_New (0);
> > + Py_INCREF(item);
> > + PyList_Append(list, item);
> > + return list;
> > + /* { dg-warning "expected 'item' to have reference count" "" { target
> *-*-* } .-1 } */
>
> It would be good if these dg-warning directives regexp contained the
> actual and expected counts; I find I can't easily tell what the
> intended output is meant to be.
>
>
> > +}
> > +
> > +PyObject *
> > +test_PyListAppend_8 (PyObject *item, PyObject *list)
> > +{
> > + Py_INCREF(item);
> > + Py_INCREF(item);
> > + PyList_Append(list, item);
> > + return list;
> > +}
>
> Should we complain here about item->ob_refcnt being too high?
>
> > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> > new file mode 100644
> > index 00000000000..a7f39509d6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target analyzer } */
> > +/* { dg-options "-fanalyzer" } */
> > +/* { dg-require-python-h "" } */
> > +
> > +
> > +#define PY_SSIZE_T_CLEAN
> > +#include <Python.h>
> > +#include "../analyzer/analyzer-decls.h"
> > +
> > +PyObject *
> > +test_refcnt_1 (PyObject *obj)
> > +{
> > + Py_INCREF(obj);
> > + Py_INCREF(obj);
> > + return obj;
> > + /* { dg-warning "expected 'obj' to have reference count" "" { target
> *-*-* } .-1 } */
>
> Likewise, it would be better for the dg-warning directive's expressed
> the expected "actual vs expected" values.
>
> [...snip...]
>
> Thanks again for the patch, hope this is constructive
>
> Dave
>
>
next prev parent reply other threads:[~2023-09-11 2:12 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 4:49 Update and Questions on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-07-25 14:41 ` David Malcolm
2023-07-27 22:13 ` Eric Feng
2023-07-27 22:35 ` David Malcolm
2023-07-30 17:52 ` Eric Feng
2023-07-30 23:44 ` David Malcolm
2023-08-01 13:57 ` Eric Feng
2023-08-01 17:06 ` David Malcolm
2023-08-04 15:02 ` Eric Feng
2023-08-04 15:39 ` David Malcolm
2023-08-04 20:48 ` Eric Feng
2023-08-04 22:42 ` David Malcolm
2023-08-04 22:46 ` David Malcolm
2023-08-07 18:31 ` Eric Feng
2023-08-07 23:16 ` David Malcolm
2023-08-08 16:51 ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
2023-08-08 18:08 ` David Malcolm
2023-08-08 18:51 ` David Malcolm
2023-08-09 19:22 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
2023-08-09 21:36 ` David Malcolm
2023-08-11 17:47 ` [COMMITTED] " Eric Feng
2023-08-11 20:23 ` Eric Feng
2023-08-16 19:17 ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-08-16 21:28 ` David Malcolm
2023-08-17 1:47 ` Eric Feng
2023-08-21 14:05 ` Eric Feng
2023-08-21 15:04 ` David Malcolm
2023-08-23 21:15 ` Eric Feng
2023-08-23 23:16 ` David Malcolm
2023-08-24 14:45 ` Eric Feng
2023-08-25 12:50 ` Eric Feng
2023-08-25 19:50 ` David Malcolm
2023-08-29 4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29 4:35 ` Eric Feng
2023-08-29 17:28 ` Eric Feng
2023-08-29 21:14 ` David Malcolm
2023-08-30 22:15 ` Eric Feng
2023-08-31 17:01 ` David Malcolm
2023-08-31 19:09 ` Eric Feng
2023-08-31 20:19 ` David Malcolm
2023-09-01 1:25 ` Eric Feng
2023-09-01 11:57 ` David Malcolm
2023-09-05 2:13 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
2023-09-07 17:28 ` David Malcolm
2023-09-11 2:12 ` Eric Feng [this message]
2023-09-11 19:00 ` David Malcolm
2023-08-29 21:08 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
2023-09-01 2:49 ` Hans-Peter Nilsson
2023-09-01 14:51 ` David Malcolm
2023-09-01 21:07 ` Eric Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANGHATVjJ775Kg=jrR73qXuNwy9XOX9_XEHM0sPDqGhr5c07wg@mail.gmail.com' \
--to=ef2648@columbia.edu \
--cc=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).