From: David Malcolm <dmalcolm@redhat.com>
To: Eric Feng <ef2648@columbia.edu>
Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
Date: Thu, 07 Sep 2023 13:28:02 -0400 [thread overview]
Message-ID: <14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com> (raw)
In-Reply-To: <20230905021334.39124-1-ef2648@columbia.edu>
On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
> Hi Dave,
Hi Eric, thanks for the patch.
>
> Recently I've been working on symbolic value support for the reference
> count checker. I've attached a patch for it below; let me know it looks
> OK for trunk. Thanks!
>
> Best,
> Eric
>
> ---
>
> This patch enhances the reference count checker in the CPython plugin by
> adding support for symbolic values. Whereas previously we were only able
> to check the reference count of PyObject* objects created in the scope
> of the function; we are now able to emit diagnostics on reference count
> mismatch of objects that were, for example, passed in as a function
> parameter.
>
> rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’
> 6 | return obj;
> | ^~~
[...snip...]
> create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
>
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index bf1982e79c3..d7ecd7fce09 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -314,17 +314,20 @@ public:
> {
> diagnostic_metadata m;
> bool warned;
> - // just assuming constants for now
> - auto actual_refcnt
> - = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> - auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> - warned = warning_meta (rich_loc, m, get_controlling_option (),
> - "expected %qE to have "
> - "reference count: %qE but ob_refcnt field is: %qE",
> - m_reg_tree, actual_refcnt, ob_refcnt);
> -
> - // location_t loc = rich_loc->get_loc ();
> - // foo (loc);
> +
> + const auto *actual_refcnt_constant
> + = m_actual_refcnt->dyn_cast_constant_svalue ();
> + const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue ();
> + if (!actual_refcnt_constant || !ob_refcnt_constant)
> + return false;
> +
> + auto actual_refcnt = actual_refcnt_constant->get_constant ();
> + auto ob_refcnt = ob_refcnt_constant->get_constant ();
> + warned = warning_meta (
> + rich_loc, m, get_controlling_option (),
> + "expected %qE to have "
> + "reference count: N + %qE but ob_refcnt field is N + %qE",
> + m_reg_tree, actual_refcnt, ob_refcnt);
> return warned;
I know you're emulating the old behavior I implemented way back in
cpychecker, but I don't like that behavior :(
Specifically, although the patch improves the behavior for symbolic
values, it regresses the precision of wording for the concrete values
case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have
1 pointer, then it's more readable to say:
warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
field is ‘2’
than:
warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’
...and we shouldn't quote concrete numbers, the message should be:
warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field is 2
or better:
warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2
Can you move the unwrapping of the svalue from the tests below into the
emit vfunc? That way the m_actual_refcnt doesn't have to be a
constant_svalue; you could have logic in the emit vfunc to print
readable messages based on what kind of svalue it is.
Rather than 'N', it might be better to say 'initial'; how about:
warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt' field has increased by 1
warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field has increased by 2
warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field is unchanged
warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt' field has decreased by 1
warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field is unchanged
and similar?
Maybe have a flag that tracks whether we're talking about a concrete
value that's absolute versus a concrete value that's relative to the
initial value?
[...snip...]
> @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
> refcnt = existed ? refcnt + 1 : 1;
> }
>
> +static const region *
> +get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
> +{
> + const auto *region_sval = sval->dyn_cast_region_svalue ();
> + if (region_sval)
> + return region_sval->get_pointee ();
> +
> + const auto *initial_sval = sval->dyn_cast_initial_svalue ();
> + if (initial_sval)
> + return mgr->get_symbolic_region (initial_sval);
> +
> + return nullptr;
> +}
This is dereferencing a pointer, right?
Can the caller use region_model::deref_rvalue instead?
[...snip...]
> +static void
> +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
> +{
> + if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
> + {
> + auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
> + if (!unwrap_cast)
> + unwrap_cast = ob_refcnt_sval;
> +
> + if (unwrap_cast->get_kind () == SK_BINOP)
> + ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 ();
This would be better spelled as:
if (const binop_svalue *binop_sval = unwrap_cast->dyn_cast_binop_svalue ())
ob_refcnt_sval = binop_sval->get_arg1 ();
[...snip...]
> /* Compare ob_refcnt field vs the actual reference count of a region */
> static void
> check_refcnt (const region_model *model,
> const region_model *old_model,
> region_model_context *ctxt,
> const hash_map<const ana::region *,
> - int>::iterator::reference_pair region_refcnt)
> + int>::iterator::reference_pair ®ion_refcnt)
Could really use a typedef for
const hash_map<const ana::region *, int>
to simplify this code.
> {
> region_model_manager *mgr = model->get_manager ();
> const auto &curr_region = region_refcnt.first;
> const auto &actual_refcnt = region_refcnt.second;
> +
> const svalue *ob_refcnt_sval
> = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> + if (!ob_refcnt_sval)
> + return;
> +
> + unwrap_any_ob_refcnt_sval (ob_refcnt_sval);
As noted above, can the diagnostic store the pre-unwrapped
ob_refcnt_sval? Might mean you have to do the unwrapping both here,
and later when displaying the diagnostic. Or (probably best) track
both the original and unwrapped ob_refcnt_sval, and store both in the
pending_diagnostic.
> +
> const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> ob_refcnt_sval->get_type (), actual_refcnt);
> -
> if (ob_refcnt_sval != actual_refcnt_sval)
> - {
> - const svalue *curr_reg_sval
> - = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
> - tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
> - if (!reg_tree)
> - return;
> -
> - const auto &eg = ctxt->get_eg ();
> - refcnt_stmt_finder finder (*eg, reg_tree);
> - auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> - actual_refcnt_sval, reg_tree);
> - if (pd && eg)
> - ctxt->warn (std::move (pd), &finder);
> - }
> + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
> + actual_refcnt_sval, ctxt);
> }
>
> static void
> @@ -493,8 +520,6 @@ count_all_references (const region_model *model,
> for (const auto &cluster : *model->get_store ())
> {
> auto curr_region = cluster.first;
> - if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> - continue;
>
> increment_region_refcnt (region_to_refcnt, curr_region);
>
> @@ -505,8 +530,8 @@ count_all_references (const region_model *model,
>
> const svalue *unwrapped_sval
> = binding_sval->unwrap_any_unmergeable ();
> - // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> - // continue;
> + if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> + continue;
We'll probably want a smarter test for this, that the type "inherits"
C-style from PyObject (e.g. PyLongObject).
>
> const region *pointee = unwrapped_sval->maybe_get_region ();
> if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index e1efd9efda5..46daf2f8975 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -75,4 +75,23 @@ test_PyListAppend_6 ()
> PyObject *list = NULL;
> PyList_Append(list, item);
> return list;
> -}
> \ No newline at end of file
> +}
> +
> +PyObject *
> +test_PyListAppend_7 (PyObject *item)
> +{
> + PyObject *list = PyList_New (0);
> + Py_INCREF(item);
> + PyList_Append(list, item);
> + return list;
> + /* { dg-warning "expected 'item' to have reference count" "" { target *-*-* } .-1 } */
It would be good if these dg-warning directives regexp contained the
actual and expected counts; I find I can't easily tell what the
intended output is meant to be.
> +}
> +
> +PyObject *
> +test_PyListAppend_8 (PyObject *item, PyObject *list)
> +{
> + Py_INCREF(item);
> + Py_INCREF(item);
> + PyList_Append(list, item);
> + return list;
> +}
Should we complain here about item->ob_refcnt being too high?
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> new file mode 100644
> index 00000000000..a7f39509d6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_refcnt_1 (PyObject *obj)
> +{
> + Py_INCREF(obj);
> + Py_INCREF(obj);
> + return obj;
> + /* { dg-warning "expected 'obj' to have reference count" "" { target *-*-* } .-1 } */
Likewise, it would be better for the dg-warning directive's expressed
the expected "actual vs expected" values.
[...snip...]
Thanks again for the patch, hope this is constructive
Dave
next prev parent reply other threads:[~2023-09-07 17:28 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 4:49 Update and Questions on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-07-25 14:41 ` David Malcolm
2023-07-27 22:13 ` Eric Feng
2023-07-27 22:35 ` David Malcolm
2023-07-30 17:52 ` Eric Feng
2023-07-30 23:44 ` David Malcolm
2023-08-01 13:57 ` Eric Feng
2023-08-01 17:06 ` David Malcolm
2023-08-04 15:02 ` Eric Feng
2023-08-04 15:39 ` David Malcolm
2023-08-04 20:48 ` Eric Feng
2023-08-04 22:42 ` David Malcolm
2023-08-04 22:46 ` David Malcolm
2023-08-07 18:31 ` Eric Feng
2023-08-07 23:16 ` David Malcolm
2023-08-08 16:51 ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
2023-08-08 18:08 ` David Malcolm
2023-08-08 18:51 ` David Malcolm
2023-08-09 19:22 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
2023-08-09 21:36 ` David Malcolm
2023-08-11 17:47 ` [COMMITTED] " Eric Feng
2023-08-11 20:23 ` Eric Feng
2023-08-16 19:17 ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-08-16 21:28 ` David Malcolm
2023-08-17 1:47 ` Eric Feng
2023-08-21 14:05 ` Eric Feng
2023-08-21 15:04 ` David Malcolm
2023-08-23 21:15 ` Eric Feng
2023-08-23 23:16 ` David Malcolm
2023-08-24 14:45 ` Eric Feng
2023-08-25 12:50 ` Eric Feng
2023-08-25 19:50 ` David Malcolm
2023-08-29 4:31 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29 4:35 ` Eric Feng
2023-08-29 17:28 ` Eric Feng
2023-08-29 21:14 ` David Malcolm
2023-08-30 22:15 ` Eric Feng
2023-08-31 17:01 ` David Malcolm
2023-08-31 19:09 ` Eric Feng
2023-08-31 20:19 ` David Malcolm
2023-09-01 1:25 ` Eric Feng
2023-09-01 11:57 ` David Malcolm
2023-09-05 2:13 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
2023-09-07 17:28 ` David Malcolm [this message]
2023-09-11 2:12 ` Eric Feng
2023-09-11 19:00 ` David Malcolm
2023-08-29 21:08 ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
2023-09-01 2:49 ` Hans-Peter Nilsson
2023-09-01 14:51 ` David Malcolm
2023-09-01 21:07 ` Eric Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=ef2648@columbia.edu \
--cc=gcc-patches@gcc.gnu.org \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).