From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id EAB493858D3C for ; Tue, 29 Aug 2023 04:35:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EAB493858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167073.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37T3s8Ni015538 for ; Tue, 29 Aug 2023 00:35:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type : content-transfer-encoding; s=pps01; bh=1/zwuA/mB634UqIscRd/xM0HzbLzqNhDRYm87ktOtRI=; b=nVB5iEz6BOdR8DG1MgiTQzPG7cq2BelNYFYuRVNhoTVwjPG8VZzl+Y3n1UUE0d5XqKoK iC2OT/b+x02ZvNSvrm5oyB/9T3TqY0Tpl5Y2n3s5nTdwkNR6nCepeq6fj/4c812uUdZx swrtNiJYkhhT3qimHQvyZsdC8Y0Srw7Q/lcYEE1QW276MV2B9iVkCsptxEs2uQ4xQS3q eUWLj/E7i2R9eQ8wHTL3BqdbKSGfKuEmdQqak9KNwKZ3hi1HUZXe+9dkcq67VwArQbf+ vkkQLx4GAWyFXVEzWE8zIu14dfzBA7KKtINj4uq2B7liqB1kXbzLP8YENcRfNpxk8f1B Tg== Received: from mail-vk1-f197.google.com (mail-vk1-f197.google.com [209.85.221.197]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3sqd9f0tsg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 29 Aug 2023 00:35:19 -0400 Received: by mail-vk1-f197.google.com with SMTP id 71dfb90a1353d-48cfb3672fdso1317667e0c.2 for ; Mon, 28 Aug 2023 21:35:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693283718; x=1693888518; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1/zwuA/mB634UqIscRd/xM0HzbLzqNhDRYm87ktOtRI=; b=QZC8oHrgMjU3mSCI/sOWTltrmxyL4z53X/OzFjbKPh6yjQ9eaaBi75j3diSjI5OCuA HnSjs6sjuPmLmMG/+Wmf1TXz3nl14fvdBT3CfWNUhzWkHPx35+wIO2awuLZLstvsjtt5 91/Hqg4qxjnPpJvk3RCJJqFaxXZzsPT/EwZpf/rUP/Hpbdvy+ySX2TchhYWDyFCXBKz1 BNFvSJczj03nrdq5VEplbMI5/LA2PefC4VNgTuP64rYtQk7uiHf4UyG8/vfeA22oF+U8 I6fT9fejdyNEuAKIggv4ASYW6L45bsTCrjWJBa5V7xLgRvYyYsf8kuUu58KT16+k6gp/ Fckg== X-Gm-Message-State: AOJu0YykxVWFMjWIDeP34ey1vvLy+4YxIHkOMqN3Fo7tDb64o28IWYl0 TWrUO0zXgxjMurkcChw6xK8Dp8iklV54umXj9va3w/n33M/aFzJfYh+zBZbYJzJHGa/1ql6DJ96 LNMIQsO8xKQx0CBShExZD X-Received: by 2002:a1f:728e:0:b0:48d:eaa:45c4 with SMTP id n136-20020a1f728e000000b0048d0eaa45c4mr23752276vkc.7.1693283718231; Mon, 28 Aug 2023 21:35:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEpVsiXvLG/ipHjwHHOHK7B0GYP3c+QBSEdSM6S227t3ZZff/9z+roQFZ6VRhJ9VEEXnadRmsvRC7YFJW4+fdQ= X-Received: by 2002:a1f:728e:0:b0:48d:eaa:45c4 with SMTP id n136-20020a1f728e000000b0048d0eaa45c4mr23752256vkc.7.1693283717672; Mon, 28 Aug 2023 21:35:17 -0700 (PDT) MIME-Version: 1.0 References: <20230829043155.17651-1-ef2648@columbia.edu> In-Reply-To: <20230829043155.17651-1-ef2648@columbia.edu> From: Eric Feng Date: Tue, 29 Aug 2023 00:35:06 -0400 Message-ID: Subject: Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] To: dmalcolm@redhat.com Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: Wj7aqlfn30xVQp128hHYGrzoFOYF6pFr X-Proofpoint-ORIG-GUID: Wj7aqlfn30xVQp128hHYGrzoFOYF6pFr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-08-29_01,2023-08-28_04,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 malwarescore=0 suspectscore=0 spamscore=0 impostorscore=10 mlxlogscore=999 adultscore=0 mlxscore=0 bulkscore=10 phishscore=0 lowpriorityscore=10 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2308290040 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Aug 29, 2023 at 12:32=E2=80=AFAM Eric Feng wr= ote: > > Hi Dave, > > Thanks for the feedback. I've addressed the changes you mentioned in > addition to adding more test cases. I've also taken this chance to > split the test files according to known function subclasses, as you previ= ously > suggested. Since there were also some changes to the core analyzer, I've = done a > bootstrap and regtested the patch as well. Does it look OK for trunk? Apologies =E2=80=94 I forgot to mention that bootstrap and regtest was done= on aarch64-unknown-linux-gnu. > > Best, > Eric > > --- > > This patch introduces initial support for reference count checking of > PyObjects in relation to the Python/C API for the CPython plugin. > Additionally, the core analyzer underwent several modifications to > accommodate this feature. These include: > > - Introducing support for callbacks at the end of > region_model::pop_frame. This is our current point of validation for > the reference count of PyObjects. > - An added optional custom stmt_finder parameter to > region_model_context::warn. This aids in emitting a diagnostic > concerning the reference count, especially when the stmt_finder is > NULL, which is currently the case during region_model::pop_frame. > > The current diagnostic we emit relating to the reference count > appears as follows: > > rc3.c:23:10: warning: expected = to have reference count: =E2=80=981=E2=80=99 but ob_refcnt field is: =E2= =80=982=E2=80=99 > 23 | return list; > | ^~~~ > =E2=80=98create_py_object=E2=80=99: events 1-4 > | > | 4 | PyObject* item =3D PyLong_FromLong(3); > | | ^~~~~~~~~~~~~~~~~~ > | | | > | | (1) when =E2=80=98PyLong_FromLong=E2=80= =99 succeeds > | 5 | PyObject* list =3D PyList_New(1); > | | ~~~~~~~~~~~~~ > | | | > | | (2) when =E2=80=98PyList_New=E2=80=99 suc= ceeds > |...... > | 14 | PyList_Append(list, item); > | | ~~~~~~~~~~~~~~~~~~~~~~~~~ > | | | > | | (3) when =E2=80=98PyList_Append=E2=80=99 succeeds, moving = buffer > |...... > | 23 | return list; > | | ~~~~ > | | | > | | (4) here > | > > This is a WIP in several ways: > - Enhancing the diagnostic for better clarity. For instance, users should > expect to see the variable name 'item' instead of the placeholder in th= e > diagnostic above. > - Currently, functions returning PyObject * are assumed to always produce > a new reference. > - The validation of reference count is only for PyObjects created within = a > function body. Verifying reference counts for PyObjects passed as > parameters is not supported in this patch. > > gcc/analyzer/ChangeLog: > PR analyzer/107646 > * engine.cc (impl_region_model_context::warn): New optional param= eter. > * exploded-graph.h (class impl_region_model_context): Likewise. > * region-model.cc (region_model::pop_frame): New callback feature= for > * region_model::pop_frame. > * region-model.h (struct append_regions_cb_data): Likewise. > (class region_model): Likewise. > (class region_model_context): New optional parameter. > (class region_model_context_decorator): Likewise. > > gcc/testsuite/ChangeLog: > PR analyzer/107646 > * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference c= ount > * checking for PyObjects. > * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to... > * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and > * added more tests). > * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to... > * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and add= ed > * more tests). > * gcc.dg/plugin/plugin.exp: New tests. > * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test. > * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test. > * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test. > > Signed-off-by: Eric Feng > > --- > gcc/analyzer/engine.cc | 8 +- > gcc/analyzer/exploded-graph.h | 4 +- > gcc/analyzer/region-model.cc | 3 + > gcc/analyzer/region-model.h | 48 ++- > .../gcc.dg/plugin/analyzer_cpython_plugin.c | 376 +++++++++++++++++- > ....c =3D> cpython-plugin-test-PyList_Append.c} | 56 +-- > .../plugin/cpython-plugin-test-PyList_New.c | 38 ++ > .../cpython-plugin-test-PyLong_FromLong.c | 38 ++ > ...st-1.c =3D> cpython-plugin-test-no-plugin.c} | 0 > .../cpython-plugin-test-refcnt-checking.c | 78 ++++ > gcc/testsuite/gcc.dg/plugin/plugin.exp | 5 +- > 11 files changed, 612 insertions(+), 42 deletions(-) > rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c =3D> cpython= -plugin-test-PyList_Append.c} (64%) > create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLis= t_New.c > create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLon= g_FromLong.c > rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c =3D> cpython= -plugin-test-no-plugin.c} (100%) > create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcn= t-checking.c > > diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc > index a1908cdb364..736a41ecdaf 100644 > --- a/gcc/analyzer/engine.cc > +++ b/gcc/analyzer/engine.cc > @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state, > } > > bool > -impl_region_model_context::warn (std::unique_ptr d) > +impl_region_model_context::warn (std::unique_ptr d, > + const stmt_finder *custom_finder) > { > LOG_FUNC (get_logger ()); > - if (m_stmt =3D=3D NULL && m_stmt_finder =3D=3D NULL) > + auto curr_stmt_finder =3D custom_finder ? custom_finder : m_stmt_finde= r; > + if (m_stmt =3D=3D NULL && curr_stmt_finder =3D=3D NULL) > { > if (get_logger ()) > get_logger ()->log ("rejecting diagnostic: no stmt"); > @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr d) > bool terminate_path =3D d->terminate_path_p (); > if (m_eg->get_diagnostic_manager ().add_diagnostic > (m_enode_for_diag, m_enode_for_diag->get_supernode (), > - m_stmt, m_stmt_finder, std::move (d))) > + m_stmt, curr_stmt_finder, std::move (d))) > { > if (m_path_ctxt > && terminate_path > diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.= h > index 5a7ab645bfe..6e9a5ef58c7 100644 > --- a/gcc/analyzer/exploded-graph.h > +++ b/gcc/analyzer/exploded-graph.h > @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_c= ontext > uncertainty_t *uncertainty, > logger *logger =3D NULL); > > - bool warn (std::unique_ptr d) final override; > + bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder =3D NULL) final override; > void add_note (std::unique_ptr pn) final override; > void add_event (std::unique_ptr event) final override; > void on_svalue_leak (const svalue *) override; > @@ -107,6 +108,7 @@ class impl_region_model_context : public region_model= _context > std::unique_ptr *out_sm_context) ove= rride; > > const gimple *get_stmt () const override { return m_stmt; } > + const exploded_graph *get_eg () const override { return m_eg; } > > exploded_graph *m_eg; > log_user m_logger; > diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc > index 4f31a6dcf0f..eb4f976b83a 100644 > --- a/gcc/analyzer/region-model.cc > +++ b/gcc/analyzer/region-model.cc > @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3. If not see > > namespace ana { > > +auto_vec region_model::pop_frame_callbacks; > + > /* Dump T to PP in language-independent form, for debugging/logging/dump= ing > purposes. */ > > @@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue, > } > > unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK); > + notify_on_pop_frame (this, retval, ctxt); > } > > /* Get the number of frames in this region_model's stack. */ > diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h > index 10b2a59e787..440ea6d828d 100644 > --- a/gcc/analyzer/region-model.h > +++ b/gcc/analyzer/region-model.h > @@ -236,6 +236,10 @@ public: > > struct append_regions_cb_data; > > +typedef void (*pop_frame_callback) (const region_model *model, > + const svalue *retval, > + region_model_context *ctxt); > + > /* A region_model encapsulates a representation of the state of memory, = with > a tree of regions, along with their associated values. > The representation is graph-like because values can be pointers to > @@ -532,6 +536,20 @@ class region_model > get_builtin_kf (const gcall *call, > region_model_context *ctxt =3D NULL) const; > > + static void > + register_pop_frame_callback (const pop_frame_callback &callback) > + { > + pop_frame_callbacks.safe_push (callback); > + } > + > + static void > + notify_on_pop_frame (const region_model *model, const svalue *retval, > + region_model_context *ctxt) > + { > + for (auto &callback : pop_frame_callbacks) > + callback (model, retval, ctxt); > + } > + > private: > const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) c= onst; > const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) c= onst; > @@ -621,6 +639,7 @@ private: > tree callee_fndecl, > region_model_context *ctx= t) const; > > + static auto_vec pop_frame_callbacks; > /* Storing this here to avoid passing it around everywhere. */ > region_model_manager *const m_mgr; > > @@ -649,8 +668,10 @@ class region_model_context > { > public: > /* Hook for clients to store pending diagnostics. > - Return true if the diagnostic was stored, or false if it was delete= d. */ > - virtual bool warn (std::unique_ptr d) =3D 0; > + Return true if the diagnostic was stored, or false if it was delete= d. > + Optionally provide a custom stmt_finder. */ > + virtual bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder =3D NULL) =3D 0; > > /* Hook for clients to add a note to the last previously stored > pending diagnostic. */ > @@ -757,6 +778,8 @@ class region_model_context > > /* Get the current statement, if any. */ > virtual const gimple *get_stmt () const =3D 0; > + > + virtual const exploded_graph *get_eg () const =3D 0; > }; > > /* A "do nothing" subclass of region_model_context. */ > @@ -764,7 +787,8 @@ class region_model_context > class noop_region_model_context : public region_model_context > { > public: > - bool warn (std::unique_ptr) override { return fals= e; } > + bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder) override { return false; } > void add_note (std::unique_ptr) override; > void add_event (std::unique_ptr) override; > void on_svalue_leak (const svalue *) override {} > @@ -812,6 +836,7 @@ public: > } > > const gimple *get_stmt () const override { return NULL; } > + const exploded_graph *get_eg () const override { return NULL; } > }; > > /* A subclass of region_model_context for determining if operations fail > @@ -840,7 +865,8 @@ private: > class region_model_context_decorator : public region_model_context > { > public: > - bool warn (std::unique_ptr d) override > + bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder) > { > if (m_inner) > return m_inner->warn (std::move (d)); > @@ -978,6 +1004,14 @@ class region_model_context_decorator : public regio= n_model_context > return nullptr; > } > > + const exploded_graph *get_eg () const override > + { > + if (m_inner) > + return m_inner->get_eg (); > + else > + return nullptr; > + } > + > protected: > region_model_context_decorator (region_model_context *inner) > : m_inner (inner) > @@ -993,7 +1027,8 @@ protected: > class annotating_context : public region_model_context_decorator > { > public: > - bool warn (std::unique_ptr d) override > + bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder) override > { > if (m_inner) > if (m_inner->warn (std::move (d))) > @@ -1158,7 +1193,8 @@ using namespace ::selftest; > class test_region_model_context : public noop_region_model_context > { > public: > - bool warn (std::unique_ptr d) final override > + bool warn (std::unique_ptr d, > + const stmt_finder *custom_finder) final override > { > m_diagnostics.safe_push (d.release ()); > return true; > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/= testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > index 7cd72e8a886..b2caed8fc1b 100644 > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > @@ -44,6 +44,7 @@ > #include "analyzer/region-model.h" > #include "analyzer/call-details.h" > #include "analyzer/call-info.h" > +#include "analyzer/exploded-graph.h" > #include "make-unique.h" > > int plugin_is_GPL_compatible; > @@ -191,6 +192,372 @@ public: > } > }; > > +/* This is just a copy of leak_stmt_finder for now (subject to change if > + * necssary) */ > + > +class refcnt_stmt_finder : public stmt_finder > +{ > +public: > + refcnt_stmt_finder (const exploded_graph &eg, tree var) > + : m_eg (eg), m_var (var) > + { > + } > + > + std::unique_ptr > + clone () const final override > + { > + return make_unique (m_eg, m_var); > + } > + > + const gimple * > + find_stmt (const exploded_path &epath) final override > + { > + logger *const logger =3D m_eg.get_logger (); > + LOG_FUNC (logger); > + > + if (m_var && TREE_CODE (m_var) =3D=3D SSA_NAME) > + { > + /* Locate the final write to this SSA name in the path. */ > + const gimple *def_stmt =3D SSA_NAME_DEF_STMT (m_var); > + > + int idx_of_def_stmt; > + bool found =3D epath.find_stmt_backwards (def_stmt, &idx_of_def_s= tmt); > + if (!found) > + goto not_found; > + > + /* What was the next write to the underlying var > + after the SSA name was set? (if any). */ > + > + for (unsigned idx =3D idx_of_def_stmt + 1; idx < epath.m_edges.le= ngth (); > + ++idx) > + { > + const exploded_edge *eedge =3D epath.m_edges[idx]; > + if (logger) > + logger->log ("eedge[%i]: EN %i -> EN %i", idx, > + eedge->m_src->m_index, > + eedge->m_dest->m_index); > + const exploded_node *dst_node =3D eedge->m_dest; > + const program_point &dst_point =3D dst_node->get_point (); > + const gimple *stmt =3D dst_point.get_stmt (); > + if (!stmt) > + continue; > + if (const gassign *assign =3D dyn_cast (stmt= )) > + { > + tree lhs =3D gimple_assign_lhs (assign); > + if (TREE_CODE (lhs) =3D=3D SSA_NAME > + && SSA_NAME_VAR (lhs) =3D=3D SSA_NAME_VAR= (m_var)) > + return assign; > + } > + } > + } > + > + not_found: > + > + /* Look backwards for the first statement with a location. */ > + int i; > + const exploded_edge *eedge; > + FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge) > + { > + if (logger) > + logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_inde= x, > + eedge->m_dest->m_index); > + const exploded_node *dst_node =3D eedge->m_dest; > + const program_point &dst_point =3D dst_node->get_point (); > + const gimple *stmt =3D dst_point.get_stmt (); > + if (stmt) > + if (get_pure_location (stmt->location) !=3D UNKNOWN_LOCATION) > + return stmt; > + } > + > + gcc_unreachable (); > + return NULL; > + } > + > +private: > + const exploded_graph &m_eg; > + tree m_var; > +}; > + > +class refcnt_mismatch : public pending_diagnostic_subclass > +{ > +public: > + refcnt_mismatch (const region *base_region, > + const svalue *ob_refcnt, > + const svalue *actual_refcnt, > + tree reg_tree) > + : m_base_region (base_region), m_ob_refcnt (ob_refcnt), > + m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree) > + { > + } > + > + const char * > + get_kind () const final override > + { > + return "refcnt_mismatch"; > + } > + > + bool > + operator=3D=3D (const refcnt_mismatch &other) const > + { > + return (m_base_region =3D=3D other.m_base_region > + && m_ob_refcnt =3D=3D other.m_ob_refcnt > + && m_actual_refcnt =3D=3D other.m_actual_refcnt); > + } > + > + int get_controlling_option () const final override > + { > + return 0; > + } > + > + bool > + emit (rich_location *rich_loc, logger *) final override > + { > + diagnostic_metadata m; > + bool warned; > + // just assuming constants for now > + auto actual_refcnt > + =3D m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ()= ; > + auto ob_refcnt =3D m_ob_refcnt->dyn_cast_constant_svalue ()->get_con= stant (); > + warned =3D warning_meta ( > + rich_loc, m, get_controlling_option (), > + "expected to have " > + "reference count: %qE but ob_refcnt field is: %qE", > + actual_refcnt, ob_refcnt); > + > + // location_t loc =3D rich_loc->get_loc (); > + // foo (loc); > + return warned; > + } > + > + void mark_interesting_stuff (interesting_t *interest) final override > + { > + if (m_base_region) > + interest->add_region_creation (m_base_region); > + } > + > +private: > + > + void foo(location_t loc) const > + { > + inform(loc, "something is up right here"); > + } > + const region *m_base_region; > + const svalue *m_ob_refcnt; > + const svalue *m_actual_refcnt; > + tree m_reg_tree; > +}; > + > +/* Retrieves the svalue associated with the ob_refcnt field of the base = region. > + */ > +static const svalue * > +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *mod= el, > + region_model_context *ctxt) > +{ > + region_model_manager *mgr =3D model->get_manager (); > + tree ob_refcnt_tree =3D get_field_by_name (pyobj_record, "ob_refcnt"); > + const region *ob_refcnt_region > + =3D mgr->get_field_region (base_reg, ob_refcnt_tree); > + const svalue *ob_refcnt_sval > + =3D model->get_store_value (ob_refcnt_region, ctxt); > + return ob_refcnt_sval; > +} > + > +static void > +increment_region_refcnt (hash_map &map, const regio= n *key) > +{ > + bool existed; > + auto &refcnt =3D map.get_or_insert (key, &existed); > + refcnt =3D existed ? refcnt + 1 : 1; > +} > + > + > +/* Recursively fills in region_to_refcnt with the references owned by > + pyobj_ptr_sval. */ > +static void > +count_expected_pyobj_references (const region_model *model, > + hash_map ®ion_to_refcn= t, > + const svalue *pyobj_ptr_sval, > + hash_set &seen) > +{ > + if (!pyobj_ptr_sval) > + return; > + > + const auto *pyobj_region_sval =3D pyobj_ptr_sval->dyn_cast_region_sval= ue (); > + const auto *pyobj_initial_sval =3D pyobj_ptr_sval->dyn_cast_initial_sv= alue (); > + if (!pyobj_region_sval && !pyobj_initial_sval) > + return; > + > + // todo: support initial sval (e.g passed in as parameter) > + if (pyobj_initial_sval) > + { > + // increment_region_refcnt (region_to_refcnt, > + // pyobj_initial_sval->get_region ()); > + return; > + } > + > + const region *pyobj_region =3D pyobj_region_sval->get_pointee (); > + if (!pyobj_region || seen.contains (pyobj_region)) > + return; > + > + seen.add (pyobj_region); > + > + if (pyobj_ptr_sval->get_type () =3D=3D pyobj_ptr_tree) > + increment_region_refcnt (region_to_refcnt, pyobj_region); > + > + const auto *curr_store =3D model->get_store (); > + const auto *retval_cluster =3D curr_store->get_cluster (pyobj_region); > + if (!retval_cluster) > + return; > + > + const auto &retval_binding_map =3D retval_cluster->get_map (); > + > + for (const auto &binding : retval_binding_map) > + { > + const svalue *binding_sval =3D binding.second; > + const svalue *unwrapped_sval =3D binding_sval->unwrap_any_unmergea= ble (); > + const region *pointee =3D unwrapped_sval->maybe_get_region (); > + > + if (pointee && pointee->get_kind () =3D=3D RK_HEAP_ALLOCATED) > + count_expected_pyobj_references (model, region_to_refcnt, binding= _sval, > + seen); > + } > +} > + > +/* Compare ob_refcnt field vs the actual reference count of a region */ > +static void > +check_refcnt (const region_model *model, region_model_context *ctxt, > + const hash_map + int>::iterator::reference_pair region_refcnt= ) > +{ > + region_model_manager *mgr =3D model->get_manager (); > + const auto &curr_region =3D region_refcnt.first; > + const auto &actual_refcnt =3D region_refcnt.second; > + const svalue *ob_refcnt_sval =3D retrieve_ob_refcnt_sval (curr_region,= model, ctxt); > + const svalue *actual_refcnt_sval =3D mgr->get_or_create_int_cst ( > + ob_refcnt_sval->get_type (), actual_refcnt); > + > + if (ob_refcnt_sval !=3D actual_refcnt_sval) > + { > + // todo: fix this > + tree reg_tree =3D model->get_representative_tree (curr_region); > + > + const auto &eg =3D ctxt->get_eg (); > + refcnt_stmt_finder finder (*eg, reg_tree); > + auto pd =3D make_unique (curr_region, ob_refcnt_sva= l, > + actual_refcnt_sval, reg_tree)= ; > + if (pd && eg) > + ctxt->warn (std::move (pd), &finder); > + } > +} > + > +static void > +check_refcnts (const region_model *model, const svalue *retval, > + region_model_context *ctxt, > + hash_map ®ion_to_refcnt) > +{ > + for (const auto ®ion_refcnt : region_to_refcnt) > + { > + check_refcnt(model, ctxt, region_refcnt); > + } > +} > + > +/* Validates the reference count of all Python objects. */ > +void > +pyobj_refcnt_checker (const region_model *model, const svalue *retval, > + region_model_context *ctxt) > +{ > + if (!ctxt) > + return; > + > + auto region_to_refcnt =3D hash_map (); > + auto seen_regions =3D hash_set (); > + > + count_expected_pyobj_references (model, region_to_refcnt, retval, seen= _regions); > + check_refcnts (model, retval, ctxt, region_to_refcnt); > +} > + > +/* Counts the actual pyobject references from all clusters in the model'= s > + * store. */ > +static void > +count_all_references (const region_model *model, > + hash_map ®ion_to_refcnt) > +{ > + for (const auto &cluster : *model->get_store ()) > + { > + auto curr_region =3D cluster.first; > + if (curr_region->get_kind () !=3D RK_HEAP_ALLOCATED) > + continue; > + > + increment_region_refcnt (region_to_refcnt, curr_region); > + > + auto binding_cluster =3D cluster.second; > + for (const auto &binding : binding_cluster->get_map ()) > + { > + const svalue *binding_sval =3D binding.second; > + > + const svalue *unwrapped_sval > + =3D binding_sval->unwrap_any_unmergeable (); > + // if (unwrapped_sval->get_type () !=3D pyobj_ptr_tree) > + // continue; > + > + const region *pointee =3D unwrapped_sval->maybe_get_region (); > + if (!pointee || pointee->get_kind () !=3D RK_HEAP_ALLOCATED) > + continue; > + > + increment_region_refcnt (region_to_refcnt, pointee); > + } > + } > +} > + > +static void > +dump_refcnt_info (const hash_map ®ion_to_refcnt, > + const region_model *model, region_model_context *ctxt) > +{ > + region_model_manager *mgr =3D model->get_manager (); > + pretty_printer pp; > + pp_format_decoder (&pp) =3D default_tree_printer; > + pp_show_color (&pp) =3D pp_show_color (global_dc->printer); > + pp.buffer->stream =3D stderr; > + > + for (const auto ®ion_refcnt : region_to_refcnt) > + { > + auto region =3D region_refcnt.first; > + auto actual_refcnt =3D region_refcnt.second; > + const svalue *ob_refcnt_sval > + =3D retrieve_ob_refcnt_sval (region, model, ctxt); > + const svalue *actual_refcnt_sval =3D mgr->get_or_create_int_cst ( > + ob_refcnt_sval->get_type (), actual_refcnt); > + > + region->dump_to_pp (&pp, true); > + pp_string (&pp, " =E2=80=94 ob_refcnt: "); > + ob_refcnt_sval->dump_to_pp (&pp, true); > + pp_string (&pp, " actual refcnt: "); > + actual_refcnt_sval->dump_to_pp (&pp, true); > + pp_newline (&pp); > + } > + pp_string (&pp, "~~~~~~~~\n"); > + pp_flush (&pp); > +} > + > +class kf_analyzer_cpython_dump_refcounts : public known_function > +{ > +public: > + bool matches_call_types_p (const call_details &cd) const final overrid= e > + { > + return cd.num_args () =3D=3D 0; > + } > + void impl_call_pre (const call_details &cd) const final override > + { > + region_model_context *ctxt =3D cd.get_ctxt (); > + if (!ctxt) > + return; > + region_model *model =3D cd.get_model (); > + auto region_to_refcnt =3D hash_map (); > + count_all_references(model, region_to_refcnt); > + dump_refcnt_info(region_to_refcnt, model, ctxt); > + } > +}; > + > /* Some concessions were made to > simplify the analysis process when comparing kf_PyList_Append with the > real implementation. In particular, PyList_Append performs some > @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*= user_data */) > iface->register_known_function ("PyList_New", make_unique ()); > iface->register_known_function ("PyLong_FromLong", > make_unique ()); > + > + iface->register_known_function ( > + "__analyzer_cpython_dump_refcounts", > + make_unique ()); > } > } // namespace ana > > @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info, > const char *plugin_name =3D plugin_info->base_name; > if (0) > inform (input_location, "got here; %qs", plugin_name); > - ana::register_finish_translation_unit_callback (&stash_named_types); > - ana::register_finish_translation_unit_callback (&stash_global_vars); > + register_finish_translation_unit_callback (&stash_named_types); > + register_finish_translation_unit_callback (&stash_global_vars); > + region_model::register_pop_frame_callback(pyobj_refcnt_checker); > register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT, > ana::cpython_analyzer_init_cb, > NULL); /* void *user_data */ > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/te= stsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > similarity index 64% > rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c > rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > index 19b5c17428a..9912f9105d4 100644 > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > @@ -8,34 +8,6 @@ > #include > #include "../analyzer/analyzer-decls.h" > > -PyObject * > -test_PyList_New (Py_ssize_t len) > -{ > - PyObject *obj =3D PyList_New (len); > - if (obj) > - { > - __analyzer_eval (obj->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" }= */ > - __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" }= */ > - } > - else > - __analyzer_dump_path (); /* { dg-message "path" } */ > - return obj; > -} > - > -PyObject * > -test_PyLong_New (long n) > -{ > - PyObject *obj =3D PyLong_FromLong (n); > - if (obj) > - { > - __analyzer_eval (obj->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" }= */ > - __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" }= */ > - } > - else > - __analyzer_dump_path (); /* { dg-message "path" } */ > - return obj; > -} > - > PyObject * > test_PyListAppend (long n) > { > @@ -43,6 +15,7 @@ test_PyListAppend (long n) > PyObject *list =3D PyList_New (0); > PyList_Append(list, item); > return list; /* { dg-warning "leak of 'item'" } */ > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > } > > PyObject * > @@ -67,6 +40,7 @@ test_PyListAppend_2 (long n) > else > __analyzer_eval (item->ob_refcnt =3D=3D 2); /* { dg-warning "TRUE" }= */ > return list; /* { dg-warning "leak of 'item'" } */ > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > } > > > @@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list) > { > PyList_Append (list, item); > return list; > +} > + > +PyObject * > +test_PyListAppend_4 (long n) > +{ > + PyObject *item =3D PyLong_FromLong (n); > + PyObject *list =3D NULL; > + PyList_Append(list, item); > + return list; > +} > + > +PyObject * > +test_PyListAppend_5 () > +{ > + PyObject *list =3D PyList_New (0); > + PyList_Append(list, NULL); > + return list; > +} > + > +PyObject * > +test_PyListAppend_6 () > +{ > + PyObject *item =3D NULL; > + PyObject *list =3D NULL; > + PyList_Append(list, item); > + return list; > } > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c= b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c > new file mode 100644 > index 00000000000..492d4f7d58d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target analyzer } */ > +/* { dg-options "-fanalyzer" } */ > +/* { dg-require-python-h "" } */ > + > + > +#define PY_SSIZE_T_CLEAN > +#include > +#include "../analyzer/analyzer-decls.h" > + > +PyObject * > +test_PyList_New (Py_ssize_t len) > +{ > + PyObject *obj =3D PyList_New (len); > + if (obj) > + { > + __analyzer_eval (obj->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" }= */ > + __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" }= */ > + } > + else > + __analyzer_dump_path (); /* { dg-message "path" } */ > + return obj; > +} > + > +void > +test_PyList_New_2 () > +{ > + PyObject *obj =3D PyList_New (0); > +} /* { dg-warning "leak of 'obj'" } */ > + > +PyObject *test_stray_incref_PyList () > +{ > + PyObject *p =3D PyList_New (2); > + if (p) > + Py_INCREF (p); > + return p; > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > +} > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromL= ong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c > new file mode 100644 > index 00000000000..97b29849302 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target analyzer } */ > +/* { dg-options "-fanalyzer" } */ > +/* { dg-require-python-h "" } */ > + > + > +#define PY_SSIZE_T_CLEAN > +#include > +#include "../analyzer/analyzer-decls.h" > + > +PyObject * > +test_PyLong_New (long n) > +{ > + PyObject *obj =3D PyLong_FromLong (n); > + if (obj) > + { > + __analyzer_eval (obj->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" }= */ > + __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" }= */ > + } > + else > + __analyzer_dump_path (); /* { dg-message "path" } */ > + return obj; > +} > + > +void > +test_PyLong_New_2 (long n) > +{ > + PyObject *obj =3D PyLong_FromLong (n); > +} /* { dg-warning "leak of 'obj'" } */ > + > +PyObject *test_stray_incref_PyLong (long val) > +{ > + PyObject *p =3D PyLong_FromLong (val); > + if (p) > + Py_INCREF (p); > + return p; > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > +} > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/te= stsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c > similarity index 100% > rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c > rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-check= ing.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c > new file mode 100644 > index 00000000000..9912f9105d4 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c > @@ -0,0 +1,78 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target analyzer } */ > +/* { dg-options "-fanalyzer" } */ > +/* { dg-require-python-h "" } */ > + > + > +#define PY_SSIZE_T_CLEAN > +#include > +#include "../analyzer/analyzer-decls.h" > + > +PyObject * > +test_PyListAppend (long n) > +{ > + PyObject *item =3D PyLong_FromLong (n); > + PyObject *list =3D PyList_New (0); > + PyList_Append(list, item); > + return list; /* { dg-warning "leak of 'item'" } */ > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > +} > + > +PyObject * > +test_PyListAppend_2 (long n) > +{ > + PyObject *item =3D PyLong_FromLong (n); > + if (!item) > + return NULL; > + > + __analyzer_eval (item->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" } *= / > + PyObject *list =3D PyList_New (n); > + if (!list) > + { > + Py_DECREF(item); > + return NULL; > + } > + > + __analyzer_eval (list->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" } *= / > + > + if (PyList_Append (list, item) < 0) > + __analyzer_eval (item->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" }= */ > + else > + __analyzer_eval (item->ob_refcnt =3D=3D 2); /* { dg-warning "TRUE" }= */ > + return list; /* { dg-warning "leak of 'item'" } */ > + /* { dg-warning "reference count" "" { target *-*-* } .-1 } */ > +} > + > + > +PyObject * > +test_PyListAppend_3 (PyObject *item, PyObject *list) > +{ > + PyList_Append (list, item); > + return list; > +} > + > +PyObject * > +test_PyListAppend_4 (long n) > +{ > + PyObject *item =3D PyLong_FromLong (n); > + PyObject *list =3D NULL; > + PyList_Append(list, item); > + return list; > +} > + > +PyObject * > +test_PyListAppend_5 () > +{ > + PyObject *list =3D PyList_New (0); > + PyList_Append(list, NULL); > + return list; > +} > + > +PyObject * > +test_PyListAppend_6 () > +{ > + PyObject *item =3D NULL; > + PyObject *list =3D NULL; > + PyList_Append(list, item); > + return list; > +} > \ No newline at end of file > diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.d= g/plugin/plugin.exp > index e1ed2d2589e..cbef6da8d86 100644 > --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp > +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp > @@ -161,8 +161,9 @@ set plugin_test_list [list \ > taint-CVE-2011-0521-6.c \ > taint-antipatterns-1.c } \ > { analyzer_cpython_plugin.c \ > - cpython-plugin-test-1.c \ > - cpython-plugin-test-2.c } \ > + cpython-plugin-test-PyList_Append.c \ > + cpython-plugin-test-PyList_New.c \ > + cpython-plugin-test-PyLong_FromLong.c } \ > ] > > foreach plugin_test $plugin_test_list { > -- > 2.30.2 >