From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id 114B03858426 for ; Fri, 25 Aug 2023 12:51:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 114B03858426 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167076.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37PCZ6bm024878 for ; Fri, 25 Aug 2023 08:51:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=pps01; bh=LfmFHF7NvL2nNdl6++Cgop3InG8UWuGN2M1jxrLjf4E=; b=AL9CGFAslAlo7qM3SZw4kFSionog2kYY3amdbQ3cCeT60tvzvaksSCbcPDygpNoaFzJJ NShQFfmu2GfqgTJhaKAXaC/R2/GtFLZx9xTKXRWXNbTky995pPHcbHBjKTDdtmq97m8M u+jKQ+PsXgiOaqtacfL3iQBOm9IjG4+1LaQZStgjxACr5RuSByy7+yrwG27/89piNFpH 84bMwunOhtj1JA7tFZFllxuUJQFiJLPftDKaIvS19WDrSZJu2j/Rlpvqp6IalinyLrDK RcXUPrRXJP48Yb/nmaK5VoDSSp95xSc74HfGGZ8QaOXWZqPkQ82XKEmGo3WFzhhAJQGc 2w== Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3spmgx1w6j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 25 Aug 2023 08:51:31 -0400 Received: by mail-qk1-f198.google.com with SMTP id af79cd13be357-76eccbec834so100162285a.0 for ; Fri, 25 Aug 2023 05:51:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692967890; x=1693572690; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LfmFHF7NvL2nNdl6++Cgop3InG8UWuGN2M1jxrLjf4E=; b=Zdipo4sOiX11cXsnurUj3pPZWVOiXQhUJscgdddsXY9UMTgh0xMsZZDdM+gQbrYD7g sjgPKSDw/ILspCx5fY/787geP7K7PMo8Trr3EzeLOwPJ0XAES5PUwckwDqPTmVLp0LN7 oYxm0mHdc+MbNt1w/ULMczs2okrasJiPVsxG+binA5e4PatITCECnTDtoLRK1eGvWtBj XpO/zPZ24om32FrxMaQNnB9nImMo5tUc+5Z9JiT2wVMINA9BeyUkGZyo4MFGhdb2KQav NPo1deRnJpUJch26F8GHjONEK2Ips56Aiwda5xdtOO636bhpxTbGy2jXugKbf9w+Bp8E lAmA== X-Gm-Message-State: AOJu0Yw4KVOFM394uhQBPdOR9FE/fJK41RciNiIwvNR3rWdkIqLkUzo/ jBbj/q7DcTD29rlDn4sPCsL4TvSd2MFwjh4GH/HL4v9c6d3EPdRv4U/CkIWsRvtm6VtdTllY5hf AosyN X-Received: by 2002:a05:620a:e9e:b0:76e:ef17:d37e with SMTP id w30-20020a05620a0e9e00b0076eef17d37emr4834315qkm.71.1692967890280; Fri, 25 Aug 2023 05:51:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFnr7TZrztAyEuk3u7yjb3wbXVWwVisM+UKtjyp0qK2RpQGPfuhLYlJ1QshYJOmajiRIJ8BAg== X-Received: by 2002:a05:620a:e9e:b0:76e:ef17:d37e with SMTP id w30-20020a05620a0e9e00b0076eef17d37emr4834300qkm.71.1692967889850; Fri, 25 Aug 2023 05:51:29 -0700 (PDT) Received: from Ericcs-MBP.cable.rcn.com (207-38-164-63.s9254.c3-0.43d-cbr1.qens-43d.ny.cable.rcncustomer.com. [207.38.164.63]) by smtp.gmail.com with ESMTPSA id b14-20020a05620a126e00b00767d4a3f4d9sm509556qkl.29.2023.08.25.05.51.29 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 25 Aug 2023 05:51:29 -0700 (PDT) From: Eric Feng To: dmalcolm@redhat.com Cc: gcc@gcc.gnu.org, Eric Feng Subject: Update on CPython Extension Module -fanalyzer plugin development Date: Fri, 25 Aug 2023 08:50:56 -0400 Message-Id: <20230825125056.99826-1-ef2648@columbia.edu> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Proofpoint-GUID: KLQ1iR9ze-XQoUSjxLrI4abD0wwHdmvm X-Proofpoint-ORIG-GUID: KLQ1iR9ze-XQoUSjxLrI4abD0wwHdmvm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-08-25_10,2023-08-25_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 adultscore=0 mlxlogscore=999 mlxscore=0 suspectscore=0 clxscore=1015 spamscore=0 priorityscore=1501 phishscore=0 malwarescore=0 impostorscore=10 bulkscore=10 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2308250113 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Dave, Please find an updated WIP patch on reference count checking below. Some parts aren't properly formatted yet; I apologize for that. Since the last WIP patch, the major updates include: - Updated certain areas of the core analyzer to support custom stmt_finder. - A significant revamp of the reference count checking logic. - __analyzer_cpython_dump_refcounts: This dumps reference count related information. Here's an updated look at the current WIP diagnostic we issue: rc3.c:25:10: warning: Expected to have reference count: ‘1’ but ob_refcnt field is: ‘2’ 25 | return list; | ^~~~ ‘create_py_object’: events 1-4 | | 4 | PyObject* item = PyLong_FromLong(3); | | ^~~~~~~~~~~~~~~~~~ | | | | | (1) when ‘PyLong_FromLong’ succeeds | 5 | PyObject* list = PyList_New(1); | | ~~~~~~~~~~~~~ | | | | | (2) when ‘PyList_New’ succeeds |...... | 16 | PyList_Append(list, item); | | ~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (3) when ‘PyList_Append’ succeeds, moving buffer |...... | 25 | return list; | | ~~~~ | | | | | (4) here | The reference count checking logic for v1 should be almost complete. Currently, it supports situations where the returned object is newly created. It doesn't yet support the other case (i.e., incremented by 1 from what it was previously, if not newly created). This week, I've focused primarily on the reference count checking logic. I plan to shift my focus to refining the diagnostic next. As seen in the placeholder diagnostic message above, I believe we should at least inform the user about the variable name associated with the region that has an unexpected reference count issue (in this case, 'item'). Initially, I suspect the issue might be that: tree reg_tree = model->get_representative_tree (curr_region); returns NULL since curr_region is heap allocated and thus the path_var returned would be: path_var (NULL_TREE, 0); Which means that: refcnt_stmt_finder finder (*eg, reg_tree); always receives a NULL_TREE, causing it to always default to the not_found case. A workaround might be necessary, but I haven't delved too deeply into this yet, so my suspicion could be off. Additionally, I think it would be helpful to show users what the ob_refcnt looks like in each event as well. I'll keep you updated on both these points and welcome any feedback. Best, Eric --- gcc/analyzer/engine.cc | 8 +- gcc/analyzer/exploded-graph.h | 4 +- gcc/analyzer/region-model.cc | 3 + gcc/analyzer/region-model.h | 38 +- .../gcc.dg/plugin/analyzer_cpython_plugin.c | 376 +++++++++++++++++- 5 files changed, 421 insertions(+), 8 deletions(-) diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc index 61685f43fba..f9e239128a0 100644 --- a/gcc/analyzer/engine.cc +++ b/gcc/analyzer/engine.cc @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state, } bool -impl_region_model_context::warn (std::unique_ptr d) +impl_region_model_context::warn (std::unique_ptr d, + const stmt_finder *custom_finder) { LOG_FUNC (get_logger ()); - if (m_stmt == NULL && m_stmt_finder == NULL) + auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder; + if (m_stmt == NULL && curr_stmt_finder == NULL) { if (get_logger ()) get_logger ()->log ("rejecting diagnostic: no stmt"); @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr d) bool terminate_path = d->terminate_path_p (); if (m_eg->get_diagnostic_manager ().add_diagnostic (m_enode_for_diag, m_enode_for_diag->get_supernode (), - m_stmt, m_stmt_finder, std::move (d))) + m_stmt, curr_stmt_finder, std::move (d))) { if (m_path_ctxt && terminate_path diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h index 4a4ef9d12b4..633f8c263fc 100644 --- a/gcc/analyzer/exploded-graph.h +++ b/gcc/analyzer/exploded-graph.h @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context uncertainty_t *uncertainty, logger *logger = NULL); - bool warn (std::unique_ptr d) final override; + bool warn (std::unique_ptr d, + const stmt_finder *custom_finder = NULL) final override; void add_note (std::unique_ptr pn) final override; void on_svalue_leak (const svalue *) override; void on_liveness_change (const svalue_set &live_svalues, @@ -106,6 +107,7 @@ class impl_region_model_context : public region_model_context std::unique_ptr *out_sm_context) override; const gimple *get_stmt () const override { return m_stmt; } + const exploded_graph *get_eg () const override { return m_eg; } exploded_graph *m_eg; log_user m_logger; diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc index 494a9cdf149..18cea279e53 100644 --- a/gcc/analyzer/region-model.cc +++ b/gcc/analyzer/region-model.cc @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3. If not see namespace ana { +auto_vec region_model::pop_frame_callbacks; + /* Dump T to PP in language-independent form, for debugging/logging/dumping purposes. */ @@ -4813,6 +4815,7 @@ region_model::pop_frame (tree result_lvalue, } unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK); + notify_on_pop_frame (this, retval, ctxt); } /* Get the number of frames in this region_model's stack. */ diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h index 4f09f2e585a..fd99b987a69 100644 --- a/gcc/analyzer/region-model.h +++ b/gcc/analyzer/region-model.h @@ -236,6 +236,10 @@ public: struct append_regions_cb_data; +typedef void (*pop_frame_callback) (const region_model *model, + const svalue *retval, + region_model_context *ctxt); + /* A region_model encapsulates a representation of the state of memory, with a tree of regions, along with their associated values. The representation is graph-like because values can be pointers to @@ -505,6 +509,20 @@ class region_model void check_for_null_terminated_string_arg (const call_details &cd, unsigned idx); + static void + register_pop_frame_callback (const pop_frame_callback &callback) + { + pop_frame_callbacks.safe_push (callback); + } + + static void + notify_on_pop_frame (const region_model *model, const svalue *retval, + region_model_context *ctxt) + { + for (auto &callback : pop_frame_callbacks) + callback (model, retval, ctxt); + } + private: const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const; const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const; @@ -592,6 +610,7 @@ private: tree callee_fndecl, region_model_context *ctxt) const; + static auto_vec pop_frame_callbacks; /* Storing this here to avoid passing it around everywhere. */ region_model_manager *const m_mgr; @@ -620,8 +639,15 @@ class region_model_context { public: /* Hook for clients to store pending diagnostics. - Return true if the diagnostic was stored, or false if it was deleted. */ - virtual bool warn (std::unique_ptr d) = 0; + Return true if the diagnostic was stored, or false if it was deleted. + Optionally provide a custom stmt_finder. */ + virtual bool warn(std::unique_ptr d) { + return warn(std::move(d), nullptr); + } + + virtual bool warn(std::unique_ptr d, const stmt_finder *custom_finder) { + return false; + } /* Hook for clients to add a note to the last previously stored pending diagnostic. */ @@ -724,6 +750,8 @@ class region_model_context /* Get the current statement, if any. */ virtual const gimple *get_stmt () const = 0; + + virtual const exploded_graph *get_eg () const = 0; }; /* A "do nothing" subclass of region_model_context. */ @@ -778,6 +806,7 @@ public: } const gimple *get_stmt () const override { return NULL; } + const exploded_graph *get_eg () const override { return NULL; } }; /* A subclass of region_model_context for determining if operations fail @@ -912,6 +941,11 @@ class region_model_context_decorator : public region_model_context return m_inner->get_stmt (); } + const exploded_graph *get_eg () const override + { + return m_inner->get_eg (); + } + protected: region_model_context_decorator (region_model_context *inner) : m_inner (inner) diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c index 7cd72e8a886..a3274ced4a8 100644 --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c @@ -44,6 +44,7 @@ #include "analyzer/region-model.h" #include "analyzer/call-details.h" #include "analyzer/call-info.h" +#include "analyzer/exploded-graph.h" #include "make-unique.h" int plugin_is_GPL_compatible; @@ -191,6 +192,372 @@ public: } }; +/* This is just a copy of leak_stmt_finder for now (subject to change if + * necssary) */ + +class refcnt_stmt_finder : public stmt_finder +{ +public: + refcnt_stmt_finder (const exploded_graph &eg, tree var) + : m_eg (eg), m_var (var) + { + } + + std::unique_ptr + clone () const final override + { + return make_unique (m_eg, m_var); + } + + const gimple * + find_stmt (const exploded_path &epath) final override + { + logger *const logger = m_eg.get_logger (); + LOG_FUNC (logger); + + if (m_var && TREE_CODE (m_var) == SSA_NAME) + { + /* Locate the final write to this SSA name in the path. */ + const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var); + + int idx_of_def_stmt; + bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt); + if (!found) + goto not_found; + + /* What was the next write to the underlying var + after the SSA name was set? (if any). */ + + for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length (); + ++idx) + { + const exploded_edge *eedge = epath.m_edges[idx]; + if (logger) + logger->log ("eedge[%i]: EN %i -> EN %i", idx, + eedge->m_src->m_index, + eedge->m_dest->m_index); + const exploded_node *dst_node = eedge->m_dest; + const program_point &dst_point = dst_node->get_point (); + const gimple *stmt = dst_point.get_stmt (); + if (!stmt) + continue; + if (const gassign *assign = dyn_cast (stmt)) + { + tree lhs = gimple_assign_lhs (assign); + if (TREE_CODE (lhs) == SSA_NAME + && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var)) + return assign; + } + } + } + + not_found: + + /* Look backwards for the first statement with a location. */ + int i; + const exploded_edge *eedge; + FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge) + { + if (logger) + logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index, + eedge->m_dest->m_index); + const exploded_node *dst_node = eedge->m_dest; + const program_point &dst_point = dst_node->get_point (); + const gimple *stmt = dst_point.get_stmt (); + if (stmt) + if (get_pure_location (stmt->location) != UNKNOWN_LOCATION) + return stmt; + } + + gcc_unreachable (); + return NULL; + } + +private: + const exploded_graph &m_eg; + tree m_var; +}; + +class refcnt_mismatch : public pending_diagnostic_subclass +{ +public: + refcnt_mismatch (const region *base_region, + const svalue *ob_refcnt, + const svalue *actual_refcnt, + tree reg_tree) + : m_base_region (base_region), m_ob_refcnt (ob_refcnt), + m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree) + { + } + + const char * + get_kind () const final override + { + return "refcnt_mismatch"; + } + + bool + operator== (const refcnt_mismatch &other) const + { + return (m_base_region == other.m_base_region + && m_ob_refcnt == other.m_ob_refcnt + && m_actual_refcnt == other.m_actual_refcnt); + } + + int get_controlling_option () const final override + { + return 0; + } + + bool + emit (rich_location *rich_loc, logger *) final override + { + diagnostic_metadata m; + bool warned; + // just assuming constants for now + auto actual_refcnt + = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); + auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant (); + warned = warning_meta ( + rich_loc, m, get_controlling_option (), + "Expected to have " + "reference count: %qE but ob_refcnt field is: %qE", + actual_refcnt, ob_refcnt); + + // location_t loc = rich_loc->get_loc (); + // foo (loc); + return warned; + } + + void mark_interesting_stuff (interesting_t *interest) final override + { + if (m_base_region) + interest->add_region_creation (m_base_region); + } + +private: + + void foo(location_t loc) const + { + inform(loc, "something is up right here"); + } + const region *m_base_region; + const svalue *m_ob_refcnt; + const svalue *m_actual_refcnt; + tree m_reg_tree; +}; + +/* Retrieves the svalue associated with the ob_refcnt field of the base region. + */ +const svalue * +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model, + region_model_context *ctxt) +{ + region_model_manager *mgr = model->get_manager (); + tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt"); + const region *ob_refcnt_region + = mgr->get_field_region (base_reg, ob_refcnt_tree); + const svalue *ob_refcnt_sval + = model->get_store_value (ob_refcnt_region, ctxt); + return ob_refcnt_sval; +} + +void +increment_region_refcnt (hash_map &map, const region *key) +{ + bool existed; + auto &refcnt = map.get_or_insert (key, &existed); + refcnt = existed ? refcnt + 1 : 1; +} + + +/* Recursively fills in region_to_refcnt with the references owned by + pyobj_ptr_sval. */ +void +count_expected_pyobj_references (const region_model *model, + hash_map ®ion_to_refcnt, + const svalue *pyobj_ptr_sval, + hash_set &seen) +{ + if (!pyobj_ptr_sval) + return; + + const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue (); + const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue (); + if (!pyobj_region_sval && !pyobj_initial_sval) + return; + + // todo: support initial sval (e.g passed in as parameter) + if (pyobj_initial_sval) + { + // increment_region_refcnt (region_to_refcnt, + // pyobj_initial_sval->get_region ()); + return; + } + + const region *pyobj_region = pyobj_region_sval->get_pointee (); + if (!pyobj_region || seen.contains (pyobj_region)) + return; + + seen.add (pyobj_region); + + if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree) + increment_region_refcnt (region_to_refcnt, pyobj_region); + + const auto *curr_store = model->get_store (); + const auto *retval_cluster = curr_store->get_cluster (pyobj_region); + if (!retval_cluster) + return; + + const auto &retval_binding_map = retval_cluster->get_map (); + + for (const auto &binding : retval_binding_map) + { + const svalue *binding_sval = binding.second; + const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable (); + const region *pointee = unwrapped_sval->maybe_get_region (); + + if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED) + count_expected_pyobj_references (model, region_to_refcnt, binding_sval, + seen); + } +} + +/* Compare ob_refcnt field vs the actual reference count of a region */ +void +check_refcnt (const region_model *model, region_model_context *ctxt, + const hash_map::iterator::reference_pair region_refcnt) +{ + region_model_manager *mgr = model->get_manager (); + const auto &curr_region = region_refcnt.first; + const auto &actual_refcnt = region_refcnt.second; + const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt); + const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( + ob_refcnt_sval->get_type (), actual_refcnt); + + if (ob_refcnt_sval != actual_refcnt_sval) + { + // todo: fix this (always null) + tree reg_tree = model->get_representative_tree (curr_region); + + const auto &eg = ctxt->get_eg (); + refcnt_stmt_finder finder (*eg, reg_tree); + auto pd = make_unique (curr_region, ob_refcnt_sval, + actual_refcnt_sval, reg_tree); + if (pd && eg) + ctxt->warn (std::move (pd), &finder); + } +} + +void +check_refcnts (const region_model *model, const svalue *retval, + region_model_context *ctxt, + hash_map ®ion_to_refcnt) +{ + for (const auto ®ion_refcnt : region_to_refcnt) + { + check_refcnt(model, ctxt, region_refcnt); + } +} + +/* Validates the reference count of all Python objects. */ +void +pyobj_refcnt_checker (const region_model *model, const svalue *retval, + region_model_context *ctxt) +{ + if (!ctxt) + return; + + auto region_to_refcnt = hash_map (); + auto seen_regions = hash_set (); + + count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions); + check_refcnts (model, retval, ctxt, region_to_refcnt); +} + +/* Counts the actual pyobject references from all clusters in the model's + * store. */ +void +count_all_references (const region_model *model, + hash_map ®ion_to_refcnt) +{ + for (const auto &cluster : *model->get_store ()) + { + auto curr_region = cluster.first; + if (curr_region->get_kind () != RK_HEAP_ALLOCATED) + continue; + + increment_region_refcnt (region_to_refcnt, curr_region); + + auto binding_cluster = cluster.second; + for (const auto &binding : binding_cluster->get_map ()) + { + const svalue *binding_sval = binding.second; + + const svalue *unwrapped_sval + = binding_sval->unwrap_any_unmergeable (); + // if (unwrapped_sval->get_type () != pyobj_ptr_tree) + // continue; + + const region *pointee = unwrapped_sval->maybe_get_region (); + if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED) + continue; + + increment_region_refcnt (region_to_refcnt, pointee); + } + } +} + +void +dump_refcnt_info (const hash_map ®ion_to_refcnt, + const region_model *model, region_model_context *ctxt) +{ + region_model_manager *mgr = model->get_manager (); + pretty_printer pp; + pp_format_decoder (&pp) = default_tree_printer; + pp_show_color (&pp) = pp_show_color (global_dc->printer); + pp.buffer->stream = stderr; + + for (const auto ®ion_refcnt : region_to_refcnt) + { + auto region = region_refcnt.first; + auto actual_refcnt = region_refcnt.second; + const svalue *ob_refcnt_sval + = retrieve_ob_refcnt_sval (region, model, ctxt); + const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst ( + ob_refcnt_sval->get_type (), actual_refcnt); + + region->dump_to_pp (&pp, true); + pp_string (&pp, " — ob_refcnt: "); + ob_refcnt_sval->dump_to_pp (&pp, true); + pp_string (&pp, " actual refcnt: "); + actual_refcnt_sval->dump_to_pp (&pp, true); + pp_newline (&pp); + } + pp_string (&pp, "~~~~~~~~\n"); + pp_flush (&pp); +} + +class kf_analyzer_cpython_dump_refcounts : public known_function +{ +public: + bool matches_call_types_p (const call_details &cd) const final override + { + return cd.num_args () == 0; + } + void impl_call_pre (const call_details &cd) const final override + { + region_model_context *ctxt = cd.get_ctxt (); + if (!ctxt) + return; + region_model *model = cd.get_model (); + auto region_to_refcnt = hash_map (); + count_all_references(model, region_to_refcnt); + dump_refcnt_info(region_to_refcnt, model, ctxt); + } +}; + /* Some concessions were made to simplify the analysis process when comparing kf_PyList_Append with the real implementation. In particular, PyList_Append performs some @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */) iface->register_known_function ("PyList_New", make_unique ()); iface->register_known_function ("PyLong_FromLong", make_unique ()); + + iface->register_known_function ( + "__analyzer_cpython_dump_refcounts", + make_unique ()); } } // namespace ana @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info, const char *plugin_name = plugin_info->base_name; if (0) inform (input_location, "got here; %qs", plugin_name); - ana::register_finish_translation_unit_callback (&stash_named_types); - ana::register_finish_translation_unit_callback (&stash_global_vars); + register_finish_translation_unit_callback (&stash_named_types); + register_finish_translation_unit_callback (&stash_global_vars); + region_model::register_pop_frame_callback(pyobj_refcnt_checker); register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT, ana::cpython_analyzer_init_cb, NULL); /* void *user_data */ -- 2.30.2