From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) by sourceware.org (Postfix) with ESMTPS id 37E613857C44 for ; Thu, 17 Aug 2023 01:47:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 37E613857C44 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167068.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37H1ZhsG032428 for ; Wed, 16 Aug 2023 21:47:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type : content-transfer-encoding; s=pps01; bh=DwR+R4pYI9Z1pmM/uUmcfzbUW72plZoBdCo3IelnDNc=; b=fGgFSRWOTD8VYfKe5yAVG70HwaWn1TkS35MYhcNJJDrDRtr/i3Sf8oywiDzWeXU3A3zN ZVhp/1SEk2YaXF5YfNhytVU5W5uLNG6wCV/QS5k6KfY2TDhE6KYE9mlvY1Bl5Dh3OncQ VOWcWGHeACVsAncR5QHFShTzaQHmfENSeD+1YQjH1HlNbObfcSOJz2LgcAjz3z6QJZlN L4sjR3xY36/TuJIOwKzy1bcXK6ZcS66T/kui4ZebS7walCGSn9h5f2nNYjvIk2xMaeuY CmY2etI9a7PNkRWdVGwPJb4f8FEQsenao+O2T/l19by5bqDeTifOqEe85jq6Bm8HCC3v NA== Received: from mail-ua1-f69.google.com (mail-ua1-f69.google.com [209.85.222.69]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 3sgy0b570b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 16 Aug 2023 21:47:53 -0400 Received: by mail-ua1-f69.google.com with SMTP id a1e0cc1a2514c-79b02ab5194so1836003241.1 for ; Wed, 16 Aug 2023 18:47:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692236872; x=1692841672; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DwR+R4pYI9Z1pmM/uUmcfzbUW72plZoBdCo3IelnDNc=; b=gPM0tMX3a/q2MZMq/FM9TnYTq6vHOV+CdOqNGOR3F19FAyjfLSaTIwGUScF2TsvEcp OBuCvzF3nTO2lXb1IXxlMaOZteBSwCL+LtxibAFAoqlwZrgm0WrpkesP8nLtlYD5C8zx p9eeDjNwHlZKwCtcifSMq4Hfhhx1/cC+lYQWNqkSvre8xnmiSzNCPof4JzBsCtpCXuG8 MmulP8xIFVQe5M2RHlRXBV3T65JQamghr6CMp1QLCjhuH4F+vZ7DlrWZpstaHyWU1FWS UspmeBf7eICTlRG8LJgpOF+patSeALwfKI/H5lP3tsSNUvTQlNGRp5z6C72cOgRaRamE S0sA== X-Gm-Message-State: AOJu0YwZ07ZkQ/k0DlHUlCqCl9xL13/hDW+3mF0yQqiUrkovOxs+2LE0 I2umGSOJm7WKBEgkAXmigId8BHVXYhy3MH5e87/hbpSQ/eWH4by8x72ieHtpKZ6ZPihEK+l+l04 +JX0Yn3ZiMru2AYdIcO2KIJAItyBc X-Received: by 2002:a1f:e1c2:0:b0:487:2bd4:f93e with SMTP id y185-20020a1fe1c2000000b004872bd4f93emr2767324vkg.10.1692236872212; Wed, 16 Aug 2023 18:47:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFsZbBRgwKnqKtD8gUXhWOfXrhFi4E51G42xpF6gwrMqGG5jeXy6PhqSOgLKfWoBJiCLiVQrvSjSWRu7sVbTcE= X-Received: by 2002:a1f:e1c2:0:b0:487:2bd4:f93e with SMTP id y185-20020a1fe1c2000000b004872bd4f93emr2767315vkg.10.1692236871789; Wed, 16 Aug 2023 18:47:51 -0700 (PDT) MIME-Version: 1.0 References: <20230816191711.39268-1-ef2648@columbia.edu> <755975d612a449027a1f605d950161ef9b62a72f.camel@redhat.com> In-Reply-To: <755975d612a449027a1f605d950161ef9b62a72f.camel@redhat.com> From: Eric Feng Date: Wed, 16 Aug 2023 21:47:41 -0400 Message-ID: Subject: Re: Update on CPython Extension Module -fanalyzer plugin development To: David Malcolm Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: x_3qWCWM5yM_awRzYAsozyfXVxOnsuzQ X-Proofpoint-ORIG-GUID: x_3qWCWM5yM_awRzYAsozyfXVxOnsuzQ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-08-16_21,2023-08-15_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=10 phishscore=0 spamscore=0 suspectscore=0 mlxscore=0 adultscore=0 impostorscore=10 priorityscore=1501 mlxlogscore=999 bulkscore=10 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308170013 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Dave, Thanks for the feedback! On Wed, Aug 16, 2023 at 5:29=E2=80=AFPM David Malcolm = wrote: > > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote: > > Hi everyone, > > [fixing typo in my email address] > > Hi Eric, thanks for the update, and the WIP patch. > > > > > After pushing the code that supports various known function classes las= t week, > > I've turned my attention back to the core reference count checking > > functionality. This functionality used to reside in region_model, which > > wasn't ideal. To address this, I've introduced a hook to register callb= acks > > to pop_frame. Specifically, this allows the code that checks the refere= nce > > count and emits diagnostics to be housed within the plugin, rather than= the > > core analyzer. > > > > As of now, the parameters of pop_frame_callback are tailored specifical= ly to > > our needs. If the use of callbacks at the end of pop_frame becomes more > > prevalent, we can revisit the setup to potentially make it more general= . > > > > Moreover, the core reference count checking logic was previously somewh= at > > bloated, contained in one extensive function. I've since refactored it, > > breaking it down into several helper functions to simplify and reduce > > complexity. There are still some aspects that need refinement, especial= ly > > since the plugin has seen changes since I last worked on this logic. Ho= wever, > > I believe that there aren't any significant problems. > > Suggestion: introduce some more decls into analyzer-decls.h and > known_functions for them into the plugin so that you can run/test/debug > the helper functions independently (similar to the existing ones in kf- > analyzer.cc). > > e.g. > extern void __analyzer_cpython_dump_real_refcounts (void); > extern void __analyzer_cpython_dump_ob_refcnt (void); > > > Thanks for the suggestion. This will be even more helpful now that we have split the logic into helper functions. I will look into these when I come back to the "is there a refcount problem" side of the equation. > > Currently, I've started working a custom stmt_finder similar to leak_st= mt_finder > > to address the issue of m_stmt and m_stmt_finder being NULL at the time= of > > region_model::pop_frame. This approach was discussed as a viable soluti= on in > > a previous email, and I'll keep everyone posted on my progress. Afterwa= rds, I > > will go back to address the refinements necessary mentioned above. > > You might want to experiment with splitting out > (a) "is there a refcount problem" from > (b) "emit a refcount problem". > > For example, you could hardcode (a) to true, so we always complain with > (b) on every heap-allocated object, just to debug the stmt_finder > workaround. > > > [...snip...] > > BTW, you don't need to bother to write ChangeLog entries if you're just > sending a work-in-progress for me. > > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gc= c/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > index 7cd72e8a886..918bb5a5587 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > [...] > > > +/* For PyListObjects: processes the ob_item field within the current r= egion and > > + * increments the reference count if conditions are met. */ > > +void > > +process_ob_item_region (const region_model *model, region_model_manage= r *mgr, > > + region_model_context *ctxt, const region *curr_= region, > > + const svalue *pylist_type_ptr, const region *ba= se_reg, > > + int &actual_refcnt) > > You seem to be special-casing PyListObject here; why? That seems like > it's not going to be scalable tothe general case. > > Am I right in thinking the intent of this code is to count the actual > number of pointers in memory that point to a particular region? > > Doesn't the ob_item buffer show up in the store as another cluster? > Can't you just look at the bindings in the clusters and tally up the > pointers for each heap_allocated_region? (accumulating a result map > from region to int of the actual reference counts). Yes, you're correct, I hadn't thought of that ... This simplifies a lot of things. Thanks for the great suggestion! > > Or am I missing something? > > What does > model->debug (); > show in your examples? > > > > +{ > > + tree ob_item_field_tree =3D get_field_by_name (pylistobj_record, "ob= _item"); > > + const region *ob_item_field_reg > > + =3D mgr->get_field_region (curr_region, ob_item_field_tree); > > + const svalue *ob_item_ptr =3D model->get_store_value (ob_item_field_= reg, ctxt); > > + > > + if (const auto &cast_ob_item_reg =3D ob_item_ptr->dyn_cast_region_sv= alue ()) > > + { > > + const region *ob_item_reg =3D cast_ob_item_reg->get_pointee (); > > + const svalue *allocated_bytes =3D model->get_dynamic_extents (ob= _item_reg); > > + const region *ob_item_sized =3D mgr->get_sized_region ( > > + ob_item_reg, pyobj_ptr_ptr, allocated_bytes); > > + const svalue *buffer_contents_sval > > + =3D model->get_store_value (ob_item_sized, ctxt); > > + > > + if (const auto &buffer_contents > > + =3D buffer_contents_sval->dyn_cast_compound_svalue ()) > > + { > > + for (const auto &buffer_content : buffer_contents->get_map ()= ) > > + { > > + const auto &content_value =3D buffer_content.second= ; > > + if (const auto &content_region > > + =3D content_value->dyn_cast_region_svalue ()) > > + if (content_region->get_pointee () =3D=3D b= ase_reg) > > + actual_refcnt++; > > + } > > + } > > + } > > +} > > + > > +/* Counts the actual references from all clusters in the model's store= . */ > > +int > > +count_actual_references (const region_model *model, region_model_manag= er *mgr, > > + region_model_context *ctxt, const region *base= _reg, > > + const svalue *pylist_type_ptr, tree ob_type_fi= eld) > > +{ > > + int actual_refcnt =3D 0; > > + for (const auto &other_cluster : *model->get_store ()) > > + { > > + for (const auto &binding : other_cluster.second->get_map ()) > > + { > > + const auto &sval =3D binding.second; > > + const auto &curr_region =3D sval->maybe_get_region (); > > + > > + if (!curr_region || curr_region->get_kind () !=3D RK_HEAP_ALL= OCATED) > > + continue; > > + > > + increment_count_if_base_matches (curr_region, base_reg, > > + actual_refcnt); > > + > > + const region *ob_type_region > > + =3D mgr->get_field_region (curr_region, ob_type_field); > > + const svalue *stored_sval > > + =3D model->get_store_value (ob_type_region, ctxt); > > + const auto &remove_cast =3D stored_sval->dyn_cast_unaryop_sva= lue (); > > + > > + if (!remove_cast) > > + continue; > > + > > + const svalue *type =3D remove_cast->get_arg (); > > + if (type =3D=3D pylist_type_ptr) > > + process_ob_item_region (model, mgr, ctxt, curr_region, > > + pylist_type_ptr, base_reg, actual_r= efcnt); > > + } > > + } > > + return actual_refcnt; > > +} > > > > Hope the above is constructive. > Dave > Best, Eric