From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id 5443A3858416 for ; Mon, 21 Aug 2023 14:06:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5443A3858416 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167073.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 37LE5Non003359 for ; Mon, 21 Aug 2023 10:06:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type : content-transfer-encoding; s=pps01; bh=yB/wZlSPNe5qUUooMTsJi8hyGc0n97vBBwBsMfVktL4=; b=pJW1pwMJckvII6FfZBrIky1/C3puFiFITQowvOm+tLn/rr84QRT/7SY+uW5/U9hjodzL bwKy999fB1srBgsLukCIBCL7hWXRVReawfHegFRRfW4SZXMtIkGFUjh0eU0zJO+fQOvM Li+fPKj+ykiwCRgAQx8+B+tlPiuR6DmuibXarKpaBkrpwWQSIXF+5KvnlvBuCiZfoeHo DuDpHqtIksOoEO2FKL+CTeDFW0alZOfebXu/owbOMSegeSUS4KFoxNUpAFmMPw6Dl237 EsD/sMbotln/X41ZrcTxbDgz6HTvY+xCfHrEL/a9qeLc6IlcdieepX3qYa9+cfSy1ODu wA== Received: from mail-oo1-f69.google.com (mail-oo1-f69.google.com [209.85.161.69]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3sjsmet2fr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 21 Aug 2023 10:06:09 -0400 Received: by mail-oo1-f69.google.com with SMTP id 006d021491bc7-570bac0d9ddso2222586eaf.0 for ; Mon, 21 Aug 2023 07:06:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692626768; x=1693231568; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yB/wZlSPNe5qUUooMTsJi8hyGc0n97vBBwBsMfVktL4=; b=DAaHFJ4Ae0r7KD/m7CG0PGFDyYn56xMUfG5m/w67b+RBkcriddWPRz2+tHgNnvy54u 9xLs3tUYu7Y4cvXk/hxUiMpMJpv82SHSwRP3GpcwdCA8dfWNcXAHjakDsl8uvFqKcZC+ gWRMF6BBtYuP5ZbiEgWUdxivKrwRB4F+DFay/GSSK/d/ISkxKq7nP2Ma3Uwm9JVmoEeF PVj7DJuQme9LpODZ7PboN8sX3bf1xAgTuj7qkj3MSLgiW5lKtK2kxJIFrLH/AlOJm7St wUkhI+gAd53xZEjGnwCOyFEGkcA2WYvu1ziNcWcH0axuWi06MPA3juUA7dmQFbeHYbHl ds1w== X-Gm-Message-State: AOJu0YyHhkzH4hhbLExp/RagrteG0uO1Opu80S8ShAvOeqsQOg8gzVj6 mrvVKlkIacxNeF6+gr2UPnJr6bpf+/rBH1fQNkqEDMA95+p8V0A88cMB77GWONbGLRLxbvgmPe7 QXZ9HqhAXhTUW4VCgMGb0 X-Received: by 2002:a05:6358:248b:b0:132:d42f:8e19 with SMTP id m11-20020a056358248b00b00132d42f8e19mr5833026rwc.31.1692626768554; Mon, 21 Aug 2023 07:06:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHd96JamilJhFZaArY3SO+7agW3rDynIUFZbBEO6kNPjnLKEAM6q0GOr3UuKFYbflE7iEyp3UgucxxLamfoXSY= X-Received: by 2002:a05:6358:248b:b0:132:d42f:8e19 with SMTP id m11-20020a056358248b00b00132d42f8e19mr5832984rwc.31.1692626768004; Mon, 21 Aug 2023 07:06:08 -0700 (PDT) MIME-Version: 1.0 References: <20230816191711.39268-1-ef2648@columbia.edu> <755975d612a449027a1f605d950161ef9b62a72f.camel@redhat.com> In-Reply-To: From: Eric Feng Date: Mon, 21 Aug 2023 10:05:56 -0400 Message-ID: Subject: Re: Update on CPython Extension Module -fanalyzer plugin development To: David Malcolm Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: -Efj2aVJ0pQ5GWurEyHWN3B81U65npPa X-Proofpoint-ORIG-GUID: -Efj2aVJ0pQ5GWurEyHWN3B81U65npPa X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-08-21_01,2023-08-18_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 clxscore=1015 phishscore=0 suspectscore=0 adultscore=0 mlxlogscore=999 impostorscore=10 bulkscore=10 spamscore=0 malwarescore=0 lowpriorityscore=10 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308210131 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Dave, Just wanted to give you and everyone else a short update on how reference count checking is going =E2=80=94 we can now observe the refcnt diagnostic being emitted: rc3.c:22:10: warning: REF COUNT PROBLEM 22 | return list; | ^~~~ =E2=80=98create_py_object=E2=80=99: events 1-4 | | 4 | PyObject* item =3D PyLong_FromLong(3); | | ^~~~~~~~~~~~~~~~~~ | | | | | (1) when =E2=80=98PyLong_FromLong=E2=80=99 = succeeds | 5 | PyObject* list =3D PyList_New(1); | | ~~~~~~~~~~~~~ | | | | | (2) when =E2=80=98PyList_New=E2=80=99 succe= eds |...... | 14 | PyList_Append(list, item); | | ~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (3) when =E2=80=98PyList_Append=E2=80=99 fails |...... | 22 | return list; | | ~~~~ | | | | | (4) here | I will fix up and refactor the logic for counting the actual ref count before coming back and refining the diagnostic to give much more detailed information. Best, Eric On Wed, Aug 16, 2023 at 9:47=E2=80=AFPM Eric Feng wro= te: > > Hi Dave, > > Thanks for the feedback! > > > On Wed, Aug 16, 2023 at 5:29=E2=80=AFPM David Malcolm wrote: > > > > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote: > > > Hi everyone, > > > > [fixing typo in my email address] > > > > Hi Eric, thanks for the update, and the WIP patch. > > > > > > > > After pushing the code that supports various known function classes l= ast week, > > > I've turned my attention back to the core reference count checking > > > functionality. This functionality used to reside in region_model, whi= ch > > > wasn't ideal. To address this, I've introduced a hook to register cal= lbacks > > > to pop_frame. Specifically, this allows the code that checks the refe= rence > > > count and emits diagnostics to be housed within the plugin, rather th= an the > > > core analyzer. > > > > > > As of now, the parameters of pop_frame_callback are tailored specific= ally to > > > our needs. If the use of callbacks at the end of pop_frame becomes mo= re > > > prevalent, we can revisit the setup to potentially make it more gener= al. > > > > > > Moreover, the core reference count checking logic was previously some= what > > > bloated, contained in one extensive function. I've since refactored i= t, > > > breaking it down into several helper functions to simplify and reduce > > > complexity. There are still some aspects that need refinement, especi= ally > > > since the plugin has seen changes since I last worked on this logic. = However, > > > I believe that there aren't any significant problems. > > > > Suggestion: introduce some more decls into analyzer-decls.h and > > known_functions for them into the plugin so that you can run/test/debug > > the helper functions independently (similar to the existing ones in kf- > > analyzer.cc). > > > > e.g. > > extern void __analyzer_cpython_dump_real_refcounts (void); > > extern void __analyzer_cpython_dump_ob_refcnt (void); > > > > > > Thanks for the suggestion. This will be even more helpful now that we > have split the logic into helper functions. I will look into these > when I come back to the "is there a refcount problem" side of the > equation. > > > Currently, I've started working a custom stmt_finder similar to leak_= stmt_finder > > > to address the issue of m_stmt and m_stmt_finder being NULL at the ti= me of > > > region_model::pop_frame. This approach was discussed as a viable solu= tion in > > > a previous email, and I'll keep everyone posted on my progress. After= wards, I > > > will go back to address the refinements necessary mentioned above. > > > > You might want to experiment with splitting out > > (a) "is there a refcount problem" from > > (b) "emit a refcount problem". > > > > For example, you could hardcode (a) to true, so we always complain with > > (b) on every heap-allocated object, just to debug the stmt_finder > > workaround. > > > > > > [...snip...] > > > > BTW, you don't need to bother to write ChangeLog entries if you're just > > sending a work-in-progress for me. > > > > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/= gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > > index 7cd72e8a886..918bb5a5587 100644 > > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > > > [...] > > > > > +/* For PyListObjects: processes the ob_item field within the current= region and > > > + * increments the reference count if conditions are met. */ > > > +void > > > +process_ob_item_region (const region_model *model, region_model_mana= ger *mgr, > > > + region_model_context *ctxt, const region *cur= r_region, > > > + const svalue *pylist_type_ptr, const region *= base_reg, > > > + int &actual_refcnt) > > > > You seem to be special-casing PyListObject here; why? That seems like > > it's not going to be scalable tothe general case. > > > > Am I right in thinking the intent of this code is to count the actual > > number of pointers in memory that point to a particular region? > > > > Doesn't the ob_item buffer show up in the store as another cluster? > > Can't you just look at the bindings in the clusters and tally up the > > pointers for each heap_allocated_region? (accumulating a result map > > from region to int of the actual reference counts). > Yes, you're correct, I hadn't thought of that ... This simplifies a > lot of things. Thanks for the great suggestion! > > > > Or am I missing something? > > > > What does > > model->debug (); > > show in your examples? > > > > > > > +{ > > > + tree ob_item_field_tree =3D get_field_by_name (pylistobj_record, "= ob_item"); > > > + const region *ob_item_field_reg > > > + =3D mgr->get_field_region (curr_region, ob_item_field_tree); > > > + const svalue *ob_item_ptr =3D model->get_store_value (ob_item_fiel= d_reg, ctxt); > > > + > > > + if (const auto &cast_ob_item_reg =3D ob_item_ptr->dyn_cast_region_= svalue ()) > > > + { > > > + const region *ob_item_reg =3D cast_ob_item_reg->get_pointee ()= ; > > > + const svalue *allocated_bytes =3D model->get_dynamic_extents (= ob_item_reg); > > > + const region *ob_item_sized =3D mgr->get_sized_region ( > > > + ob_item_reg, pyobj_ptr_ptr, allocated_bytes); > > > + const svalue *buffer_contents_sval > > > + =3D model->get_store_value (ob_item_sized, ctxt); > > > + > > > + if (const auto &buffer_contents > > > + =3D buffer_contents_sval->dyn_cast_compound_svalue ()) > > > + { > > > + for (const auto &buffer_content : buffer_contents->get_map = ()) > > > + { > > > + const auto &content_value =3D buffer_content.seco= nd; > > > + if (const auto &content_region > > > + =3D content_value->dyn_cast_region_svalue ()) > > > + if (content_region->get_pointee () =3D=3D= base_reg) > > > + actual_refcnt++; > > > + } > > > + } > > > + } > > > +} > > > + > > > +/* Counts the actual references from all clusters in the model's sto= re. */ > > > +int > > > +count_actual_references (const region_model *model, region_model_man= ager *mgr, > > > + region_model_context *ctxt, const region *ba= se_reg, > > > + const svalue *pylist_type_ptr, tree ob_type_= field) > > > +{ > > > + int actual_refcnt =3D 0; > > > + for (const auto &other_cluster : *model->get_store ()) > > > + { > > > + for (const auto &binding : other_cluster.second->get_map ()) > > > + { > > > + const auto &sval =3D binding.second; > > > + const auto &curr_region =3D sval->maybe_get_region (); > > > + > > > + if (!curr_region || curr_region->get_kind () !=3D RK_HEAP_A= LLOCATED) > > > + continue; > > > + > > > + increment_count_if_base_matches (curr_region, base_reg, > > > + actual_refcnt); > > > + > > > + const region *ob_type_region > > > + =3D mgr->get_field_region (curr_region, ob_type_field); > > > + const svalue *stored_sval > > > + =3D model->get_store_value (ob_type_region, ctxt); > > > + const auto &remove_cast =3D stored_sval->dyn_cast_unaryop_s= value (); > > > + > > > + if (!remove_cast) > > > + continue; > > > + > > > + const svalue *type =3D remove_cast->get_arg (); > > > + if (type =3D=3D pylist_type_ptr) > > > + process_ob_item_region (model, mgr, ctxt, curr_region, > > > + pylist_type_ptr, base_reg, actual= _refcnt); > > > + } > > > + } > > > + return actual_refcnt; > > > +} > > > > > > > Hope the above is constructive. > > Dave > > > > Best, > Eric