From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id C44EE3858D3C for ; Mon, 11 Sep 2023 02:12:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C44EE3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167076.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38B158Mp017474 for ; Sun, 10 Sep 2023 22:12:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=pps01; bh=6T1QFHHhV+Glw/pGJOSjVNASpcFSXRTSbXUD1FsODhU=; b=Mmat7Ow5mrqxeptdnB/D8xPbWtLBMo/3j/aFjeXjI6cx4LihGBjUWb+T3CsQFEEWJajn 0YiT2Rl1F9EP6uBRZhbmbWO8lH3+9kmhsH7iMQbDHGROK/ywljDmvwKY/l8uRgWeia9V ddHFK/1jkaRwZo9A7xLNMIGmFoKeyJnzv4ZTIidLQZ9XH1YEniN6WFbL6V3tiYRT8fPx 3xE/DmIAbD6m4+8mW3FBGBaSabYuEWw6v53h2SOrBQj8RTIYvKYAL+DzrAVdKzwYCqr4 y5cIJXMprDKkEGbVeI8al1WwSmNFJsM5HDFrvTcW0ji/8lXrJXaX7PDJ6U0vTzJ1op7H CA== Received: from mail-vk1-f199.google.com (mail-vk1-f199.google.com [209.85.221.199]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3t16bkm9v2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sun, 10 Sep 2023 22:12:40 -0400 Received: by mail-vk1-f199.google.com with SMTP id 71dfb90a1353d-490e62cf3f7so1547997e0c.2 for ; Sun, 10 Sep 2023 19:12:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694398360; x=1695003160; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6T1QFHHhV+Glw/pGJOSjVNASpcFSXRTSbXUD1FsODhU=; b=oYkblvgI1UyMZlIpFAnjzA31kgjWacqxjgOMcqVnRbYbien7Iko4IbMnLAHX4CBi2z MP7B4K+vIVcdbpzj1O63sQTxPKPLkCZiSUN8XwPk455eLKKrzzC8fSUoTF6Bsbx17PBC Z1bfJGUBSIy+Nz5f06Kns2/kKeb5wGNOXnjFGoxQ1Dt0BMqgKr3IN0fFe30ZKjNZCHwl tih5AaMcxyMwuuqRpHJhDdKBVh+ZJb/HEMq0u3y7WQlqawstNw4kQz8S/lWANOcQc4Ug 2C+h2gWL7kNmXnaWYVcNqR1aq5FMuzH5VpT7lE9CUQzumPPVMROGDyD652WWAnfupSYG Yq+A== X-Gm-Message-State: AOJu0YyfAPlpKsmG1HwEzI3m65gbvlQaabjF3SHH3sKNhRNdqtAtuZTK 9aFlzPR6DNpW6n+q/4F6SrMwofSt0aCz486ufDxRfWbULdlhrvIs11Ue+3s4oCHtXMMsMM0RCJd gnCJ+81H+q1zCbtVmhivH X-Received: by 2002:a1f:ec84:0:b0:48f:dc94:1b37 with SMTP id k126-20020a1fec84000000b0048fdc941b37mr6830890vkh.5.1694398360305; Sun, 10 Sep 2023 19:12:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGQa9EA0BecT9tAUE5Xk2pnNinY+6g9990OgsljkZ69zuoG/4djLYinNOSs6HIF4zNe5hdMF/BCKh5Bs+ffJJk= X-Received: by 2002:a1f:ec84:0:b0:48f:dc94:1b37 with SMTP id k126-20020a1fec84000000b0048fdc941b37mr6830879vkh.5.1694398360027; Sun, 10 Sep 2023 19:12:40 -0700 (PDT) MIME-Version: 1.0 References: <20230905021334.39124-1-ef2648@columbia.edu> <14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com> In-Reply-To: <14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com> From: Eric Feng Date: Sun, 10 Sep 2023 22:12:29 -0400 Message-ID: Subject: Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] To: David Malcolm Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org Content-Type: multipart/alternative; boundary="0000000000005b989f06050bdc7b" X-Proofpoint-ORIG-GUID: wgQFe3eaJ0HZMPANJ7guJHZURYwJJaEL X-Proofpoint-GUID: wgQFe3eaJ0HZMPANJ7guJHZURYwJJaEL X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.957,Hydra:6.0.601,FMLib:17.11.176.26 definitions=2023-09-10_18,2023-09-05_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 mlxlogscore=965 bulkscore=10 impostorscore=10 clxscore=1015 lowpriorityscore=10 malwarescore=0 priorityscore=1501 adultscore=0 mlxscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2308100000 definitions=main-2309110019 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000005b989f06050bdc7b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Sep 7, 2023 at 1:28=E2=80=AFPM David Malcolm = wrote: > On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote: > > > Hi Dave, > > Hi Eric, thanks for the patch. > > > > > Recently I've been working on symbolic value support for the reference > > count checker. I've attached a patch for it below; let me know it looks > > OK for trunk. Thanks! > > > > Best, > > Eric > > > > --- > > > > This patch enhances the reference count checker in the CPython plugin by > > adding support for symbolic values. Whereas previously we were only able > > to check the reference count of PyObject* objects created in the scope > > of the function; we are now able to emit diagnostics on reference count > > mismatch of objects that were, for example, passed in as a function > > parameter. > > > > rc6.c:6:10: warning: expected =E2=80=98obj=E2=80=99 to have reference c= ount: N + =E2=80=981=E2=80=99 but > ob_refcnt field is N + =E2=80=982=E2=80=99 > > 6 | return obj; > > | ^~~ > > [...snip...] > > > create mode 100644 > gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > index bf1982e79c3..d7ecd7fce09 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > > @@ -314,17 +314,20 @@ public: > > { > > diagnostic_metadata m; > > bool warned; > > - // just assuming constants for now > > - auto actual_refcnt > > - =3D m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); > > - auto ob_refcnt =3D m_ob_refcnt->dyn_cast_constant_svalue > ()->get_constant (); > > - warned =3D warning_meta (rich_loc, m, get_controlling_option (), > > - "expected %qE to have " > > - "reference count: %qE but ob_refcnt field is: > %qE", > > - m_reg_tree, actual_refcnt, ob_refcnt); > > - > > - // location_t loc =3D rich_loc->get_loc (); > > - // foo (loc); > > + > > + const auto *actual_refcnt_constant > > + =3D m_actual_refcnt->dyn_cast_constant_svalue (); > > + const auto *ob_refcnt_constant =3D > m_ob_refcnt->dyn_cast_constant_svalue (); > > + if (!actual_refcnt_constant || !ob_refcnt_constant) > > + return false; > > + > > + auto actual_refcnt =3D actual_refcnt_constant->get_constant (); > > + auto ob_refcnt =3D ob_refcnt_constant->get_constant (); > > + warned =3D warning_meta ( > > + rich_loc, m, get_controlling_option (), > > + "expected %qE to have " > > + "reference count: N + %qE but ob_refcnt field is N + %qE", > > + m_reg_tree, actual_refcnt, ob_refcnt); > > return warned; > > I know you're emulating the old behavior I implemented way back in > cpychecker, but I don't like that behavior :( > > Specifically, although the patch improves the behavior for symbolic > values, it regresses the precision of wording for the concrete values > case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have > 1 pointer, then it's more readable to say: > > warning: expected =E2=80=98obj=E2=80=99 to have reference count: =E2=80= =981=E2=80=99 but ob_refcnt > field is =E2=80=982=E2=80=99 > > than: > > warning: expected =E2=80=98obj=E2=80=99 to have reference count: N + = =E2=80=981=E2=80=99 but ob_refcnt > field is N + =E2=80=982=E2=80=99 > > ...and we shouldn't quote concrete numbers, the message should be: > > warning: expected =E2=80=98obj=E2=80=99 to have reference count of 1 bu= t ob_refcnt field > is 2 > or better: > > warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 pointer but 'ob_refc= nt' field is 2 > > > Can you move the unwrapping of the svalue from the tests below into the > emit vfunc? That way the m_actual_refcnt doesn't have to be a > constant_svalue; you could have logic in the emit vfunc to print > readable messages based on what kind of svalue it is. > > Rather than 'N', it might be better to say 'initial'; how about: > > warning: =E2=80=98*obj=E2=80=99 is pointed to by 0 additional pointers = but 'ob_refcnt' > field has increased by 1 > warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 additional pointer b= ut 'ob_refcnt' > field has increased by 2 > warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 additional pointer b= ut 'ob_refcnt' > field is unchanged > warning: =E2=80=98*obj=E2=80=99 is pointed to by 2 additional pointers = but 'ob_refcnt' > field has decreased by 1 > warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 fewer pointers but '= ob_refcnt' field > is unchanged > > and similar? > That makes sense to me as well (indeed I was just emulating the old behavior)! Will experiment and keep you posted on a revised patch with this in mind. This is somewhat of a minor detail but can we emit =E2=80=98*obj= =E2=80=99 as bolded text in the diagnostic message? Currently, I can emit this (including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't bold the body of the single quotations. Is this possible? > > Maybe have a flag that tracks whether we're talking about a concrete > value that's absolute versus a concrete value that's relative to the > initial value? > > > [...snip...] > > > > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map int> &map, const region *key) > > refcnt =3D existed ? refcnt + 1 : 1; > > } > > > > +static const region * > > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr) > > +{ > > + const auto *region_sval =3D sval->dyn_cast_region_svalue (); > > + if (region_sval) > > + return region_sval->get_pointee (); > > + > > + const auto *initial_sval =3D sval->dyn_cast_initial_svalue (); > > + if (initial_sval) > > + return mgr->get_symbolic_region (initial_sval); > > + > > + return nullptr; > > +} > > This is dereferencing a pointer, right? > > Can the caller use region_model::deref_rvalue instead? > > > [...snip...] > > > +static void > > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval) > > +{ > > + if (ob_refcnt_sval->get_kind () !=3D SK_CONSTANT) > > + { > > + auto unwrap_cast =3D ob_refcnt_sval->maybe_undo_cast (); > > + if (!unwrap_cast) > > + unwrap_cast =3D ob_refcnt_sval; > > + > > + if (unwrap_cast->get_kind () =3D=3D SK_BINOP) > > + ob_refcnt_sval =3D unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 > (); > > This would be better spelled as: > > if (const binop_svalue *binop_sval =3D > unwrap_cast->dyn_cast_binop_svalue ()) > ob_refcnt_sval =3D binop_sval->get_arg1 (); > > [...snip...] > > > /* Compare ob_refcnt field vs the actual reference count of a region */ > > static void > > check_refcnt (const region_model *model, > > const region_model *old_model, > > region_model_context *ctxt, > > const hash_map > - int>::iterator::reference_pair region_refcnt) > > + int>::iterator::reference_pair ®ion_refcn= t) > > Could really use a typedef for > const hash_map > to simplify this code. > > > { > > region_model_manager *mgr =3D model->get_manager (); > > const auto &curr_region =3D region_refcnt.first; > > const auto &actual_refcnt =3D region_refcnt.second; > > + > > const svalue *ob_refcnt_sval > > =3D retrieve_ob_refcnt_sval (curr_region, model, ctxt); > > + if (!ob_refcnt_sval) > > + return; > > + > > + unwrap_any_ob_refcnt_sval (ob_refcnt_sval); > > As noted above, can the diagnostic store the pre-unwrapped > ob_refcnt_sval? Might mean you have to do the unwrapping both here, > and later when displaying the diagnostic. Or (probably best) track > both the original and unwrapped ob_refcnt_sval, and store both in the > pending_diagnostic. > > > + > > const svalue *actual_refcnt_sval =3D mgr->get_or_create_int_cst ( > > ob_refcnt_sval->get_type (), actual_refcnt); > > - > > if (ob_refcnt_sval !=3D actual_refcnt_sval) > > - { > > - const svalue *curr_reg_sval > > - =3D mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region); > > - tree reg_tree =3D old_model->get_representative_tree > (curr_reg_sval); > > - if (!reg_tree) > > - return; > > - > > - const auto &eg =3D ctxt->get_eg (); > > - refcnt_stmt_finder finder (*eg, reg_tree); > > - auto pd =3D make_unique (curr_region, > ob_refcnt_sval, > > - actual_refcnt_sval, > reg_tree); > > - if (pd && eg) > > - ctxt->warn (std::move (pd), &finder); > > - } > > + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval, > > + actual_refcnt_sval, ctxt); > > } > > > > static void > > @@ -493,8 +520,6 @@ count_all_references (const region_model *model, > > for (const auto &cluster : *model->get_store ()) > > { > > auto curr_region =3D cluster.first; > > - if (curr_region->get_kind () !=3D RK_HEAP_ALLOCATED) > > - continue; > > > > increment_region_refcnt (region_to_refcnt, curr_region); > > > > @@ -505,8 +530,8 @@ count_all_references (const region_model *model, > > > > const svalue *unwrapped_sval > > =3D binding_sval->unwrap_any_unmergeable (); > > - // if (unwrapped_sval->get_type () !=3D pyobj_ptr_tree) > > - // continue; > > + if (unwrapped_sval->get_type () !=3D pyobj_ptr_tree) > > + continue; > > We'll probably want a smarter test for this, that the type "inherits" > C-style from PyObject (e.g. PyLongObject). > > > > > > const region *pointee =3D unwrapped_sval->maybe_get_region (); > > if (!pointee || pointee->get_kind () !=3D RK_HEAP_ALLOCATED) > > diff --git > a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > index e1efd9efda5..46daf2f8975 100644 > > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > > @@ -75,4 +75,23 @@ test_PyListAppend_6 () > > PyObject *list =3D NULL; > > PyList_Append(list, item); > > return list; > > -} > > \ No newline at end of file > > +} > > + > > +PyObject * > > +test_PyListAppend_7 (PyObject *item) > > +{ > > + PyObject *list =3D PyList_New (0); > > + Py_INCREF(item); > > + PyList_Append(list, item); > > + return list; > > + /* { dg-warning "expected 'item' to have reference count" "" { target > *-*-* } .-1 } */ > > It would be good if these dg-warning directives regexp contained the > actual and expected counts; I find I can't easily tell what the > intended output is meant to be. > > > > +} > > + > > +PyObject * > > +test_PyListAppend_8 (PyObject *item, PyObject *list) > > +{ > > + Py_INCREF(item); > > + Py_INCREF(item); > > + PyList_Append(list, item); > > + return list; > > +} > > Should we complain here about item->ob_refcnt being too high? > > > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > new file mode 100644 > > index 00000000000..a7f39509d6d > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target analyzer } */ > > +/* { dg-options "-fanalyzer" } */ > > +/* { dg-require-python-h "" } */ > > + > > + > > +#define PY_SSIZE_T_CLEAN > > +#include > > +#include "../analyzer/analyzer-decls.h" > > + > > +PyObject * > > +test_refcnt_1 (PyObject *obj) > > +{ > > + Py_INCREF(obj); > > + Py_INCREF(obj); > > + return obj; > > + /* { dg-warning "expected 'obj' to have reference count" "" { target > *-*-* } .-1 } */ > > Likewise, it would be better for the dg-warning directive's expressed > the expected "actual vs expected" values. > > [...snip...] > > Thanks again for the patch, hope this is constructive > > Dave > > --0000000000005b989f06050bdc7b--