From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id D4F573858D1E for ; Thu, 7 Sep 2023 17:28:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D4F573858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694107686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b9aFpDgRW0ZN4u6jZsrKoc1nDiHUSNAZpDi6U839bkc=; b=SPTZL0cWfg4HhZHnk+pf+Nbt2n8ANhFUTmvDvLHue1uPoZuKghPrKWRXb4yob5SmqAMt3e P1Dh7IDjl50wTlHVSTxC31m5fISOk+CejWEH+TFiTAkUHjsPqXFwqJhbTZ1DrRq8Kywv1v f0KKvtKM+N2hyWMcUXc6f0ZY4FnXbJ8= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-652-nBk7X8FoNeiBfEBV3icv4g-1; Thu, 07 Sep 2023 13:28:04 -0400 X-MC-Unique: nBk7X8FoNeiBfEBV3icv4g-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-414b72208a1so14006121cf.3 for ; Thu, 07 Sep 2023 10:28:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694107684; x=1694712484; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=aj1h7ifGZYM8oEXUg3imR7OCiGZeGDLUHoW0f6YhalA=; b=ISzr4dqYsQU2KazRN9ewb2pn1I5vazCHJSYIO8HWpWfKJ4ExJptZ1+yAfwTUkawIPx W1feNPr+TTzzR6NUXa6eFZmv/cvL1rbD2LiJp2l/VR9RfV1VeXI7AAU3AUXApixW4lPc 3KNXN1xcjQHYdQ1jd28TiAKbvGcO5IQOOkVmHPgwVhRXYNDjUWe28XinCsN5Xj1lIivi /qHdvZdHvvfgfT4/deC+g09wAarcCQ8Owb7WXAwaEkBFrKuXkjzpsDwaXlJ5Q59Vuv4v AFvliNQXsMECCeGjGEN/sFK5fnWnJXLUFLkX7yXNMtSgTMCKUo2WePd5+10RPoxPcO3a w/+A== X-Gm-Message-State: AOJu0YyCIsex9G/lUuB8ro0lgzywYAiPjTjVNyXjkUqVqqY8li6CbagR J3zfBCAoGF85fcM42K+RBxbuSeKSi9KB2a7jFPbLVw+zdDc7UmV6vauW+O0qR6gQJ1IAjygHKRw EN14910k= X-Received: by 2002:a05:622a:587:b0:403:66f7:ae66 with SMTP id c7-20020a05622a058700b0040366f7ae66mr167522qtb.13.1694107683946; Thu, 07 Sep 2023 10:28:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFdWZ7/oym/ffwWsBhN9XqIkXZJLYAsq7DSH8hHbOFnKD6e+Qk8VaoN7zSjNJs9EEHgE8OJ4g== X-Received: by 2002:a05:622a:587:b0:403:66f7:ae66 with SMTP id c7-20020a05622a058700b0040366f7ae66mr167507qtb.13.1694107683568; Thu, 07 Sep 2023 10:28:03 -0700 (PDT) Received: from t14s.localdomain (c-76-28-97-5.hsd1.ma.comcast.net. [76.28.97.5]) by smtp.gmail.com with ESMTPSA id b3-20020ac844c3000000b00410929fe3b9sm6350329qto.58.2023.09.07.10.28.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Sep 2023 10:28:03 -0700 (PDT) Message-ID: <14870fdc12b8e67bd1e8bd97880653198cfe485d.camel@redhat.com> Subject: Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] From: David Malcolm To: Eric Feng Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org Date: Thu, 07 Sep 2023 13:28:02 -0400 In-Reply-To: <20230905021334.39124-1-ef2648@columbia.edu> References: <20230905021334.39124-1-ef2648@columbia.edu> User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote: > Hi Dave, Hi Eric, thanks for the patch. >=20 > Recently I've been working on symbolic value support for the reference > count checker. I've attached a patch for it below; let me know it looks > OK for trunk. Thanks! >=20 > Best, > Eric >=20 > --- >=20 > This patch enhances the reference count checker in the CPython plugin by > adding support for symbolic values. Whereas previously we were only able > to check the reference count of PyObject* objects created in the scope > of the function; we are now able to emit diagnostics on reference count > mismatch of objects that were, for example, passed in as a function > parameter. >=20 > rc6.c:6:10: warning: expected =E2=80=98obj=E2=80=99 to have reference cou= nt: N + =E2=80=981=E2=80=99 but ob_refcnt field is N + =E2=80=982=E2=80=99 > 6 | return obj; > | ^~~ [...snip...] > create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcn= t.c >=20 > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/= testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > index bf1982e79c3..d7ecd7fce09 100644 > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c > @@ -314,17 +314,20 @@ public: > { > diagnostic_metadata m; > bool warned; > - // just assuming constants for now > - auto actual_refcnt > -=09=3D m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant (); > - auto ob_refcnt =3D m_ob_refcnt->dyn_cast_constant_svalue ()->get_con= stant (); > - warned =3D warning_meta (rich_loc, m, get_controlling_option (), > -=09=09=09 "expected %qE to have " > -=09=09=09 "reference count: %qE but ob_refcnt field is: %qE", > -=09=09=09 m_reg_tree, actual_refcnt, ob_refcnt); > - > - // location_t loc =3D rich_loc->get_loc (); > - // foo (loc); > + > + const auto *actual_refcnt_constant > +=09=3D m_actual_refcnt->dyn_cast_constant_svalue (); > + const auto *ob_refcnt_constant =3D m_ob_refcnt->dyn_cast_constant_sv= alue (); > + if (!actual_refcnt_constant || !ob_refcnt_constant) > + return false; > + > + auto actual_refcnt =3D actual_refcnt_constant->get_constant (); > + auto ob_refcnt =3D ob_refcnt_constant->get_constant (); > + warned =3D warning_meta ( > +=09rich_loc, m, get_controlling_option (), > +=09"expected %qE to have " > +=09"reference count: N + %qE but ob_refcnt field is N + %qE", > +=09m_reg_tree, actual_refcnt, ob_refcnt); > return warned; I know you're emulating the old behavior I implemented way back in cpychecker, but I don't like that behavior :( Specifically, although the patch improves the behavior for symbolic values, it regresses the precision of wording for the concrete values case. If we have e.g. a concrete ob_refcnt of 2, whereas we only have 1 pointer, then it's more readable to say: warning: expected =E2=80=98obj=E2=80=99 to have reference count: =E2=80= =981=E2=80=99 but ob_refcnt field is =E2=80=982=E2=80=99 than: warning: expected =E2=80=98obj=E2=80=99 to have reference count: N + =E2= =80=981=E2=80=99 but ob_refcnt field is N + =E2=80=982=E2=80=99 ...and we shouldn't quote concrete numbers, the message should be: warning: expected =E2=80=98obj=E2=80=99 to have reference count of 1 but = ob_refcnt field is 2 or better: warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 pointer but 'ob_refcnt= ' field is 2 Can you move the unwrapping of the svalue from the tests below into the emit vfunc? That way the m_actual_refcnt doesn't have to be a constant_svalue; you could have logic in the emit vfunc to print readable messages based on what kind of svalue it is. Rather than 'N', it might be better to say 'initial'; how about: warning: =E2=80=98*obj=E2=80=99 is pointed to by 0 additional pointers bu= t 'ob_refcnt' field has increased by 1 warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 additional pointer but= 'ob_refcnt' field has increased by 2 warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 additional pointer but= 'ob_refcnt' field is unchanged warning: =E2=80=98*obj=E2=80=99 is pointed to by 2 additional pointers bu= t 'ob_refcnt' field has decreased by 1 warning: =E2=80=98*obj=E2=80=99 is pointed to by 1 fewer pointers but 'ob= _refcnt' field is unchanged and similar? Maybe have a flag that tracks whether we're talking about a concrete value that's absolute versus a concrete value that's relative to the initial value? [...snip...] > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map &map, const region *key) > refcnt =3D existed ? refcnt + 1 : 1; > } > =20 > +static const region * > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr) > +{ > + const auto *region_sval =3D sval->dyn_cast_region_svalue (); > + if (region_sval) > + return region_sval->get_pointee (); > + > + const auto *initial_sval =3D sval->dyn_cast_initial_svalue (); > + if (initial_sval) > + return mgr->get_symbolic_region (initial_sval); > + > + return nullptr; > +} This is dereferencing a pointer, right? Can the caller use region_model::deref_rvalue instead? [...snip...] > +static void > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval) > +{ > + if (ob_refcnt_sval->get_kind () !=3D SK_CONSTANT) > + { > + auto unwrap_cast =3D ob_refcnt_sval->maybe_undo_cast (); > + if (!unwrap_cast) > +=09unwrap_cast =3D ob_refcnt_sval; > + > + if (unwrap_cast->get_kind () =3D=3D SK_BINOP) > +=09ob_refcnt_sval =3D unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 ()= ; This would be better spelled as: if (const binop_svalue *binop_sval =3D unwrap_cast->dyn_cast_binop= _svalue ()) =09 ob_refcnt_sval =3D binop_sval->get_arg1 (); [...snip...] > /* Compare ob_refcnt field vs the actual reference count of a region */ > static void > check_refcnt (const region_model *model, > =09 const region_model *old_model, > =09 region_model_context *ctxt, > =09 const hash_map -=09=09=09 int>::iterator::reference_pair region_refcnt) > +=09=09=09 int>::iterator::reference_pair ®ion_refcnt) Could really use a typedef for const hash_map to simplify this code. > { > region_model_manager *mgr =3D model->get_manager (); > const auto &curr_region =3D region_refcnt.first; > const auto &actual_refcnt =3D region_refcnt.second; > + > const svalue *ob_refcnt_sval > =3D retrieve_ob_refcnt_sval (curr_region, model, ctxt); > + if (!ob_refcnt_sval) > + return; > + > + unwrap_any_ob_refcnt_sval (ob_refcnt_sval); As noted above, can the diagnostic store the pre-unwrapped ob_refcnt_sval? Might mean you have to do the unwrapping both here, and later when displaying the diagnostic. Or (probably best) track both the original and unwrapped ob_refcnt_sval, and store both in the pending_diagnostic. > + > const svalue *actual_refcnt_sval =3D mgr->get_or_create_int_cst ( > ob_refcnt_sval->get_type (), actual_refcnt); > - > if (ob_refcnt_sval !=3D actual_refcnt_sval) > - { > - const svalue *curr_reg_sval > -=09 =3D mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region); > - tree reg_tree =3D old_model->get_representative_tree (curr_reg_sva= l); > - if (!reg_tree) > -=09return; > - > - const auto &eg =3D ctxt->get_eg (); > - refcnt_stmt_finder finder (*eg, reg_tree); > - auto pd =3D make_unique (curr_region, ob_refcnt_s= val, > -=09=09=09=09=09 actual_refcnt_sval, reg_tree); > - if (pd && eg) > -=09ctxt->warn (std::move (pd), &finder); > - } > + handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval, > +=09=09=09 actual_refcnt_sval, ctxt); > } > =20 > static void > @@ -493,8 +520,6 @@ count_all_references (const region_model *model, > for (const auto &cluster : *model->get_store ()) > { > auto curr_region =3D cluster.first; > - if (curr_region->get_kind () !=3D RK_HEAP_ALLOCATED) > -=09continue; > =20 > increment_region_refcnt (region_to_refcnt, curr_region); > =20 > @@ -505,8 +530,8 @@ count_all_references (const region_model *model, > =20 > =09 const svalue *unwrapped_sval > =09 =3D binding_sval->unwrap_any_unmergeable (); > -=09 // if (unwrapped_sval->get_type () !=3D pyobj_ptr_tree) > -=09 // continue; > +=09 if (unwrapped_sval->get_type () !=3D pyobj_ptr_tree) > +=09 continue; We'll probably want a smarter test for this, that the type "inherits" C-style from PyObject (e.g. PyLongObject). > =20 > =09 const region *pointee =3D unwrapped_sval->maybe_get_region (); > =09 if (!pointee || pointee->get_kind () !=3D RK_HEAP_ALLOCATED) > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Appen= d.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > index e1efd9efda5..46daf2f8975 100644 > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c > @@ -75,4 +75,23 @@ test_PyListAppend_6 () > PyObject *list =3D NULL; > PyList_Append(list, item); > return list; > -} > \ No newline at end of file > +} > + > +PyObject * > +test_PyListAppend_7 (PyObject *item) > +{ > + PyObject *list =3D PyList_New (0); > + Py_INCREF(item); > + PyList_Append(list, item); > + return list; > + /* { dg-warning "expected 'item' to have reference count" "" { target = *-*-* } .-1 } */ It would be good if these dg-warning directives regexp contained the actual and expected counts; I find I can't easily tell what the intended output is meant to be. > +} > + > +PyObject * > +test_PyListAppend_8 (PyObject *item, PyObject *list) > +{ > + Py_INCREF(item); > + Py_INCREF(item); > + PyList_Append(list, item); > + return list; > +} Should we complain here about item->ob_refcnt being too high? > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c b/g= cc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > new file mode 100644 > index 00000000000..a7f39509d6d > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target analyzer } */ > +/* { dg-options "-fanalyzer" } */ > +/* { dg-require-python-h "" } */ > + > + > +#define PY_SSIZE_T_CLEAN > +#include > +#include "../analyzer/analyzer-decls.h" > + > +PyObject * > +test_refcnt_1 (PyObject *obj) > +{ > + Py_INCREF(obj); > + Py_INCREF(obj); > + return obj; > + /* { dg-warning "expected 'obj' to have reference count" "" { target *= -*-* } .-1 } */ Likewise, it would be better for the dg-warning directive's expressed the expected "actual vs expected" values. [...snip...] Thanks again for the patch, hope this is constructive Dave