From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 21E373858C66 for ; Sun, 2 Apr 2023 23:28:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 21E373858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680478092; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4/CWUfPVhh5urFpRaKz2XMERjFA88LgcA7rejk2fhNQ=; b=ZDaW45YPxL6uvQDKAFjID5yLB/Cv2O0SgRLW/sCq8fiMyRTNaDDyQwxH6LzYkUBAD6VJnS CkuONWRXo7iJMPWEji+0LPqInVj7KZWiQmtSiDSoNIrbTBn++oGC85SZLFYbrOo4BMzJa+ 7RvvenZeYoIQSS2VWF3Grfuq5+Tsfrk= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-75-C6q9kF0VPUKb8qryYLDvxg-1; Sun, 02 Apr 2023 19:28:11 -0400 X-MC-Unique: C6q9kF0VPUKb8qryYLDvxg-1 Received: by mail-qt1-f197.google.com with SMTP id n10-20020a05622a11ca00b003e4e30c6c98so15953075qtk.19 for ; Sun, 02 Apr 2023 16:28:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680478091; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=4/CWUfPVhh5urFpRaKz2XMERjFA88LgcA7rejk2fhNQ=; b=v7EE6gcIDOoJF8pCyPT3JkaDzak73YcoxI72cSRKSzqrNhc64xp8W03tRrVij2YCv2 pyBod4teIxzeVpgrABgTrt/5NyCuL/3Dsnq8EsOp2KjgSl2a+XEQEHvk68UNJ8yNISQE 17R+4HvLtlz4dXVNW2LQ7n45q6X5ebebJoV2kC3XZexNZIDVaVBuOx1fZTkrudH8LySx lkyizREoGJlL7d0Bof+1tJZzkePGaSj4DPcreB6sekzlURoYgavX6UeFZAo5tN7z2XfM gBEIHEVFqqZLy/cLuDSuQOx2wnck28TkF6v6rBpyR7OuaMnGEo73TxxlfvuvBrHWU2eQ h1wA== X-Gm-Message-State: AAQBX9cTz9/DsrNrFlLW/vbr8L+kJz/eMkr7oJ3rvEwtL/MfUeXSLJ0x yMStLsEK0fvBeGR4lnDC0oJuxNS4BzNbP3bEfNQzbssx7Snli+oqgWyDDbr7yuQZ3Taxqoj3Ebe XYwl3m5U= X-Received: by 2002:a05:6214:2a88:b0:5c6:5eab:4fc8 with SMTP id jr8-20020a0562142a8800b005c65eab4fc8mr56972428qvb.31.1680478090854; Sun, 02 Apr 2023 16:28:10 -0700 (PDT) X-Google-Smtp-Source: AKy350aKcMiiSolROh+Nis6XjnuC+aaUAfiVsHfx21vSp4mHL9F3VsbY/9vmFXnJGknC+LTqk6PN2Q== X-Received: by 2002:a05:6214:2a88:b0:5c6:5eab:4fc8 with SMTP id jr8-20020a0562142a8800b005c65eab4fc8mr56972411qvb.31.1680478090533; Sun, 02 Apr 2023 16:28:10 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id kv1-20020a056214534100b005dd8b934572sm2260992qvb.10.2023.04.02.16.28.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Apr 2023 16:28:10 -0700 (PDT) Message-ID: <5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com> Subject: Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin From: David Malcolm To: Eric Feng Cc: gcc@gcc.gnu.org Date: Sun, 02 Apr 2023 19:28:09 -0400 In-Reply-To: References: <9698600391b2cb611dfa8fee5540258ed0cafb1e.camel@redhat.com> User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sat, 2023-04-01 at 19:49 -0400, Eric Feng wrote: > > For the task above, I think it's almost all there, it's "just" a > > case > > of implementing the special-case knowledge about the CPython API, > > mostly via known_function subclasses. >=20 > Sounds good. >=20 >=20 > > In cpychecker I added some custom function attributes: > > =C2=A0 > > https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html > > which were: > > =C2=A0 __attribute__((cpychecker_returns_borrowed_ref)) > > =C2=A0 __attribute__((cpychecker_steals_reference_to_arg(n))) > >=20 > [...] > >=20 > > But exactly what these macros would look like would be a decision > > for > > the CPython community (hence do it via PEP, based on a sample > > implementation). >=20 > Ok, I see what you mean now. Thanks for clarifying! >=20 >=20 > > Yeah, this sounds like a big project.=C2=A0 Fortunately there are a lot > > of > > possible subtasks in this one, and the project has benefits to GCC > > and > > to CPython even if you only get a subset of the ideas done in the > > time > > available (refcount checking being probably the highest-value > > subtask). >=20 > Sounds good. >=20 > I refactored the project description and timeline sections of the > proposal according to our conversation. Notably, I moved format > string > checking to task #2 in the timeline since its subtasks are > particularly beneficial. I also suggest in the timeline section to > reach out to the CPython community via PEP about the specifics of new > attributes in week 9/10 since I think we should have a somewhat > mature > prototype by that point. Let me know if you think it should be done > earlier/later. Please find the changed sections below (I omitted > unchanged sections for brevity) > _______ >=20 > Describe the project and clearly define its goals: > One pertinent use case of the gcc-python plugin was as a static > analysis tool for CPython extension modules. The main goal of the > plugin was to help programmers writing extensions identify common > coding errors. The gcc-python-plugin has bitrotted over the years > and, > in particular, cpychecker stopped working some GCC releases ago. > Broadly, the goal of this project is to port the functionalities of > cpychecker to a -fanalyzer plugin. >=20 > Below is a brief description of the functionalities of the static > analysis tool for which I will work on porting over to a -fanalyzer > plugin. The structure of the objectives is based on the > gcc-python-plugin documentation: >=20 > Reference count checking: >=20 > Format string checking: Some CPython APIs such as PyArgs_ParseTuple, > PyArg_ParseTupleAndKeywords, etc take format strings as arguments. > This check involves verifying that the format strings taken in by > these APIs are correct with respect to the number and types of > arguments passed in. In particular, I will work on integrating the > analyzer with -Wformat > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107017) and adding > plugin support for -Wformat > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100121) . We should > then > be able to specify our own archetype which reflects the format string > syntax for the relevant CPython APIs and take advantage of the > integrated analyzer to check them. >=20 > Associating PyTypeObject instances with compile-time-types: > from original proposal> >=20 > Error-handling checking (including errors in exception handling): > Common errors such as dereferencing a NULL value are already checked > by the analyzer. I will extend this functionality by implementing > special-case knowledge about the CPython API. >=20 > Verification of PyMethodDef tables: proposal> >=20 > Provide an expected timeline: > Please find a rough estimate of the weekly progress in relation to > the > features described below. Tasks that I expect to take longer than one > week are broken down in more detail. In addition to what=E2=80=99s descri= bed, > each task also involves adding test coverage pertaining its specific > feature to a regression test suite. >=20 > Week 1 - 7: Reference counting checking > =C2=A0=C2=A0=C2=A0 Week 1: Set up the overall infrastructure of the plugi= n and begin > building core functionality > =C2=A0=C2=A0=C2=A0 Week 1 - 6: Core reference counting functionality > =C2=A0=C2=A0=C2=A0 Week 7: Refine prototype > Week 8 - 10.5: Format string checking (including associating > PyTypeObject instances with compile-time-types) > =C2=A0=C2=A0=C2=A0 Week 8 - ~9: RFE: support printf-style formatted funct= ions in - > fanalyzer > =C2=A0=C2=A0=C2=A0 Week ~9 - 10.5: RFE: plugin support for -Wformat via > __attribute__((format())) > =C2=A0=C2=A0=C2=A0 Additionally, begin conversing with CPython community = via PEP > about the exact form of new attributes on CPython headers which may > be > helpful for both humans and the static analyzer. Present ideas based > on work done so far. > Week 10.5 - 12: Error-handling checking, errors in exception > handling, > and verification of PyMethodDef tables >=20 Sounds great. Note that the deadline for submitting proposals to the official GSoC website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that Google are very strict about that deadline; see: https://developers.google.com/open-source/gsoc/timeline Please include the biographical detail on yourself in the proposal that you posted on the list, and if you can, link to C++ code you've written. I don't know if you saw the emails from Sun Steven, but they're also interested in this project, perhaps as a collaboration with you. Given that the project is large and could be chopped up into several components that might be a possibility - but don't feel like you need to do that yourself in your proposal; as noted in the email I just sent, we don't know how many slots we'll get from the GSoC program. Good luck Dave