From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) by sourceware.org (Postfix) with ESMTPS id 4703C3858D37 for ; Mon, 3 Apr 2023 14:30:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4703C3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167069.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 333EBuAw029700 for ; Mon, 3 Apr 2023 10:30:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type : content-transfer-encoding; s=pps01; bh=V0KIN/TrkJLRTAMwaKjkWZ4UKVXVpYeg5eCs1oLbCKA=; b=FFVZjv+WSeC0A7j+vrUqTAroR/mmq7OcOtOh4wrLk0fQ1c3brMPQLoT5pMJOXkf2Zgb9 LSsPSfrj7xWGjQEy5Q/Mhjlk08uPFP16GlgRUN7Drz/9dbbEnLJx/Sg+3EoQOSPMOWO8 8LlEgnZ4BuyDZS8GqwOpzoSvkx6hfAvdwaP2aFOSxMrdCAw7UJjSfRVrxHcHnaHpn8sA ZdQduqTobexqhcMF1wX7zN9/Vo/Aj/SUJJdbKzReeCG+K8/7MuPQHi27r2auE+gD9YSv fXBhqREzl0njClOaUTKAk+uj95zPsrIrLjWIo8yt71sN6TKho+FLiib3wl10QlBDD3Tr vQ== Received: from sendprdmail22.cc.columbia.edu (sendprdmail22.cc.columbia.edu [128.59.72.24]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 3ppdmrx5qj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 03 Apr 2023 10:30:08 -0400 Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) by sendprdmail22.cc.columbia.edu (8.14.7/8.14.4) with ESMTP id 333ESwFj124123 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 3 Apr 2023 10:30:07 -0400 Received: by mail-ua1-f70.google.com with SMTP id a6-20020ab03c86000000b00761dc4d3e30so12705174uax.5 for ; Mon, 03 Apr 2023 07:30:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680532207; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V0KIN/TrkJLRTAMwaKjkWZ4UKVXVpYeg5eCs1oLbCKA=; b=dFKpCo9d4mw4U5UgqqYLpkoBmrBWaEAkje9KRAk6i/vWnGXK0GVXRnBQvfrDiBchvt sf5H4o31p3Yy7Ap6OF3cEz1FSOuZrmhw4TyLVncf+HP1V+61WKiIDIqdoOSiaqAtqwWv 7x7Kh4rZbsNpL3mpMAOSfA/T/B5FXs1QBMQgk39YJulq0X1ARK853ClI0ayXci1zfqEi lIgLkqNn1OynMZubS8JCWH/PKRw4gB+mE5+lRvYB6+1vrcw9lPsogXx2+StwLWGvgIiY lDXpZAUw7JxS/E/6e+yOTm+oxTICTePis1Zj/r+mIj2h2gfHjPm8zVaDqnpDjY7cYrik dXug== X-Gm-Message-State: AAQBX9dyUvug4fBtnYeNQFIFC80ZFBE5eKXT4q5mbD/zyKyXgoKUrN80 0madLCH6kD2s4QaaXT+Bhc7rH4h49VWG65P/a3QisV/AO10BMM/gZC+XsYkAMffPZQC45pfVH6H JQionl+wbvzVHnZb5WuUD1tyoTDeCaYQ= X-Received: by 2002:a67:c215:0:b0:425:eb13:b07d with SMTP id i21-20020a67c215000000b00425eb13b07dmr19091859vsj.4.1680532206705; Mon, 03 Apr 2023 07:30:06 -0700 (PDT) X-Google-Smtp-Source: AKy350YoloVGR3Xw9PRm+l3NI/owdBq01MIP852fai8e/ZyW/ChXMzCdDXnORwGgtz6bmrKxrStSKjkOntfb6PAGZ3w= X-Received: by 2002:a67:c215:0:b0:425:eb13:b07d with SMTP id i21-20020a67c215000000b00425eb13b07dmr19091848vsj.4.1680532206335; Mon, 03 Apr 2023 07:30:06 -0700 (PDT) MIME-Version: 1.0 References: <9698600391b2cb611dfa8fee5540258ed0cafb1e.camel@redhat.com> <5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com> In-Reply-To: <5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com> From: Eric Feng Date: Mon, 3 Apr 2023 10:29:55 -0400 Message-ID: Subject: Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin To: David Malcolm Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-GUID: E9XhxJgn-kqaXgon8Ps1HRE6rHthpxQA X-Proofpoint-ORIG-GUID: E9XhxJgn-kqaXgon8Ps1HRE6rHthpxQA X-CU-OB: Yes X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-03_11,2023-04-03_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=10 clxscore=1015 adultscore=0 mlxlogscore=999 spamscore=0 priorityscore=1501 impostorscore=10 mlxscore=0 suspectscore=0 phishscore=0 bulkscore=10 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304030103 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thanks for bringing this to my attention Dave! I=E2=80=99m happy to collaborate on this project with Steven. I will reply in more detail in the other thread. Best, Eric On Sun, Apr 2, 2023 at 7:28=E2=80=AFPM David Malcolm = wrote: > > On Sat, 2023-04-01 at 19:49 -0400, Eric Feng wrote: > > > For the task above, I think it's almost all there, it's "just" a > > > case > > > of implementing the special-case knowledge about the CPython API, > > > mostly via known_function subclasses. > > > > Sounds good. > > > > > > > In cpychecker I added some custom function attributes: > > > > > > https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html > > > which were: > > > __attribute__((cpychecker_returns_borrowed_ref)) > > > __attribute__((cpychecker_steals_reference_to_arg(n))) > > > > > [...] > > > > > > But exactly what these macros would look like would be a decision > > > for > > > the CPython community (hence do it via PEP, based on a sample > > > implementation). > > > > Ok, I see what you mean now. Thanks for clarifying! > > > > > > > Yeah, this sounds like a big project. Fortunately there are a lot > > > of > > > possible subtasks in this one, and the project has benefits to GCC > > > and > > > to CPython even if you only get a subset of the ideas done in the > > > time > > > available (refcount checking being probably the highest-value > > > subtask). > > > > Sounds good. > > > > I refactored the project description and timeline sections of the > > proposal according to our conversation. Notably, I moved format > > string > > checking to task #2 in the timeline since its subtasks are > > particularly beneficial. I also suggest in the timeline section to > > reach out to the CPython community via PEP about the specifics of new > > attributes in week 9/10 since I think we should have a somewhat > > mature > > prototype by that point. Let me know if you think it should be done > > earlier/later. Please find the changed sections below (I omitted > > unchanged sections for brevity) > > _______ > > > > Describe the project and clearly define its goals: > > One pertinent use case of the gcc-python plugin was as a static > > analysis tool for CPython extension modules. The main goal of the > > plugin was to help programmers writing extensions identify common > > coding errors. The gcc-python-plugin has bitrotted over the years > > and, > > in particular, cpychecker stopped working some GCC releases ago. > > Broadly, the goal of this project is to port the functionalities of > > cpychecker to a -fanalyzer plugin. > > > > Below is a brief description of the functionalities of the static > > analysis tool for which I will work on porting over to a -fanalyzer > > plugin. The structure of the objectives is based on the > > gcc-python-plugin documentation: > > > > Reference count checking: > > > > Format string checking: Some CPython APIs such as PyArgs_ParseTuple, > > PyArg_ParseTupleAndKeywords, etc take format strings as arguments. > > This check involves verifying that the format strings taken in by > > these APIs are correct with respect to the number and types of > > arguments passed in. In particular, I will work on integrating the > > analyzer with -Wformat > > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107017) and adding > > plugin support for -Wformat > > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D100121) . We should > > then > > be able to specify our own archetype which reflects the format string > > syntax for the relevant CPython APIs and take advantage of the > > integrated analyzer to check them. > > > > Associating PyTypeObject instances with compile-time-types: > > > from original proposal> > > > > Error-handling checking (including errors in exception handling): > > Common errors such as dereferencing a NULL value are already checked > > by the analyzer. I will extend this functionality by implementing > > special-case knowledge about the CPython API. > > > > Verification of PyMethodDef tables: > proposal> > > > > Provide an expected timeline: > > Please find a rough estimate of the weekly progress in relation to > > the > > features described below. Tasks that I expect to take longer than one > > week are broken down in more detail. In addition to what=E2=80=99s desc= ribed, > > each task also involves adding test coverage pertaining its specific > > feature to a regression test suite. > > > > Week 1 - 7: Reference counting checking > > Week 1: Set up the overall infrastructure of the plugin and begin > > building core functionality > > Week 1 - 6: Core reference counting functionality > > Week 7: Refine prototype > > Week 8 - 10.5: Format string checking (including associating > > PyTypeObject instances with compile-time-types) > > Week 8 - ~9: RFE: support printf-style formatted functions in - > > fanalyzer > > Week ~9 - 10.5: RFE: plugin support for -Wformat via > > __attribute__((format())) > > Additionally, begin conversing with CPython community via PEP > > about the exact form of new attributes on CPython headers which may > > be > > helpful for both humans and the static analyzer. Present ideas based > > on work done so far. > > Week 10.5 - 12: Error-handling checking, errors in exception > > handling, > > and verification of PyMethodDef tables > > > > Sounds great. > > Note that the deadline for submitting proposals to the official GSoC > website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that > Google are very strict about that deadline; see: > https://developers.google.com/open-source/gsoc/timeline > > Please include the biographical detail on yourself in the proposal that > you posted on the list, and if you can, link to C++ code you've > written. > > > I don't know if you saw the emails from Sun Steven, but they're also > interested in this project, perhaps as a collaboration with you. Given > that the project is large and could be chopped up into several > components that might be a possibility - but don't feel like you need > to do that yourself in your proposal; as noted in the email I just > sent, we don't know how many slots we'll get from the GSoC program. > > Good luck > Dave >