From: David Malcolm <dmalcolm@redhat.com>
To: Eric Feng <ef2648@columbia.edu>
Cc: gcc@gcc.gnu.org
Subject: Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin
Date: Sun, 02 Apr 2023 19:28:09 -0400 [thread overview]
Message-ID: <5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com> (raw)
In-Reply-To: <CANGHATVBcN-qX3sBsXbfp1R7rnLM-8HFXX5dWNAHkiT5pdPkmA@mail.gmail.com>
On Sat, 2023-04-01 at 19:49 -0400, Eric Feng wrote:
> > For the task above, I think it's almost all there, it's "just" a
> > case
> > of implementing the special-case knowledge about the CPython API,
> > mostly via known_function subclasses.
>
> Sounds good.
>
>
> > In cpychecker I added some custom function attributes:
> >
> > https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
> > which were:
> > __attribute__((cpychecker_returns_borrowed_ref))
> > __attribute__((cpychecker_steals_reference_to_arg(n)))
> >
> [...]
> >
> > But exactly what these macros would look like would be a decision
> > for
> > the CPython community (hence do it via PEP, based on a sample
> > implementation).
>
> Ok, I see what you mean now. Thanks for clarifying!
>
>
> > Yeah, this sounds like a big project. Fortunately there are a lot
> > of
> > possible subtasks in this one, and the project has benefits to GCC
> > and
> > to CPython even if you only get a subset of the ideas done in the
> > time
> > available (refcount checking being probably the highest-value
> > subtask).
>
> Sounds good.
>
> I refactored the project description and timeline sections of the
> proposal according to our conversation. Notably, I moved format
> string
> checking to task #2 in the timeline since its subtasks are
> particularly beneficial. I also suggest in the timeline section to
> reach out to the CPython community via PEP about the specifics of new
> attributes in week 9/10 since I think we should have a somewhat
> mature
> prototype by that point. Let me know if you think it should be done
> earlier/later. Please find the changed sections below (I omitted
> unchanged sections for brevity)
> _______
>
> Describe the project and clearly define its goals:
> One pertinent use case of the gcc-python plugin was as a static
> analysis tool for CPython extension modules. The main goal of the
> plugin was to help programmers writing extensions identify common
> coding errors. The gcc-python-plugin has bitrotted over the years
> and,
> in particular, cpychecker stopped working some GCC releases ago.
> Broadly, the goal of this project is to port the functionalities of
> cpychecker to a -fanalyzer plugin.
>
> Below is a brief description of the functionalities of the static
> analysis tool for which I will work on porting over to a -fanalyzer
> plugin. The structure of the objectives is based on the
> gcc-python-plugin documentation:
>
> Reference count checking: <Unchanged from original proposal>
>
> Format string checking: Some CPython APIs such as PyArgs_ParseTuple,
> PyArg_ParseTupleAndKeywords, etc take format strings as arguments.
> This check involves verifying that the format strings taken in by
> these APIs are correct with respect to the number and types of
> arguments passed in. In particular, I will work on integrating the
> analyzer with -Wformat
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107017) and adding
> plugin support for -Wformat
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100121) . We should
> then
> be able to specify our own archetype which reflects the format string
> syntax for the relevant CPython APIs and take advantage of the
> integrated analyzer to check them.
>
> Associating PyTypeObject instances with compile-time-types:
> <Unchanged
> from original proposal>
>
> Error-handling checking (including errors in exception handling):
> Common errors such as dereferencing a NULL value are already checked
> by the analyzer. I will extend this functionality by implementing
> special-case knowledge about the CPython API.
>
> Verification of PyMethodDef tables: <Unchanged from original
> proposal>
>
> Provide an expected timeline:
> Please find a rough estimate of the weekly progress in relation to
> the
> features described below. Tasks that I expect to take longer than one
> week are broken down in more detail. In addition to what’s described,
> each task also involves adding test coverage pertaining its specific
> feature to a regression test suite.
>
> Week 1 - 7: Reference counting checking
> Week 1: Set up the overall infrastructure of the plugin and begin
> building core functionality
> Week 1 - 6: Core reference counting functionality
> Week 7: Refine prototype
> Week 8 - 10.5: Format string checking (including associating
> PyTypeObject instances with compile-time-types)
> Week 8 - ~9: RFE: support printf-style formatted functions in -
> fanalyzer
> Week ~9 - 10.5: RFE: plugin support for -Wformat via
> __attribute__((format()))
> Additionally, begin conversing with CPython community via PEP
> about the exact form of new attributes on CPython headers which may
> be
> helpful for both humans and the static analyzer. Present ideas based
> on work done so far.
> Week 10.5 - 12: Error-handling checking, errors in exception
> handling,
> and verification of PyMethodDef tables
>
Sounds great.
Note that the deadline for submitting proposals to the official GSoC
website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that
Google are very strict about that deadline; see:
https://developers.google.com/open-source/gsoc/timeline
Please include the biographical detail on yourself in the proposal that
you posted on the list, and if you can, link to C++ code you've
written.
I don't know if you saw the emails from Sun Steven, but they're also
interested in this project, perhaps as a collaboration with you. Given
that the project is large and could be chopped up into several
components that might be a possibility - but don't feel like you need
to do that yourself in your proposal; as noted in the email I just
sent, we don't know how many slots we'll get from the GSoC program.
Good luck
Dave
next prev parent reply other threads:[~2023-04-02 23:28 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-25 19:38 Eric Feng
2023-03-26 15:58 ` David Malcolm
2023-03-28 12:08 ` Eric Feng
2023-03-28 19:14 ` David Malcolm
2023-04-01 23:49 ` Eric Feng
2023-04-02 23:28 ` David Malcolm [this message]
2023-04-03 14:29 ` Eric Feng
2023-04-02 17:24 Sun Steven
2023-04-02 23:14 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=ef2648@columbia.edu \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).