public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Eric Feng <ef2648@columbia.edu>
Cc: gcc@gcc.gnu.org
Subject: Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin
Date: Sun, 02 Apr 2023 19:28:09 -0400	[thread overview]
Message-ID: <5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com> (raw)
In-Reply-To: <CANGHATVBcN-qX3sBsXbfp1R7rnLM-8HFXX5dWNAHkiT5pdPkmA@mail.gmail.com>

On Sat, 2023-04-01 at 19:49 -0400, Eric Feng wrote:
> > For the task above, I think it's almost all there, it's "just" a
> > case
> > of implementing the special-case knowledge about the CPython API,
> > mostly via known_function subclasses.
> 
> Sounds good.
> 
> 
> > In cpychecker I added some custom function attributes:
> >  
> > https://gcc-python-plugin.readthedocs.io/en/latest/cpychecker.html
> > which were:
> >   __attribute__((cpychecker_returns_borrowed_ref))
> >   __attribute__((cpychecker_steals_reference_to_arg(n)))
> > 
> [...]
> > 
> > But exactly what these macros would look like would be a decision
> > for
> > the CPython community (hence do it via PEP, based on a sample
> > implementation).
> 
> Ok, I see what you mean now. Thanks for clarifying!
> 
> 
> > Yeah, this sounds like a big project.  Fortunately there are a lot
> > of
> > possible subtasks in this one, and the project has benefits to GCC
> > and
> > to CPython even if you only get a subset of the ideas done in the
> > time
> > available (refcount checking being probably the highest-value
> > subtask).
> 
> Sounds good.
> 
> I refactored the project description and timeline sections of the
> proposal according to our conversation. Notably, I moved format
> string
> checking to task #2 in the timeline since its subtasks are
> particularly beneficial. I also suggest in the timeline section to
> reach out to the CPython community via PEP about the specifics of new
> attributes in week 9/10 since I think we should have a somewhat
> mature
> prototype by that point. Let me know if you think it should be done
> earlier/later. Please find the changed sections below (I omitted
> unchanged sections for brevity)
> _______
> 
> Describe the project and clearly define its goals:
> One pertinent use case of the gcc-python plugin was as a static
> analysis tool for CPython extension modules. The main goal of the
> plugin was to help programmers writing extensions identify common
> coding errors. The gcc-python-plugin has bitrotted over the years
> and,
> in particular, cpychecker stopped working some GCC releases ago.
> Broadly, the goal of this project is to port the functionalities of
> cpychecker to a -fanalyzer plugin.
> 
> Below is a brief description of the functionalities of the static
> analysis tool for which I will work on porting over to a -fanalyzer
> plugin. The structure of the objectives is based on the
> gcc-python-plugin documentation:
> 
> Reference count checking: <Unchanged from original proposal>
> 
> Format string checking: Some CPython APIs such as PyArgs_ParseTuple,
> PyArg_ParseTupleAndKeywords, etc take format strings as arguments.
> This check involves verifying that the format strings taken in by
> these APIs are correct with respect to the number and types of
> arguments passed in. In particular, I will work on integrating the
> analyzer with -Wformat
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107017) and adding
> plugin support for -Wformat
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100121) . We should
> then
> be able to specify our own archetype which reflects the format string
> syntax for the relevant CPython APIs and take advantage of the
> integrated analyzer to check them.
> 
> Associating PyTypeObject instances with compile-time-types:
> <Unchanged
> from original proposal>
> 
> Error-handling checking (including errors in exception handling):
> Common errors such as dereferencing a NULL value are already checked
> by the analyzer. I will extend this functionality by implementing
> special-case knowledge about the CPython API.
> 
> Verification of PyMethodDef tables: <Unchanged from original
> proposal>
> 
> Provide an expected timeline:
> Please find a rough estimate of the weekly progress in relation to
> the
> features described below. Tasks that I expect to take longer than one
> week are broken down in more detail. In addition to what’s described,
> each task also involves adding test coverage pertaining its specific
> feature to a regression test suite.
> 
> Week 1 - 7: Reference counting checking
>     Week 1: Set up the overall infrastructure of the plugin and begin
> building core functionality
>     Week 1 - 6: Core reference counting functionality
>     Week 7: Refine prototype
> Week 8 - 10.5: Format string checking (including associating
> PyTypeObject instances with compile-time-types)
>     Week 8 - ~9: RFE: support printf-style formatted functions in -
> fanalyzer
>     Week ~9 - 10.5: RFE: plugin support for -Wformat via
> __attribute__((format()))
>     Additionally, begin conversing with CPython community via PEP
> about the exact form of new attributes on CPython headers which may
> be
> helpful for both humans and the static analyzer. Present ideas based
> on work done so far.
> Week 10.5 - 12: Error-handling checking, errors in exception
> handling,
> and verification of PyMethodDef tables
> 

Sounds great.

Note that the deadline for submitting proposals to the official GSoC
website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that
Google are very strict about that deadline; see:
https://developers.google.com/open-source/gsoc/timeline

Please include the biographical detail on yourself in the proposal that
you posted on the list, and if you can, link to C++ code you've
written.


I don't know if you saw the emails from Sun Steven, but they're also
interested in this project, perhaps as a collaboration with you.  Given
that the project is large and could be chopped up into several
components that might be a possibility - but don't feel like you need
to do that yourself in your proposal; as noted in the email I just
sent, we don't know how many slots we'll get from the GSoC program.

Good luck
Dave


  reply	other threads:[~2023-04-02 23:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25 19:38 Eric Feng
2023-03-26 15:58 ` David Malcolm
2023-03-28 12:08   ` Eric Feng
2023-03-28 19:14     ` David Malcolm
2023-04-01 23:49       ` Eric Feng
2023-04-02 23:28         ` David Malcolm [this message]
2023-04-03 14:29           ` Eric Feng
2023-04-02 17:24 Sun Steven
2023-04-02 23:14 ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ac8582d2f76f48133d4b933574d775863a347bf.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=ef2648@columbia.edu \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).