public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Eric Feng <ef2648@columbia.edu>
To: gcc@gcc.gnu.org
Subject: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin
Date: Sat, 25 Mar 2023 15:38:09 -0400	[thread overview]
Message-ID: <CANGHATW9MARRSmMmrAr266LymWn8ERTCbs+Hh6sbFU+RR95_qA@mail.gmail.com> (raw)

Hi GCC community,

For GSoC, I am extremely interested in working on the selected project
idea with respect to extending the static analysis pass. In
particular, porting gcc-python-plugin's cpychecker to a plugin for GCC
-fanalyzer as described in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Please find an
initial draft of my proposal below and let me know if it is a
reasonable starting point. Please also correct me if I am
misunderstanding any particular tasks and let me know what areas I
should add more information for or what else I may do in preparation.

_______

Describe the project and clearly define its goals:
One pertinent use case of the gcc-python plugin is as a static
analysis tool for CPython extension modules. The main goal is to help
programmers writing extensions identify common coding errors. Broadly,
the goal of this project is to port the functionalities of cpychecker
to a -fanalyzer plugin.

Below is a brief description of the functionalities of the static
analysis tool for which I will work on porting over to a -fanalyzer
plugin. The structure of the objectives is taken from the
gcc-python-plugin documentation:

Reference count checking: Manipulation of PyObjects is done via the
CPython API and in particular with respect to the objects' reference
count. When the reference count belonging to an object drops to zero,
we should free all resources associated with it. This check helps
ensure programmers identify problems with the reference count
associated with an object. For example, memory leaks with respect to
forgetting to decrement the reference count of an object (analogous to
malloc() without corresponding free()) or perhaps object access after
the object's reference count is zero (analogous to access after
free()).

Error-handling checking: Various checks for common errors such as
dereferencing a NULL value.

Errors in exception-handling: Checks for situations in which functions
returning PyObject* that is NULL are not gracefully handled.

Format string checking: Verify that arguments to various CPython APIs
which take format strings are correct.

Associating PyTypeObject instances with compile-time-types: Verify
that the run-time type of a PyTypeObject matches its corresponding
compile-time type for inputs where both are provided.

Verification of PyMethodDef tables: Verify that the function in
PyMethodDef tables matches the calling convention of the ml_flags set.

I suspect a good starting point would be existing proof-of-concept
-fanalyzer plugins such as the CPython GIL example
(analyzer_gil_plugin). Please let me know of any additional pointers.
If there is time to spare, I think it is reasonable to extend the
capabilities of the original checker as well (more details in the
expected timeline below).

Provide an expected timeline:
I suspect the first task to take the longest since it is relatively
involved and it also includes getting the initial infrastructure of
the plugin up. From the experience of the first task, I hope the rest
of the tasks would be implemented faster. Additionally, I understand
that the current timeline outline below is too vague. I wished to
check in with the community for some feedback on whether I am in the
right ballpark before committing to more details.

Week 1 - 7: Reference counting checking
Week 8: Error-handling checking
Week 9: Errors in exception handling
Week 10: Format string checking
Week 11: Verification of PyMethodDef tables
Week 12: I am planning the last week to be safety in case any of the
above tasks take longer than initially expected. In case everything
goes smoothly and there is time to spare, I think it is reasonable to
spend the time extending the capabilities of the original pass. Some
ideas include extending the subset of CPython API that cpychecker
currently support, matching up similar traces to solve the issue of
duplicate error reports, and/or addressing any of the other caveats
currently mentioned in the cpychecker documentation. Additional ideas
are welcome of course.

Briefly introduce yourself and your skills and/or accomplishments:
I am a current Masters in Computer Science student at Columbia
University. I did my undergraduates at Bates College (B.A Math) and
Columbia University (B.S Computer Science) respectively. My interests
are primarily in systems, programming languages, and compilers.

At Columbia, I work in the group led by Professor Stephen Edwards on
the SSLANG programming language: a language built atop the Sparse
Synchronous Model. For more formal information on the Sparse
Synchronous Model, please take a look at "The Sparse Synchronous Model
on Real Hardware" (2022). Please find our repo, along with my
contributions, here: https://github.com/ssm-lang/sslang (my GitHub
handle is @efric). My main contribution to the compiler so far
involved adding a static inlining optimization pass with another
SSLANG team member. Our implementation is mostly based on the work
shown in "Secrets of the Glasgow Haskell Compiler Inliner" (2002),
with modifications as necessary to better fit our goals. The current
implementation is a work in progress and we are still working on
adding (many) more features to it. My work in this project is written
in Haskell.

I also conduct research in the Columbia Systems Lab. Specifically, my
group and I, advised by Professor Jason Nieh, work on secure
containerization with respect to untrusted software systems. Armv9-A
introduced Realms, secure execution environments that are opaque to
untrusted operating systems, as part of the Arm Confidential Compute
Architecture (CCA). Please find more information on CCA in "Design and
Verification of the Arm Confidential Compute Architecture" (2022).
Introduced together was the Realm Management Monitor (RMM), an
interface for hypervisors to allow secure virtualization utilizing
Realms and the new hardware support. Currently, the Realm isolation
boundary is at the level of entire VMs. We are working on applying
Realms to secure containers. Work in this project is mostly at the
kernel and firmware level and is written in C and ARM assembly.

Pertaining experience with compilers in addition to SSLANG, my
undergraduate education included a class on compilers that involved
writing passes for Clang/LLVM. More currently, I am taking a
graduate-level class on Types, Languages, and Compilers where my
partner and I are working on a plugin for our own small toy language
compiler which would be able to perform type inference. The plugin
would generate relevant constraints and solve them on behalf of the
compiler. This project is still in its early stages, but the idea is
to delegate type inference functionalities to a generic library given
some information instead of having to write your own constraint
solver.

_______

Thank you for reviewing my proposal!

Best,
Eric Feng

             reply	other threads:[~2023-03-25 19:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-25 19:38 Eric Feng [this message]
2023-03-26 15:58 ` David Malcolm
2023-03-28 12:08   ` Eric Feng
2023-03-28 19:14     ` David Malcolm
2023-04-01 23:49       ` Eric Feng
2023-04-02 23:28         ` David Malcolm
2023-04-03 14:29           ` Eric Feng
2023-04-02 17:24 Sun Steven
2023-04-02 23:14 ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANGHATW9MARRSmMmrAr266LymWn8ERTCbs+Hh6sbFU+RR95_qA@mail.gmail.com \
    --to=ef2648@columbia.edu \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).