public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin
@ 2023-04-02 17:24 Sun Steven
  2023-04-02 23:14 ` David Malcolm
  0 siblings, 1 reply; 9+ messages in thread
From: Sun Steven @ 2023-04-02 17:24 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]

Hi, Eric, Malcom,

Sorry that I didn't check this thread before.

It sounds like there are a lot of things to do. I want to offer some help.

Let me add some backgrounds of memory management in python here.


## Intro (for people unfamiliar with CPython)

Unlike programs written in C++, where the compiler automatically adds
destructors on all exit paths, CPython requires manual memory management
on PyObject*.

The current CPython has 2 major memory management mechanisms,
including reference counting and a mark-and-sweep gc for cyclic references.
The former acts as the major mechanism. PyObject gets destructed when
the refcount drops to zero.

## CPython has made great efforts to reduce memory errors.

With specific compile flags on, the CPython interpreter records the total
refcount, also it aborts when refcount drops below zero (being double freed).
This helps to discover memory leaks. PEP 683 (implemented in 3.12) also
introduced "immortal objects" with initial refcount 999999999, prevent it from
being accidentally freed (such as small integers).

Even with these features, CPython extension management is still a problem,
since most errors occur on "error-handling path", which is less likely to be
covered. And most users will not use a debug-build cpython, making the error
more under the surface.

## Why I want to participate in?

I am currently working on the initial implementations of PEP 701 (a new
f-string​ parser). During the testing, I discovered (and fixed) 3 memory leaks.
As you can see, even the most experienced CPython developers sometimes
forget to properly decrease refs. I think it will be inspiring if a new analysis
tool was made available as a compiler builtin. It will lead to a better CPython.


I do not know if GSoC allows collaborations. Maybe the headcount is limited,
or maybe I am too senior for GSoC. But I think I am still a rookie in front of
GCC.


I want to contribute, no matter the forms.

Yours


^ permalink raw reply	[flat|nested] 9+ messages in thread
* [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin
@ 2023-03-25 19:38 Eric Feng
  2023-03-26 15:58 ` David Malcolm
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Feng @ 2023-03-25 19:38 UTC (permalink / raw)
  To: gcc

Hi GCC community,

For GSoC, I am extremely interested in working on the selected project
idea with respect to extending the static analysis pass. In
particular, porting gcc-python-plugin's cpychecker to a plugin for GCC
-fanalyzer as described in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Please find an
initial draft of my proposal below and let me know if it is a
reasonable starting point. Please also correct me if I am
misunderstanding any particular tasks and let me know what areas I
should add more information for or what else I may do in preparation.

_______

Describe the project and clearly define its goals:
One pertinent use case of the gcc-python plugin is as a static
analysis tool for CPython extension modules. The main goal is to help
programmers writing extensions identify common coding errors. Broadly,
the goal of this project is to port the functionalities of cpychecker
to a -fanalyzer plugin.

Below is a brief description of the functionalities of the static
analysis tool for which I will work on porting over to a -fanalyzer
plugin. The structure of the objectives is taken from the
gcc-python-plugin documentation:

Reference count checking: Manipulation of PyObjects is done via the
CPython API and in particular with respect to the objects' reference
count. When the reference count belonging to an object drops to zero,
we should free all resources associated with it. This check helps
ensure programmers identify problems with the reference count
associated with an object. For example, memory leaks with respect to
forgetting to decrement the reference count of an object (analogous to
malloc() without corresponding free()) or perhaps object access after
the object's reference count is zero (analogous to access after
free()).

Error-handling checking: Various checks for common errors such as
dereferencing a NULL value.

Errors in exception-handling: Checks for situations in which functions
returning PyObject* that is NULL are not gracefully handled.

Format string checking: Verify that arguments to various CPython APIs
which take format strings are correct.

Associating PyTypeObject instances with compile-time-types: Verify
that the run-time type of a PyTypeObject matches its corresponding
compile-time type for inputs where both are provided.

Verification of PyMethodDef tables: Verify that the function in
PyMethodDef tables matches the calling convention of the ml_flags set.

I suspect a good starting point would be existing proof-of-concept
-fanalyzer plugins such as the CPython GIL example
(analyzer_gil_plugin). Please let me know of any additional pointers.
If there is time to spare, I think it is reasonable to extend the
capabilities of the original checker as well (more details in the
expected timeline below).

Provide an expected timeline:
I suspect the first task to take the longest since it is relatively
involved and it also includes getting the initial infrastructure of
the plugin up. From the experience of the first task, I hope the rest
of the tasks would be implemented faster. Additionally, I understand
that the current timeline outline below is too vague. I wished to
check in with the community for some feedback on whether I am in the
right ballpark before committing to more details.

Week 1 - 7: Reference counting checking
Week 8: Error-handling checking
Week 9: Errors in exception handling
Week 10: Format string checking
Week 11: Verification of PyMethodDef tables
Week 12: I am planning the last week to be safety in case any of the
above tasks take longer than initially expected. In case everything
goes smoothly and there is time to spare, I think it is reasonable to
spend the time extending the capabilities of the original pass. Some
ideas include extending the subset of CPython API that cpychecker
currently support, matching up similar traces to solve the issue of
duplicate error reports, and/or addressing any of the other caveats
currently mentioned in the cpychecker documentation. Additional ideas
are welcome of course.

Briefly introduce yourself and your skills and/or accomplishments:
I am a current Masters in Computer Science student at Columbia
University. I did my undergraduates at Bates College (B.A Math) and
Columbia University (B.S Computer Science) respectively. My interests
are primarily in systems, programming languages, and compilers.

At Columbia, I work in the group led by Professor Stephen Edwards on
the SSLANG programming language: a language built atop the Sparse
Synchronous Model. For more formal information on the Sparse
Synchronous Model, please take a look at "The Sparse Synchronous Model
on Real Hardware" (2022). Please find our repo, along with my
contributions, here: https://github.com/ssm-lang/sslang (my GitHub
handle is @efric). My main contribution to the compiler so far
involved adding a static inlining optimization pass with another
SSLANG team member. Our implementation is mostly based on the work
shown in "Secrets of the Glasgow Haskell Compiler Inliner" (2002),
with modifications as necessary to better fit our goals. The current
implementation is a work in progress and we are still working on
adding (many) more features to it. My work in this project is written
in Haskell.

I also conduct research in the Columbia Systems Lab. Specifically, my
group and I, advised by Professor Jason Nieh, work on secure
containerization with respect to untrusted software systems. Armv9-A
introduced Realms, secure execution environments that are opaque to
untrusted operating systems, as part of the Arm Confidential Compute
Architecture (CCA). Please find more information on CCA in "Design and
Verification of the Arm Confidential Compute Architecture" (2022).
Introduced together was the Realm Management Monitor (RMM), an
interface for hypervisors to allow secure virtualization utilizing
Realms and the new hardware support. Currently, the Realm isolation
boundary is at the level of entire VMs. We are working on applying
Realms to secure containers. Work in this project is mostly at the
kernel and firmware level and is written in C and ARM assembly.

Pertaining experience with compilers in addition to SSLANG, my
undergraduate education included a class on compilers that involved
writing passes for Clang/LLVM. More currently, I am taking a
graduate-level class on Types, Languages, and Compilers where my
partner and I are working on a plugin for our own small toy language
compiler which would be able to perform type inference. The plugin
would generate relevant constraints and solve them on behalf of the
compiler. This project is still in its early stages, but the idea is
to delegate type inference functionalities to a generic library given
some information instead of having to write your own constraint
solver.

_______

Thank you for reviewing my proposal!

Best,
Eric Feng

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-04-03 14:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-02 17:24 [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin Sun Steven
2023-04-02 23:14 ` David Malcolm
  -- strict thread matches above, loose matches on Subject: below --
2023-03-25 19:38 Eric Feng
2023-03-26 15:58 ` David Malcolm
2023-03-28 12:08   ` Eric Feng
2023-03-28 19:14     ` David Malcolm
2023-04-01 23:49       ` Eric Feng
2023-04-02 23:28         ` David Malcolm
2023-04-03 14:29           ` Eric Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).