From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00364e01.pphosted.com (mx0a-00364e01.pphosted.com [148.163.135.74]) by sourceware.org (Postfix) with ESMTPS id 715E03858D20 for ; Sat, 25 Mar 2023 19:38:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 715E03858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167068.ppops.net [127.0.0.1]) by mx0a-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 32PJVs7V008698 for ; Sat, 25 Mar 2023 15:38:23 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : from : date : message-id : subject : to : content-type; s=pps01; bh=CzMsLNluW/anNWDhMVoft+hQH7wkxE5DXXUSacScQ0U=; b=KXMJnp7Az8YGNHXCO3AgtDNeIkOCdeSC2ST7o9bTk6Ky7CH41uoUAuGPpSFxlpzGhyzR rwpBR3GGKBMgiIavOsR/NhsdUgbLhq5BGxCWvC1NWRn0fGLKlfZFAaMVGyc12M0Z17x7 nycR3o98m0gzxYZEJFnD5zL9dHMB/ASGOPiZaeOxCz2l/cvPnpm5HBNxNSzqUctHEZHv a8DnhSKAOvJRsmMMRBjfG2mMHSkcR7SqRuUxTPxgQBM1+TE51wPkOadbslGLhzBMoxOk N9WBD3qQML+p63sf1yp5BGnxrQ3eZFbQGBGgJCYLQ/79bknHHzf7/8TRZSQmE/EHWpoP Ng== Received: from sendprdmail21.cc.columbia.edu (sendprdmail21.cc.columbia.edu [128.59.72.23]) by mx0a-00364e01.pphosted.com (PPS) with ESMTPS id 3phu7yk5fx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 25 Mar 2023 15:38:22 -0400 Received: from mail-ua1-f70.google.com (mail-ua1-f70.google.com [209.85.222.70]) by sendprdmail21.cc.columbia.edu (8.14.7/8.14.4) with ESMTP id 32PJc2tZ121576 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sat, 25 Mar 2023 15:38:21 -0400 Received: by mail-ua1-f70.google.com with SMTP id o4-20020ab01c44000000b0074031f8fe21so2563891uaj.18 for ; Sat, 25 Mar 2023 12:38:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679773101; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=CzMsLNluW/anNWDhMVoft+hQH7wkxE5DXXUSacScQ0U=; b=3VWHTif+b5E+/aM9+jlt7LKm9/dHmkRSDA4vm8E59KsspYiU1Eymuz7//g0LFyvLWQ C63krGt3KoK0ICkvTPzh/Z0DY3mQxFH1wlbYkawejNx7V5+M/D5gUiiOff1/7N9RxlRc i8Qt53dTMr/EQrIgQqWr5ijkyN9GE9Z37Cqo/PlYGE/X7sJ4w8bGkVrslwm2LoV9ZAyI QPaFHZgHkL0YCsJMXjm8GThryL/ms8qc6jQakgRGVZATCjQCYl7/nHRWgz0MCHxhmeYb TV20LfjN3S7LH/NTORJbvgmGNAew5cPH5O/0w6rqklWEwiaTbb20gCC09pduZ7ZcEaDo BCvA== X-Gm-Message-State: AAQBX9cyV8s3fMHAXGumlp3lDnRzQEGLgjKWREOarFmXJCU6XGI3/EA6 x/IOvNkphdt1lDThbxVyyIlzK/E0/yRdh/Rtr0xkbbb8KMzzJlmtFU8TAcDmWPzN2XvVpiQ+CpM R04se8E8yEsoAt6+PJrP9MghsS0OxgW75nfeyzsmpqA== X-Received: by 2002:a1f:944e:0:b0:436:998e:a71e with SMTP id w75-20020a1f944e000000b00436998ea71emr3849787vkd.3.1679773101162; Sat, 25 Mar 2023 12:38:21 -0700 (PDT) X-Google-Smtp-Source: AKy350bsq0kYJfqbJkEL7c72HfoILzZxMSxsl2Sfo2SFV5atVLSpEWAALvhPxjSHTs53FhBHMJYKRhsRo1jUOPSLYSU= X-Received: by 2002:a1f:944e:0:b0:436:998e:a71e with SMTP id w75-20020a1f944e000000b00436998ea71emr3849782vkd.3.1679773100617; Sat, 25 Mar 2023 12:38:20 -0700 (PDT) MIME-Version: 1.0 From: Eric Feng Date: Sat, 25 Mar 2023 15:38:09 -0400 Message-ID: Subject: [GSoC] Interest and initial proposal for project on reimplementing cpychecker as -fanalyzer plugin To: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Proofpoint-ORIG-GUID: NTl7ISJZuEx5ThJFew-XqrWEqW0oaZ41 X-Proofpoint-GUID: NTl7ISJZuEx5ThJFew-XqrWEqW0oaZ41 X-CU-OB: Yes X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-24_11,2023-03-24_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 lowpriorityscore=10 spamscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 impostorscore=10 phishscore=0 priorityscore=1501 clxscore=1011 bulkscore=10 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2303250157 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi GCC community, For GSoC, I am extremely interested in working on the selected project idea with respect to extending the static analysis pass. In particular, porting gcc-python-plugin's cpychecker to a plugin for GCC -fanalyzer as described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107646. Please find an initial draft of my proposal below and let me know if it is a reasonable starting point. Please also correct me if I am misunderstanding any particular tasks and let me know what areas I should add more information for or what else I may do in preparation. _______ Describe the project and clearly define its goals: One pertinent use case of the gcc-python plugin is as a static analysis tool for CPython extension modules. The main goal is to help programmers writing extensions identify common coding errors. Broadly, the goal of this project is to port the functionalities of cpychecker to a -fanalyzer plugin. Below is a brief description of the functionalities of the static analysis tool for which I will work on porting over to a -fanalyzer plugin. The structure of the objectives is taken from the gcc-python-plugin documentation: Reference count checking: Manipulation of PyObjects is done via the CPython API and in particular with respect to the objects' reference count. When the reference count belonging to an object drops to zero, we should free all resources associated with it. This check helps ensure programmers identify problems with the reference count associated with an object. For example, memory leaks with respect to forgetting to decrement the reference count of an object (analogous to malloc() without corresponding free()) or perhaps object access after the object's reference count is zero (analogous to access after free()). Error-handling checking: Various checks for common errors such as dereferencing a NULL value. Errors in exception-handling: Checks for situations in which functions returning PyObject* that is NULL are not gracefully handled. Format string checking: Verify that arguments to various CPython APIs which take format strings are correct. Associating PyTypeObject instances with compile-time-types: Verify that the run-time type of a PyTypeObject matches its corresponding compile-time type for inputs where both are provided. Verification of PyMethodDef tables: Verify that the function in PyMethodDef tables matches the calling convention of the ml_flags set. I suspect a good starting point would be existing proof-of-concept -fanalyzer plugins such as the CPython GIL example (analyzer_gil_plugin). Please let me know of any additional pointers. If there is time to spare, I think it is reasonable to extend the capabilities of the original checker as well (more details in the expected timeline below). Provide an expected timeline: I suspect the first task to take the longest since it is relatively involved and it also includes getting the initial infrastructure of the plugin up. From the experience of the first task, I hope the rest of the tasks would be implemented faster. Additionally, I understand that the current timeline outline below is too vague. I wished to check in with the community for some feedback on whether I am in the right ballpark before committing to more details. Week 1 - 7: Reference counting checking Week 8: Error-handling checking Week 9: Errors in exception handling Week 10: Format string checking Week 11: Verification of PyMethodDef tables Week 12: I am planning the last week to be safety in case any of the above tasks take longer than initially expected. In case everything goes smoothly and there is time to spare, I think it is reasonable to spend the time extending the capabilities of the original pass. Some ideas include extending the subset of CPython API that cpychecker currently support, matching up similar traces to solve the issue of duplicate error reports, and/or addressing any of the other caveats currently mentioned in the cpychecker documentation. Additional ideas are welcome of course. Briefly introduce yourself and your skills and/or accomplishments: I am a current Masters in Computer Science student at Columbia University. I did my undergraduates at Bates College (B.A Math) and Columbia University (B.S Computer Science) respectively. My interests are primarily in systems, programming languages, and compilers. At Columbia, I work in the group led by Professor Stephen Edwards on the SSLANG programming language: a language built atop the Sparse Synchronous Model. For more formal information on the Sparse Synchronous Model, please take a look at "The Sparse Synchronous Model on Real Hardware" (2022). Please find our repo, along with my contributions, here: https://github.com/ssm-lang/sslang (my GitHub handle is @efric). My main contribution to the compiler so far involved adding a static inlining optimization pass with another SSLANG team member. Our implementation is mostly based on the work shown in "Secrets of the Glasgow Haskell Compiler Inliner" (2002), with modifications as necessary to better fit our goals. The current implementation is a work in progress and we are still working on adding (many) more features to it. My work in this project is written in Haskell. I also conduct research in the Columbia Systems Lab. Specifically, my group and I, advised by Professor Jason Nieh, work on secure containerization with respect to untrusted software systems. Armv9-A introduced Realms, secure execution environments that are opaque to untrusted operating systems, as part of the Arm Confidential Compute Architecture (CCA). Please find more information on CCA in "Design and Verification of the Arm Confidential Compute Architecture" (2022). Introduced together was the Realm Management Monitor (RMM), an interface for hypervisors to allow secure virtualization utilizing Realms and the new hardware support. Currently, the Realm isolation boundary is at the level of entire VMs. We are working on applying Realms to secure containers. Work in this project is mostly at the kernel and firmware level and is written in C and ARM assembly. Pertaining experience with compilers in addition to SSLANG, my undergraduate education included a class on compilers that involved writing passes for Clang/LLVM. More currently, I am taking a graduate-level class on Types, Languages, and Compilers where my partner and I are working on a plugin for our own small toy language compiler which would be able to perform type inference. The plugin would generate relevant constraints and solve them on behalf of the compiler. This project is still in its early stages, but the idea is to delegate type inference functionalities to a generic library given some information instead of having to write your own constraint solver. _______ Thank you for reviewing my proposal! Best, Eric Feng