From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00364e01.pphosted.com (mx0b-00364e01.pphosted.com [148.163.139.74]) by sourceware.org (Postfix) with ESMTPS id 7E2103858D35 for ; Sun, 30 Jul 2023 17:52:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E2103858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=columbia.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=columbia.edu Received: from pps.filterd (m0167076.ppops.net [127.0.0.1]) by mx0b-00364e01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 36UEtvMK029041 for ; Sun, 30 Jul 2023 13:52:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=columbia.edu; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=pps01; bh=55p6EvlkLFZ1Czk71uwW+YMw2H//O0aRI6IJAdPrXwU=; b=vUhIwpnH955iuX77FS28SOJqQ2QgUFhcx6tp6JglzhSMq/y1PiBG5iBCbfJUxtSPwuwV SpGR8DGjJG+rHwXDZepMLJP4LyVl4uo33K/YIchYTIhquMntkF3XfTcTaSW2WZ/WKF3w 97pnN06xbGQ8SCRHo4XHrk2DDdQVjJbfLHmV47D5CXvtWTeAmbxCTvYChkTC3IndMVJP 2h+KlWPoFuDtZNqtPqnEZLAcOnN7nPHDZf8/zccAOuGXQAnlKp/uyrKZwhZWYflGcsyF nPBznszWzVk/uJHHDriD1ime1yio6x4ofCjAhRAvAy4dFz5CUG5QPGRlOxVtr42SqtDp Jg== Received: from mail-vk1-f198.google.com (mail-vk1-f198.google.com [209.85.221.198]) by mx0b-00364e01.pphosted.com (PPS) with ESMTPS id 3s4uw076kj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Sun, 30 Jul 2023 13:52:38 -0400 Received: by mail-vk1-f198.google.com with SMTP id 71dfb90a1353d-4864688b677so3873823e0c.0 for ; Sun, 30 Jul 2023 10:52:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690739558; x=1691344358; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=55p6EvlkLFZ1Czk71uwW+YMw2H//O0aRI6IJAdPrXwU=; b=TS+SWjTgT04UlajFcARNxpgm4rAfrEOWB7f9ZjHQM5XOsQ1qrFNJ29MMobfdZdErbG JxarKcyyolZjMcAr7f3coA+KXSIQbNsWs0rgoLztCrE0PmWoIgtPRLCQK5XQKjMTMzk0 zUg8G8qV0od2CNYw2DwONexGZLEkf4N/9fa2WJbr6xpdGDVUdhUfCN822KhK6OBoYyZi kl8GGtKfQ38yEpjO1lz+V2y32WQTGlt1mmuCwymRFwu92oWSxlQC8AxMEBvQ6NWs+Rlm AgVH2uXCeMOAmG4qK+vx2S49pJKK4DtL2bjrccPtjeGaITv9bharyvo5Y8LsMrAFAlnR 5/ug== X-Gm-Message-State: ABy/qLa8mCCcs1uQiExF6GWhY9cE637oH1bZVlixsvH79tCVTKAnKbwf m/zgxL4PuJx9pesw3N9mewdMjX+8CcANcNbRCV8AiBQusq7zQptnXtoowFLTpFXjW/tY0W58gi9 iGt4PNFHRUvzr7YqUO6qZF3IM8kXGzMU= X-Received: by 2002:a1f:458a:0:b0:471:4e22:775d with SMTP id s132-20020a1f458a000000b004714e22775dmr2753024vka.2.1690739557724; Sun, 30 Jul 2023 10:52:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlEHWSmE0n9rObnYELGXErmfvKeFOkNq2KqvH1q2xSpODPCibATnGzBP+LbWQAW1EnmXB38aExlL24JOXh5d5PU= X-Received: by 2002:a1f:458a:0:b0:471:4e22:775d with SMTP id s132-20020a1f458a000000b004714e22775dmr2753022vka.2.1690739557376; Sun, 30 Jul 2023 10:52:37 -0700 (PDT) MIME-Version: 1.0 References: <969057b59e5cf472b73e8e1dedcc4a46630b31a0.camel@redhat.com> In-Reply-To: <969057b59e5cf472b73e8e1dedcc4a46630b31a0.camel@redhat.com> From: Eric Feng Date: Sun, 30 Jul 2023 13:52:25 -0400 Message-ID: Subject: Re: Update and Questions on CPython Extension Module -fanalyzer plugin development To: David Malcolm Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Proofpoint-ORIG-GUID: HWrnCKrYGE9ELwln-jQDs7yma6D1gfWu X-Proofpoint-GUID: HWrnCKrYGE9ELwln-jQDs7yma6D1gfWu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-27_10,2023-07-26_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 lowpriorityscore=10 bulkscore=10 phishscore=0 clxscore=1015 adultscore=0 spamscore=0 mlxscore=0 suspectscore=0 malwarescore=0 impostorscore=10 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2307300166 X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: [...] > As noted in our chat earlier, I don't think we can easily make these > work. Looking at CPython's implementation: PyList_Type's initializer > here: > https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101 > initializes tp_flags with the flags, but: > (a) we don't see that code when compiling a user's extension module > (b) even if we did, PyList_Type is non-const, so the analyzer has to > assume that tp_flags could have been written to since it was > initialized > > In theory we could specialcase such lookups, so that, say, a plugin > could register assumptions into the analyzer about the value of bits > within (PyList_Type.tp_flags). > > However, this seems like a future feature. I agree that it is more appropriate as a future feature. Recently, in preparation for a patch, I have been focusing on migrating as much of our plugin-specific functionality as possible, which is currently scattered across core analyzer files for convenience, into the plugin itself. Specifically, I am currently trying to transfer the code related to stashing Python-specific types and global variables into analyzer_cpython_plugin.c. This approach has three main benefits, among which some I believe we have previously discussed: 1) We only need to search for these values when initializing our plugin, instead of every time the analyzer is enabled. 2) We can extend the values that we stash by modifying only our plugin, avoiding changes to core analyzer files such as analyzer-language.cc, which seems a safer and more resilient approach. 3) Future analyzer plugins will have an easier time stashing values relevant to their respective projects. Let me know if my concerns or reasons appear unfounded. My initial approach involved adding a hook to the end of ana::on_finish_translation_unit which calls the relevant stashing-related callbacks registered during plugin initialization. Here's a rough sketch: void on_finish_translation_unit (const translation_unit &tu) { // ... existing code stash_named_constants (the_logger.get_logger (), tu); do_finish_translation_unit_callbacks(the_logger.get_logger (), tu); } Inside do_finish_translation_unit_callbacks we have a loop like so: for (auto& callback : finish_translation_unit_callbacks) { callback(logger, tu); } Where finish_translation_unit_callbacks is a vector defined as follows: typedef void (*finish_translation_unit_callback) (logger *, const translation_unit &); vec *finish_translation_unit_callbacks; To register a callback, we use: void register_finish_translation_unit_callback ( finish_translation_unit_callback callback) { if (!finish_translation_unit_callbacks) vec_alloc (finish_translation_unit_callbacks, 1); finish_translation_unit_callbacks->safe_push (callback); } And finally, from our plugin (or any other plugin), we can register callbacks like so: ana::register_finish_translation_unit_callback (&stash_named_types); ana::register_finish_translation_unit_callback (&stash_global_vars); However, on_finish_translation_unit runs before plugin initialization occurs, so, unfortunately, we would be registering our callbacks after on_finish_translation_unit with this method. As a workaround, I tried saving the translation unit like this: void on_finish_translation_unit (const translation_unit &tu) { // ... existing code stash_named_constants (the_logger.get_logger (), tu); saved_tu = &tu; } Then in our plugin: ana::register_finish_translation_unit_callback (&stash_named_types); ana::register_finish_translation_unit_callback (&stash_global_vars); ana:: do_finish_translation_unit_callbacks(); With do_finish_translation_units passing the stored_tu to the callbacks. Unfortunately, with this method, it seems like we encounter a segmentation fault when trying to call the lookup functions within translation_unit at the time of plugin initialization, even though the translation unit is stored correctly. So it seems like the solution may not be quite so simple. I'm currently investigating this issue, but if there's an obvious solution that I might be missing or any general suggestions, please let me know! Thanks as always, Eric