From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 17F603857710 for ; Tue, 25 Jul 2023 14:41:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 17F603857710 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690296085; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aBY270CN08jkVHqjpMPAtxR0Kep9VQJndpGUkA7xXS8=; b=CBYbaDdN0XnDeVjzO3XcDjEJjtqjbbne0o6VRYdQRoV37gtmvnGXMWr0ZEPwG1f8fS7k9Y NEzqQNJOCNsmsq8h9FjJrvuP7N6o7DHYnIyX+ywPcVOmQB54Etly1vcgh35n0VVufnOm0z dSUM3D0CheQ2PmUfLDI31245DfIAl14= Received: from mail-ot1-f69.google.com (mail-ot1-f69.google.com [209.85.210.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-422-lPZfDK9JPtCvUe0QLaG3gQ-1; Tue, 25 Jul 2023 10:41:24 -0400 X-MC-Unique: lPZfDK9JPtCvUe0QLaG3gQ-1 Received: by mail-ot1-f69.google.com with SMTP id 46e09a7af769-6bc59b0fff5so435268a34.0 for ; Tue, 25 Jul 2023 07:41:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690296083; x=1690900883; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=aBY270CN08jkVHqjpMPAtxR0Kep9VQJndpGUkA7xXS8=; b=J1YeUsXrVz/JPOI+wqbxlvoRYRWf2xBi321j92EcLgrHA6CzQLve+oWEeCZGJZ5kdx gxA9g7Q4LbnNwV02kUGlXpvoZrqtkWyWxnsO4Gg7MXWIyZI0xZmU4UfLdIsVrCX5uc/T lROJMSWIkzKz+mfPeFejIaUNZGoJQxyA3VANm4oQhBpevNWrvwzu7n1tJ1ihiN3wDydx fexOroeqttG/ze0GM7sljAZCMC4USdkx3TQ1E2brguUpe89+LLtNRVHaa/G6trKpj1Ag Lrb6EoGeENQvxvkKvKcJMrVd7rp1jydwsSmBMDKWN6uqZKFjU7DR7Y7GUcD6uxcpMORq zKzw== X-Gm-Message-State: ABy/qLarlQalKHP3AzDtzD65Bym6IyeV3Q1Dl7lLcG8WHM79JmuOjJAp dZRmGNc/bT7S72baZ/1wNR3bOEC3rU/dtARpVTxaC5QcQq23OjJbltSft0mXsTc97Mz3lIo7XLn gdiLwKmasZineSqM= X-Received: by 2002:a9d:7344:0:b0:6aa:ecb5:f186 with SMTP id l4-20020a9d7344000000b006aaecb5f186mr11263231otk.7.1690296083406; Tue, 25 Jul 2023 07:41:23 -0700 (PDT) X-Google-Smtp-Source: APBJJlGOP+7eqNEHIO6C3aVziykr3UdR58GVXfHgWM2fAJaZspyDCKribq0Aqfl48KxPlq4QAgUjXg== X-Received: by 2002:a9d:7344:0:b0:6aa:ecb5:f186 with SMTP id l4-20020a9d7344000000b006aaecb5f186mr11263216otk.7.1690296083135; Tue, 25 Jul 2023 07:41:23 -0700 (PDT) Received: from t14s.localdomain (c-76-28-97-5.hsd1.ma.comcast.net. [76.28.97.5]) by smtp.gmail.com with ESMTPSA id ou26-20020a05620a621a00b00767b37256ecsm3703506qkn.107.2023.07.25.07.41.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jul 2023 07:41:22 -0700 (PDT) Message-ID: Subject: Re: Update and Questions on CPython Extension Module -fanalyzer plugin development From: David Malcolm To: Eric Feng , gcc@gcc.gnu.org Date: Tue, 25 Jul 2023 10:41:21 -0400 In-Reply-To: References: User-Agent: Evolution 3.44.4 (3.44.4-2.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 2023-07-25 at 00:49 -0400, Eric Feng wrote: > Hi all, Hi Eric, thanks for the update. Various comments inline below... >=20 > I would like to update everyone on the progress of the static > analyzer > plugin for CPython extension module code. > Since the last update, I > have implemented known function subclasses for PyList_New and > PyList_Append. The existing known function subclasses have also been > enhanced to provide more information. For instance, we are now > simulating object type specific fields in addition to just ob_refcnt > and ob_type, which are shared by all PyObjects. Do you have any DejaGnu tests for this functionality? For example, given PyList_New https://docs.python.org/3/c-api/list.html#c.PyList_New there could be a test like: /* { dg-require-effective-target python_h } */ #define PY_SSIZE_T_CLEAN #include #include "analyzer-decls.h" PyObject * test_PyList_New (Py_ssize_t len) { PyObject *obj =3D PyList_New (len); if (obj) { __analyzer_eval (obj->ob_refcnt =3D=3D 1); /* { dg-warning "TRUE" } */ __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */ __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */ } else __analyzer_dump_path (); /* { dg-warning "path" } */ return obj; } ...or similar, to verify that we simulate that the call can both succeed and fail, and to verify properties of the store along the "success" path. Caveat: I didn't look at exactly what properties you're simulating, so the above tests might need adjusting. The: /* { dg-require-effective-target python_h } */ allows the testcase to bail out with "UNSUPPORTED" on hosts that don't have a suitable Python.h header file installed, so that all this Python-specific functionality is optional. You could implement this via a new function "check_effective_target_python_h" in gcc/testsuite/lib/target-supports.exp, similar to the existing check_effective_target_ functions in that Tcl file. >=20 > Regarding reference count checking, I have implemented a naive > traversal of the store to count the actual reference count of > PyObjects, allowing us to compare it against the ob_refcnt fields of > the same PyObjects. Although we can compare the actual reference count > and the ob_refcnt field, I am still working on implementing a > diagnostic to alert about this issue. Sounds promising. >=20 > In addition to the progress update, I have some implementation > related > questions and would appreciate any input. The current moment at which > we run the algorithm for reference count checking, and thereby also > the moment at which we may want to issue > impl_region_model_context::warn, is within region_model::pop_frame. > However, it appears that m_stmt and m_stmt_finder are NULL at the > time > of region_model::pop_frame, which results in the diagnostic for the > reference count getting rejected. I am having trouble finding a > workaround for this issue, so any ideas would be welcome. FWIW I've run into a similar issue, and so did Tim last year. The whole stmt vs stmt_finder thing is rather ugly, and I have a local branch with a part-written reworking of it that I hope will solve these issues (adding a "class pending_location"), but it's very messy and far from being ready. Sorry about this. In theory you can provide an instance of your own custom stmt_finder subclass when you save the pending_diagnostic. There's an example of doing this in engine.cc: impl_region_model_context::on_state_leak, where the leak is going to be reported at the end of a function (similar to your diagnostic); it uses a leak_stmt_finder class, which simply scans backwards once it has an exploded_path to find the last stmt that had a useful location_t value. This is a nasty hack, but probably does what you need. >=20 > I am also currently examining some issues related to state merging. Note that you can use -fno-analyzer-state-merge to turn off the state merging code, which can be very useful when debugging this kind of thing. > Let's consider the following example which lacks error checking: >=20 > PyObject* foo() { > =C2=A0=C2=A0=C2=A0 PyObject item =3D PyLong_FromLong(10); > =C2=A0=C2=A0=C2=A0 PyObject list =3D PyList_New(5); > =C2=A0=C2=A0=C2=A0 return list; > } >=20 > The states for when PyLong_FromLong fails and when PyLong_FromLong > succeeds are merged before the call to PyObject* list =3D > PyList_New(5). Ideally we would emit a leak warning about the "success" case of PyLong_FromLong here. I think you're running into the problem of the "store" part of the program_state being separate from the "malloc" state machine part of program_state - I'm guessing that you're creating a heap_allocated_region for the new python object, but the "malloc" state machine isn't transitioning the pointer from "start" to "assumed- non-null". Such state machine states inhibit state-merging, and so this might solve your state-merging problem. I think we need a way to call malloc_state_machine::on_allocator_call from outside of sm-malloc.cc. See region_model::on_realloc_with_move for an example of how to do something similar. > I suspect this may be related to me not correctly handling behavior > that arises due to the analyzer deterministically selecting the IDs > for heap allocations. Since there is a heap allocation for PyList_New > following PyLong_FromLong, the success and fail cases for > PyLong_FromLong are merged. I believe this is so that in the scenario > where PyLong_FromLong fails and PyList_New succeeds, the ID for the > region allocated for PyList_New wouldn't be the same as the > PyLong_FromLong success case. Whatever the cause, due to this state > merge, the heap allocated region representing PyObject *item has all > its fields set to UNKNOWN, making it impossible to perform the > reference count checking functionality. I attempted to fix this by > wrapping the svalue representing PyLongObject with > get_or_create_unmergeable, but it didn't seem to help.=C2=A0 Strange; I would have thought that would have fixed it. Can you post the specific code you tried? > However, this > issue doesn't occur in all situations. For instance: >=20 > PyObject* foo() { > =C2=A0=C2=A0=C2=A0 PyObject item =3D PyLong_FromLong(10); > =C2=A0=C2=A0=C2=A0 PyObject list =3D PyList_New(5); > =C2=A0=C2=A0=C2=A0 PyList_Append(list, item); > =C2=A0=C2=A0=C2=A0 return list; > } >=20 > The above scenario is simulated as expected. I will continue to > search > for a solution, but any suggestions would be highly appreciated. > Thank > you! Thanks again for the update; hope the above is helpful Dave