Update and Questions on CPython Extension Module -fanalyzer plugin development

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Update and Questions on CPython Extension Module -fanalyzer plugin development
@ 2023-07-25  4:49 Eric Feng
  2023-07-25 14:41 ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-07-25  4:49 UTC (permalink / raw)
  To: gcc; +Cc: David Malcolm

Hi all,

I would like to update everyone on the progress of the static analyzer
plugin for CPython extension module code. Since the last update, I
have implemented known function subclasses for PyList_New and
PyList_Append. The existing known function subclasses have also been
enhanced to provide more information. For instance, we are now
simulating object type specific fields in addition to just ob_refcnt
and ob_type, which are shared by all PyObjects.

Regarding reference count checking, I have implemented a naive
traversal of the store to count the actual reference count of
PyObjects, allowing us to compare it against the ob_refcnt fields of
the same PyObjects. Although we can compare the actual reference count
and the ob_refcnt field, I am still working on implementing a
diagnostic to alert about this issue.

In addition to the progress update, I have some implementation related
questions and would appreciate any input. The current moment at which
we run the algorithm for reference count checking, and thereby also
the moment at which we may want to issue
impl_region_model_context::warn, is within region_model::pop_frame.
However, it appears that m_stmt and m_stmt_finder are NULL at the time
of region_model::pop_frame, which results in the diagnostic for the
reference count getting rejected. I am having trouble finding a
workaround for this issue, so any ideas would be welcome.

I am also currently examining some issues related to state merging.
Let's consider the following example which lacks error checking:

PyObject* foo() {
    PyObject item = PyLong_FromLong(10);
    PyObject list = PyList_New(5);
    return list;
}

The states for when PyLong_FromLong fails and when PyLong_FromLong
succeeds are merged before the call to PyObject* list = PyList_New(5).
I suspect this may be related to me not correctly handling behavior
that arises due to the analyzer deterministically selecting the IDs
for heap allocations. Since there is a heap allocation for PyList_New
following PyLong_FromLong, the success and fail cases for
PyLong_FromLong are merged. I believe this is so that in the scenario
where PyLong_FromLong fails and PyList_New succeeds, the ID for the
region allocated for PyList_New wouldn't be the same as the
PyLong_FromLong success case. Whatever the cause, due to this state
merge, the heap allocated region representing PyObject *item has all
its fields set to UNKNOWN, making it impossible to perform the
reference count checking functionality. I attempted to fix this by
wrapping the svalue representing PyLongObject with
get_or_create_unmergeable, but it didn't seem to help. However, this
issue doesn't occur in all situations. For instance:

PyObject* foo() {
    PyObject item = PyLong_FromLong(10);
    PyObject list = PyList_New(5);
    PyList_Append(list, item);
    return list;
}

The above scenario is simulated as expected. I will continue to search
for a solution, but any suggestions would be highly appreciated. Thank
you!

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-25  4:49 Update and Questions on CPython Extension Module -fanalyzer plugin development Eric Feng
@ 2023-07-25 14:41 ` David Malcolm
  2023-07-27 22:13   ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-07-25 14:41 UTC (permalink / raw)
  To: Eric Feng, gcc

On Tue, 2023-07-25 at 00:49 -0400, Eric Feng wrote:
> Hi all,

Hi Eric, thanks for the update.

Various comments inline below...

> 
> I would like to update everyone on the progress of the static
> analyzer
> plugin for CPython extension module code.
>  Since the last update, I
> have implemented known function subclasses for PyList_New and
> PyList_Append. The existing known function subclasses have also been
> enhanced to provide more information. For instance, we are now
> simulating object type specific fields in addition to just ob_refcnt
> and ob_type, which are shared by all PyObjects.

Do you have any DejaGnu tests for this functionality?  For example,
given PyList_New
  https://docs.python.org/3/c-api/list.html#c.PyList_New
there could be a test like:

/* { dg-require-effective-target python_h } */

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include "analyzer-decls.h"

PyObject *
test_PyList_New (Py_ssize_t len)
{
  PyObject *obj = PyList_New (len);
  if (obj)
    {
     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
     __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */
     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
    }
  else
    __analyzer_dump_path (); /* { dg-warning "path" } */
  return obj;
}

...or similar, to verify that we simulate that the call can both
succeed and fail, and to verify properties of the store along the
"success" path.  Caveat: I didn't look at exactly what properties
you're simulating, so the above tests might need adjusting.

The:
  /* { dg-require-effective-target python_h } */
allows the testcase to bail out with "UNSUPPORTED" on hosts that don't
have a suitable Python.h header file installed, so that all this
Python-specific functionality is optional.  You could implement this
via a new function "check_effective_target_python_h" in
gcc/testsuite/lib/target-supports.exp, similar to the existing
check_effective_target_ functions in that Tcl file.

> 
> Regarding reference count checking, I have implemented a naive
> traversal of the store to count the actual reference count of
> PyObjects, allowing us to compare it against the ob_refcnt fields of
> the same PyObjects. Although we can compare the actual reference count
> and the ob_refcnt field, I am still working on implementing a
> diagnostic to alert about this issue.

Sounds promising.

> 
> In addition to the progress update, I have some implementation
> related
> questions and would appreciate any input. The current moment at which
> we run the algorithm for reference count checking, and thereby also
> the moment at which we may want to issue
> impl_region_model_context::warn, is within region_model::pop_frame.
> However, it appears that m_stmt and m_stmt_finder are NULL at the
> time
> of region_model::pop_frame, which results in the diagnostic for the
> reference count getting rejected. I am having trouble finding a
> workaround for this issue, so any ideas would be welcome.

FWIW I've run into a similar issue, and so did Tim last year.  The
whole stmt vs stmt_finder thing is rather ugly, and I have a local
branch with a part-written reworking of it that I hope will solve these
issues (adding a "class pending_location"), but it's very messy and far
from being ready.  Sorry about this.

In theory you can provide an instance of your own custom stmt_finder
subclass when you save the pending_diagnostic.  There's an example of
doing this in engine.cc: impl_region_model_context::on_state_leak,
where the leak is going to be reported at the end of a function
(similar to your diagnostic); it uses a leak_stmt_finder class, which
simply scans backwards once it has an exploded_path to find the last
stmt that had a useful location_t value.  This is a nasty hack, but
probably does what you need.

> 
> I am also currently examining some issues related to state merging.

Note that you can use -fno-analyzer-state-merge to turn off the state
merging code, which can be very useful when debugging this kind of
thing.

> Let's consider the following example which lacks error checking:
> 
> PyObject* foo() {
>     PyObject item = PyLong_FromLong(10);
>     PyObject list = PyList_New(5);
>     return list;
> }
> 
> The states for when PyLong_FromLong fails and when PyLong_FromLong
> succeeds are merged before the call to PyObject* list =
> PyList_New(5).

Ideally we would emit a leak warning about the "success" case of
PyLong_FromLong here.  I think you're running into the problem of the
"store" part of the program_state being separate from the "malloc"
state machine part of program_state - I'm guessing that you're creating
a heap_allocated_region for the new python object, but the "malloc"
state machine isn't transitioning the pointer from "start" to "assumed-
non-null".  Such state machine states inhibit state-merging, and so
this might solve your state-merging problem.

I think we need a way to call malloc_state_machine::on_allocator_call
from outside of sm-malloc.cc.  See region_model::on_realloc_with_move
for an example of how to do something similar.

> I suspect this may be related to me not correctly handling behavior
> that arises due to the analyzer deterministically selecting the IDs
> for heap allocations. Since there is a heap allocation for PyList_New
> following PyLong_FromLong, the success and fail cases for
> PyLong_FromLong are merged. I believe this is so that in the scenario
> where PyLong_FromLong fails and PyList_New succeeds, the ID for the
> region allocated for PyList_New wouldn't be the same as the
> PyLong_FromLong success case. Whatever the cause, due to this state
> merge, the heap allocated region representing PyObject *item has all
> its fields set to UNKNOWN, making it impossible to perform the
> reference count checking functionality. I attempted to fix this by
> wrapping the svalue representing PyLongObject with
> get_or_create_unmergeable, but it didn't seem to help. 

Strange; I would have thought that would have fixed it.  Can you post
the specific code you tried?

> However, this
> issue doesn't occur in all situations. For instance:
> 
> PyObject* foo() {
>     PyObject item = PyLong_FromLong(10);
>     PyObject list = PyList_New(5);
>     PyList_Append(list, item);
>     return list;
> }
> 
> The above scenario is simulated as expected. I will continue to
> search
> for a solution, but any suggestions would be highly appreciated.
> Thank
> you!

Thanks again for the update; hope the above is helpful
Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-25 14:41 ` David Malcolm
@ 2023-07-27 22:13   ` Eric Feng
  2023-07-27 22:35     ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-07-27 22:13 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi Dave,

Thanks for the comments!

[...]
> Do you have any DejaGnu tests for this functionality?  For example,
> given PyList_New
>   https://docs.python.org/3/c-api/list.html#c.PyList_New
> there could be a test like:
>
> /* { dg-require-effective-target python_h } */
>
> #define PY_SSIZE_T_CLEAN
> #include <Python.h>
> #include "analyzer-decls.h"
>
> PyObject *
> test_PyList_New (Py_ssize_t len)
> {
>   PyObject *obj = PyList_New (len);
>   if (obj)
>     {
>      __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
>      __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" } */
>      __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
>     }
>   else
>     __analyzer_dump_path (); /* { dg-warning "path" } */
>   return obj;
> }
>
> ...or similar, to verify that we simulate that the call can both
> succeed and fail, and to verify properties of the store along the
> "success" path.  Caveat: I didn't look at exactly what properties
> you're simulating, so the above tests might need adjusting.
>

I am currently in the process of developing more tests. Specific to
the test you provided as an example, we are passing all cases except
for PyList_Check. PyList_Check does not pass because I have not yet
added support for the various definitions of tp_flags. I also
encountered a minor hiccup where PyList_CheckExact appeared to give
"UNKNOWN" rather than "TRUE", but this has since been fixed. The
problem was caused by accidentally using the tree representation of
struct PyList_Type as opposed to struct PyList_Type * when creating a
pointer sval to the region for Pylist_Type.

[...]
>
> > Let's consider the following example which lacks error checking:
> >
> > PyObject* foo() {
> >     PyObject item = PyLong_FromLong(10);
> >     PyObject list = PyList_New(5);
> >     return list;
> > }
> >
> > The states for when PyLong_FromLong fails and when PyLong_FromLong
> > succeeds are merged before the call to PyObject* list =
> > PyList_New(5).
>
> Ideally we would emit a leak warning about the "success" case of
> PyLong_FromLong here.  I think you're running into the problem of the
> "store" part of the program_state being separate from the "malloc"
> state machine part of program_state - I'm guessing that you're creating
> a heap_allocated_region for the new python object, but the "malloc"
> state machine isn't transitioning the pointer from "start" to "assumed-
> non-null".  Such state machine states inhibit state-merging, and so
> this might solve your state-merging problem.
>
> I think we need a way to call malloc_state_machine::on_allocator_call
> from outside of sm-malloc.cc.  See region_model::on_realloc_with_move
> for an example of how to do something similar.
>

Thank you for the suggestion — this worked great and has solved the issue!

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-27 22:13   ` Eric Feng
@ 2023-07-27 22:35     ` David Malcolm
  2023-07-30 17:52       ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-07-27 22:35 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Thu, 2023-07-27 at 18:13 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Thanks for the comments!
> 
> [...]
> > Do you have any DejaGnu tests for this functionality?  For example,
> > given PyList_New
> >   https://docs.python.org/3/c-api/list.html#c.PyList_New
> > there could be a test like:
> > 
> > /* { dg-require-effective-target python_h } */
> > 
> > #define PY_SSIZE_T_CLEAN
> > #include <Python.h>
> > #include "analyzer-decls.h"
> > 
> > PyObject *
> > test_PyList_New (Py_ssize_t len)
> > {
> >   PyObject *obj = PyList_New (len);
> >   if (obj)
> >     {
> >      __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE"
> > } */
> >      __analyzer_eval (PyList_Check (obj)); /* { dg-warning "TRUE" }
> > */
> >      __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning
> > "TRUE" } */
> >     }
> >   else
> >     __analyzer_dump_path (); /* { dg-warning "path" } */
> >   return obj;
> > }
> > 
> > ...or similar, to verify that we simulate that the call can both
> > succeed and fail, and to verify properties of the store along the
> > "success" path.  Caveat: I didn't look at exactly what properties
> > you're simulating, so the above tests might need adjusting.
> > 
> 
> I am currently in the process of developing more tests. Specific to
> the test you provided as an example, we are passing all cases except
> for PyList_Check. PyList_Check does not pass because I have not yet
> added support for the various definitions of tp_flags.

As noted in our chat earlier, I don't think we can easily make these
work.  Looking at CPython's implementation: PyList_Type's initializer
here:
https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
initializes tp_flags with the flags, but:
(a) we don't see that code when compiling a user's extension module
(b) even if we did, PyList_Type is non-const, so the analyzer has to
assume that tp_flags could have been written to since it was
initialized

In theory we could specialcase such lookups, so that, say, a plugin
could register assumptions into the analyzer about the value of bits
within (PyList_Type.tp_flags).

However, this seems like a future feature.

>  I also
> encountered a minor hiccup where PyList_CheckExact appeared to give
> "UNKNOWN" rather than "TRUE", but this has since been fixed. The
> problem was caused by accidentally using the tree representation of
> struct PyList_Type as opposed to struct PyList_Type * when creating a
> pointer sval to the region for Pylist_Type.

Ah, good.

> 
> [...]
> > 
> > > Let's consider the following example which lacks error checking:
> > > 
> > > PyObject* foo() {
> > >     PyObject item = PyLong_FromLong(10);
> > >     PyObject list = PyList_New(5);
> > >     return list;
> > > }
> > > 
> > > The states for when PyLong_FromLong fails and when
> > > PyLong_FromLong
> > > succeeds are merged before the call to PyObject* list =
> > > PyList_New(5).
> > 
> > Ideally we would emit a leak warning about the "success" case of
> > PyLong_FromLong here.  I think you're running into the problem of
> > the
> > "store" part of the program_state being separate from the "malloc"
> > state machine part of program_state - I'm guessing that you're
> > creating
> > a heap_allocated_region for the new python object, but the "malloc"
> > state machine isn't transitioning the pointer from "start" to
> > "assumed-
> > non-null".  Such state machine states inhibit state-merging, and so
> > this might solve your state-merging problem.
> > 
> > I think we need a way to call
> > malloc_state_machine::on_allocator_call
> > from outside of sm-malloc.cc.  See
> > region_model::on_realloc_with_move
> > for an example of how to do something similar.
> > 
> 
> Thank you for the suggestion — this worked great and has solved the
> issue!

Excellent!

Thanks for the update
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-27 22:35     ` David Malcolm
@ 2023-07-30 17:52       ` Eric Feng
  2023-07-30 23:44         ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-07-30 17:52 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[...]
> As noted in our chat earlier, I don't think we can easily make these
> work.  Looking at CPython's implementation: PyList_Type's initializer
> here:
> https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
> initializes tp_flags with the flags, but:
> (a) we don't see that code when compiling a user's extension module
> (b) even if we did, PyList_Type is non-const, so the analyzer has to
> assume that tp_flags could have been written to since it was
> initialized
>
> In theory we could specialcase such lookups, so that, say, a plugin
> could register assumptions into the analyzer about the value of bits
> within (PyList_Type.tp_flags).
>
> However, this seems like a future feature.

I agree that it is more appropriate as a future feature.

Recently, in preparation for a patch, I have been focusing on
migrating as much of our plugin-specific functionality as possible,
which is currently scattered across core analyzer files for
convenience, into the plugin itself. Specifically, I am currently
trying to transfer the code related to stashing Python-specific types
and global variables into analyzer_cpython_plugin.c. This approach has
three main benefits, among which some I believe we have previously
discussed:

1) We only need to search for these values when initializing our
plugin, instead of every time the analyzer is enabled.
2) We can extend the values that we stash by modifying only our
plugin, avoiding changes to core analyzer files such as
analyzer-language.cc, which seems a safer and more resilient approach.
3) Future analyzer plugins will have an easier time stashing values
relevant to their respective projects.

Let me know if my concerns or reasons appear unfounded.

My initial approach involved adding a hook to the end of
ana::on_finish_translation_unit which calls the relevant
stashing-related callbacks registered during plugin initialization.
Here's a rough sketch:

void
on_finish_translation_unit (const translation_unit &tu)
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  do_finish_translation_unit_callbacks(the_logger.get_logger (), tu);
}

Inside do_finish_translation_unit_callbacks we have a loop like so:

for (auto& callback : finish_translation_unit_callbacks)
{
    callback(logger, tu);
}

Where finish_translation_unit_callbacks is a vector defined as follows:
typedef void (*finish_translation_unit_callback) (logger *, const
translation_unit &);
vec<finish_translation_unit_callback> *finish_translation_unit_callbacks;

To register a callback, we use:

void
register_finish_translation_unit_callback (
    finish_translation_unit_callback callback)
{
  if (!finish_translation_unit_callbacks)
    vec_alloc (finish_translation_unit_callbacks, 1);
  finish_translation_unit_callbacks->safe_push (callback);
}

And finally, from our plugin (or any other plugin), we can register
callbacks like so:
ana::register_finish_translation_unit_callback (&stash_named_types);
ana::register_finish_translation_unit_callback (&stash_global_vars);

However, on_finish_translation_unit runs before plugin initialization
occurs, so, unfortunately, we would be registering our callbacks after
on_finish_translation_unit with this method. As a workaround, I tried
saving the translation unit like this:

void
on_finish_translation_unit (const translation_unit &tu)
{
  // ... existing code
  stash_named_constants (the_logger.get_logger (), tu);

  saved_tu = &tu;
}

Then in our plugin:
ana::register_finish_translation_unit_callback (&stash_named_types);
ana::register_finish_translation_unit_callback (&stash_global_vars);
ana:: do_finish_translation_unit_callbacks();

With do_finish_translation_units passing the stored_tu to the callbacks.

Unfortunately, with this method, it seems like we encounter a
segmentation fault when trying to call the lookup functions within
translation_unit at the time of plugin initialization, even though the
translation unit is stored correctly. So it seems like the solution
may not be quite so simple.

I'm currently investigating this issue, but if there's an obvious
solution that I might be missing or any general suggestions, please
let me know!

Thanks as always,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-30 17:52       ` Eric Feng
@ 2023-07-30 23:44         ` David Malcolm
  2023-08-01 13:57           ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-07-30 23:44 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Sun, 2023-07-30 at 13:52 -0400, Eric Feng wrote:
> [...]
> > As noted in our chat earlier, I don't think we can easily make
> > these
> > work.  Looking at CPython's implementation: PyList_Type's
> > initializer
> > here:
> > https://github.com/python/cpython/blob/main/Objects/listobject.c#L3101
> > initializes tp_flags with the flags, but:
> > (a) we don't see that code when compiling a user's extension module
> > (b) even if we did, PyList_Type is non-const, so the analyzer has
> > to
> > assume that tp_flags could have been written to since it was
> > initialized
> > 
> > In theory we could specialcase such lookups, so that, say, a plugin
> > could register assumptions into the analyzer about the value of
> > bits
> > within (PyList_Type.tp_flags).
> > 
> > However, this seems like a future feature.
> 
> I agree that it is more appropriate as a future feature.
> 
> Recently, in preparation for a patch, I have been focusing on
> migrating as much of our plugin-specific functionality as possible,
> which is currently scattered across core analyzer files for
> convenience, into the plugin itself. Specifically, I am currently
> trying to transfer the code related to stashing Python-specific types
> and global variables into analyzer_cpython_plugin.c. This approach
> has
> three main benefits, among which some I believe we have previously
> discussed:
> 
> 1) We only need to search for these values when initializing our
> plugin, instead of every time the analyzer is enabled.
> 2) We can extend the values that we stash by modifying only our
> plugin, avoiding changes to core analyzer files such as
> analyzer-language.cc, which seems a safer and more resilient
> approach.
> 3) Future analyzer plugins will have an easier time stashing values
> relevant to their respective projects.

Sounds good, though I don't mind if the initial version of your patch
adds CPython-specific stuff to the core, if there are unexpected
hurdles in converting things to be more purely plugin based.

> 
> Let me know if my concerns or reasons appear unfounded.
> 
> My initial approach involved adding a hook to the end of
> ana::on_finish_translation_unit which calls the relevant
> stashing-related callbacks registered during plugin initialization.
> Here's a rough sketch:
> 
> void
> on_finish_translation_unit (const translation_unit &tu)
> {
>   // ... existing code
>   stash_named_constants (the_logger.get_logger (), tu);
> 
>   do_finish_translation_unit_callbacks(the_logger.get_logger (), tu);
> }
> 
> Inside do_finish_translation_unit_callbacks we have a loop like so:
> 
> for (auto& callback : finish_translation_unit_callbacks)
> {
>     callback(logger, tu);
> }
> 
> Where finish_translation_unit_callbacks is a vector defined as
> follows:
> typedef void (*finish_translation_unit_callback) (logger *, const
> translation_unit &);
> vec<finish_translation_unit_callback>
> *finish_translation_unit_callbacks;

Seems reasonable.

> 
> To register a callback, we use:
> 
> void
> register_finish_translation_unit_callback (
>     finish_translation_unit_callback callback)
> {
>   if (!finish_translation_unit_callbacks)
>     vec_alloc (finish_translation_unit_callbacks, 1);
>   finish_translation_unit_callbacks->safe_push (callback);
> }
> 
> And finally, from our plugin (or any other plugin), we can register
> callbacks like so:
> ana::register_finish_translation_unit_callback (&stash_named_types);
> ana::register_finish_translation_unit_callback (&stash_global_vars);
> 
> However, on_finish_translation_unit runs before plugin initialization
> occurs, so, unfortunately, we would be registering our callbacks
> after
> on_finish_translation_unit with this method.

Really?   I thought the plugin_init callback is called from
initialize_plugins, which is called from toplev::main fairly early on;
I though on_finish_translation_unit is called from deep within
do_compile, which is called later on from toplev::main.

What happens if you put breakpoints on both the plugin_init hook and on
on_finish_translation_unit, and have a look at the backtrace at each?

Note that this is the "plugin_init" code, not the PLUGIN_ANALYZER_INIT
callback.  The latter *is* called after on_finish_translation_unit,
when the analyzer runs.  You'll need to put your code in the former.


>  As a workaround, I tried
> saving the translation unit like this:
> 
> void
> on_finish_translation_unit (const translation_unit &tu)
> {
>   // ... existing code
>   stash_named_constants (the_logger.get_logger (), tu);
> 
>   saved_tu = &tu;
> }

That's not going to work; the "tu" is a reference to an on-stack
object, i.e. essentially a pointer to a temporary on the stack.  If
saved_tu is a pointer, then it's going to be pointing at garbage when
the function returns; if it's an object, then it's going to take a copy
of just the base class, which isn't going to be usable either ("object
slicing").

> 
> Then in our plugin:
> ana::register_finish_translation_unit_callback (&stash_named_types);
> ana::register_finish_translation_unit_callback (&stash_global_vars);
> ana:: do_finish_translation_unit_callbacks();
> 
> With do_finish_translation_units passing the stored_tu to the
> callbacks.
> 
> Unfortunately, with this method, it seems like we encounter a
> segmentation fault when trying to call the lookup functions within
> translation_unit at the time of plugin initialization, even though
> the
> translation unit is stored correctly. 

I don't think the tu is getting stored correctly, due to the reasons
described above.


> So it seems like the solution
> may not be quite so simple.
> 
> I'm currently investigating this issue, but if there's an obvious
> solution that I might be missing or any general suggestions, please
> let me know!

My guess is that you were trying to do it from the PLUGIN_ANALYZER_INIT
hook rather than from the plugin_init function, but it's hard to be
sure without seeing the code.

Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-07-30 23:44         ` David Malcolm
@ 2023-08-01 13:57           ` Eric Feng
  2023-08-01 17:06             ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-01 13:57 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

>
> My guess is that you were trying to do it from the PLUGIN_ANALYZER_INIT
> hook rather than from the plugin_init function, but it's hard to be
> sure without seeing the code.
>

Thanks Dave, you are entirely right — I made the mistake of trying to
do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
function. After following your suggestion, the callbacks are getting
registered as expected. I submitted a patch to review for this feature
on gcc-patches; please let me know if it looks OK.

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-01 13:57           ` Eric Feng
@ 2023-08-01 17:06             ` David Malcolm
  2023-08-04 15:02               ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-01 17:06 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > 
> > My guess is that you were trying to do it from the
> > PLUGIN_ANALYZER_INIT
> > hook rather than from the plugin_init function, but it's hard to be
> > sure without seeing the code.
> > 
> 
> Thanks Dave, you are entirely right — I made the mistake of trying to
> do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> function. After following your suggestion, the callbacks are getting
> registered as expected. 

Ah, good.

> I submitted a patch to review for this feature
> on gcc-patches; please let me know if it looks OK.

Thanks Eric; I've posted a reply to your email there, so let's discuss
the details there.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-01 17:06             ` David Malcolm
@ 2023-08-04 15:02               ` Eric Feng
  2023-08-04 15:39                 ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-04 15:02 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi Dave,

Tests related to our plugin which depend on Python-specific
definitions have been run by including /* { dg-options "-fanalyzer
-I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
best to approach this problem by adapting a subset of relevant
definitions like in gil.h?

Best,
Eric

On Tue, Aug 1, 2023 at 1:06 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > >
> > > My guess is that you were trying to do it from the
> > > PLUGIN_ANALYZER_INIT
> > > hook rather than from the plugin_init function, but it's hard to be
> > > sure without seeing the code.
> > >
> >
> > Thanks Dave, you are entirely right — I made the mistake of trying to
> > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > function. After following your suggestion, the callbacks are getting
> > registered as expected.
>
> Ah, good.
>
> > I submitted a patch to review for this feature
> > on gcc-patches; please let me know if it looks OK.
>
> Thanks Eric; I've posted a reply to your email there, so let's discuss
> the details there.
>
> Dave
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-04 15:02               ` Eric Feng
@ 2023-08-04 15:39                 ` David Malcolm
  2023-08-04 20:48                   ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-04 15:39 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Tests related to our plugin which depend on Python-specific
> definitions have been run by including /* { dg-options "-fanalyzer
> -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
> best to approach this problem by adapting a subset of relevant
> definitions like in gil.h?

That might be acceptable in the very short-term, but to create a plugin
that's useful to end-user (authors of CPython extension modules) we
want to be testing against real Python headers.

As I understand it, https://peps.python.org/pep-0394/ allows for
distributors of Python to symlink "python3-config" in the PATH to a
python3.X-config script (for some X).

So on such systems running:
  python3-config --includes
should emit the correct -I option.  On my box it emits:

-I/usr/include/python3.8 -I/usr/include/python3.8

It's more complicated, but I believe:
  python3-config --cflags
should emit the build flags that C/C++ extensions ought to use when
building.  On my box this emits:

-I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-result -
Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -
D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG  -
O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-
D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-
gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv

and it's likely going to vary from distribution to distribution.  Some
of those options *are* going to affect the gimple that -fanalyzer
"sees".

Does your installation of Python have such a script?

So in the short term you could hack in a minimal subset of the
decls/defns from Python.h, but I'd prefer it if target-supports.exp
gained a DejaGnu directive that invokes python3-config, captures the
result (or fails with UNSUPPORTED for systems without python3
development headers), and then adds the result to the build flags of
the file being tested.  The .exp files are implemented in Tcl, alas;
let me know if you want help with that.

Dave

> 
> Best,
> Eric
> 
> On Tue, Aug 1, 2023 at 1:06 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > 
> > > > My guess is that you were trying to do it from the
> > > > PLUGIN_ANALYZER_INIT
> > > > hook rather than from the plugin_init function, but it's hard
> > > > to be
> > > > sure without seeing the code.
> > > > 
> > > 
> > > Thanks Dave, you are entirely right — I made the mistake of
> > > trying to
> > > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > > function. After following your suggestion, the callbacks are
> > > getting
> > > registered as expected.
> > 
> > Ah, good.
> > 
> > > I submitted a patch to review for this feature
> > > on gcc-patches; please let me know if it looks OK.
> > 
> > Thanks Eric; I've posted a reply to your email there, so let's
> > discuss
> > the details there.
> > 
> > Dave
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-04 15:39                 ` David Malcolm
@ 2023-08-04 20:48                   ` Eric Feng
  2023-08-04 22:42                     ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-04 20:48 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Fri, Aug 4, 2023 at 11:39 AM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Tests related to our plugin which depend on Python-specific
> > definitions have been run by including /* { dg-options "-fanalyzer
> > -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is it
> > best to approach this problem by adapting a subset of relevant
> > definitions like in gil.h?
>
> That might be acceptable in the very short-term, but to create a plugin
> that's useful to end-user (authors of CPython extension modules) we
> want to be testing against real Python headers.
>
> As I understand it, https://peps.python.org/pep-0394/ allows for
> distributors of Python to symlink "python3-config" in the PATH to a
> python3.X-config script (for some X).
>
> So on such systems running:
>   python3-config --includes
> should emit the correct -I option.  On my box it emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8
>
>
> It's more complicated, but I believe:
>   python3-config --cflags
> should emit the build flags that C/C++ extensions ought to use when
> building.  On my box this emits:
>
> -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-result -
> Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
> protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -
> D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG  -
> O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-
> D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-
> gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv
>
> and it's likely going to vary from distribution to distribution.  Some
> of those options *are* going to affect the gimple that -fanalyzer
> "sees".
>
> Does your installation of Python have such a script?
>
> So in the short term you could hack in a minimal subset of the
> decls/defns from Python.h, but I'd prefer it if target-supports.exp
> gained a DejaGnu directive that invokes python3-config, captures the
> result (or fails with UNSUPPORTED for systems without python3
> development headers), and then adds the result to the build flags of
> the file being tested.  The .exp files are implemented in Tcl, alas;
> let me know if you want help with that.
>
> Dave
Sounds good; thanks! Following existing examples in
target-supports.exp, the following works as expected in terms of
extracting the build flags we are interested in.

In target-supports.exp:
proc check_python_flags { } {
    set result [remote_exec host "python3-config --cflags"]
    set status [lindex $result 0]
    if { $status == 0 } {
        return [lindex $result 1]
    } else {
        return "UNSUPPORTED"
    }
}

However, I'm having some trouble figuring out the specifics as to how
we may add the build flags to our test cases. My intuition looks like
something like the following:

In plugin.exp:
foreach plugin_test $plugin_test_list {
    if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
        set python_flags [check_python_flags]
        if { $python_flags ne "UNSUPPORTED" } {
           // append $python_flags to build flags here
        }
    }
....
}

How might we do so?
>
>
> >
> > Best,
> > Eric
> >
> > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > >
> > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > >
> > > > > My guess is that you were trying to do it from the
> > > > > PLUGIN_ANALYZER_INIT
> > > > > hook rather than from the plugin_init function, but it's hard
> > > > > to be
> > > > > sure without seeing the code.
> > > > >
> > > >
> > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > trying to
> > > > do it from PLUGIN_ANALYZER_INIT hook and not from the plugin_init
> > > > function. After following your suggestion, the callbacks are
> > > > getting
> > > > registered as expected.
> > >
> > > Ah, good.
> > >
> > > > I submitted a patch to review for this feature
> > > > on gcc-patches; please let me know if it looks OK.
> > >
> > > Thanks Eric; I've posted a reply to your email there, so let's
> > > discuss
> > > the details there.
> > >
> > > Dave
> > >
> >
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-04 20:48                   ` Eric Feng
@ 2023-08-04 22:42                     ` David Malcolm
  2023-08-04 22:46                       ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-04 22:42 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> On Fri, Aug 4, 2023 at 11:39 AM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > Hi Dave,
> > > 
> > > Tests related to our plugin which depend on Python-specific
> > > definitions have been run by including /* { dg-options "-
> > > fanalyzer
> > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal; is
> > > it
> > > best to approach this problem by adapting a subset of relevant
> > > definitions like in gil.h?
> > 
> > That might be acceptable in the very short-term, but to create a
> > plugin
> > that's useful to end-user (authors of CPython extension modules) we
> > want to be testing against real Python headers.
> > 
> > As I understand it, https://peps.python.org/pep-0394/ allows for
> > distributors of Python to symlink "python3-config" in the PATH to a
> > python3.X-config script (for some X).
> > 
> > So on such systems running:
> >   python3-config --includes
> > should emit the correct -I option.  On my box it emits:
> > 
> > -I/usr/include/python3.8 -I/usr/include/python3.8
> > 
> > 
> > It's more complicated, but I believe:
> >   python3-config --cflags
> > should emit the build flags that C/C++ extensions ought to use when
> > building.  On my box this emits:
> > 
> > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > result -
> > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-
> > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > protection -
> > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > DNDEBUG  -
> > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -
> > Wp,-
> > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > grecord-
> > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv
> > 
> > and it's likely going to vary from distribution to distribution. 
> > Some
> > of those options *are* going to affect the gimple that -fanalyzer
> > "sees".
> > 
> > Does your installation of Python have such a script?
> > 
> > So in the short term you could hack in a minimal subset of the
> > decls/defns from Python.h, but I'd prefer it if target-supports.exp
> > gained a DejaGnu directive that invokes python3-config, captures
> > the
> > result (or fails with UNSUPPORTED for systems without python3
> > development headers), and then adds the result to the build flags
> > of
> > the file being tested.  The .exp files are implemented in Tcl,
> > alas;
> > let me know if you want help with that.
> > 
> > Dave
> Sounds good; thanks! Following existing examples in
> target-supports.exp, the following works as expected in terms of
> extracting the build flags we are interested in.
> 
> In target-supports.exp:
> proc check_python_flags { } {
>     set result [remote_exec host "python3-config --cflags"]
>     set status [lindex $result 0]
>     if { $status == 0 } {
>         return [lindex $result 1]
>     } else {
>         return "UNSUPPORTED"
>     }
> }
> 
> However, I'm having some trouble figuring out the specifics as to how
> we may add the build flags to our test cases. My intuition looks like
> something like the following:
> 
> In plugin.exp:
> foreach plugin_test $plugin_test_list {
>     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
>         set python_flags [check_python_flags]
>         if { $python_flags ne "UNSUPPORTED" } {
>            // append $python_flags to build flags here
>         }
>     }
> ....
> }
> 
> How might we do so?

Good question.

Looking at plugin.exp I see it uses plugin-test-execute, which is
defined in gcc/testsuite/lib/plugin-support.exp.

Looking there, I see it attempts to build the plugin, and then if it
succeeds, it calls 
  dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
where $plugin_tests is the list of source files to be compiled using
the plugin.  So one way to do this would be to modify that code from
plugin.exp to pass in a different value, rather than $default_flags. 
Though it seems hackish to special-case this.

As another way, that avoids adding special-casing to plugin.exp,
there's an existing directive:
   dg-additional-options
implemented in gcc/testsuite/lib/gcc-defs.exp which appends options to
the default options.  Unfortunately, it works via:
    upvar dg-extra-tool-flags extra-tool-flags
which is a nasty Tcl hack meaning access the local variable named "dg-
extra-tool-flags" in *the frame above*, referring to it as "extra-tool-
flags".  (this is why I don't like Tcl)

So I think what could be done is to invoke your "check_python_flags"
test as a directive from the test case, so that in target-supports.exp
you'd have something like:

  proc dg-require-python-h {} {

which could do the invocation/output-capture of python3-config, and
would also have code similar to that in dg-additional-options to append
to the options (or it could possibly just call dg-additional-options
provided there's an "upvar" before the callsite to make the nested
stack manipulation work).

The individual test cases could then have:

  /* { dg-require-python-h } */

in them.

That way the Tcl stack at the point where the new directive runs should
be similar enough to how dg-additional-options gets run for similar
option-injection code to work (yuck!).

Maybe someone else on the list can see a less hackish way to get this
to work?

Let me know if any of the above is unclear.
Dave






> > 
> > 
> > > 
> > > Best,
> > > Eric
> > > 
> > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > <dmalcolm@redhat.com>
> > > wrote:
> > > > 
> > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > 
> > > > > > My guess is that you were trying to do it from the
> > > > > > PLUGIN_ANALYZER_INIT
> > > > > > hook rather than from the plugin_init function, but it's
> > > > > > hard
> > > > > > to be
> > > > > > sure without seeing the code.
> > > > > > 
> > > > > 
> > > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > > trying to
> > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > plugin_init
> > > > > function. After following your suggestion, the callbacks are
> > > > > getting
> > > > > registered as expected.
> > > > 
> > > > Ah, good.
> > > > 
> > > > > I submitted a patch to review for this feature
> > > > > on gcc-patches; please let me know if it looks OK.
> > > > 
> > > > Thanks Eric; I've posted a reply to your email there, so let's
> > > > discuss
> > > > the details there.
> > > > 
> > > > Dave
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-04 22:42                     ` David Malcolm
@ 2023-08-04 22:46                       ` David Malcolm
  2023-08-07 18:31                         ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-04 22:46 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Fri, 2023-08-04 at 18:42 -0400, David Malcolm wrote:
> On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> > On Fri, Aug 4, 2023 at 11:39 AM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > > 
> > > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > > Hi Dave,
> > > > 
> > > > Tests related to our plugin which depend on Python-specific
> > > > definitions have been run by including /* { dg-options "-
> > > > fanalyzer
> > > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal;
> > > > is
> > > > it
> > > > best to approach this problem by adapting a subset of relevant
> > > > definitions like in gil.h?
> > > 
> > > That might be acceptable in the very short-term, but to create a
> > > plugin
> > > that's useful to end-user (authors of CPython extension modules)
> > > we
> > > want to be testing against real Python headers.
> > > 
> > > As I understand it, https://peps.python.org/pep-0394/ allows for
> > > distributors of Python to symlink "python3-config" in the PATH to
> > > a
> > > python3.X-config script (for some X).
> > > 
> > > So on such systems running:
> > >   python3-config --includes
> > > should emit the correct -I option.  On my box it emits:
> > > 
> > > -I/usr/include/python3.8 -I/usr/include/python3.8
> > > 
> > > 
> > > It's more complicated, but I believe:
> > >   python3-config --cflags
> > > should emit the build flags that C/C++ extensions ought to use
> > > when
> > > building.  On my box this emits:
> > > 
> > > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > > result -
> > > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -
> > > fstack-
> > > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > > protection -
> > > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > > DNDEBUG  -
> > > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> > > -
> > > Wp,-
> > > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > > grecord-
> > > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -
> > > fwrapv
> > > 
> > > and it's likely going to vary from distribution to distribution. 
> > > Some
> > > of those options *are* going to affect the gimple that -fanalyzer
> > > "sees".
> > > 
> > > Does your installation of Python have such a script?
> > > 
> > > So in the short term you could hack in a minimal subset of the
> > > decls/defns from Python.h, but I'd prefer it if target-
> > > supports.exp
> > > gained a DejaGnu directive that invokes python3-config, captures
> > > the
> > > result (or fails with UNSUPPORTED for systems without python3
> > > development headers), and then adds the result to the build flags
> > > of
> > > the file being tested.  The .exp files are implemented in Tcl,
> > > alas;
> > > let me know if you want help with that.
> > > 
> > > Dave
> > Sounds good; thanks! Following existing examples in
> > target-supports.exp, the following works as expected in terms of
> > extracting the build flags we are interested in.
> > 
> > In target-supports.exp:
> > proc check_python_flags { } {
> >     set result [remote_exec host "python3-config --cflags"]
> >     set status [lindex $result 0]
> >     if { $status == 0 } {
> >         return [lindex $result 1]
> >     } else {
> >         return "UNSUPPORTED"
> >     }
> > }
> > 
> > However, I'm having some trouble figuring out the specifics as to
> > how
> > we may add the build flags to our test cases. My intuition looks
> > like
> > something like the following:
> > 
> > In plugin.exp:
> > foreach plugin_test $plugin_test_list {
> >     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
> >         set python_flags [check_python_flags]
> >         if { $python_flags ne "UNSUPPORTED" } {
> >            // append $python_flags to build flags here
> >         }
> >     }
> > ....
> > }
> > 
> > How might we do so?
> 
> Good question.
> 
> Looking at plugin.exp I see it uses plugin-test-execute, which is
> defined in gcc/testsuite/lib/plugin-support.exp.
> 
> Looking there, I see it attempts to build the plugin, and then if it
> succeeds, it calls 
>   dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
> where $plugin_tests is the list of source files to be compiled using
> the plugin.  So one way to do this would be to modify that code from
> plugin.exp to pass in a different value, rather than $default_flags. 
> Though it seems hackish to special-case this.

Sorry, I think I misspoke here; that line that uses $default_flags is
from plugin-support.exp, not from plugin.exp; $default_flags is a
global variable.

So I think my 2nd approach below may be the one to try:

> 
> As another way, that avoids adding special-casing to plugin.exp,
> there's an existing directive:
>    dg-additional-options
> implemented in gcc/testsuite/lib/gcc-defs.exp which appends options
> to
> the default options.  Unfortunately, it works via:
>     upvar dg-extra-tool-flags extra-tool-flags
> which is a nasty Tcl hack meaning access the local variable named
> "dg-
> extra-tool-flags" in *the frame above*, referring to it as "extra-
> tool-
> flags".  (this is why I don't like Tcl)
> 
> So I think what could be done is to invoke your "check_python_flags"
> test as a directive from the test case, so that in target-
> supports.exp
> you'd have something like:
> 
>   proc dg-require-python-h {} {
> 
> which could do the invocation/output-capture of python3-config, and
> would also have code similar to that in dg-additional-options to
> append
> to the options (or it could possibly just call dg-additional-options
> provided there's an "upvar" before the callsite to make the nested
> stack manipulation work).
> 
> The individual test cases could then have:
> 
>   /* { dg-require-python-h } */
> 
> in them.
> 
> That way the Tcl stack at the point where the new directive runs
> should
> be similar enough to how dg-additional-options gets run for similar
> option-injection code to work (yuck!).
> 
> Maybe someone else on the list can see a less hackish way to get this
> to work?
> 
> Let me know if any of the above is unclear.
> Dave
> 
> 
> 
> 
> 
> 
> > > 
> > > 
> > > > 
> > > > Best,
> > > > Eric
> > > > 
> > > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > > <dmalcolm@redhat.com>
> > > > wrote:
> > > > > 
> > > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > > 
> > > > > > > My guess is that you were trying to do it from the
> > > > > > > PLUGIN_ANALYZER_INIT
> > > > > > > hook rather than from the plugin_init function, but it's
> > > > > > > hard
> > > > > > > to be
> > > > > > > sure without seeing the code.
> > > > > > > 
> > > > > > 
> > > > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > > > trying to
> > > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > > plugin_init
> > > > > > function. After following your suggestion, the callbacks
> > > > > > are
> > > > > > getting
> > > > > > registered as expected.
> > > > > 
> > > > > Ah, good.
> > > > > 
> > > > > > I submitted a patch to review for this feature
> > > > > > on gcc-patches; please let me know if it looks OK.
> > > > > 
> > > > > Thanks Eric; I've posted a reply to your email there, so
> > > > > let's
> > > > > discuss
> > > > > the details there.
> > > > > 
> > > > > Dave
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-04 22:46                       ` David Malcolm
@ 2023-08-07 18:31                         ` Eric Feng
  2023-08-07 23:16                           ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-07 18:31 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Fri, Aug 4, 2023 at 6:46 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-08-04 at 18:42 -0400, David Malcolm wrote:
> > On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> > > On Fri, Aug 4, 2023 at 11:39 AM David Malcolm <dmalcolm@redhat.com>
> > > wrote:
> > > >
> > > > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > > > Hi Dave,
> > > > >
> > > > > Tests related to our plugin which depend on Python-specific
> > > > > definitions have been run by including /* { dg-options "-
> > > > > fanalyzer
> > > > > -I/usr/include/python3.9" } */. This is undoubtedly not ideal;
> > > > > is
> > > > > it
> > > > > best to approach this problem by adapting a subset of relevant
> > > > > definitions like in gil.h?
> > > >
> > > > That might be acceptable in the very short-term, but to create a
> > > > plugin
> > > > that's useful to end-user (authors of CPython extension modules)
> > > > we
> > > > want to be testing against real Python headers.
> > > >
> > > > As I understand it, https://peps.python.org/pep-0394/ allows for
> > > > distributors of Python to symlink "python3-config" in the PATH to
> > > > a
> > > > python3.X-config script (for some X).
> > > >
> > > > So on such systems running:
> > > >   python3-config --includes
> > > > should emit the correct -I option.  On my box it emits:
> > > >
> > > > -I/usr/include/python3.8 -I/usr/include/python3.8
> > > >
> > > >
> > > > It's more complicated, but I believe:
> > > >   python3-config --cflags
> > > > should emit the build flags that C/C++ extensions ought to use
> > > > when
> > > > building.  On my box this emits:
> > > >
> > > > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-unused-
> > > > result -
> > > > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > > > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -
> > > > fstack-
> > > > protector-strong -grecord-gcc-switches   -m64 -mtune=generic -
> > > > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > > > protection -
> > > > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > > > DNDEBUG  -
> > > > O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> > > > -
> > > > Wp,-
> > > > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > > > grecord-
> > > > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-tables -
> > > > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -
> > > > fwrapv
> > > >
> > > > and it's likely going to vary from distribution to distribution.
> > > > Some
> > > > of those options *are* going to affect the gimple that -fanalyzer
> > > > "sees".
> > > >
> > > > Does your installation of Python have such a script?
> > > >
> > > > So in the short term you could hack in a minimal subset of the
> > > > decls/defns from Python.h, but I'd prefer it if target-
> > > > supports.exp
> > > > gained a DejaGnu directive that invokes python3-config, captures
> > > > the
> > > > result (or fails with UNSUPPORTED for systems without python3
> > > > development headers), and then adds the result to the build flags
> > > > of
> > > > the file being tested.  The .exp files are implemented in Tcl,
> > > > alas;
> > > > let me know if you want help with that.
> > > >
> > > > Dave
> > > Sounds good; thanks! Following existing examples in
> > > target-supports.exp, the following works as expected in terms of
> > > extracting the build flags we are interested in.
> > >
> > > In target-supports.exp:
> > > proc check_python_flags { } {
> > >     set result [remote_exec host "python3-config --cflags"]
> > >     set status [lindex $result 0]
> > >     if { $status == 0 } {
> > >         return [lindex $result 1]
> > >     } else {
> > >         return "UNSUPPORTED"
> > >     }
> > > }
> > >
> > > However, I'm having some trouble figuring out the specifics as to
> > > how
> > > we may add the build flags to our test cases. My intuition looks
> > > like
> > > something like the following:
> > >
> > > In plugin.exp:
> > > foreach plugin_test $plugin_test_list {
> > >     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"} {
> > >         set python_flags [check_python_flags]
> > >         if { $python_flags ne "UNSUPPORTED" } {
> > >            // append $python_flags to build flags here
> > >         }
> > >     }
> > > ....
> > > }
> > >
> > > How might we do so?
> >
> > Good question.
> >
> > Looking at plugin.exp I see it uses plugin-test-execute, which is
> > defined in gcc/testsuite/lib/plugin-support.exp.
> >
> > Looking there, I see it attempts to build the plugin, and then if it
> > succeeds, it calls
> >   dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
> > where $plugin_tests is the list of source files to be compiled using
> > the plugin.  So one way to do this would be to modify that code from
> > plugin.exp to pass in a different value, rather than $default_flags.
> > Though it seems hackish to special-case this.
>
> Sorry, I think I misspoke here; that line that uses $default_flags is
> from plugin-support.exp, not from plugin.exp; $default_flags is a
> global variable.
>
> So I think my 2nd approach below may be the one to try:
>
> >
> > As another way, that avoids adding special-casing to plugin.exp,
> > there's an existing directive:
> >    dg-additional-options
> > implemented in gcc/testsuite/lib/gcc-defs.exp which appends options
> > to
> > the default options.  Unfortunately, it works via:
> >     upvar dg-extra-tool-flags extra-tool-flags
> > which is a nasty Tcl hack meaning access the local variable named
> > "dg-
> > extra-tool-flags" in *the frame above*, referring to it as "extra-
> > tool-
> > flags".  (this is why I don't like Tcl)
> >
> > So I think what could be done is to invoke your "check_python_flags"
> > test as a directive from the test case, so that in target-
> > supports.exp
> > you'd have something like:
> >
> >   proc dg-require-python-h {} {
> >
> > which could do the invocation/output-capture of python3-config, and
> > would also have code similar to that in dg-additional-options to
> > append
> > to the options (or it could possibly just call dg-additional-options
> > provided there's an "upvar" before the callsite to make the nested
> > stack manipulation work).
> >
> > The individual test cases could then have:
> >
> >   /* { dg-require-python-h } */
> >
> > in them.
> >
> > That way the Tcl stack at the point where the new directive runs
> > should
> > be similar enough to how dg-additional-options gets run for similar
> > option-injection code to work (yuck!).
Gotcha, thanks for the tip! I've been trying to make this approach
work, but despite trying many things just having /* {
dg-require-python-h } */ in the relevant test cases does not seem to
invoke proc dg-require-python-h {} { ... } in target-supports.exp. Am
I missing something else? Do we need to also "register" it somewhere
else for it to recognize the command? For context, previously I was
able to see the results of "check_python_flags" (i.e the output of
python3-config) by invoking it in plugin.exp.
> >
> > Maybe someone else on the list can see a less hackish way to get this
> > to work?
> >
> > Let me know if any of the above is unclear.
> > Dave
> >
> >
> >
> >
> >
> >
> > > >
> > > >
> > > > >
> > > > > Best,
> > > > > Eric
> > > > >
> > > > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > > > <dmalcolm@redhat.com>
> > > > > wrote:
> > > > > >
> > > > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > > >
> > > > > > > > My guess is that you were trying to do it from the
> > > > > > > > PLUGIN_ANALYZER_INIT
> > > > > > > > hook rather than from the plugin_init function, but it's
> > > > > > > > hard
> > > > > > > > to be
> > > > > > > > sure without seeing the code.
> > > > > > > >
> > > > > > >
> > > > > > > Thanks Dave, you are entirely right — I made the mistake of
> > > > > > > trying to
> > > > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > > > plugin_init
> > > > > > > function. After following your suggestion, the callbacks
> > > > > > > are
> > > > > > > getting
> > > > > > > registered as expected.
> > > > > >
> > > > > > Ah, good.
> > > > > >
> > > > > > > I submitted a patch to review for this feature
> > > > > > > on gcc-patches; please let me know if it looks OK.
> > > > > >
> > > > > > Thanks Eric; I've posted a reply to your email there, so
> > > > > > let's
> > > > > > discuss
> > > > > > the details there.
> > > > > >
> > > > > > Dave
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update and Questions on CPython Extension Module -fanalyzer plugin development
  2023-08-07 18:31                         ` Eric Feng
@ 2023-08-07 23:16                           ` David Malcolm
  2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-07 23:16 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Mon, 2023-08-07 at 14:31 -0400, Eric Feng wrote:
> On Fri, Aug 4, 2023 at 6:46 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Fri, 2023-08-04 at 18:42 -0400, David Malcolm wrote:
> > > On Fri, 2023-08-04 at 16:48 -0400, Eric Feng wrote:
> > > > On Fri, Aug 4, 2023 at 11:39 AM David Malcolm
> > > > <dmalcolm@redhat.com>
> > > > wrote:
> > > > > 
> > > > > On Fri, 2023-08-04 at 11:02 -0400, Eric Feng wrote:
> > > > > > Hi Dave,
> > > > > > 
> > > > > > Tests related to our plugin which depend on Python-specific
> > > > > > definitions have been run by including /* { dg-options "-
> > > > > > fanalyzer
> > > > > > -I/usr/include/python3.9" } */. This is undoubtedly not
> > > > > > ideal;
> > > > > > is
> > > > > > it
> > > > > > best to approach this problem by adapting a subset of
> > > > > > relevant
> > > > > > definitions like in gil.h?
> > > > > 
> > > > > That might be acceptable in the very short-term, but to
> > > > > create a
> > > > > plugin
> > > > > that's useful to end-user (authors of CPython extension
> > > > > modules)
> > > > > we
> > > > > want to be testing against real Python headers.
> > > > > 
> > > > > As I understand it, https://peps.python.org/pep-0394/ allows
> > > > > for
> > > > > distributors of Python to symlink "python3-config" in the
> > > > > PATH to
> > > > > a
> > > > > python3.X-config script (for some X).
> > > > > 
> > > > > So on such systems running:
> > > > >   python3-config --includes
> > > > > should emit the correct -I option.  On my box it emits:
> > > > > 
> > > > > -I/usr/include/python3.8 -I/usr/include/python3.8
> > > > > 
> > > > > 
> > > > > It's more complicated, but I believe:
> > > > >   python3-config --cflags
> > > > > should emit the build flags that C/C++ extensions ought to
> > > > > use
> > > > > when
> > > > > building.  On my box this emits:
> > > > > 
> > > > > -I/usr/include/python3.8 -I/usr/include/python3.8  -Wno-
> > > > > unused-
> > > > > result -
> > > > > Wsign-compare  -O2 -g -pipe -Wall -Werror=format-security -
> > > > > Wp,-
> > > > > D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -
> > > > > fstack-
> > > > > protector-strong -grecord-gcc-switches   -m64 -mtune=generic
> > > > > -
> > > > > fasynchronous-unwind-tables -fstack-clash-protection -fcf-
> > > > > protection -
> > > > > D_GNU_SOURCE -fPIC -fwrapv  -DDYNAMIC_ANNOTATIONS_ENABLED=1 -
> > > > > DNDEBUG  -
> > > > > O2 -g -pipe -Wall -Werror=format-security -Wp,-
> > > > > D_FORTIFY_SOURCE=2
> > > > > -
> > > > > Wp,-
> > > > > D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -
> > > > > grecord-
> > > > > gcc-switches   -m64 -mtune=generic -fasynchronous-unwind-
> > > > > tables -
> > > > > fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -
> > > > > fwrapv
> > > > > 
> > > > > and it's likely going to vary from distribution to
> > > > > distribution.
> > > > > Some
> > > > > of those options *are* going to affect the gimple that -
> > > > > fanalyzer
> > > > > "sees".
> > > > > 
> > > > > Does your installation of Python have such a script?
> > > > > 
> > > > > So in the short term you could hack in a minimal subset of
> > > > > the
> > > > > decls/defns from Python.h, but I'd prefer it if target-
> > > > > supports.exp
> > > > > gained a DejaGnu directive that invokes python3-config,
> > > > > captures
> > > > > the
> > > > > result (or fails with UNSUPPORTED for systems without python3
> > > > > development headers), and then adds the result to the build
> > > > > flags
> > > > > of
> > > > > the file being tested.  The .exp files are implemented in
> > > > > Tcl,
> > > > > alas;
> > > > > let me know if you want help with that.
> > > > > 
> > > > > Dave
> > > > Sounds good; thanks! Following existing examples in
> > > > target-supports.exp, the following works as expected in terms
> > > > of
> > > > extracting the build flags we are interested in.
> > > > 
> > > > In target-supports.exp:
> > > > proc check_python_flags { } {
> > > >     set result [remote_exec host "python3-config --cflags"]
> > > >     set status [lindex $result 0]
> > > >     if { $status == 0 } {
> > > >         return [lindex $result 1]
> > > >     } else {
> > > >         return "UNSUPPORTED"
> > > >     }
> > > > }
> > > > 
> > > > However, I'm having some trouble figuring out the specifics as
> > > > to
> > > > how
> > > > we may add the build flags to our test cases. My intuition
> > > > looks
> > > > like
> > > > something like the following:
> > > > 
> > > > In plugin.exp:
> > > > foreach plugin_test $plugin_test_list {
> > > >     if {[lindex $plugin_test 0] eq "analyzer_cpython_plugin.c"}
> > > > {
> > > >         set python_flags [check_python_flags]
> > > >         if { $python_flags ne "UNSUPPORTED" } {
> > > >            // append $python_flags to build flags here
> > > >         }
> > > >     }
> > > > ....
> > > > }
> > > > 
> > > > How might we do so?
> > > 
> > > Good question.
> > > 
> > > Looking at plugin.exp I see it uses plugin-test-execute, which is
> > > defined in gcc/testsuite/lib/plugin-support.exp.
> > > 
> > > Looking there, I see it attempts to build the plugin, and then if
> > > it
> > > succeeds, it calls
> > >   dg-runtest $plugin_tests $plugin_enabling_flags $default_flags
> > > where $plugin_tests is the list of source files to be compiled
> > > using
> > > the plugin.  So one way to do this would be to modify that code
> > > from
> > > plugin.exp to pass in a different value, rather than
> > > $default_flags.
> > > Though it seems hackish to special-case this.
> > 
> > Sorry, I think I misspoke here; that line that uses $default_flags
> > is
> > from plugin-support.exp, not from plugin.exp; $default_flags is a
> > global variable.
> > 
> > So I think my 2nd approach below may be the one to try:
> > 
> > > 
> > > As another way, that avoids adding special-casing to plugin.exp,
> > > there's an existing directive:
> > >    dg-additional-options
> > > implemented in gcc/testsuite/lib/gcc-defs.exp which appends
> > > options
> > > to
> > > the default options.  Unfortunately, it works via:
> > >     upvar dg-extra-tool-flags extra-tool-flags
> > > which is a nasty Tcl hack meaning access the local variable named
> > > "dg-
> > > extra-tool-flags" in *the frame above*, referring to it as
> > > "extra-
> > > tool-
> > > flags".  (this is why I don't like Tcl)
> > > 
> > > So I think what could be done is to invoke your
> > > "check_python_flags"
> > > test as a directive from the test case, so that in target-
> > > supports.exp
> > > you'd have something like:
> > > 
> > >   proc dg-require-python-h {} {
> > > 
> > > which could do the invocation/output-capture of python3-config,
> > > and
> > > would also have code similar to that in dg-additional-options to
> > > append
> > > to the options (or it could possibly just call dg-additional-
> > > options
> > > provided there's an "upvar" before the callsite to make the
> > > nested
> > > stack manipulation work).
> > > 
> > > The individual test cases could then have:
> > > 
> > >   /* { dg-require-python-h } */
> > > 
> > > in them.
> > > 
> > > That way the Tcl stack at the point where the new directive runs
> > > should
> > > be similar enough to how dg-additional-options gets run for
> > > similar
> > > option-injection code to work (yuck!).
> Gotcha, thanks for the tip! I've been trying to make this approach
> work, but despite trying many things just having /* {
> dg-require-python-h } */ in the relevant test cases does not seem to
> invoke proc dg-require-python-h {} { ... } in target-supports.exp. Am
> I missing something else? Do we need to also "register" it somewhere
> else for it to recognize the command? For context, previously I was
> able to see the results of "check_python_flags" (i.e the output of
> python3-config) by invoking it in plugin.exp.

My understanding is that ultimately we call into DejGnu's dg.exp's dg-
get-options, which has this grep:

    set tmp [grep $prog "{\[ \t\]\+dg-\[-a-z\]\+\[ \t\]\+.*\[ \t\]\+}" line]

to find directives in the source file, the first part of which are
function names.  plugin.exp has a load_lib of target-supports.exp, so
it *should* know about the dg-require-python-h function.

Have a look in the .log for lines that say "ERROR", which can appear if
the syntax of a directive is wrong.  Failing that, please post a patch
with the work-in-progress version of the various parts of this that
you're using, so I can see exactly what you're doing, and I can have a
go at debugging it.

Thanks
Dave

> > > 
> > > Maybe someone else on the list can see a less hackish way to get
> > > this
> > > to work?
> > > 
> > > Let me know if any of the above is unclear.
> > > Dave
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Best,
> > > > > > Eric
> > > > > > 
> > > > > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > > > > <dmalcolm@redhat.com>
> > > > > > wrote:
> > > > > > > 
> > > > > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > > > > 
> > > > > > > > > My guess is that you were trying to do it from the
> > > > > > > > > PLUGIN_ANALYZER_INIT
> > > > > > > > > hook rather than from the plugin_init function, but
> > > > > > > > > it's
> > > > > > > > > hard
> > > > > > > > > to be
> > > > > > > > > sure without seeing the code.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks Dave, you are entirely right — I made the
> > > > > > > > mistake of
> > > > > > > > trying to
> > > > > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > > > > plugin_init
> > > > > > > > function. After following your suggestion, the
> > > > > > > > callbacks
> > > > > > > > are
> > > > > > > > getting
> > > > > > > > registered as expected.
> > > > > > > 
> > > > > > > Ah, good.
> > > > > > > 
> > > > > > > > I submitted a patch to review for this feature
> > > > > > > > on gcc-patches; please let me know if it looks OK.
> > > > > > > 
> > > > > > > Thanks Eric; I've posted a reply to your email there, so
> > > > > > > let's
> > > > > > > discuss
> > > > > > > the details there.
> > > > > > > 
> > > > > > > Dave
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] WIP for dg-require-python-h [PR107646]
  2023-08-07 23:16                           ` David Malcolm
@ 2023-08-08 16:51                             ` Eric Feng
  2023-08-08 18:08                               ` David Malcolm
  2023-08-08 18:51                               ` David Malcolm
  0 siblings, 2 replies; 50+ messages in thread
From: Eric Feng @ 2023-08-08 16:51 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, Eric Feng

Unfortunately, there doesn’t seem to be any ERRORs in the .log nor any of the debug print statements which I’ve scattered within proc dg-require-python-h when run. I’ve attached the WIP below; thank you! Please note that in this version of the patch, I’ve removed the other (non Python) test cases in plugin.exp for convenience. 

Aside from issues with dg-require-python-h, everything works as expected (when using /* { dg-options "-fanalyzer -I/usr/include/python3.9" }. The patch includes support for PyList_New, PyLong_FromLong, PyList_Append and also the optional parameters for get_or_create_region_for_heap_alloc as we previously discussed. I will submit the version of the patch sans dg-require-python-h to gcc-patches for review as soon as I confirm regtests pass as expected; perhaps we can first push these changes to trunk and later push a separate patch for dg-require-python-h. 

Best,
Eric
>
> > > >
> > > > Maybe someone else on the list can see a less hackish way to get
> > > > this
> > > > to work?
> > > >
> > > > Let me know if any of the above is unclear.
> > > > Dave
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Eric
> > > > > > >
> > > > > > > On Tue, Aug 1, 2023 at 1:06 PM David Malcolm
> > > > > > > <dmalcolm@redhat.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, 2023-08-01 at 09:57 -0400, Eric Feng wrote:
> > > > > > > > > >
> > > > > > > > > > My guess is that you were trying to do it from the
> > > > > > > > > > PLUGIN_ANALYZER_INIT
> > > > > > > > > > hook rather than from the plugin_init function, but
> > > > > > > > > > it's
> > > > > > > > > > hard
> > > > > > > > > > to be
> > > > > > > > > > sure without seeing the code.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks Dave, you are entirely right — I made the
> > > > > > > > > mistake of
> > > > > > > > > trying to
> > > > > > > > > do it from PLUGIN_ANALYZER_INIT hook and not from the
> > > > > > > > > plugin_init
> > > > > > > > > function. After following your suggestion, the
> > > > > > > > > callbacks
> > > > > > > > > are
> > > > > > > > > getting
> > > > > > > > > registered as expected.
> > > > > > > >
> > > > > > > > Ah, good.
> > > > > > > >
> > > > > > > > > I submitted a patch to review for this feature
> > > > > > > > > on gcc-patches; please let me know if it looks OK.
> > > > > > > >
> > > > > > > > Thanks Eric; I've posted a reply to your email there, so
> > > > > > > > let's
> > > > > > > > discuss
> > > > > > > > the details there.
> > > > > > > >
> > > > > > > > Dave
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
---
This patch adds known function subclasses for the following Python/C
API: PyList_New, PyLong_FromLong, PyList_Append. It also adds new
optional parameters to region_model::get_or_create_region_for_heap_alloc
so that the newly allocated region may transition from the start state
to the assumed non null state on the malloc state machine immediately if
desired.

The main warnings we gain in this patch with respect to the known function subclasses
mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
      │
      |          ^~~~
│
  ‘create_py_object’: events 1-4
│
    |
│
    |    4 |   PyObject* item = PyLong_FromLong(10);
│
    |      |                    ^~~~~~~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (1) allocated here
│
    |      |                    (2) when ‘PyLong_FromLong’ succeeds
│
    |    5 |   PyObject* list = PyList_New(2);
│
    |      |                    ~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (3) when ‘PyList_New’ fails
│
    |......
│
    |   21 |   return list;
│
    |      |          ~~~~
│
    |      |          |
│
    |      |          (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
	* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
  New optional parameters.
	* region-model.h (class region_model): Likewise.
	* sm-malloc.cc (on_realloc_with_move): New function.
	(region_model::move_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: New features for plugin.
	* gcc.dg/plugin/plugin.exp: New test.
	* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>
---
 gcc/analyzer/region-model.cc                  |  15 +-
 gcc/analyzer/region-model.h                   |  10 +-
 gcc/analyzer/sm-malloc.cc                     |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 710 ++++++++++++++++++
 .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        | 107 +--
 gcc/testsuite/lib/target-supports.exp         |  25 +
 7 files changed, 876 insertions(+), 109 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c53446b2afc 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5131,7 +5131,9 @@ region_model::check_dynamic_size_for_floats (const svalue *size_in_bytes,
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-						   region_model_context *ctxt)
+       region_model_context *ctxt,
+       bool register_alloc,
+       const call_details *cd)
 {
   /* Determine which regions are referenced in this region_model, so that
      we can reuse an existing heap_allocated_region if it's not in use on
@@ -5153,6 +5155,17 @@ region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
   if (size_in_bytes)
     if (compat_types_p (size_in_bytes->get_type (), size_type_node))
       set_dynamic_extents (reg, size_in_bytes, ctxt);
+
+	if (register_alloc && cd)
+		{
+			const svalue *ptr_sval = nullptr;
+			if (cd->get_lhs_type ())
+       ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
+			else
+       ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
+			move_ptr_sval_non_null (ctxt, ptr_sval);
+		}
+
   return reg;
 }
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 0cf38714c96..84c964fadc9 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -387,9 +387,9 @@ class region_model
 		       region_model_context *ctxt,
 		       rejected_constraint **out);
 
-  const region *
-  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-				       region_model_context *ctxt);
+  const region *get_or_create_region_for_heap_alloc (
+      const svalue *size_in_bytes, region_model_context *ctxt,
+      bool register_alloc = false, const call_details *cd = nullptr);
   const region *create_region_for_alloca (const svalue *size_in_bytes,
 					  region_model_context *ctxt);
   void get_referenced_base_regions (auto_bitmap &out_ids) const;
@@ -476,6 +476,10 @@ class region_model
 			     const svalue *old_ptr_sval,
 			     const svalue *new_ptr_sval);
 
+  /* Implemented in sm-malloc.cc.  */
+  void move_ptr_sval_non_null (region_model_context *ctxt,
+       const svalue *new_ptr_sval);
+
   /* Implemented in sm-taint.cc.  */
   void mark_as_tainted (const svalue *sval,
 			region_model_context *ctxt);
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index a8c63eb1ce8..eb508768170 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -434,6 +434,10 @@ public:
 			     const svalue *new_ptr_sval,
 			     const extrinsic_state &ext_state) const;
 
+  void move_ptr_sval_non_null (region_model *model, sm_state_map *smap,
+       const svalue *new_ptr_sval,
+       const extrinsic_state &ext_state) const;
+
   standard_deallocator_set m_free;
   standard_deallocator_set m_scalar_delete;
   standard_deallocator_set m_vector_delete;
@@ -2504,6 +2508,16 @@ on_realloc_with_move (region_model *model,
 		   NULL, ext_state);
 }
 
+/*  Hook for get_or_create_region_for_heap_alloc for the case when we want
+   ptr_sval to mark a newly created region as assumed non null on malloc SM.  */
+void
+malloc_state_machine::move_ptr_sval_non_null (
+    region_model *model, sm_state_map *smap, const svalue *new_ptr_sval,
+    const extrinsic_state &ext_state) const
+{
+  smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL, ext_state);
+}
+
 } // anonymous namespace
 
 /* Internal interface to this file. */
@@ -2548,6 +2562,32 @@ region_model::on_realloc_with_move (const call_details &cd,
 				  *ext_state);
 }
 
+/* Moves ptr_sval from start to assumed non-null, for use by
+   region_model::get_or_create_region_for_heap_alloc.  */
+void
+region_model::move_ptr_sval_non_null (region_model_context *ctxt,
+const svalue *ptr_sval)
+{
+  if (!ctxt)
+    return;
+  const extrinsic_state *ext_state = ctxt->get_ext_state ();
+  if (!ext_state)
+    return;
+
+  sm_state_map *smap;
+  const state_machine *sm;
+  unsigned sm_idx;
+  if (!ctxt->get_malloc_map (&smap, &sm, &sm_idx))
+    return;
+
+  gcc_assert (smap);
+  gcc_assert (sm);
+
+  const malloc_state_machine &malloc_sm = (const malloc_state_machine &)*sm;
+
+  malloc_sm.move_ptr_sval_non_null (this, smap, ptr_sval, *ext_state);
+}
+
 } // namespace ana
 
 #endif /* #if ENABLE_ANALYZER */
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 9ecc42d4465..4d985620c01 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -55,6 +55,8 @@ static GTY (()) hash_map<tree, tree> *analyzer_stashed_globals;
 namespace ana
 {
 static tree pyobj_record = NULL_TREE;
+static tree pyobj_ptr_tree = NULL_TREE;
+static tree pyobj_ptr_ptr = NULL_TREE;
 static tree varobj_record = NULL_TREE;
 static tree pylistobj_record = NULL_TREE;
 static tree pylongobj_record = NULL_TREE;
@@ -76,6 +78,702 @@ get_field_by_name (tree type, const char *name)
   return NULL_TREE;
 }
 
+static const svalue *
+get_sizeof_pyobjptr (region_model_manager *mgr)
+{
+  tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree);
+  const svalue *sizeof_sval = mgr->get_or_create_constant_svalue (size_tree);
+  return sizeof_sval;
+}
+
+static bool
+arg_is_long_p (const call_details &cd, unsigned idx)
+{
+  return types_compatible_p (cd.get_arg_type (idx), long_integer_type_node);
+}
+
+class kf_PyList_Append : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 2); // TODO: more checks here
+  }
+  void impl_call_pre (const call_details &cd) const final override;
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_Append::impl_call_pre (const call_details &cd) const
+{
+  region_model_manager *mgr = cd.get_manager ();
+  region_model *model = cd.get_model ();
+
+  const svalue *pylist_sval = cd.get_arg_svalue (0);
+  const region *pylist_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  const svalue *newitem_sval = cd.get_arg_svalue (1);
+  const region *newitem_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  // Skip checks if unknown etc
+  if (pylist_sval->get_kind () != SK_REGION
+      && pylist_sval->get_kind () != SK_CONSTANT)
+    return;
+
+  // PyList_Check
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+  const region *ob_type_region
+      = mgr->get_field_region (pylist_reg, ob_type_field);
+  const svalue *stored_sval
+      = model->get_store_value (ob_type_region, cd.get_ctxt ());
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pylisttype_vardecl);
+  tree pylisttype_vardecl_ptr
+      = build_pointer_type (TREE_TYPE (pylisttype_vardecl));
+  const svalue *pylist_type_ptr
+      = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region);
+
+  if (stored_sval != pylist_type_ptr)
+    {
+      // TODO: emit diagnostic -Wanalyzer-type-error
+      cd.get_ctxt ()->terminate_path ();
+      return;
+    }
+
+  // Check that new_item is not null
+  {
+    const svalue *null_ptr
+        = mgr->get_or_create_int_cst (newitem_sval->get_type (), 0);
+    if (!model->add_constraint (newitem_sval, NE_EXPR, null_ptr,
+                                cd.get_ctxt ()))
+      {
+        // TODO: emit diagnostic here
+        cd.get_ctxt ()->terminate_path ();
+        return;
+      }
+  }
+}
+
+void
+kf_PyList_Append::impl_call_post (const call_details &cd) const
+{
+  /* Three custom subclasses of custom_edge_info, for handling the various
+     outcomes of "realloc".  */
+
+  /* Concrete custom_edge_info: a realloc call that fails, returning NULL.
+   */
+  class append_realloc_failure : public failed_call_info
+  {
+  public:
+    append_realloc_failure (const call_details &cd) : failed_call_info (cd) {}
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      /* Identify ob_item field and set it to NULL. */
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_sval = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      model->set_value (ob_item_reg, null_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *neg_one
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), -1);
+          model->set_value (cd.get_lhs_region (), neg_one, cd.get_ctxt ());
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_no_move : public call_info
+  {
+  public:
+    realloc_success_no_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (
+          can_colorize, "when %qE succeeds, without moving underlying buffer",
+          get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *ob_size_sval
+          = model->get_store_value (ob_size_region, cd.get_ctxt ());
+      const svalue *one_sval
+          = mgr->get_or_create_int_cst (integer_type_node, 1);
+      const svalue *new_size_sval = mgr->get_or_create_binop (
+          integer_type_node, PLUS_EXPR, ob_size_sval, one_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *num_allocated_bytes = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *ob_item_ptr_sval
+          = model->get_store_value (ob_item_region, cd.get_ctxt ());
+
+      /* We can only grow in place with a non-NULL pointer and no unknown
+       */
+      {
+        const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+        if (!model->add_constraint (ob_item_ptr_sval, NE_EXPR, null_ptr,
+                                    cd.get_ctxt ()))
+          {
+            return false;
+          }
+      }
+
+      const unmergeable_svalue *underlying_svalue
+          = ob_item_ptr_sval->dyn_cast_unmergeable_svalue ();
+      const svalue *target_svalue = nullptr;
+      const region_svalue *target_region_svalue = nullptr;
+
+      if (underlying_svalue)
+        {
+          target_svalue = underlying_svalue->get_arg ();
+          if (target_svalue->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+        }
+      else
+        {
+          if (ob_item_ptr_sval->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+          target_svalue = ob_item_ptr_sval;
+        }
+
+      target_region_svalue = target_svalue->dyn_cast_region_svalue ();
+      const region *curr_reg = target_region_svalue->get_pointee ();
+
+      if (compat_types_p (num_allocated_bytes->get_type (), size_type_node))
+        model->set_dynamic_extents (curr_reg, num_allocated_bytes, ctxt);
+
+      model->set_value (ob_size_region, new_size_sval, ctxt);
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (curr_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      const svalue *curr_refcnt
+          = model->get_store_value (ob_refcnt_region, cd.get_ctxt ());
+      const svalue *refcnt_one_sval
+          = mgr->get_or_create_int_cst (size_type_node, 1);
+      const svalue *new_refcnt_sval = mgr->get_or_create_binop (
+          size_type_node, PLUS_EXPR, curr_refcnt, refcnt_one_sval);
+      model->set_value (ob_refcnt_region, new_refcnt_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_move : public call_info
+  {
+  public:
+    realloc_success_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds, moving buffer",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *old_ob_size_sval
+          = model->get_store_value (ob_size_region, cd.get_ctxt ());
+      const svalue *one_sval
+          = mgr->get_or_create_int_cst (integer_type_node, 1);
+      const svalue *new_ob_size_sval = mgr->get_or_create_binop (
+          integer_type_node, PLUS_EXPR, old_ob_size_sval, one_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          old_ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *new_size_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_ob_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      /* Create the new region.  */
+      const region *new_reg = model->get_or_create_region_for_heap_alloc (
+          new_size_sval, cd.get_ctxt ());
+      const svalue *new_ptr_sval
+          = mgr->get_ptr_svalue (pyobj_ptr_ptr, new_reg);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval,
+                                  cd.get_ctxt ()))
+        return false;
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          const svalue *old_size_sval = model->get_dynamic_extents (freed_reg);
+          if (old_size_sval)
+            {
+              const svalue *copied_size_sval
+                  = get_copied_size (model, old_size_sval, new_size_sval);
+              const region *copied_old_reg = mgr->get_sized_region (
+                  freed_reg, pyobj_ptr_ptr, copied_size_sval);
+              const svalue *buffer_content_sval
+                  = model->get_store_value (copied_old_reg, cd.get_ctxt ());
+              const region *copied_new_reg = mgr->get_sized_region (
+                  new_reg, pyobj_ptr_ptr, copied_size_sval);
+              model->set_value (copied_new_reg, buffer_content_sval,
+                                cd.get_ctxt ());
+            }
+          else
+            {
+              model->mark_region_as_unknown (freed_reg, cd.get_uncertainty ());
+            }
+
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, null_ptr,
+                                  cd.get_ctxt ()))
+        return false;
+
+      model->set_value (ob_size_region, new_ob_size_sval, ctxt);
+      model->set_value (ob_item_reg, new_ptr_sval, cd.get_ctxt ());
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, old_ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (new_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      const svalue *curr_refcnt
+          = model->get_store_value (ob_refcnt_region, cd.get_ctxt ());
+      const svalue *refcnt_one_sval
+          = mgr->get_or_create_int_cst (size_type_node, 1);
+      const svalue *new_refcnt_sval = mgr->get_or_create_binop (
+          size_type_node, PLUS_EXPR, curr_refcnt, refcnt_one_sval);
+      model->set_value (ob_refcnt_region, new_refcnt_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+        }
+      return true;
+    }
+
+  private:
+    /* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL.
+       If unknown, OLD_SIZE_SVAL is returned.  */
+    const svalue *
+    get_copied_size (region_model *model, const svalue *old_size_sval,
+                     const svalue *new_size_sval) const
+    {
+      tristate res
+          = model->eval_condition (old_size_sval, GT_EXPR, new_size_sval);
+      switch (res.get_value ())
+        {
+        case tristate::TS_TRUE:
+          return new_size_sval;
+        case tristate::TS_FALSE:
+        case tristate::TS_UNKNOWN:
+          return old_size_sval;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  };
+
+  /* Body of kf_PyList_Append::impl_call_post.  */
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<append_realloc_failure> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_no_move> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_move> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyList_New : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && arg_is_long_p (cd, 0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_New::impl_call_post (const call_details &cd) const
+{
+  class failure : public failed_call_info
+  {
+  public:
+    failure (const call_details &cd) : failed_call_info (cd) {}
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      /* Return NULL; everything else is unchanged.  */
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+        }
+      return true;
+    }
+  };
+
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *varobj_svalue
+          = mgr->get_or_create_unknown_svalue (varobj_record);
+      const svalue *pylist_svalue
+          = mgr->get_or_create_unknown_svalue (pylistobj_record);
+
+      const svalue *size_sval = cd.get_arg_svalue (0);
+
+      const svalue *tp_basicsize_sval
+          = mgr->get_or_create_unknown_svalue (NULL);
+      const region *pylist_region
+          = model->get_or_create_region_for_heap_alloc (
+              tp_basicsize_sval, cd.get_ctxt (), true, &cd);
+      model->set_value (pylist_region, pylist_svalue, cd.get_ctxt ());
+
+      /*
+      typedef struct
+      {
+        PyObject_VAR_HEAD
+        PyObject **ob_item;
+        Py_ssize_t allocated;
+      } PyListObject;
+      */
+      tree varobj_field = get_field_by_name (pylistobj_record, "ob_base");
+      const region *varobj_region
+          = mgr->get_field_region (pylist_region, varobj_field);
+      model->set_value (varobj_region, varobj_svalue, cd.get_ctxt ());
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_region, ob_item_field);
+
+      const svalue *zero_sval = mgr->get_or_create_int_cst (size_type_node, 0);
+      const svalue *casted_size_sval
+          = mgr->get_or_create_cast (size_type_node, size_sval);
+      const svalue *size_cond_sval = mgr->get_or_create_binop (
+          size_type_node, LE_EXPR, casted_size_sval, zero_sval);
+
+      // if size <= 0, ob_item = NULL
+
+      if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (),
+                              integer_one_node))
+        {
+          const svalue *null_sval
+              = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+          model->set_value (ob_item_region, null_sval, cd.get_ctxt ());
+        }
+      else // calloc
+        {
+          const svalue *sizeof_sval = mgr->get_or_create_cast (
+              size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+          const svalue *prod_sval = mgr->get_or_create_binop (
+              size_type_node, MULT_EXPR, sizeof_sval, size_sval);
+          const region *ob_item_sized_region
+              = model->get_or_create_region_for_heap_alloc (prod_sval,
+                                                            cd.get_ctxt ());
+          model->zero_fill_region (ob_item_sized_region);
+          const svalue *ob_item_ptr_sval
+              = mgr->get_ptr_svalue (pyobj_ptr_ptr, ob_item_sized_region);
+          const svalue *ob_item_unmergeable
+              = mgr->get_or_create_unmergeable (ob_item_ptr_sval);
+          model->set_value (ob_item_region, ob_item_unmergeable,
+                            cd.get_ctxt ());
+        }
+
+      /*
+      typedef struct {
+      PyObject ob_base;
+      Py_ssize_t ob_size; // Number of items in variable part
+      } PyVarObject;
+      */
+      tree ob_base_tree = get_field_by_name (varobj_record, "ob_base");
+      const region *ob_base_region
+          = mgr->get_field_region (varobj_region, ob_base_tree);
+      model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
+
+      tree ob_size_tree = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (varobj_region, ob_size_tree);
+      model->set_value (ob_size_region, size_sval, cd.get_ctxt ());
+
+      /*
+      typedef struct _object {
+          _PyObject_HEAD_EXTRA
+          Py_ssize_t ob_refcnt;
+          PyTypeObject *ob_type;
+      } PyObject;
+      */
+
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
+      const svalue *refcnt_one_sval
+          = mgr->get_or_create_int_cst (size_type_node, 1);
+      model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ());
+
+      // get pointer svalue for PyList_Type then assign it to ob_type field.
+      const region *pylist_type_region
+          = mgr->get_region_for_global (pylisttype_vardecl);
+      tree pylisttype_vardecl_ptr
+          = build_pointer_type (TREE_TYPE (pylisttype_vardecl));
+      const svalue *pylist_type_ptr_sval
+          = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region);
+      tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+      const region *ob_type_region
+          = mgr->get_field_region (ob_base_region, ob_type_field);
+      model->set_value (ob_type_region, pylist_type_ptr_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylist_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<failure> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyLong_FromLong : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && arg_is_long_p (cd, 0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
+{
+  class failure : public failed_call_info
+  {
+  public:
+    failure (const call_details &cd) : failed_call_info (cd) {}
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      /* Return NULL; everything else is unchanged.  */
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+        }
+      return true;
+    }
+  };
+
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *pylongobj_sval
+          = mgr->get_or_create_unknown_svalue (pylongobj_record);
+      tree pylongtype_vardecl_ptr
+          = build_pointer_type (TREE_TYPE (pylongtype_vardecl));
+
+      const svalue *tp_basicsize_sval
+          = mgr->get_or_create_unknown_svalue (NULL);
+      const region *new_pylong_region
+          = model->get_or_create_region_for_heap_alloc (
+              tp_basicsize_sval, cd.get_ctxt (), true, &cd);
+      model->set_value (new_pylong_region, pylongobj_sval, cd.get_ctxt ());
+
+      // Create a region for the base PyObject within the PyLongObject.
+      tree ob_base_tree = get_field_by_name (pylongobj_record, "ob_base");
+      const region *ob_base_region
+          = mgr->get_field_region (new_pylong_region, ob_base_tree);
+      model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
+
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
+      const svalue *refcnt_one_sval
+          = mgr->get_or_create_int_cst (size_type_node, 1);
+      model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ());
+
+      // get pointer svalue for PyLong_Type then assign it to ob_type field.
+      const region *pylong_type_region
+          = mgr->get_region_for_global (pylongtype_vardecl);
+      const svalue *pylong_type_ptr_sval
+          = mgr->get_ptr_svalue (pylongtype_vardecl_ptr, pylong_type_region);
+      tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+      const region *ob_type_region
+          = mgr->get_field_region (ob_base_region, ob_type_field);
+      model->set_value (ob_type_region, pylong_type_ptr_sval, cd.get_ctxt ());
+
+      // Set the PyLongObject value.
+      tree ob_digit_field = get_field_by_name (pylongobj_record, "ob_digit");
+      const region *ob_digit_region
+          = mgr->get_field_region (new_pylong_region, ob_digit_field);
+      const svalue *ob_digit_sval = cd.get_arg_svalue (0);
+      model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), new_pylong_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<failure> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
 static void
 maybe_stash_named_type (logger *logger, const translation_unit &tu,
                         const char *name)
@@ -179,6 +877,12 @@ init_py_structs ()
   pylongobj_record = get_stashed_type_by_name ("PyLongObject");
   pylongtype_vardecl = get_stashed_global_var_by_name ("PyLong_Type");
   pylisttype_vardecl = get_stashed_global_var_by_name ("PyList_Type");
+
+  if (pyobj_record)
+    {
+      pyobj_ptr_tree = build_pointer_type (pyobj_record);
+      pyobj_ptr_ptr = build_pointer_type (pyobj_ptr_tree);
+    }
 }
 
 void
@@ -205,6 +909,12 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
       sorry_no_cpython_plugin ();
       return;
     }
+
+  iface->register_known_function ("PyLong_FromLong",
+                                  make_unique<kf_PyLong_FromLong> ());
+  iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
+  iface->register_known_function ("PyList_Append",
+                                  make_unique<kf_PyList_Append> ());
 }
 } // namespace ana
 
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
new file mode 100644
index 00000000000..9eb411316bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h } */
+/* { dg-require-effective-target analyzer } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 09c45394b1f..33cb7178a4f 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -56,112 +56,9 @@ gcc_parallel_test_enable 0
 # Specify the plugin source file and the associated test files in a list.
 # plugin_test_list={ {plugin1 test1 test2 ...} {plugin2 test1 ...} ... }
 set plugin_test_list [list \
-    { selfassign.c self-assign-test-1.c self-assign-test-2.c } \
-    { ggcplug.c ggcplug-test-1.c } \
-    { one_time_plugin.c one_time-test-1.c } \
-    { start_unit_plugin.c start_unit-test-1.c } \
-    { finish_unit_plugin.c finish_unit-test-1.c } \
-    { wide-int_plugin.c wide-int-test-1.c } \
-    { poly-int-01_plugin.c poly-int-test-1.c } \
-    { poly-int-02_plugin.c poly-int-test-1.c } \
-    { poly-int-03_plugin.c poly-int-test-1.c } \
-    { poly-int-04_plugin.c poly-int-test-1.c } \
-    { poly-int-05_plugin.c poly-int-test-1.c } \
-    { poly-int-06_plugin.c poly-int-test-1.c } \
-    { poly-int-07_plugin.c poly-int-test-1.c } \
-    { crash_test_plugin.c \
-	  crash-test-ice-stderr.c \
-	  crash-test-write-though-null-stderr.c \
-	  crash-test-ice-sarif.c \
-	  crash-test-write-though-null-sarif.c } \
-    { diagnostic_group_plugin.c \
-	  diagnostic-group-test-1.c } \
-    { diagnostic_plugin_test_show_locus.c \
-	  diagnostic-test-show-locus-bw.c \
-	  diagnostic-test-show-locus-color.c \
-	  diagnostic-test-show-locus-no-labels.c \
-	  diagnostic-test-show-locus-bw-line-numbers.c \
-	  diagnostic-test-show-locus-bw-line-numbers-2.c \
-	  diagnostic-test-show-locus-color-line-numbers.c \
-	  diagnostic-test-show-locus-parseable-fixits.c \
-	  diagnostic-test-show-locus-GCC_EXTRA_DIAGNOSTIC_OUTPUT-fixits-v1.c \
-	  diagnostic-test-show-locus-GCC_EXTRA_DIAGNOSTIC_OUTPUT-fixits-v2.c \
-	  diagnostic-test-show-locus-generate-patch.c }\
-    { diagnostic_plugin_test_tree_expression_range.c \
-	  diagnostic-test-expressions-1.c } \
-    { diagnostic_plugin_show_trees.c \
-	  diagnostic-test-show-trees-1.c } \
-    { diagnostic_plugin_test_string_literals.c \
-	  diagnostic-test-string-literals-1.c \
-	  diagnostic-test-string-literals-2.c \
-	  diagnostic-test-string-literals-3.c \
-	  diagnostic-test-string-literals-4.c } \
-    { diagnostic_plugin_test_inlining.c \
-	  diagnostic-test-inlining-1.c \
-	  diagnostic-test-inlining-2.c \
-	  diagnostic-test-inlining-3.c \
-	  diagnostic-test-inlining-4.c } \
-    { diagnostic_plugin_test_metadata.c diagnostic-test-metadata.c } \
-    { diagnostic_plugin_test_paths.c \
-	  diagnostic-test-paths-1.c \
-	  diagnostic-test-paths-2.c \
-	  diagnostic-test-paths-3.c \
-	  diagnostic-test-paths-4.c \
-	  diagnostic-test-paths-5.c \
-	  diagnostic-path-format-plain.c \
-	  diagnostic-path-format-none.c \
-	  diagnostic-path-format-separate-events.c \
-	  diagnostic-path-format-inline-events-1.c \
-	  diagnostic-path-format-inline-events-2.c \
-	  diagnostic-path-format-inline-events-3.c } \
-    { diagnostic_plugin_test_text_art.c \
-	  diagnostic-test-text-art-none.c \
-	  diagnostic-test-text-art-ascii-bw.c \
-	  diagnostic-test-text-art-ascii-color.c \
-	  diagnostic-test-text-art-unicode-bw.c \
-	  diagnostic-test-text-art-unicode-color.c } \
-    { location_overflow_plugin.c \
-	  location-overflow-test-1.c \
-	  location-overflow-test-2.c \
-	  location-overflow-test-pr83173.c } \
-    { must_tail_call_plugin.c \
-	  must-tail-call-1.c \
-	  must-tail-call-2.c } \
-    { expensive_selftests_plugin.c \
-	  expensive-selftests-1.c } \
-    { dump_plugin.c \
-	  dump-1.c \
-	  dump-2.c } \
-    { analyzer_gil_plugin.c \
-	  gil-1.c } \
-    { analyzer_known_fns_plugin.c \
-	  known-fns-1.c } \
-    { analyzer_kernel_plugin.c \
-	  copy_from_user-1.c \
-	  infoleak-1.c \
-	  infoleak-2.c \
-	  infoleak-3.c \
-	  infoleak-CVE-2011-1078-1.c \
-	  infoleak-CVE-2011-1078-2.c \
-	  infoleak-CVE-2017-18549-1.c \
-	  infoleak-CVE-2017-18550-1.c \
-	  infoleak-antipatterns-1.c \
-	  infoleak-fixit-1.c \
-	  infoleak-net-ethtool-ioctl.c \
-	  infoleak-vfio_iommu_type1.c \
-	  taint-CVE-2011-0521-1-fixed.c \
-	  taint-CVE-2011-0521-1.c \
-	  taint-CVE-2011-0521-2-fixed.c \
-	  taint-CVE-2011-0521-2.c \
-	  taint-CVE-2011-0521-3-fixed.c \
-	  taint-CVE-2011-0521-3.c \
-	  taint-CVE-2011-0521-4.c \
-	  taint-CVE-2011-0521-5.c \
-	  taint-CVE-2011-0521-5-fixed.c \
-	  taint-CVE-2011-0521-6.c \
-	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c } \
+	  cpython-plugin-test-1.c \
+	  cpython-plugin-test-2.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7004711b384..a1d4f684f8e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12559,3 +12559,28 @@ proc check_effective_target_const_volatile_readonly_section { } {
     }
   return 1
 }
+
+proc dg-require-python-h { } {
+    puts "ENTER dg-require-python-h" ; 
+    set result [remote_exec host "python3-config --cflags"]
+    set status [lindex $result 0]
+    if { $status == 0 } {
+        set python_flags [lindex $result 1]
+    } else {
+        set python_flags "UNSUPPORTED"
+    }
+    
+    puts "Python flags are: $python_flags" ;
+
+    # Check if Python flags are unsupported
+    if { $python_flags eq "UNSUPPORTED" } {
+        puts "Python flags are unsupported" ;
+        error "Python flags are unsupported"
+        return
+    }
+
+    upvar dg-extra-tool-flags extra-tool-flags
+    puts "Before appending, extra-tool-flags: $extra-tool-flags" ;
+    eval lappend extra-tool-flags $python_flags
+    puts "After appending, extra-tool-flags: $extra-tool-flags" ;
+}
\ No newline at end of file
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] WIP for dg-require-python-h [PR107646]
  2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
@ 2023-08-08 18:08                               ` David Malcolm
  2023-08-08 18:51                               ` David Malcolm
  1 sibling, 0 replies; 50+ messages in thread
From: David Malcolm @ 2023-08-08 18:08 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 4597 bytes --]

On Tue, 2023-08-08 at 12:51 -0400, Eric Feng wrote:
> Unfortunately, there doesn’t seem to be any ERRORs in the .log nor
> any of the debug print statements which I’ve scattered within proc
> dg-require-python-h when run. I’ve attached the WIP below; thank you!
> Please note that in this version of the patch, I’ve removed the other
> (non Python) test cases in plugin.exp for convenience. 
> 
> Aside from issues with dg-require-python-h, everything works as
> expected (when using /* { dg-options "-fanalyzer -
> I/usr/include/python3.9" }. The patch includes support for
> PyList_New, PyLong_FromLong, PyList_Append and also the optional
> parameters for get_or_create_region_for_heap_alloc as we previously
> discussed. I will submit the version of the patch sans dg-require-
> python-h to gcc-patches for review as soon as I confirm regtests pass
> as expected; perhaps we can first push these changes to trunk and
> later push a separate patch for dg-require-python-h. 
> 

Hi Eric.

I got dg-require-python-h working; I'm attaching a patch that seems to
fix it (on top of your WIP patch).

Looking in dg.exp, dg-test has a:
    set tmp [dg-get-options $prog]
    foreach op $tmp {
	verbose "Processing option: $op" 3
on the dg directives it finds, and at verbosity level 3 that wasn't
firing.

The issue turned out to be that the grep in dg.exp's dg-get-options for
dg- directives requires them to have an argument.

So fixing it from:

/* { dg-require-python-h } */

to:

/* { dg-require-python-h "" } */

gets it to recognize it as a directive and calls the new code.

Some other fixes:
- I put the /* { dg-require-effective-target analyzer } */
above the /* { dg-options "-fanalyzer" } */, since it seems to make
more logical sense
- within the new .exp code:
  - the new function needs to takes "args" (but will happily ignore
them)
  - I put the upvar at the top of the function, as that what's everyone
else seems to do.  This may be "cargo cult programming" though.
  - I used verbose rather than "puts" for the debugging code
  - I reworked how the "unsupported" case works, copying what other
target-supports code does.  With this, hacking up the script invocation
to be "not-python-3-config" so that it fails makes the test gracefully
with UNSUPPORTED in the gcc.sum
  - As written the puts of $extra-tool-flags fails with:
ERROR: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so: can't read "extra": no such variable for " dg-require-python-h 3  "
since AIUI it looks for a variable names "extra" and tries to subtract
from it.  Putting the full variable name in {}, as ${extra-tool-flags}
fixes that.

With this, I get this for -test-2.c:

PASS: gcc.dg/plugin/analyzer_cpython_plugin.c compilation
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 17)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 18)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 21)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 31)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 32)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 35)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 45)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 55)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 63)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 66)
PASS: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 68)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so  (test for warnings, line 69)
FAIL: gcc.dg/plugin/cpython-plugin-test-2.c -fplugin=./analyzer_cpython_plugin.so (test for excess errors)

where the FAILs seem to be due to missing "leak of 'item'" messages,
which might be due to
- possibly not having all of the patch?
- differences between python 3.8 and python 3.9
- differences between the --cflags affecting the gimple seen by the
analyzer

Anyway, hope this gets you unstuck.

Dave


[-- Attachment #2: 0001-Fixup-Eric-s-WIP-for-dg-require-python-h.patch --]
[-- Type: text/x-patch, Size: 2548 bytes --]

From 16ca49cb40c3d34b3547b2e0834bb51ae26e2eb5 Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Tue, 8 Aug 2023 13:53:39 -0400
Subject: [PATCH] Fixup Eric's WIP for dg-require-python-h

---
 .../gcc.dg/plugin/cpython-plugin-test-2.c     |  4 +--
 gcc/testsuite/lib/target-supports.exp         | 28 +++++++++----------
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
index 9eb411316bd..19b5c17428a 100644
--- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-fanalyzer" } */
-/* { dg-require-python-h } */
 /* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
 
 
 #define PY_SSIZE_T_CLEAN
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a1d4f684f8e..99d62ab98ad 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12560,27 +12560,25 @@ proc check_effective_target_const_volatile_readonly_section { } {
   return 1
 }
 
-proc dg-require-python-h { } {
-    puts "ENTER dg-require-python-h" ; 
+proc dg-require-python-h { args } {
+    upvar dg-extra-tool-flags extra-tool-flags
+
+    verbose "ENTER dg-require-python-h" 2
+
     set result [remote_exec host "python3-config --cflags"]
     set status [lindex $result 0]
     if { $status == 0 } {
         set python_flags [lindex $result 1]
     } else {
-        set python_flags "UNSUPPORTED"
+	verbose "Python.h not supported" 2
+	upvar dg-do-what dg-do-what
+	set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+	return
     }
-    
-    puts "Python flags are: $python_flags" ;
 
-    # Check if Python flags are unsupported
-    if { $python_flags eq "UNSUPPORTED" } {
-        puts "Python flags are unsupported" ;
-        error "Python flags are unsupported"
-        return
-    }
+    verbose "Python flags are: $python_flags" 2
 
-    upvar dg-extra-tool-flags extra-tool-flags
-    puts "Before appending, extra-tool-flags: $extra-tool-flags" ;
+    verbose "Before appending, extra-tool-flags: ${extra-tool-flags}" 3
     eval lappend extra-tool-flags $python_flags
-    puts "After appending, extra-tool-flags: $extra-tool-flags" ;
-}
\ No newline at end of file
+    verbose "After appending, extra-tool-flags: ${extra-tool-flags}" 3
+}
-- 
2.26.3


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] WIP for dg-require-python-h [PR107646]
  2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
  2023-08-08 18:08                               ` David Malcolm
@ 2023-08-08 18:51                               ` David Malcolm
  2023-08-09 19:22                                 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
  1 sibling, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-08 18:51 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Tue, 2023-08-08 at 12:51 -0400, Eric Feng wrote:
> Unfortunately, there doesn’t seem to be any ERRORs in the .log nor
> any of the debug print statements which I’ve scattered within proc
> dg-require-python-h when run. I’ve attached the WIP below; thank you!
> Please note that in this version of the patch, I’ve removed the other
> (non Python) test cases in plugin.exp for convenience. 
> 
> Aside from issues with dg-require-python-h, everything works as
> expected (when using /* { dg-options "-fanalyzer -
> I/usr/include/python3.9" }. The patch includes support for
> PyList_New, PyLong_FromLong, PyList_Append and also the optional
> parameters for get_or_create_region_for_heap_alloc as we previously
> discussed. I will submit the version of the patch sans dg-require-
> python-h to gcc-patches for review as soon as I confirm regtests pass
> as expected; perhaps we can first push these changes to trunk and
> later push a separate patch for dg-require-python-h. 
> > 

[...snip...]

Various comments on the WIP patch inline below...


> ---
> This patch adds known function subclasses for the following Python/C
> API: PyList_New, PyLong_FromLong, PyList_Append. It also adds new
> optional parameters to
> region_model::get_or_create_region_for_heap_alloc
> so that the newly allocated region may transition from the start
> state
> to the assumed non null state on the malloc state machine immediately
> if
> desired.
> 
> The main warnings we gain in this patch with respect to the known
> function subclasses
> mentioned are leak related. For example:
> 
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-
> leak]
> │
>    21 |   return list;
>       │
>       |          ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
>     |
> │
>     |    4 |   PyObject* item = PyLong_FromLong(10);
> │
>     |      |                    ^~~~~~~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (1) allocated here
> │
>     |      |                    (2) when ‘PyLong_FromLong’ succeeds
> │
>     |    5 |   PyObject* list = PyList_New(2);
> │
>     |      |                    ~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (3) when ‘PyList_New’ fails
> │
>     |......
> │
>     |   21 |   return list;
> │
>     |      |          ~~~~
> │
>     |      |          |
> │
>     |      |          (4) ‘item’ leaks here; was allocated at (1)
> │
> 
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with
> the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every
> time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual
> implementation
> follows.
> 
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
>         * region-model.cc
> (region_model::get_or_create_region_for_heap_alloc):
>   New optional parameters.

>         * region-model.h (class region_model): Likewise.
>         * sm-malloc.cc (on_realloc_with_move): New function.
>         (region_model::move_ptr_sval_non_null): New function.
> 
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: New features for
> plugin.

I don't want a lot of detail, but please have the ChangeLog entry at
least list the new functions that are being simulated, since that
willlikely the most pertinent information when we go back look at the
logs

>         * gcc.dg/plugin/plugin.exp: New test.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
> 
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
> ---
>  gcc/analyzer/region-model.cc                  |  15 +-
>  gcc/analyzer/region-model.h                   |  10 +-
>  gcc/analyzer/sm-malloc.cc                     |  40 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 710
> ++++++++++++++++++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        | 107 +--
>  gcc/testsuite/lib/target-supports.exp         |  25 +
>  7 files changed, 876 insertions(+), 109 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 2.c
> 
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index e92b3f7b074..c53446b2afc 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5131,7 +5131,9 @@ region_model::check_dynamic_size_for_floats
> (const svalue *size_in_bytes,
>  
>  const region *
>  region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> -                                                 
> region_model_context *ctxt)
> +       region_model_context *ctxt,
> +       bool register_alloc,
> +       const call_details *cd)

When adding new params/functionality to an existing function, please
update the leading comment.


>  {
>    /* Determine which regions are referenced in this region_model, so
> that
>       we can reuse an existing heap_allocated_region if it's not in
> use on
> @@ -5153,6 +5155,17 @@
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
>    if (size_in_bytes)
>      if (compat_types_p (size_in_bytes->get_type (), size_type_node))
>        set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> +       if (register_alloc && cd)
> +               {
> +                       const svalue *ptr_sval = nullptr;
> +                       if (cd->get_lhs_type ())
> +       ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> +                       else
> +       ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
> +                       move_ptr_sval_non_null (ctxt, ptr_sval);
> +               }
> +
>    return reg;
>  }
>  
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-
> model.h
> index 0cf38714c96..84c964fadc9 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -387,9 +387,9 @@ class region_model
>                        region_model_context *ctxt,
>                        rejected_constraint **out);
>  
> -  const region *
> -  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> -                                      region_model_context *ctxt);
> +  const region *get_or_create_region_for_heap_alloc (
> +      const svalue *size_in_bytes, region_model_context *ctxt,
> +      bool register_alloc = false, const call_details *cd =
> nullptr);

I don't like "register_alloc" as a param name as the word "register" is
overused in the context of a compiler.  Maybe "update_state_machine"?

>    const region *create_region_for_alloca (const svalue
> *size_in_bytes,
>                                           region_model_context
> *ctxt);
>    void get_referenced_base_regions (auto_bitmap &out_ids) const;
> @@ -476,6 +476,10 @@ class region_model
>                              const svalue *old_ptr_sval,
>                              const svalue *new_ptr_sval);
>  
> +  /* Implemented in sm-malloc.cc.  */
> +  void move_ptr_sval_non_null (region_model_context *ctxt,
> +       const svalue *new_ptr_sval);

"move" makes me think of C++11; how about
"transition_ptr_sval_to_non_null"?

> +
>    /* Implemented in sm-taint.cc.  */
>    void mark_as_tainted (const svalue *sval,
>                         region_model_context *ctxt);

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 9ecc42d4465..4d985620c01 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...snip...]

The known_function implementations get very long, alas.  Some of that
may be inevitable, but there's a lot of repetition within them, which
ought to be split out into subroutines.

Some examples:
- grepping for PLUS_EXPR, there seem to be two places that spell out
how to incref an object; probably there should be a static helper
function for doing this.

- grepping for "ob_refcnt", there are two places which simulate a new
object having its reference count initialized to 1.  Again, this ought
to be a call to a static helper subroutine.

- grepping for "ob_type", there are two places that simulate
initializing the ob_type of a new object to point at a type object. 
Likewise, this ought to be a call to a static helper subroutine.

Hopefully doing so will help tame the complexity of these functions,
and as we add new ones, they'll probably be able to reuse some of these
subroutines.

[...snip...]

> +class kf_PyLong_FromLong : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && arg_is_long_p (cd, 0));
> +  }

I think it's probably enough here to just check that the arg's type is
integral, rather than a specific type of integer.

[...snip...]

> 
> @@ -205,6 +909,12 @@ cpython_analyzer_init_cb (void *gcc_data, void *
> /*user_data */)
>        sorry_no_cpython_plugin ();
>        return;
>      }
> +
> +  iface->register_known_function ("PyLong_FromLong",
> +                                  make_unique<kf_PyLong_FromLong>
> ());
> +  iface->register_known_function ("PyList_New",
> make_unique<kf_PyList_New> ());
> +  iface->register_known_function ("PyList_Append",
> +                                  make_unique<kf_PyList_Append> ());

Hopefully there will eventually be a lot of Py* known_functions, so
shall we keep the functions alphabetized?  (both in terms of
registration, and in terms of declarations/definitions).  

[...snip...]

Hope this is constructive
Dave

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]
  2023-08-08 18:51                               ` David Malcolm
@ 2023-08-09 19:22                                 ` Eric Feng
  2023-08-09 21:36                                   ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-09 19:22 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Thank you for your help in getting dg-require-python-h working! I can
confirm that the FAILs are related to differences between the --cflags
affecting the gimple seen by the analyzer. For this reason, I have
changed it to --includes for now. To be sure, I tested on Python 3.8 as
well and it works as expected. I have also addressed the following
comments on the WIP patch as you described.

-- Update Changelog entry to list new functions being simulated.
-- Update region_model::get_or_create_region_for_heap_alloc leading
comment.
-- Change register_alloc to update_state_machine.
-- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
-- Static helper functions for:
	-- Initializing ob_refcnt field.
	-- Setting ob_type field.
	-- Getting ob_base field.
	-- Initializing heap allocated region for PyObjects.
	-- Incrementing a field by one.
-- Change arg_is_long_p to arg_is_integral_p.
-- Extract common failure scenario for reusability.

The initial WIP patch using 

/* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */

have been bootstrapped and regtested on aarch64-unknown-linux-gnu. Since
we did not change any core logic in the revision and the only changes
within the analyzer core are changing variable names, is it OK for
trunk. In the mean time, the revised patch is currently going through
bootstrap and regtest process.

Best,
Eric

---
This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
      │
      |          ^~~~
│
  ‘create_py_object’: events 1-4
│
    |
│
    |    4 |   PyObject* item = PyLong_FromLong(10);
│
    |      |                    ^~~~~~~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (1) allocated here
│
    |      |                    (2) when ‘PyLong_FromLong’ succeeds
│
    |    5 |   PyObject* list = PyList_New(2);
│
    |      |                    ~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (3) when ‘PyList_New’ fails
│
    |......
│
    |   21 |   return list;
│
    |      |          ~~~~
│
    |      |          |
│
    |      |          (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
	PR analyzer/107646
	* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
	New optional parameters.
	* region-model.h (class region_model): New optional parameters.
	* sm-malloc.cc (on_realloc_with_move): New function.
	(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
	PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
	PyList_New, PyList_Append, PyLong_FromLong
	* gcc.dg/plugin/plugin.exp: New test.
	* lib/target-supports.exp: New procedure.
	* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>
---
 gcc/analyzer/region-model.cc                  |  20 +-
 gcc/analyzer/region-model.h                   |  10 +-
 gcc/analyzer/sm-malloc.cc                     |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 711 ++++++++++++++++++
 .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   3 +-
 gcc/testsuite/lib/target-supports.exp         |  25 +
 7 files changed, 881 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c338f045d92 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats (const svalue *size_in_bytes,
    Use CTXT to complain about tainted sizes.
 
    Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-						   region_model_context *ctxt)
+       region_model_context *ctxt,
+       bool update_state_machine,
+       const call_details *cd)
 {
   /* Determine which regions are referenced in this region_model, so that
      we can reuse an existing heap_allocated_region if it's not in use on
@@ -5153,6 +5158,17 @@ region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
   if (size_in_bytes)
     if (compat_types_p (size_in_bytes->get_type (), size_type_node))
       set_dynamic_extents (reg, size_in_bytes, ctxt);
+
+	if (update_state_machine && cd)
+		{
+			const svalue *ptr_sval = nullptr;
+			if (cd->get_lhs_type ())
+       ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
+			else
+       ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
+			transition_ptr_sval_non_null (ctxt, ptr_sval);
+		}
+
   return reg;
 }
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 0cf38714c96..16c80a238bc 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -387,9 +387,9 @@ class region_model
 		       region_model_context *ctxt,
 		       rejected_constraint **out);
 
-  const region *
-  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-				       region_model_context *ctxt);
+  const region *get_or_create_region_for_heap_alloc (
+      const svalue *size_in_bytes, region_model_context *ctxt,
+      bool update_state_machine = false, const call_details *cd = nullptr);
   const region *create_region_for_alloca (const svalue *size_in_bytes,
 					  region_model_context *ctxt);
   void get_referenced_base_regions (auto_bitmap &out_ids) const;
@@ -476,6 +476,10 @@ class region_model
 			     const svalue *old_ptr_sval,
 			     const svalue *new_ptr_sval);
 
+  /* Implemented in sm-malloc.cc.  */
+  void transition_ptr_sval_non_null (region_model_context *ctxt,
+       const svalue *new_ptr_sval);
+
   /* Implemented in sm-taint.cc.  */
   void mark_as_tainted (const svalue *sval,
 			region_model_context *ctxt);
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index a8c63eb1ce8..bb8d83e4605 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -434,6 +434,10 @@ public:
 			     const svalue *new_ptr_sval,
 			     const extrinsic_state &ext_state) const;
 
+  void transition_ptr_sval_non_null (region_model *model, sm_state_map *smap,
+       const svalue *new_ptr_sval,
+       const extrinsic_state &ext_state) const;
+
   standard_deallocator_set m_free;
   standard_deallocator_set m_scalar_delete;
   standard_deallocator_set m_vector_delete;
@@ -2504,6 +2508,16 @@ on_realloc_with_move (region_model *model,
 		   NULL, ext_state);
 }
 
+/*  Hook for get_or_create_region_for_heap_alloc for the case when we want
+   ptr_sval to mark a newly created region as assumed non null on malloc SM.  */
+void
+malloc_state_machine::transition_ptr_sval_non_null (
+    region_model *model, sm_state_map *smap, const svalue *new_ptr_sval,
+    const extrinsic_state &ext_state) const
+{
+  smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL, ext_state);
+}
+
 } // anonymous namespace
 
 /* Internal interface to this file. */
@@ -2548,6 +2562,32 @@ region_model::on_realloc_with_move (const call_details &cd,
 				  *ext_state);
 }
 
+/* Moves ptr_sval from start to assumed non-null, for use by
+   region_model::get_or_create_region_for_heap_alloc.  */
+void
+region_model::transition_ptr_sval_non_null (region_model_context *ctxt,
+const svalue *ptr_sval)
+{
+  if (!ctxt)
+    return;
+  const extrinsic_state *ext_state = ctxt->get_ext_state ();
+  if (!ext_state)
+    return;
+
+  sm_state_map *smap;
+  const state_machine *sm;
+  unsigned sm_idx;
+  if (!ctxt->get_malloc_map (&smap, &sm, &sm_idx))
+    return;
+
+  gcc_assert (smap);
+  gcc_assert (sm);
+
+  const malloc_state_machine &malloc_sm = (const malloc_state_machine &)*sm;
+
+  malloc_sm.transition_ptr_sval_non_null (this, smap, ptr_sval, *ext_state);
+}
+
 } // namespace ana
 
 #endif /* #if ENABLE_ANALYZER */
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 9ecc42d4465..42c8aff101e 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -55,6 +55,8 @@ static GTY (()) hash_map<tree, tree> *analyzer_stashed_globals;
 namespace ana
 {
 static tree pyobj_record = NULL_TREE;
+static tree pyobj_ptr_tree = NULL_TREE;
+static tree pyobj_ptr_ptr = NULL_TREE;
 static tree varobj_record = NULL_TREE;
 static tree pylistobj_record = NULL_TREE;
 static tree pylongobj_record = NULL_TREE;
@@ -76,6 +78,703 @@ get_field_by_name (tree type, const char *name)
   return NULL_TREE;
 }
 
+static const svalue *
+get_sizeof_pyobjptr (region_model_manager *mgr)
+{
+  tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree);
+  const svalue *sizeof_sval = mgr->get_or_create_constant_svalue (size_tree);
+  return sizeof_sval;
+}
+
+static bool
+arg_is_integral_p(const call_details &cd, unsigned idx)
+{
+  return INTEGRAL_TYPE_P(cd.get_arg_type(idx));
+}
+
+static void
+init_ob_refcnt_field (region_model_manager *mgr, region_model *model,
+                      const region *ob_base_region, tree pyobj_record,
+                      const call_details &cd)
+{
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
+  const svalue *refcnt_one_sval
+      = mgr->get_or_create_int_cst (size_type_node, 1);
+  model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ());
+}
+
+static void
+set_ob_type_field (region_model_manager *mgr, region_model *model,
+                   const region *ob_base_region, tree pyobj_record,
+                   tree pytype_var_decl_ptr, const call_details &cd)
+{
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pytype_var_decl_ptr);
+  tree pytype_var_decl_ptr_type
+      = build_pointer_type (TREE_TYPE (pytype_var_decl_ptr));
+  const svalue *pylist_type_ptr_sval
+      = mgr->get_ptr_svalue (pytype_var_decl_ptr_type, pylist_type_region);
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+  const region *ob_type_region
+      = mgr->get_field_region (ob_base_region, ob_type_field);
+  model->set_value (ob_type_region, pylist_type_ptr_sval, cd.get_ctxt ());
+}
+
+static const region *
+get_ob_base_region (region_model_manager *mgr, region_model *model,
+                   const region *new_object_region, tree object_record,
+                   const svalue *pyobj_svalue, const call_details &cd)
+{
+  tree ob_base_tree = get_field_by_name (object_record, "ob_base");
+  const region *ob_base_region
+      = mgr->get_field_region (new_object_region, ob_base_tree);
+  model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
+  return ob_base_region;
+}
+
+static const region *
+init_pyobject_region (region_model_manager *mgr, region_model *model,
+                      const svalue *object_svalue, const call_details &cd)
+{
+  /* TODO: switch to actual tp_basic_size */
+  const svalue *tp_basicsize_sval = mgr->get_or_create_unknown_svalue (NULL);
+  const region *pyobject_region = model->get_or_create_region_for_heap_alloc (
+      tp_basicsize_sval, cd.get_ctxt (), true, &cd);
+  model->set_value (pyobject_region, object_svalue, cd.get_ctxt ());
+  return pyobject_region;
+}
+
+static void
+inc_field_val (region_model_manager *mgr, region_model *model,
+               const region *field_region, const tree type_node,
+               const call_details &cd, const svalue **old_sval = nullptr,
+               const svalue **new_sval = nullptr)
+{
+  const svalue *tmp_old_sval
+      = model->get_store_value (field_region, cd.get_ctxt ());
+  const svalue *one_sval = mgr->get_or_create_int_cst (type_node, 1);
+  const svalue *tmp_new_sval = mgr->get_or_create_binop (
+      type_node, PLUS_EXPR, tmp_old_sval, one_sval);
+
+  model->set_value (field_region, tmp_new_sval, cd.get_ctxt ());
+
+  if (old_sval)
+    *old_sval = tmp_old_sval;
+
+  if (new_sval)
+    *new_sval = tmp_new_sval;
+}
+
+class pyobj_init_fail : public failed_call_info
+{
+public:
+  pyobj_init_fail (const call_details &cd) : failed_call_info (cd) {}
+
+  bool
+  update_model (region_model *model, const exploded_edge *,
+                region_model_context *ctxt) const final override
+  {
+    /* Return NULL; everything else is unchanged. */
+    const call_details cd (get_call_details (model, ctxt));
+    region_model_manager *mgr = cd.get_manager ();
+    if (cd.get_lhs_type ())
+      {
+        const svalue *zero
+            = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+        model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+      }
+    return true;
+  }
+};
+
+class kf_PyList_Append : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 2); // TODO: more checks here
+  }
+  void impl_call_pre (const call_details &cd) const final override;
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_Append::impl_call_pre (const call_details &cd) const
+{
+  region_model_manager *mgr = cd.get_manager ();
+  region_model *model = cd.get_model ();
+
+  const svalue *pylist_sval = cd.get_arg_svalue (0);
+  const region *pylist_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  const svalue *newitem_sval = cd.get_arg_svalue (1);
+  const region *newitem_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  // Skip checks if unknown etc
+  if (pylist_sval->get_kind () != SK_REGION
+      && pylist_sval->get_kind () != SK_CONSTANT)
+    return;
+
+  // PyList_Check
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+  const region *ob_type_region
+      = mgr->get_field_region (pylist_reg, ob_type_field);
+  const svalue *stored_sval
+      = model->get_store_value (ob_type_region, cd.get_ctxt ());
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pylisttype_vardecl);
+  tree pylisttype_vardecl_ptr
+      = build_pointer_type (TREE_TYPE (pylisttype_vardecl));
+  const svalue *pylist_type_ptr
+      = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region);
+
+  if (stored_sval != pylist_type_ptr)
+    {
+      // TODO: emit diagnostic -Wanalyzer-type-error
+      cd.get_ctxt ()->terminate_path ();
+      return;
+    }
+
+  // Check that new_item is not null.
+  {
+    const svalue *null_ptr
+        = mgr->get_or_create_int_cst (newitem_sval->get_type (), 0);
+    if (!model->add_constraint (newitem_sval, NE_EXPR, null_ptr,
+                                cd.get_ctxt ()))
+      {
+        // TODO: emit diagnostic here
+        cd.get_ctxt ()->terminate_path ();
+        return;
+      }
+  }
+}
+
+void
+kf_PyList_Append::impl_call_post (const call_details &cd) const
+{
+  /* Three custom subclasses of custom_edge_info, for handling the various
+     outcomes of "realloc".  */
+
+  /* Concrete custom_edge_info: a realloc call that fails, returning NULL.
+   */
+  class realloc_failure : public failed_call_info
+  {
+  public:
+    realloc_failure (const call_details &cd) : failed_call_info (cd) {}
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      /* Identify ob_item field and set it to NULL. */
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_sval = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      model->set_value (ob_item_reg, null_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *neg_one
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), -1);
+          cd.maybe_set_lhs(neg_one);
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_no_move : public call_info
+  {
+  public:
+    realloc_success_no_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (
+          can_colorize, "when %qE succeeds, without moving underlying buffer",
+          get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *ob_size_sval = nullptr;
+      const svalue *new_size_sval = nullptr;
+      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
+                     &ob_size_sval, &new_size_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *num_allocated_bytes = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *ob_item_ptr_sval
+          = model->get_store_value (ob_item_region, cd.get_ctxt ());
+
+      /* We can only grow in place with a non-NULL pointer and no unknown
+       */
+      {
+        const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+        if (!model->add_constraint (ob_item_ptr_sval, NE_EXPR, null_ptr,
+                                    cd.get_ctxt ()))
+          {
+            return false;
+          }
+      }
+
+      const unmergeable_svalue *underlying_svalue
+          = ob_item_ptr_sval->dyn_cast_unmergeable_svalue ();
+      const svalue *target_svalue = nullptr;
+      const region_svalue *target_region_svalue = nullptr;
+
+      if (underlying_svalue)
+        {
+          target_svalue = underlying_svalue->get_arg ();
+          if (target_svalue->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+        }
+      else
+        {
+          if (ob_item_ptr_sval->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+          target_svalue = ob_item_ptr_sval;
+        }
+
+      target_region_svalue = target_svalue->dyn_cast_region_svalue ();
+      const region *curr_reg = target_region_svalue->get_pointee ();
+
+      if (compat_types_p (num_allocated_bytes->get_type (), size_type_node))
+        model->set_dynamic_extents (curr_reg, num_allocated_bytes, ctxt);
+
+      model->set_value (ob_size_region, new_size_sval, ctxt);
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (curr_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      // Increment ob_refcnt of appended item.
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          cd.maybe_set_lhs(zero);
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_move : public call_info
+  {
+  public:
+    realloc_success_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds, moving buffer",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *old_ob_size_sval = nullptr;
+      const svalue *new_ob_size_sval = nullptr;
+      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
+                     &old_ob_size_sval, &new_ob_size_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          old_ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *new_size_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_ob_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      /* Create the new region.  */
+      const region *new_reg = model->get_or_create_region_for_heap_alloc (
+          new_size_sval, cd.get_ctxt ());
+      const svalue *new_ptr_sval
+          = mgr->get_ptr_svalue (pyobj_ptr_ptr, new_reg);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval,
+                                  cd.get_ctxt ()))
+        return false;
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          const svalue *old_size_sval = model->get_dynamic_extents (freed_reg);
+          if (old_size_sval)
+            {
+              const svalue *copied_size_sval
+                  = get_copied_size (model, old_size_sval, new_size_sval);
+              const region *copied_old_reg = mgr->get_sized_region (
+                  freed_reg, pyobj_ptr_ptr, copied_size_sval);
+              const svalue *buffer_content_sval
+                  = model->get_store_value (copied_old_reg, cd.get_ctxt ());
+              const region *copied_new_reg = mgr->get_sized_region (
+                  new_reg, pyobj_ptr_ptr, copied_size_sval);
+              model->set_value (copied_new_reg, buffer_content_sval,
+                                cd.get_ctxt ());
+            }
+          else
+            {
+              model->mark_region_as_unknown (freed_reg, cd.get_uncertainty ());
+            }
+
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, null_ptr,
+                                  cd.get_ctxt ()))
+        return false;
+
+      model->set_value (ob_size_region, new_ob_size_sval, ctxt);
+      model->set_value (ob_item_reg, new_ptr_sval, cd.get_ctxt ());
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, old_ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (new_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      // Increment ob_refcnt of appended item.
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          cd.maybe_set_lhs(zero);
+        }
+      return true;
+    }
+
+  private:
+    /* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL.
+       If unknown, OLD_SIZE_SVAL is returned.  */
+    const svalue *
+    get_copied_size (region_model *model, const svalue *old_size_sval,
+                     const svalue *new_size_sval) const
+    {
+      tristate res
+          = model->eval_condition (old_size_sval, GT_EXPR, new_size_sval);
+      switch (res.get_value ())
+        {
+        case tristate::TS_TRUE:
+          return new_size_sval;
+        case tristate::TS_FALSE:
+        case tristate::TS_UNKNOWN:
+          return old_size_sval;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  };
+
+  /* Body of kf_PyList_Append::impl_call_post.  */
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_failure> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_no_move> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_move> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyList_New : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && arg_is_integral_p (cd, 0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_New::impl_call_post (const call_details &cd) const
+{
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *varobj_svalue
+          = mgr->get_or_create_unknown_svalue (varobj_record);
+      const svalue *pylist_svalue
+          = mgr->get_or_create_unknown_svalue (pylistobj_record);
+
+      const svalue *size_sval = cd.get_arg_svalue (0);
+
+      const region *pylist_region
+          = init_pyobject_region (mgr, model, pylist_svalue, cd);
+
+      /*
+      typedef struct
+      {
+        PyObject_VAR_HEAD
+        PyObject **ob_item;
+        Py_ssize_t allocated;
+      } PyListObject;
+      */
+      tree varobj_field = get_field_by_name (pylistobj_record, "ob_base");
+      const region *varobj_region
+          = mgr->get_field_region (pylist_region, varobj_field);
+      model->set_value (varobj_region, varobj_svalue, cd.get_ctxt ());
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_region, ob_item_field);
+
+      const svalue *zero_sval = mgr->get_or_create_int_cst (size_type_node, 0);
+      const svalue *casted_size_sval
+          = mgr->get_or_create_cast (size_type_node, size_sval);
+      const svalue *size_cond_sval = mgr->get_or_create_binop (
+          size_type_node, LE_EXPR, casted_size_sval, zero_sval);
+
+      // if size <= 0, ob_item = NULL
+
+      if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (),
+                              integer_one_node))
+        {
+          const svalue *null_sval
+              = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+          model->set_value (ob_item_region, null_sval, cd.get_ctxt ());
+        }
+      else // calloc
+        {
+          const svalue *sizeof_sval = mgr->get_or_create_cast (
+              size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+          const svalue *prod_sval = mgr->get_or_create_binop (
+              size_type_node, MULT_EXPR, sizeof_sval, size_sval);
+          const region *ob_item_sized_region
+              = model->get_or_create_region_for_heap_alloc (prod_sval,
+                                                            cd.get_ctxt ());
+          model->zero_fill_region (ob_item_sized_region);
+          const svalue *ob_item_ptr_sval
+              = mgr->get_ptr_svalue (pyobj_ptr_ptr, ob_item_sized_region);
+          const svalue *ob_item_unmergeable
+              = mgr->get_or_create_unmergeable (ob_item_ptr_sval);
+          model->set_value (ob_item_region, ob_item_unmergeable,
+                            cd.get_ctxt ());
+        }
+
+      /*
+      typedef struct {
+      PyObject ob_base;
+      Py_ssize_t ob_size; // Number of items in variable part
+      } PyVarObject;
+      */
+      const region *ob_base_region = get_ob_base_region (
+          mgr, model, varobj_region, varobj_record, pyobj_svalue, cd);
+
+      tree ob_size_tree = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (varobj_region, ob_size_tree);
+      model->set_value (ob_size_region, size_sval, cd.get_ctxt ());
+
+      /*
+      typedef struct _object {
+          _PyObject_HEAD_EXTRA
+          Py_ssize_t ob_refcnt;
+          PyTypeObject *ob_type;
+      } PyObject;
+      */
+
+      // Initialize ob_refcnt field to 1.
+      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
+
+      // Get pointer svalue for PyList_Type then assign it to ob_type field.
+      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylisttype_vardecl, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylist_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyLong_FromLong : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && arg_is_integral_p (cd, 0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
+{
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *pylongobj_sval
+          = mgr->get_or_create_unknown_svalue (pylongobj_record);
+
+      const region *pylong_region
+          = init_pyobject_region (mgr, model, pylongobj_sval, cd);
+
+      // Create a region for the base PyObject within the PyLongObject.
+      const region *ob_base_region = get_ob_base_region (
+          mgr, model, pylong_region, pylongobj_record, pyobj_svalue, cd);
+
+      // Initialize ob_refcnt field to 1.
+      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
+
+      // Get pointer svalue for PyLong_Type then assign it to ob_type field.
+      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylongtype_vardecl, cd);
+
+      // Set the PyLongObject value.
+      tree ob_digit_field = get_field_by_name (pylongobj_record, "ob_digit");
+      const region *ob_digit_region
+          = mgr->get_field_region (pylong_region, ob_digit_field);
+      const svalue *ob_digit_sval = cd.get_arg_svalue (0);
+      model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylong_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
 static void
 maybe_stash_named_type (logger *logger, const translation_unit &tu,
                         const char *name)
@@ -179,6 +878,12 @@ init_py_structs ()
   pylongobj_record = get_stashed_type_by_name ("PyLongObject");
   pylongtype_vardecl = get_stashed_global_var_by_name ("PyLong_Type");
   pylisttype_vardecl = get_stashed_global_var_by_name ("PyList_Type");
+
+  if (pyobj_record)
+    {
+      pyobj_ptr_tree = build_pointer_type (pyobj_record);
+      pyobj_ptr_ptr = build_pointer_type (pyobj_ptr_tree);
+    }
 }
 
 void
@@ -205,6 +910,12 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
       sorry_no_cpython_plugin ();
       return;
     }
+
+  iface->register_known_function ("PyList_Append",
+                                  make_unique<kf_PyList_Append> ());
+  iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
+  iface->register_known_function ("PyLong_FromLong",
+                                  make_unique<kf_PyLong_FromLong> ());
 }
 } // namespace ana
 
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
new file mode 100644
index 00000000000..19b5c17428a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 09c45394b1f..e1ed2d2589e 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -161,7 +161,8 @@ set plugin_test_list [list \
 	  taint-CVE-2011-0521-6.c \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c } \
+	  cpython-plugin-test-1.c \
+	  cpython-plugin-test-2.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7004711b384..eda53ff3a09 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12559,3 +12559,28 @@ proc check_effective_target_const_volatile_readonly_section { } {
     }
   return 1
 }
+
+# Appends necessary Python flags to extra-tool-flags if Python.h is supported.
+# Otherwise, modifies dg-do-what.
+proc dg-require-python-h { args } {
+    upvar dg-extra-tool-flags extra-tool-flags
+
+    verbose "ENTER dg-require-python-h" 2
+
+    set result [remote_exec host "python3-config --includes"]
+    set status [lindex $result 0]
+    if { $status == 0 } {
+        set python_flags [lindex $result 1]
+    } else {
+	verbose "Python.h not supported" 2
+	upvar dg-do-what dg-do-what
+	set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+	return
+    }
+
+    verbose "Python flags are: $python_flags" 2
+
+    verbose "Before appending, extra-tool-flags: ${extra-tool-flags}" 3
+    eval lappend extra-tool-flags $python_flags
+    verbose "After appending, extra-tool-flags: ${extra-tool-flags}" 3
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]
  2023-08-09 19:22                                 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
@ 2023-08-09 21:36                                   ` David Malcolm
  2023-08-11 17:47                                     ` [COMMITTED] " Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-09 21:36 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Wed, 2023-08-09 at 15:22 -0400, Eric Feng wrote:
> Thank you for your help in getting dg-require-python-h working! I can
> confirm that the FAILs are related to differences between the --
> cflags
> affecting the gimple seen by the analyzer. For this reason, I have
> changed it to --includes for now. 

Sounds good.

Eventually we'll probably want to support --cflags, but given that
every distribution probably has its own set of flags, it's a recipe for
an unpleasantly large test matrix, so just using --includes is a good
compromise.

> To be sure, I tested on Python 3.8 as
> well and it works as expected. I have also addressed the following
> comments on the WIP patch as you described.
> 
> -- Update Changelog entry to list new functions being simulated.
> -- Update region_model::get_or_create_region_for_heap_alloc leading
> comment.
> -- Change register_alloc to update_state_machine.
> -- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
> -- Static helper functions for:
>         -- Initializing ob_refcnt field.
>         -- Setting ob_type field.
>         -- Getting ob_base field.
>         -- Initializing heap allocated region for PyObjects.
>         -- Incrementing a field by one.
> -- Change arg_is_long_p to arg_is_integral_p.
> -- Extract common failure scenario for reusability.
> 
> The initial WIP patch using 
> 
> /* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */
> 
> have been bootstrapped and regtested on aarch64-unknown-linux-gnu.
> Since
> we did not change any core logic in the revision and the only changes
> within the analyzer core are changing variable names, is it OK for
> trunk. In the mean time, the revised patch is currently going through
> bootstrap and regtest process.

Thanks for the updated patch.

Unfortunately I just pushed a largish analyzer patch (r14-3114-
g73da34a538ddc2) which may well conflict with your patch, so please
rebase to beyond that.  

Sorry about this.

In particular note that there's no longer a default assignment to the
LHS at a call-site in region_model::on_call_pre; known_function
subclasses are now responsible for assigning to the LHS of the
callsite.  But I suspect that all the known_function subclasses in the
cpython plugin already do that.

Some nits inline below...

[...snip...]

> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with
> the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every
> time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual
> implementation
> follows.

Might be worth capturing these notes as comments in the source (for
class kf_PyList_Append), rather than just within the commit message.

[...snip...]
> 
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index e92b3f7b074..c338f045d92 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats
> (const svalue *size_in_bytes,
>     Use CTXT to complain about tainted sizes.
>  
>     Reuse an existing heap_allocated_region if it's not being
> referenced by
> -   this region_model; otherwise create a new one.  */
> +   this region_model; otherwise create a new one.
> +
> +   Optionally (update_state_machine) transitions the pointer
> pointing to the
> +   heap_allocated_region from start to assumed non-null.  */
>  
>  const region *
>  region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> -                                                 
> region_model_context *ctxt)
> +       region_model_context *ctxt,
> +       bool update_state_machine,
> +       const call_details *cd)
>  {
>    /* Determine which regions are referenced in this region_model, so
> that
>       we can reuse an existing heap_allocated_region if it's not in
> use on
> @@ -5153,6 +5158,17 @@
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
>    if (size_in_bytes)
>      if (compat_types_p (size_in_bytes->get_type (), size_type_node))
>        set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> +       if (update_state_machine && cd)
> +               {
> +                       const svalue *ptr_sval = nullptr;
> +                       if (cd->get_lhs_type ())
> +       ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> +                       else
> +       ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
> +                       transition_ptr_sval_non_null (ctxt,
> ptr_sval);

This if/else is redundant: the "else" is only reached if cd-
>get_lhs_type () is null, in which case you pass in NULL_TREE, so it
works the same either way.  Or am I missing something?

Also, it looks like something weird's happening with indentation in
this hunk.

> +               }
> +
>    return reg;
>  }
>  
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-
> model.h
> index 0cf38714c96..16c80a238bc 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -387,9 +387,9 @@ class region_model
>                        region_model_context *ctxt,
>                        rejected_constraint **out);
>  
> -  const region *
> -  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> -                                      region_model_context *ctxt);
> +  const region *get_or_create_region_for_heap_alloc (
> +      const svalue *size_in_bytes, region_model_context *ctxt,
> +      bool update_state_machine = false, const call_details *cd =
> nullptr);

Nit: non-standard indentation here.

>    const region *create_region_for_alloca (const svalue
> *size_in_bytes,
>                                           region_model_context
> *ctxt);
>    void get_referenced_base_regions (auto_bitmap &out_ids) const;
> @@ -476,6 +476,10 @@ class region_model
>                              const svalue *old_ptr_sval,
>                              const svalue *new_ptr_sval);
>  
> +  /* Implemented in sm-malloc.cc.  */
> +  void transition_ptr_sval_non_null (region_model_context *ctxt,
> +       const svalue *new_ptr_sval);

Nit: non-standard indentation here.

> +
>    /* Implemented in sm-taint.cc.  */
>    void mark_as_tainted (const svalue *sval,
>                         region_model_context *ctxt);
> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index a8c63eb1ce8..bb8d83e4605 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -434,6 +434,10 @@ public:
>                              const svalue *new_ptr_sval,
>                              const extrinsic_state &ext_state) const;
>  
> +  void transition_ptr_sval_non_null (region_model *model,
> sm_state_map *smap,
> +       const svalue *new_ptr_sval,
> +       const extrinsic_state &ext_state) const;

Nit: non-standard indentation here.

> +
>    standard_deallocator_set m_free;
>    standard_deallocator_set m_scalar_delete;
>    standard_deallocator_set m_vector_delete;
> @@ -2504,6 +2508,16 @@ on_realloc_with_move (region_model *model,
>                    NULL, ext_state);
>  }
>  
> +/*  Hook for get_or_create_region_for_heap_alloc for the case when
> we want
> +   ptr_sval to mark a newly created region as assumed non null on
> malloc SM.  */
> +void
> +malloc_state_machine::transition_ptr_sval_non_null (
> +    region_model *model, sm_state_map *smap, const svalue
> *new_ptr_sval,
> +    const extrinsic_state &ext_state) const

Nit: non-standard indentation here.

> +{
> +  smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL,
> ext_state);
> +}
> 
[...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 9ecc42d4465..42c8aff101e 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...]

> @@ -76,6 +78,703 @@ get_field_by_name (tree type, const char *name)
>    return NULL_TREE;
>  }
>  
> +static const svalue *
> +get_sizeof_pyobjptr (region_model_manager *mgr)
> +{
> +  tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree);
> +  const svalue *sizeof_sval = mgr->get_or_create_constant_svalue
> (size_tree);
> +  return sizeof_sval;
> +}
> +
> +static bool
> +arg_is_integral_p(const call_details &cd, unsigned idx)
> +{
> +  return INTEGRAL_TYPE_P(cd.get_arg_type(idx));
> +}

We already have a call_details::arg_is_pointer_p, so perhaps this
should be a call_details::arg_is_integral_p?

> +
> +static void
> +init_ob_refcnt_field (region_model_manager *mgr, region_model
> *model,
> +                      const region *ob_base_region, tree
> pyobj_record,
> +                      const call_details &cd)
> +{
> +  tree ob_refcnt_tree = get_field_by_name (pyobj_record,
> "ob_refcnt");
> +  const region *ob_refcnt_region
> +      = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
> +  const svalue *refcnt_one_sval
> +      = mgr->get_or_create_int_cst (size_type_node, 1);
> +  model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt
> ());
> +}

Please add a leading comment to this function, something like:

/* Update MODEL to set OB_BASE_REGION's ob_refcnt to 1.  */

> +
> +static void
> +set_ob_type_field (region_model_manager *mgr, region_model *model,
> +                   const region *ob_base_region, tree pyobj_record,
> +                   tree pytype_var_decl_ptr, const call_details &cd)
> +{
> +  const region *pylist_type_region
> +      = mgr->get_region_for_global (pytype_var_decl_ptr);
> +  tree pytype_var_decl_ptr_type
> +      = build_pointer_type (TREE_TYPE (pytype_var_decl_ptr));
> +  const svalue *pylist_type_ptr_sval
> +      = mgr->get_ptr_svalue (pytype_var_decl_ptr_type,
> pylist_type_region);
> +  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
> +  const region *ob_type_region
> +      = mgr->get_field_region (ob_base_region, ob_type_field);
> +  model->set_value (ob_type_region, pylist_type_ptr_sval,
> cd.get_ctxt ());
> +}

Likewise, this needs a leading comment, something like:

/* Update MODEL to set OB_BASE_REGION's ob_type to point to
   PYTYPE_VAR_DECL_PTR.  */

> +
> +static const region *
> +get_ob_base_region (region_model_manager *mgr, region_model *model,
> +                   const region *new_object_region, tree
> object_record,
> +                   const svalue *pyobj_svalue, const call_details
> &cd)
> +{
> +  tree ob_base_tree = get_field_by_name (object_record, "ob_base");
> +  const region *ob_base_region
> +      = mgr->get_field_region (new_object_region, ob_base_tree);
> +  model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
> +  return ob_base_region;
> +}

Likewise, needs a leading comment.  It isn't clear to me what the
intent of this function is.  I see it used from
kf_PyLong_FromLong::impl_call_post's outcome handler, where it seems to
be used to set the ob_base_region to an unknown value.

> +
> +static const region *
> +init_pyobject_region (region_model_manager *mgr, region_model
> *model,
> +                      const svalue *object_svalue, const
> call_details &cd)
> +{
> +  /* TODO: switch to actual tp_basic_size */
> +  const svalue *tp_basicsize_sval = mgr-
> >get_or_create_unknown_svalue (NULL);
> +  const region *pyobject_region = model-
> >get_or_create_region_for_heap_alloc (
> +      tp_basicsize_sval, cd.get_ctxt (), true, &cd);
> +  model->set_value (pyobject_region, object_svalue, cd.get_ctxt ());
> +  return pyobject_region;
> +}

Likewise needs a leading comment, and the exact intent isn't quite
clear to me.  I believe that everywhere you're calling it, you're
passing in an unknown svalue for "object_svalue".

[...]

> +class kf_PyList_Append : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 2); // TODO: more checks here

Probably: 
  && cd.arg_is_pointer_p (0)
  && cd.arg_is_pointer_p (1)

> +  }
> +  void impl_call_pre (const call_details &cd) const final override;
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +

[...snip kf_PyList_Append implementation...]

I confess that my eyes started glazing over at the kf_PyList_Append
implementation.  Given that this is just within the test plugin, I'll
defer to you on this.


> +class kf_PyList_New : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && arg_is_integral_p (cd, 0));
> +  }
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +
> +void
> +kf_PyList_New::impl_call_post (const call_details &cd) const
> +{
> +  class success : public call_info
> +  {
> +  public:
> +    success (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (can_colorize, "when %qE succeeds",
> +                              get_fndecl ());
> +    }
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pyobj_svalue
> +          = mgr->get_or_create_unknown_svalue (pyobj_record);
> +      const svalue *varobj_svalue
> +          = mgr->get_or_create_unknown_svalue (varobj_record);
> +      const svalue *pylist_svalue
> +          = mgr->get_or_create_unknown_svalue (pylistobj_record);
> +
> +      const svalue *size_sval = cd.get_arg_svalue (0);
> +
> +      const region *pylist_region
> +          = init_pyobject_region (mgr, model, pylist_svalue, cd);
> +
> +      /*
> +      typedef struct
> +      {
> +        PyObject_VAR_HEAD
> +        PyObject **ob_item;
> +        Py_ssize_t allocated;
> +      } PyListObject;
> +      */
> +      tree varobj_field = get_field_by_name (pylistobj_record,
> "ob_base");
> +      const region *varobj_region
> +          = mgr->get_field_region (pylist_region, varobj_field);
> +      model->set_value (varobj_region, varobj_svalue, cd.get_ctxt
> ());
> +
> +      tree ob_item_field = get_field_by_name (pylistobj_record,
> "ob_item");
> +      const region *ob_item_region
> +          = mgr->get_field_region (pylist_region, ob_item_field);
> +
> +      const svalue *zero_sval = mgr->get_or_create_int_cst
> (size_type_node, 0);
> +      const svalue *casted_size_sval
> +          = mgr->get_or_create_cast (size_type_node, size_sval);
> +      const svalue *size_cond_sval = mgr->get_or_create_binop (
> +          size_type_node, LE_EXPR, casted_size_sval, zero_sval);
> +
> +      // if size <= 0, ob_item = NULL
> +
> +      if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (),
> +                              integer_one_node))
> +        {

I think the way I'd do this is to bifurcate on the <= 0 versus > 0
case, and add the constraints to the model, as this ought to better
handle non-constant values for the size.

But this is good enough for now.

> +          const svalue *null_sval
> +              = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
> +          model->set_value (ob_item_region, null_sval, cd.get_ctxt
> ());
> +        }
> +      else // calloc
> +        {
> +          const svalue *sizeof_sval = mgr->get_or_create_cast (
> +              size_sval->get_type (), get_sizeof_pyobjptr (mgr));
> +          const svalue *prod_sval = mgr->get_or_create_binop (
> +              size_type_node, MULT_EXPR, sizeof_sval, size_sval);
> +          const region *ob_item_sized_region
> +              = model->get_or_create_region_for_heap_alloc
> (prod_sval,
> +                                                           
> cd.get_ctxt ());
> +          model->zero_fill_region (ob_item_sized_region);
> +          const svalue *ob_item_ptr_sval
> +              = mgr->get_ptr_svalue (pyobj_ptr_ptr,
> ob_item_sized_region);
> +          const svalue *ob_item_unmergeable
> +              = mgr->get_or_create_unmergeable (ob_item_ptr_sval);
> +          model->set_value (ob_item_region, ob_item_unmergeable,
> +                            cd.get_ctxt ());
> +        }
> +
> +      /*
> +      typedef struct {
> +      PyObject ob_base;
> +      Py_ssize_t ob_size; // Number of items in variable part
> +      } PyVarObject;
> +      */
> +      const region *ob_base_region = get_ob_base_region (
> +          mgr, model, varobj_region, varobj_record, pyobj_svalue,
> cd);
> +
> +      tree ob_size_tree = get_field_by_name (varobj_record,
> "ob_size");
> +      const region *ob_size_region
> +          = mgr->get_field_region (varobj_region, ob_size_tree);
> +      model->set_value (ob_size_region, size_sval, cd.get_ctxt ());
> +
> +      /*
> +      typedef struct _object {
> +          _PyObject_HEAD_EXTRA
> +          Py_ssize_t ob_refcnt;
> +          PyTypeObject *ob_type;
> +      } PyObject;
> +      */
> +
> +      // Initialize ob_refcnt field to 1.
> +      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record,
> cd);
> +
> +      // Get pointer svalue for PyList_Type then assign it to
> ob_type field.
> +      set_ob_type_field(mgr, model, ob_base_region, pyobj_record,
> pylisttype_vardecl, cd);
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *ptr_sval
> +              = mgr->get_ptr_svalue (cd.get_lhs_type (),
> pylist_region);
> +          cd.maybe_set_lhs (ptr_sval);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  if (cd.get_ctxt ())
> +    {
> +      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
> +      cd.get_ctxt ()->terminate_path ();
> +    }
> +}
> +
> +class kf_PyLong_FromLong : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && arg_is_integral_p (cd, 0));
> +  }
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +
> +void
> +kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
> +{
> +  class success : public call_info
> +  {
> +  public:
> +    success (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (can_colorize, "when %qE succeeds",
> +                              get_fndecl ());
> +    }

If you subclass from success_call_info then you get an equivalent
get_desc implementation "for free".

> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {

Could add a comment here that we *don't* attempt to simulate the
special-casing that CPython does for values -5 to 256 (see
https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong ).

> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pyobj_svalue
> +          = mgr->get_or_create_unknown_svalue (pyobj_record);
> +      const svalue *pylongobj_sval
> +          = mgr->get_or_create_unknown_svalue (pylongobj_record);
> +
> +      const region *pylong_region
> +          = init_pyobject_region (mgr, model, pylongobj_sval, cd);
> +
> +      // Create a region for the base PyObject within the
> PyLongObject.
> +      const region *ob_base_region = get_ob_base_region (
> +          mgr, model, pylong_region, pylongobj_record, pyobj_svalue,
> cd);
> +
> +      // Initialize ob_refcnt field to 1.
> +      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record,
> cd);
> +
> +      // Get pointer svalue for PyLong_Type then assign it to
> ob_type field.
> +      set_ob_type_field(mgr, model, ob_base_region, pyobj_record,
> pylongtype_vardecl, cd);
> +
> +      // Set the PyLongObject value.
> +      tree ob_digit_field = get_field_by_name (pylongobj_record,
> "ob_digit");
> +      const region *ob_digit_region
> +          = mgr->get_field_region (pylong_region, ob_digit_field);
> +      const svalue *ob_digit_sval = cd.get_arg_svalue (0);
> +      model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt
> ());
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *ptr_sval
> +              = mgr->get_ptr_svalue (cd.get_lhs_type (),
> pylong_region);
> +          cd.maybe_set_lhs (ptr_sval);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  if (cd.get_ctxt ())
> +    {
> +      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
> +      cd.get_ctxt ()->terminate_path ();
> +    }
> +}

[...]


> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> new file mode 100644
> index 00000000000..19b5c17428a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyList_New (Py_ssize_t len)
> +{
> +  PyObject *obj = PyList_New (len);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" }
> */
> +     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning
> "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}

There's lots of scope for extra test coverage here.

For example, for all these test_FOO_New cases, consider a variant with
"void" return and no "return obj;" at the end.  The analyzer ought to
report a leak when we fall off the end of these functions.

Similarly, it's good to try both:
- symbolic values for arguments (like you have here)
- constant values (for example, what happens with NULL for pointer
params?)

FWIW I like to organize test coverage for specific known functions into
test cases named after the function, so perhaps we could have:
  cpython-plugin-test-PyList_Append.c
  cpython-plugin-test-PyList_New.c
  cpython-plugin-test-PyLong_New.c
but that's up to you.

All this can be left to a follow-up, though.

> +
> +PyObject *
> +test_PyLong_New (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" }
> */
> +     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning
> "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +PyObject *
> +test_PyListAppend (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, item);
> +  return list; /* { dg-warning "leak of 'item'" } */
> +}
> +
> +PyObject *
> +test_PyListAppend_2 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  if (!item)
> +       return NULL;
> +
> +  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" }
> */
> +  PyObject *list = PyList_New (n);
> +  if (!list)
> +  {
> +       Py_DECREF(item);
> +       return NULL;
> +  }
> +
> +  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" }
> */
> +
> +  if (PyList_Append (list, item) < 0)
> +    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" }
> */
> +  else
> +    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" }
> */
> +  return list; /* { dg-warning "leak of 'item'" } */
> +}
> +
> +
> +PyObject *
> +test_PyListAppend_3 (PyObject *item, PyObject *list)
> +{
> +  PyList_Append (list, item);
> +  return list;
> +}
> \ No newline at end of file

[...]

Overall, I think that assuming the rebase is trivial then with the nits
fixed, this is good for trunk.  As noted above there are some issues
with the known_function implementations in the plugin, but that's a
minor detail that doesn't impact anyone else, so let's not perfect be
the enemy of the good.

Hope the above makes sense; thanks again for the patch.
Dave



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]
  2023-08-09 21:36                                   ` David Malcolm
@ 2023-08-11 17:47                                     ` Eric Feng
  2023-08-11 20:23                                       ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-11 17:47 UTC (permalink / raw)
  To: dmalcom; +Cc: gcc, gcc-patches, Eric Feng

Thanks for the feedback! I've incorporated the changes (aside from
expanding test coverage, which I plan on releasing in a follow-up),
rebased, and performed a bootstrap and regtest on
aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
with nits fixed and no problems after rebase, the patch has now been pushed. 

Best,
Eric

---

This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
      │
      |          ^~~~
│
  ‘create_py_object’: events 1-4
│
    |
│
    |    4 |   PyObject* item = PyLong_FromLong(10);
│
    |      |                    ^~~~~~~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (1) allocated here
│
    |      |                    (2) when ‘PyLong_FromLong’ succeeds
│
    |    5 |   PyObject* list = PyList_New(2);
│
    |      |                    ~~~~~~~~~~~~~
│
    |      |                    |
│
    |      |                    (3) when ‘PyList_New’ fails
│
    |......
│
    |   21 |   return list;
│
    |      |          ~~~~
│
    |      |          |
│
    |      |          (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
	PR analyzer/107646
	* call-details.h: New function.
	* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
	New optional parameters.
	* region-model.h (class region_model): New optional parameters.
	* sm-malloc.cc (on_realloc_with_move): New function.
	(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
	PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
	PyList_New, PyList_Append, PyLong_FromLong
	* gcc.dg/plugin/plugin.exp: New test.
	* lib/target-supports.exp: New procedure.
	* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>
---
 gcc/analyzer/call-details.h                   |   4 +
 gcc/analyzer/region-model.cc                  |  17 +-
 gcc/analyzer/region-model.h                   |  14 +-
 gcc/analyzer/sm-malloc.cc                     |  42 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++++++++++++++++++
 .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   3 +-
 gcc/testsuite/lib/target-supports.exp         |  25 +
 8 files changed, 899 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 24be2247e63..bf2601151ea 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -49,6 +49,10 @@ public:
     return POINTER_TYPE_P (get_arg_type (idx));
   }
   bool arg_is_size_p (unsigned idx) const;
+  bool arg_is_integral_p (unsigned idx) const
+  {
+    return INTEGRAL_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 094b7af3dbc..aa9fe008b9d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const svalue *size_in_bytes,
    Use CTXT to complain about tainted sizes.
 
    Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-						   region_model_context *ctxt)
+       region_model_context *ctxt,
+       bool update_state_machine,
+       const call_details *cd)
 {
   /* Determine which regions are referenced in this region_model, so that
      we can reuse an existing heap_allocated_region if it's not in use on
@@ -5017,6 +5022,14 @@ region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
   if (size_in_bytes)
     if (compat_types_p (size_in_bytes->get_type (), size_type_node))
       set_dynamic_extents (reg, size_in_bytes, ctxt);
+
+	if (update_state_machine && cd)
+		{
+			const svalue *ptr_sval
+			= m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
+      transition_ptr_sval_non_null (ctxt, ptr_sval);
+		}
+
   return reg;
 }
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 0cf38714c96..a8acad8b7b2 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -387,9 +387,12 @@ class region_model
 		       region_model_context *ctxt,
 		       rejected_constraint **out);
 
-  const region *
-  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-				       region_model_context *ctxt);
+	const region *
+	get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
+				region_model_context *ctxt,
+				bool update_state_machine = false,
+				const call_details *cd = nullptr);
+
   const region *create_region_for_alloca (const svalue *size_in_bytes,
 					  region_model_context *ctxt);
   void get_referenced_base_regions (auto_bitmap &out_ids) const;
@@ -476,6 +479,11 @@ class region_model
 			     const svalue *old_ptr_sval,
 			     const svalue *new_ptr_sval);
 
+  /* Implemented in sm-malloc.cc.  */
+  void
+  transition_ptr_sval_non_null (region_model_context *ctxt,
+      const svalue *new_ptr_sval);
+
   /* Implemented in sm-taint.cc.  */
   void mark_as_tainted (const svalue *sval,
 			region_model_context *ctxt);
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index a8c63eb1ce8..ec763254b29 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -434,6 +434,11 @@ public:
 			     const svalue *new_ptr_sval,
 			     const extrinsic_state &ext_state) const;
 
+  void transition_ptr_sval_non_null (region_model *model,
+      sm_state_map *smap,
+      const svalue *new_ptr_sval,
+      const extrinsic_state &ext_state) const;
+
   standard_deallocator_set m_free;
   standard_deallocator_set m_scalar_delete;
   standard_deallocator_set m_vector_delete;
@@ -2504,6 +2509,17 @@ on_realloc_with_move (region_model *model,
 		   NULL, ext_state);
 }
 
+/*  Hook for get_or_create_region_for_heap_alloc for the case when we want
+   ptr_sval to mark a newly created region as assumed non null on malloc SM.  */
+void
+malloc_state_machine::transition_ptr_sval_non_null (region_model *model,
+    sm_state_map *smap,
+    const svalue *new_ptr_sval,
+    const extrinsic_state &ext_state) const
+{
+  smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL, ext_state);
+}
+
 } // anonymous namespace
 
 /* Internal interface to this file. */
@@ -2548,6 +2564,32 @@ region_model::on_realloc_with_move (const call_details &cd,
 				  *ext_state);
 }
 
+/* Moves ptr_sval from start to assumed non-null, for use by
+   region_model::get_or_create_region_for_heap_alloc.  */
+void
+region_model::transition_ptr_sval_non_null (region_model_context *ctxt,
+const svalue *ptr_sval)
+{
+  if (!ctxt)
+    return;
+  const extrinsic_state *ext_state = ctxt->get_ext_state ();
+  if (!ext_state)
+    return;
+
+  sm_state_map *smap;
+  const state_machine *sm;
+  unsigned sm_idx;
+  if (!ctxt->get_malloc_map (&smap, &sm, &sm_idx))
+    return;
+
+  gcc_assert (smap);
+  gcc_assert (sm);
+
+  const malloc_state_machine &malloc_sm = (const malloc_state_machine &)*sm;
+
+  malloc_sm.transition_ptr_sval_non_null (this, smap, ptr_sval, *ext_state);
+}
+
 } // namespace ana
 
 #endif /* #if ENABLE_ANALYZER */
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 9ecc42d4465..7cd72e8a886 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -55,6 +55,8 @@ static GTY (()) hash_map<tree, tree> *analyzer_stashed_globals;
 namespace ana
 {
 static tree pyobj_record = NULL_TREE;
+static tree pyobj_ptr_tree = NULL_TREE;
+static tree pyobj_ptr_ptr = NULL_TREE;
 static tree varobj_record = NULL_TREE;
 static tree pylistobj_record = NULL_TREE;
 static tree pylongobj_record = NULL_TREE;
@@ -76,6 +78,714 @@ get_field_by_name (tree type, const char *name)
   return NULL_TREE;
 }
 
+static const svalue *
+get_sizeof_pyobjptr (region_model_manager *mgr)
+{
+  tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree);
+  const svalue *sizeof_sval = mgr->get_or_create_constant_svalue (size_tree);
+  return sizeof_sval;
+}
+
+/* Update MODEL to set OB_BASE_REGION's ob_refcnt to 1.  */
+static void
+init_ob_refcnt_field (region_model_manager *mgr, region_model *model,
+                      const region *ob_base_region, tree pyobj_record,
+                      const call_details &cd)
+{
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
+  const svalue *refcnt_one_sval
+      = mgr->get_or_create_int_cst (size_type_node, 1);
+  model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ());
+}
+
+/* Update MODEL to set OB_BASE_REGION's ob_type to point to
+   PYTYPE_VAR_DECL_PTR.  */
+static void
+set_ob_type_field (region_model_manager *mgr, region_model *model,
+                   const region *ob_base_region, tree pyobj_record,
+                   tree pytype_var_decl_ptr, const call_details &cd)
+{
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pytype_var_decl_ptr);
+  tree pytype_var_decl_ptr_type
+      = build_pointer_type (TREE_TYPE (pytype_var_decl_ptr));
+  const svalue *pylist_type_ptr_sval
+      = mgr->get_ptr_svalue (pytype_var_decl_ptr_type, pylist_type_region);
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+  const region *ob_type_region
+      = mgr->get_field_region (ob_base_region, ob_type_field);
+  model->set_value (ob_type_region, pylist_type_ptr_sval, cd.get_ctxt ());
+}
+
+/* Retrieve the "ob_base" field's region from OBJECT_RECORD within
+   NEW_OBJECT_REGION and set its value in the MODEL to PYOBJ_SVALUE. */
+static const region *
+get_ob_base_region (region_model_manager *mgr, region_model *model,
+                   const region *new_object_region, tree object_record,
+                   const svalue *pyobj_svalue, const call_details &cd)
+{
+  tree ob_base_tree = get_field_by_name (object_record, "ob_base");
+  const region *ob_base_region
+      = mgr->get_field_region (new_object_region, ob_base_tree);
+  model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
+  return ob_base_region;
+}
+
+/* Initialize and retrieve a region within the MODEL for a PyObject 
+   and set its value to OBJECT_SVALUE. */
+static const region *
+init_pyobject_region (region_model_manager *mgr, region_model *model,
+                      const svalue *object_svalue, const call_details &cd)
+{
+  const region *pyobject_region = model->get_or_create_region_for_heap_alloc (
+      NULL, cd.get_ctxt (), true, &cd);
+  model->set_value (pyobject_region, object_svalue, cd.get_ctxt ());
+  return pyobject_region;
+}
+
+/* Increment the value of FIELD_REGION in the MODEL by 1. Optionally
+   capture the old and new svalues if OLD_SVAL and NEW_SVAL pointers are
+   provided. */
+static void
+inc_field_val (region_model_manager *mgr, region_model *model,
+               const region *field_region, const tree type_node,
+               const call_details &cd, const svalue **old_sval = nullptr,
+               const svalue **new_sval = nullptr)
+{
+  const svalue *tmp_old_sval
+      = model->get_store_value (field_region, cd.get_ctxt ());
+  const svalue *one_sval = mgr->get_or_create_int_cst (type_node, 1);
+  const svalue *tmp_new_sval = mgr->get_or_create_binop (
+      type_node, PLUS_EXPR, tmp_old_sval, one_sval);
+
+  model->set_value (field_region, tmp_new_sval, cd.get_ctxt ());
+
+  if (old_sval)
+    *old_sval = tmp_old_sval;
+
+  if (new_sval)
+    *new_sval = tmp_new_sval;
+}
+
+class pyobj_init_fail : public failed_call_info
+{
+public:
+  pyobj_init_fail (const call_details &cd) : failed_call_info (cd) {}
+
+  bool
+  update_model (region_model *model, const exploded_edge *,
+                region_model_context *ctxt) const final override
+  {
+    /* Return NULL; everything else is unchanged. */
+    const call_details cd (get_call_details (model, ctxt));
+    region_model_manager *mgr = cd.get_manager ();
+    if (cd.get_lhs_type ())
+      {
+        const svalue *zero
+            = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+        model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
+      }
+    return true;
+  }
+};
+
+/* Some concessions were made to
+simplify the analysis process when comparing kf_PyList_Append with the
+real implementation. In particular, PyList_Append performs some
+optimization internally to try and avoid calls to realloc if
+possible. For simplicity, we assume that realloc is called every time.
+Also, we grow the size by just 1 (to ensure enough space for adding a
+new element) rather than abide by the heuristics that the actual implementation
+follows. */
+class kf_PyList_Append : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 2 && cd.arg_is_pointer_p (0)
+            && cd.arg_is_pointer_p (1));
+  }
+  void impl_call_pre (const call_details &cd) const final override;
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_Append::impl_call_pre (const call_details &cd) const
+{
+  region_model_manager *mgr = cd.get_manager ();
+  region_model *model = cd.get_model ();
+
+  const svalue *pylist_sval = cd.get_arg_svalue (0);
+  const region *pylist_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  const svalue *newitem_sval = cd.get_arg_svalue (1);
+  const region *newitem_reg
+      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+  // Skip checks if unknown etc
+  if (pylist_sval->get_kind () != SK_REGION
+      && pylist_sval->get_kind () != SK_CONSTANT)
+    return;
+
+  // PyList_Check
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+  const region *ob_type_region
+      = mgr->get_field_region (pylist_reg, ob_type_field);
+  const svalue *stored_sval
+      = model->get_store_value (ob_type_region, cd.get_ctxt ());
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pylisttype_vardecl);
+  tree pylisttype_vardecl_ptr
+      = build_pointer_type (TREE_TYPE (pylisttype_vardecl));
+  const svalue *pylist_type_ptr
+      = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region);
+
+  if (stored_sval != pylist_type_ptr)
+    {
+      // TODO: emit diagnostic -Wanalyzer-type-error
+      cd.get_ctxt ()->terminate_path ();
+      return;
+    }
+
+  // Check that new_item is not null.
+  {
+    const svalue *null_ptr
+        = mgr->get_or_create_int_cst (newitem_sval->get_type (), 0);
+    if (!model->add_constraint (newitem_sval, NE_EXPR, null_ptr,
+                                cd.get_ctxt ()))
+      {
+        // TODO: emit diagnostic here
+        cd.get_ctxt ()->terminate_path ();
+        return;
+      }
+  }
+}
+
+void
+kf_PyList_Append::impl_call_post (const call_details &cd) const
+{
+  /* Three custom subclasses of custom_edge_info, for handling the various
+     outcomes of "realloc".  */
+
+  /* Concrete custom_edge_info: a realloc call that fails, returning NULL.
+   */
+  class realloc_failure : public failed_call_info
+  {
+  public:
+    realloc_failure (const call_details &cd) : failed_call_info (cd) {}
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      /* Identify ob_item field and set it to NULL. */
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_sval = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      model->set_value (ob_item_reg, null_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *neg_one
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), -1);
+          cd.maybe_set_lhs(neg_one);
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_no_move : public call_info
+  {
+  public:
+    realloc_success_no_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (
+          can_colorize, "when %qE succeeds, without moving underlying buffer",
+          get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *ob_size_sval = nullptr;
+      const svalue *new_size_sval = nullptr;
+      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
+                     &ob_size_sval, &new_size_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *num_allocated_bytes = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *ob_item_ptr_sval
+          = model->get_store_value (ob_item_region, cd.get_ctxt ());
+
+      /* We can only grow in place with a non-NULL pointer and no unknown
+       */
+      {
+        const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+        if (!model->add_constraint (ob_item_ptr_sval, NE_EXPR, null_ptr,
+                                    cd.get_ctxt ()))
+          {
+            return false;
+          }
+      }
+
+      const unmergeable_svalue *underlying_svalue
+          = ob_item_ptr_sval->dyn_cast_unmergeable_svalue ();
+      const svalue *target_svalue = nullptr;
+      const region_svalue *target_region_svalue = nullptr;
+
+      if (underlying_svalue)
+        {
+          target_svalue = underlying_svalue->get_arg ();
+          if (target_svalue->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+        }
+      else
+        {
+          if (ob_item_ptr_sval->get_kind () != SK_REGION)
+            {
+              return false;
+            }
+          target_svalue = ob_item_ptr_sval;
+        }
+
+      target_region_svalue = target_svalue->dyn_cast_region_svalue ();
+      const region *curr_reg = target_region_svalue->get_pointee ();
+
+      if (compat_types_p (num_allocated_bytes->get_type (), size_type_node))
+        model->set_dynamic_extents (curr_reg, num_allocated_bytes, ctxt);
+
+      model->set_value (ob_size_region, new_size_sval, ctxt);
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (curr_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      // Increment ob_refcnt of appended item.
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          cd.maybe_set_lhs(zero);
+        }
+      return true;
+    }
+  };
+
+  class realloc_success_move : public call_info
+  {
+  public:
+    realloc_success_move (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds, moving buffer",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+      const svalue *pylist_sval = cd.get_arg_svalue (0);
+      const region *pylist_reg = model->deref_rvalue (
+          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
+
+      const svalue *newitem_sval = cd.get_arg_svalue (1);
+      const region *newitem_reg = model->deref_rvalue (
+          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
+
+      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (pylist_reg, ob_size_field);
+      const svalue *old_ob_size_sval = nullptr;
+      const svalue *new_ob_size_sval = nullptr;
+      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
+                     &old_ob_size_sval, &new_ob_size_sval);
+
+      const svalue *sizeof_sval = mgr->get_or_create_cast (
+          old_ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+      const svalue *new_size_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, new_ob_size_sval);
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_reg
+          = mgr->get_field_region (pylist_reg, ob_item_field);
+      const svalue *old_ptr_sval
+          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
+
+      /* Create the new region.  */
+      const region *new_reg = model->get_or_create_region_for_heap_alloc (
+          new_size_sval, cd.get_ctxt ());
+      const svalue *new_ptr_sval
+          = mgr->get_ptr_svalue (pyobj_ptr_ptr, new_reg);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval,
+                                  cd.get_ctxt ()))
+        return false;
+
+      if (const region_svalue *old_reg
+          = old_ptr_sval->dyn_cast_region_svalue ())
+        {
+          const region *freed_reg = old_reg->get_pointee ();
+          const svalue *old_size_sval = model->get_dynamic_extents (freed_reg);
+          if (old_size_sval)
+            {
+              const svalue *copied_size_sval
+                  = get_copied_size (model, old_size_sval, new_size_sval);
+              const region *copied_old_reg = mgr->get_sized_region (
+                  freed_reg, pyobj_ptr_ptr, copied_size_sval);
+              const svalue *buffer_content_sval
+                  = model->get_store_value (copied_old_reg, cd.get_ctxt ());
+              const region *copied_new_reg = mgr->get_sized_region (
+                  new_reg, pyobj_ptr_ptr, copied_size_sval);
+              model->set_value (copied_new_reg, buffer_content_sval,
+                                cd.get_ctxt ());
+            }
+          else
+            {
+              model->mark_region_as_unknown (freed_reg, cd.get_uncertainty ());
+            }
+
+          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
+          model->unset_dynamic_extents (freed_reg);
+        }
+
+      const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+      if (!model->add_constraint (new_ptr_sval, NE_EXPR, null_ptr,
+                                  cd.get_ctxt ()))
+        return false;
+
+      model->set_value (ob_size_region, new_ob_size_sval, ctxt);
+      model->set_value (ob_item_reg, new_ptr_sval, cd.get_ctxt ());
+
+      const svalue *offset_sval = mgr->get_or_create_binop (
+          size_type_node, MULT_EXPR, sizeof_sval, old_ob_size_sval);
+      const region *element_region
+          = mgr->get_offset_region (new_reg, pyobj_ptr_ptr, offset_sval);
+      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
+
+      // Increment ob_refcnt of appended item.
+      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+      const region *ob_refcnt_region
+          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
+      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *zero
+              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
+          cd.maybe_set_lhs(zero);
+        }
+      return true;
+    }
+
+  private:
+    /* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL.
+       If unknown, OLD_SIZE_SVAL is returned.  */
+    const svalue *
+    get_copied_size (region_model *model, const svalue *old_size_sval,
+                     const svalue *new_size_sval) const
+    {
+      tristate res
+          = model->eval_condition (old_size_sval, GT_EXPR, new_size_sval);
+      switch (res.get_value ())
+        {
+        case tristate::TS_TRUE:
+          return new_size_sval;
+        case tristate::TS_FALSE:
+        case tristate::TS_UNKNOWN:
+          return old_size_sval;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  };
+
+  /* Body of kf_PyList_Append::impl_call_post.  */
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_failure> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_no_move> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_move> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyList_New : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && cd.arg_is_integral_p (0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyList_New::impl_call_post (const call_details &cd) const
+{
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *varobj_svalue
+          = mgr->get_or_create_unknown_svalue (varobj_record);
+      const svalue *pylist_svalue
+          = mgr->get_or_create_unknown_svalue (pylistobj_record);
+
+      const svalue *size_sval = cd.get_arg_svalue (0);
+
+      const region *pylist_region
+          = init_pyobject_region (mgr, model, pylist_svalue, cd);
+
+      /*
+      typedef struct
+      {
+        PyObject_VAR_HEAD
+        PyObject **ob_item;
+        Py_ssize_t allocated;
+      } PyListObject;
+      */
+      tree varobj_field = get_field_by_name (pylistobj_record, "ob_base");
+      const region *varobj_region
+          = mgr->get_field_region (pylist_region, varobj_field);
+      model->set_value (varobj_region, varobj_svalue, cd.get_ctxt ());
+
+      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
+      const region *ob_item_region
+          = mgr->get_field_region (pylist_region, ob_item_field);
+
+      const svalue *zero_sval = mgr->get_or_create_int_cst (size_type_node, 0);
+      const svalue *casted_size_sval
+          = mgr->get_or_create_cast (size_type_node, size_sval);
+      const svalue *size_cond_sval = mgr->get_or_create_binop (
+          size_type_node, LE_EXPR, casted_size_sval, zero_sval);
+
+      // if size <= 0, ob_item = NULL
+
+      if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (),
+                              integer_one_node))
+        {
+          const svalue *null_sval
+              = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
+          model->set_value (ob_item_region, null_sval, cd.get_ctxt ());
+        }
+      else // calloc
+        {
+          const svalue *sizeof_sval = mgr->get_or_create_cast (
+              size_sval->get_type (), get_sizeof_pyobjptr (mgr));
+          const svalue *prod_sval = mgr->get_or_create_binop (
+              size_type_node, MULT_EXPR, sizeof_sval, size_sval);
+          const region *ob_item_sized_region
+              = model->get_or_create_region_for_heap_alloc (prod_sval,
+                                                            cd.get_ctxt ());
+          model->zero_fill_region (ob_item_sized_region);
+          const svalue *ob_item_ptr_sval
+              = mgr->get_ptr_svalue (pyobj_ptr_ptr, ob_item_sized_region);
+          const svalue *ob_item_unmergeable
+              = mgr->get_or_create_unmergeable (ob_item_ptr_sval);
+          model->set_value (ob_item_region, ob_item_unmergeable,
+                            cd.get_ctxt ());
+        }
+
+      /*
+      typedef struct {
+      PyObject ob_base;
+      Py_ssize_t ob_size; // Number of items in variable part
+      } PyVarObject;
+      */
+      const region *ob_base_region = get_ob_base_region (
+          mgr, model, varobj_region, varobj_record, pyobj_svalue, cd);
+
+      tree ob_size_tree = get_field_by_name (varobj_record, "ob_size");
+      const region *ob_size_region
+          = mgr->get_field_region (varobj_region, ob_size_tree);
+      model->set_value (ob_size_region, size_sval, cd.get_ctxt ());
+
+      /*
+      typedef struct _object {
+          _PyObject_HEAD_EXTRA
+          Py_ssize_t ob_refcnt;
+          PyTypeObject *ob_type;
+      } PyObject;
+      */
+
+      // Initialize ob_refcnt field to 1.
+      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
+
+      // Get pointer svalue for PyList_Type then assign it to ob_type field.
+      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylisttype_vardecl, cd);
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylist_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
+class kf_PyLong_FromLong : public known_function
+{
+public:
+  bool
+  matches_call_types_p (const call_details &cd) const final override
+  {
+    return (cd.num_args () == 1 && cd.arg_is_integral_p (0));
+  }
+  void impl_call_post (const call_details &cd) const final override;
+};
+
+void
+kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
+{
+  class success : public call_info
+  {
+  public:
+    success (const call_details &cd) : call_info (cd) {}
+
+    label_text
+    get_desc (bool can_colorize) const final override
+    {
+      return make_label_text (can_colorize, "when %qE succeeds",
+                              get_fndecl ());
+    }
+
+    bool
+    update_model (region_model *model, const exploded_edge *,
+                  region_model_context *ctxt) const final override
+    {
+      const call_details cd (get_call_details (model, ctxt));
+      region_model_manager *mgr = cd.get_manager ();
+
+      const svalue *pyobj_svalue
+          = mgr->get_or_create_unknown_svalue (pyobj_record);
+      const svalue *pylongobj_sval
+          = mgr->get_or_create_unknown_svalue (pylongobj_record);
+
+      const region *pylong_region
+          = init_pyobject_region (mgr, model, pylongobj_sval, cd);
+
+      // Create a region for the base PyObject within the PyLongObject.
+      const region *ob_base_region = get_ob_base_region (
+          mgr, model, pylong_region, pylongobj_record, pyobj_svalue, cd);
+
+      // Initialize ob_refcnt field to 1.
+      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
+
+      // Get pointer svalue for PyLong_Type then assign it to ob_type field.
+      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylongtype_vardecl, cd);
+
+      // Set the PyLongObject value.
+      tree ob_digit_field = get_field_by_name (pylongobj_record, "ob_digit");
+      const region *ob_digit_region
+          = mgr->get_field_region (pylong_region, ob_digit_field);
+      const svalue *ob_digit_sval = cd.get_arg_svalue (0);
+      model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt ());
+
+      if (cd.get_lhs_type ())
+        {
+          const svalue *ptr_sval
+              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylong_region);
+          cd.maybe_set_lhs (ptr_sval);
+        }
+      return true;
+    }
+  };
+
+  if (cd.get_ctxt ())
+    {
+      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
+      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
+      cd.get_ctxt ()->terminate_path ();
+    }
+}
+
 static void
 maybe_stash_named_type (logger *logger, const translation_unit &tu,
                         const char *name)
@@ -179,6 +889,12 @@ init_py_structs ()
   pylongobj_record = get_stashed_type_by_name ("PyLongObject");
   pylongtype_vardecl = get_stashed_global_var_by_name ("PyLong_Type");
   pylisttype_vardecl = get_stashed_global_var_by_name ("PyList_Type");
+
+  if (pyobj_record)
+    {
+      pyobj_ptr_tree = build_pointer_type (pyobj_record);
+      pyobj_ptr_ptr = build_pointer_type (pyobj_ptr_tree);
+    }
 }
 
 void
@@ -205,6 +921,12 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
       sorry_no_cpython_plugin ();
       return;
     }
+
+  iface->register_known_function ("PyList_Append",
+                                  make_unique<kf_PyList_Append> ());
+  iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
+  iface->register_known_function ("PyLong_FromLong",
+                                  make_unique<kf_PyLong_FromLong> ());
 }
 } // namespace ana
 
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
new file mode 100644
index 00000000000..19b5c17428a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 09c45394b1f..e1ed2d2589e 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -161,7 +161,8 @@ set plugin_test_list [list \
 	  taint-CVE-2011-0521-6.c \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c } \
+	  cpython-plugin-test-1.c \
+	  cpython-plugin-test-2.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7004711b384..eda53ff3a09 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12559,3 +12559,28 @@ proc check_effective_target_const_volatile_readonly_section { } {
     }
   return 1
 }
+
+# Appends necessary Python flags to extra-tool-flags if Python.h is supported.
+# Otherwise, modifies dg-do-what.
+proc dg-require-python-h { args } {
+    upvar dg-extra-tool-flags extra-tool-flags
+
+    verbose "ENTER dg-require-python-h" 2
+
+    set result [remote_exec host "python3-config --includes"]
+    set status [lindex $result 0]
+    if { $status == 0 } {
+        set python_flags [lindex $result 1]
+    } else {
+	verbose "Python.h not supported" 2
+	upvar dg-do-what dg-do-what
+	set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+	return
+    }
+
+    verbose "Python flags are: $python_flags" 2
+
+    verbose "Before appending, extra-tool-flags: ${extra-tool-flags}" 3
+    eval lappend extra-tool-flags $python_flags
+    verbose "After appending, extra-tool-flags: ${extra-tool-flags}" 3
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]
  2023-08-11 17:47                                     ` [COMMITTED] " Eric Feng
@ 2023-08-11 20:23                                       ` Eric Feng
  2023-08-16 19:17                                         ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-11 20:23 UTC (permalink / raw)
  To: dmalcom; +Cc: gcc, gcc-patches

I've noticed there were still some strange indentations in the last
patch ... however, I think I've finally figured out a sane formatting
solution for me (fingers crossed). I will address them in the
follow-up patch at the same time as adding more test coverage.

---

In case, anyone else using VSCode has been having issues with
formatting according to GNU/GCC conventions, these are the relevant
formatting settings that I've found work for me. Assuming the C/C++
extension is installed, then in settings.json:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 8 }"

Just setting the base style to GNU formats everything correctly except
for the fact that indentation defaults to spaces (which is what I've
been struggling with fixing manually in the last few patches). The
rest of the settings are for replacing blocks of 8 spaces with tabs
(which is a requirement in check_GNU_style). In combination, this
works for everything except for header files for some reason, but I'll
defer that battle to another day.

On Fri, Aug 11, 2023 at 1:47 PM Eric Feng <ef2648@columbia.edu> wrote:
>
> Thanks for the feedback! I've incorporated the changes (aside from
> expanding test coverage, which I plan on releasing in a follow-up),
> rebased, and performed a bootstrap and regtest on
> aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
> with nits fixed and no problems after rebase, the patch has now been pushed.
>
> Best,
> Eric
>
> ---
>
> This patch adds known function subclasses for Python/C API functions
> PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
> optional parameters for
> region_model::get_or_create_region_for_heap_alloc, allowing for the
> newly allocated region to immediately transition from the start state to
> the assumed non-null state in the malloc state machine if desired.
> Finally, it adds a new procedure, dg-require-python-h, intended as a
> directive in Python-related analyzer tests, to append necessary Python
> flags during the tests' build process.
>
> The main warnings we gain in this patch with respect to the known function
> subclasses mentioned are leak related. For example:
>
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
> │
>    21 |   return list;
>       │
>       |          ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
>     |
> │
>     |    4 |   PyObject* item = PyLong_FromLong(10);
> │
>     |      |                    ^~~~~~~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (1) allocated here
> │
>     |      |                    (2) when ‘PyLong_FromLong’ succeeds
> │
>     |    5 |   PyObject* list = PyList_New(2);
> │
>     |      |                    ~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (3) when ‘PyList_New’ fails
> │
>     |......
> │
>     |   21 |   return list;
> │
>     |      |          ~~~~
> │
>     |      |          |
> │
>     |      |          (4) ‘item’ leaks here; was allocated at (1)
> │
>
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual implementation
> follows.
>
> gcc/analyzer/ChangeLog:
>         PR analyzer/107646
>         * call-details.h: New function.
>         * region-model.cc (region_model::get_or_create_region_for_heap_alloc):
>         New optional parameters.
>         * region-model.h (class region_model): New optional parameters.
>         * sm-malloc.cc (on_realloc_with_move): New function.
>         (region_model::transition_ptr_sval_non_null): New function.
>
> gcc/testsuite/ChangeLog:
>         PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
>         PyList_New, PyList_Append, PyLong_FromLong
>         * gcc.dg/plugin/plugin.exp: New test.
>         * lib/target-supports.exp: New procedure.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
>
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
> ---
>  gcc/analyzer/call-details.h                   |   4 +
>  gcc/analyzer/region-model.cc                  |  17 +-
>  gcc/analyzer/region-model.h                   |  14 +-
>  gcc/analyzer/sm-malloc.cc                     |  42 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++++++++++++++++++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        |   3 +-
>  gcc/testsuite/lib/target-supports.exp         |  25 +
>  8 files changed, 899 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>
> diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
> index 24be2247e63..bf2601151ea 100644
> --- a/gcc/analyzer/call-details.h
> +++ b/gcc/analyzer/call-details.h
> @@ -49,6 +49,10 @@ public:
>      return POINTER_TYPE_P (get_arg_type (idx));
>    }
>    bool arg_is_size_p (unsigned idx) const;
> +  bool arg_is_integral_p (unsigned idx) const
> +  {
> +    return INTEGRAL_TYPE_P (get_arg_type (idx));
> +  }
>
>    const gcall *get_call_stmt () const { return m_call; }
>    location_t get_location () const;
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 094b7af3dbc..aa9fe008b9d 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const svalue *size_in_bytes,
>     Use CTXT to complain about tainted sizes.
>
>     Reuse an existing heap_allocated_region if it's not being referenced by
> -   this region_model; otherwise create a new one.  */
> +   this region_model; otherwise create a new one.
> +
> +   Optionally (update_state_machine) transitions the pointer pointing to the
> +   heap_allocated_region from start to assumed non-null.  */
>
>  const region *
>  region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> -                                                  region_model_context *ctxt)
> +       region_model_context *ctxt,
> +       bool update_state_machine,
> +       const call_details *cd)
>  {
>    /* Determine which regions are referenced in this region_model, so that
>       we can reuse an existing heap_allocated_region if it's not in use on
> @@ -5017,6 +5022,14 @@ region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
>    if (size_in_bytes)
>      if (compat_types_p (size_in_bytes->get_type (), size_type_node))
>        set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> +       if (update_state_machine && cd)
> +               {
> +                       const svalue *ptr_sval
> +                       = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> +      transition_ptr_sval_non_null (ctxt, ptr_sval);
> +               }
> +
>    return reg;
>  }
>
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 0cf38714c96..a8acad8b7b2 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -387,9 +387,12 @@ class region_model
>                        region_model_context *ctxt,
>                        rejected_constraint **out);
>
> -  const region *
> -  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> -                                      region_model_context *ctxt);
> +       const region *
> +       get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> +                               region_model_context *ctxt,
> +                               bool update_state_machine = false,
> +                               const call_details *cd = nullptr);
> +
>    const region *create_region_for_alloca (const svalue *size_in_bytes,
>                                           region_model_context *ctxt);
>    void get_referenced_base_regions (auto_bitmap &out_ids) const;
> @@ -476,6 +479,11 @@ class region_model
>                              const svalue *old_ptr_sval,
>                              const svalue *new_ptr_sval);
>
> +  /* Implemented in sm-malloc.cc.  */
> +  void
> +  transition_ptr_sval_non_null (region_model_context *ctxt,
> +      const svalue *new_ptr_sval);
> +
>    /* Implemented in sm-taint.cc.  */
>    void mark_as_tainted (const svalue *sval,
>                         region_model_context *ctxt);
> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index a8c63eb1ce8..ec763254b29 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -434,6 +434,11 @@ public:
>                              const svalue *new_ptr_sval,
>                              const extrinsic_state &ext_state) const;
>
> +  void transition_ptr_sval_non_null (region_model *model,
> +      sm_state_map *smap,
> +      const svalue *new_ptr_sval,
> +      const extrinsic_state &ext_state) const;
> +
>    standard_deallocator_set m_free;
>    standard_deallocator_set m_scalar_delete;
>    standard_deallocator_set m_vector_delete;
> @@ -2504,6 +2509,17 @@ on_realloc_with_move (region_model *model,
>                    NULL, ext_state);
>  }
>
> +/*  Hook for get_or_create_region_for_heap_alloc for the case when we want
> +   ptr_sval to mark a newly created region as assumed non null on malloc SM.  */
> +void
> +malloc_state_machine::transition_ptr_sval_non_null (region_model *model,
> +    sm_state_map *smap,
> +    const svalue *new_ptr_sval,
> +    const extrinsic_state &ext_state) const
> +{
> +  smap->set_state (model, new_ptr_sval, m_free.m_nonnull, NULL, ext_state);
> +}
> +
>  } // anonymous namespace
>
>  /* Internal interface to this file. */
> @@ -2548,6 +2564,32 @@ region_model::on_realloc_with_move (const call_details &cd,
>                                   *ext_state);
>  }
>
> +/* Moves ptr_sval from start to assumed non-null, for use by
> +   region_model::get_or_create_region_for_heap_alloc.  */
> +void
> +region_model::transition_ptr_sval_non_null (region_model_context *ctxt,
> +const svalue *ptr_sval)
> +{
> +  if (!ctxt)
> +    return;
> +  const extrinsic_state *ext_state = ctxt->get_ext_state ();
> +  if (!ext_state)
> +    return;
> +
> +  sm_state_map *smap;
> +  const state_machine *sm;
> +  unsigned sm_idx;
> +  if (!ctxt->get_malloc_map (&smap, &sm, &sm_idx))
> +    return;
> +
> +  gcc_assert (smap);
> +  gcc_assert (sm);
> +
> +  const malloc_state_machine &malloc_sm = (const malloc_state_machine &)*sm;
> +
> +  malloc_sm.transition_ptr_sval_non_null (this, smap, ptr_sval, *ext_state);
> +}
> +
>  } // namespace ana
>
>  #endif /* #if ENABLE_ANALYZER */
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 9ecc42d4465..7cd72e8a886 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -55,6 +55,8 @@ static GTY (()) hash_map<tree, tree> *analyzer_stashed_globals;
>  namespace ana
>  {
>  static tree pyobj_record = NULL_TREE;
> +static tree pyobj_ptr_tree = NULL_TREE;
> +static tree pyobj_ptr_ptr = NULL_TREE;
>  static tree varobj_record = NULL_TREE;
>  static tree pylistobj_record = NULL_TREE;
>  static tree pylongobj_record = NULL_TREE;
> @@ -76,6 +78,714 @@ get_field_by_name (tree type, const char *name)
>    return NULL_TREE;
>  }
>
> +static const svalue *
> +get_sizeof_pyobjptr (region_model_manager *mgr)
> +{
> +  tree size_tree = TYPE_SIZE_UNIT (pyobj_ptr_tree);
> +  const svalue *sizeof_sval = mgr->get_or_create_constant_svalue (size_tree);
> +  return sizeof_sval;
> +}
> +
> +/* Update MODEL to set OB_BASE_REGION's ob_refcnt to 1.  */
> +static void
> +init_ob_refcnt_field (region_model_manager *mgr, region_model *model,
> +                      const region *ob_base_region, tree pyobj_record,
> +                      const call_details &cd)
> +{
> +  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +  const region *ob_refcnt_region
> +      = mgr->get_field_region (ob_base_region, ob_refcnt_tree);
> +  const svalue *refcnt_one_sval
> +      = mgr->get_or_create_int_cst (size_type_node, 1);
> +  model->set_value (ob_refcnt_region, refcnt_one_sval, cd.get_ctxt ());
> +}
> +
> +/* Update MODEL to set OB_BASE_REGION's ob_type to point to
> +   PYTYPE_VAR_DECL_PTR.  */
> +static void
> +set_ob_type_field (region_model_manager *mgr, region_model *model,
> +                   const region *ob_base_region, tree pyobj_record,
> +                   tree pytype_var_decl_ptr, const call_details &cd)
> +{
> +  const region *pylist_type_region
> +      = mgr->get_region_for_global (pytype_var_decl_ptr);
> +  tree pytype_var_decl_ptr_type
> +      = build_pointer_type (TREE_TYPE (pytype_var_decl_ptr));
> +  const svalue *pylist_type_ptr_sval
> +      = mgr->get_ptr_svalue (pytype_var_decl_ptr_type, pylist_type_region);
> +  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
> +  const region *ob_type_region
> +      = mgr->get_field_region (ob_base_region, ob_type_field);
> +  model->set_value (ob_type_region, pylist_type_ptr_sval, cd.get_ctxt ());
> +}
> +
> +/* Retrieve the "ob_base" field's region from OBJECT_RECORD within
> +   NEW_OBJECT_REGION and set its value in the MODEL to PYOBJ_SVALUE. */
> +static const region *
> +get_ob_base_region (region_model_manager *mgr, region_model *model,
> +                   const region *new_object_region, tree object_record,
> +                   const svalue *pyobj_svalue, const call_details &cd)
> +{
> +  tree ob_base_tree = get_field_by_name (object_record, "ob_base");
> +  const region *ob_base_region
> +      = mgr->get_field_region (new_object_region, ob_base_tree);
> +  model->set_value (ob_base_region, pyobj_svalue, cd.get_ctxt ());
> +  return ob_base_region;
> +}
> +
> +/* Initialize and retrieve a region within the MODEL for a PyObject
> +   and set its value to OBJECT_SVALUE. */
> +static const region *
> +init_pyobject_region (region_model_manager *mgr, region_model *model,
> +                      const svalue *object_svalue, const call_details &cd)
> +{
> +  const region *pyobject_region = model->get_or_create_region_for_heap_alloc (
> +      NULL, cd.get_ctxt (), true, &cd);
> +  model->set_value (pyobject_region, object_svalue, cd.get_ctxt ());
> +  return pyobject_region;
> +}
> +
> +/* Increment the value of FIELD_REGION in the MODEL by 1. Optionally
> +   capture the old and new svalues if OLD_SVAL and NEW_SVAL pointers are
> +   provided. */
> +static void
> +inc_field_val (region_model_manager *mgr, region_model *model,
> +               const region *field_region, const tree type_node,
> +               const call_details &cd, const svalue **old_sval = nullptr,
> +               const svalue **new_sval = nullptr)
> +{
> +  const svalue *tmp_old_sval
> +      = model->get_store_value (field_region, cd.get_ctxt ());
> +  const svalue *one_sval = mgr->get_or_create_int_cst (type_node, 1);
> +  const svalue *tmp_new_sval = mgr->get_or_create_binop (
> +      type_node, PLUS_EXPR, tmp_old_sval, one_sval);
> +
> +  model->set_value (field_region, tmp_new_sval, cd.get_ctxt ());
> +
> +  if (old_sval)
> +    *old_sval = tmp_old_sval;
> +
> +  if (new_sval)
> +    *new_sval = tmp_new_sval;
> +}
> +
> +class pyobj_init_fail : public failed_call_info
> +{
> +public:
> +  pyobj_init_fail (const call_details &cd) : failed_call_info (cd) {}
> +
> +  bool
> +  update_model (region_model *model, const exploded_edge *,
> +                region_model_context *ctxt) const final override
> +  {
> +    /* Return NULL; everything else is unchanged. */
> +    const call_details cd (get_call_details (model, ctxt));
> +    region_model_manager *mgr = cd.get_manager ();
> +    if (cd.get_lhs_type ())
> +      {
> +        const svalue *zero
> +            = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
> +        model->set_value (cd.get_lhs_region (), zero, cd.get_ctxt ());
> +      }
> +    return true;
> +  }
> +};
> +
> +/* Some concessions were made to
> +simplify the analysis process when comparing kf_PyList_Append with the
> +real implementation. In particular, PyList_Append performs some
> +optimization internally to try and avoid calls to realloc if
> +possible. For simplicity, we assume that realloc is called every time.
> +Also, we grow the size by just 1 (to ensure enough space for adding a
> +new element) rather than abide by the heuristics that the actual implementation
> +follows. */
> +class kf_PyList_Append : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 2 && cd.arg_is_pointer_p (0)
> +            && cd.arg_is_pointer_p (1));
> +  }
> +  void impl_call_pre (const call_details &cd) const final override;
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +
> +void
> +kf_PyList_Append::impl_call_pre (const call_details &cd) const
> +{
> +  region_model_manager *mgr = cd.get_manager ();
> +  region_model *model = cd.get_model ();
> +
> +  const svalue *pylist_sval = cd.get_arg_svalue (0);
> +  const region *pylist_reg
> +      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
> +
> +  const svalue *newitem_sval = cd.get_arg_svalue (1);
> +  const region *newitem_reg
> +      = model->deref_rvalue (pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
> +
> +  // Skip checks if unknown etc
> +  if (pylist_sval->get_kind () != SK_REGION
> +      && pylist_sval->get_kind () != SK_CONSTANT)
> +    return;
> +
> +  // PyList_Check
> +  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
> +  const region *ob_type_region
> +      = mgr->get_field_region (pylist_reg, ob_type_field);
> +  const svalue *stored_sval
> +      = model->get_store_value (ob_type_region, cd.get_ctxt ());
> +  const region *pylist_type_region
> +      = mgr->get_region_for_global (pylisttype_vardecl);
> +  tree pylisttype_vardecl_ptr
> +      = build_pointer_type (TREE_TYPE (pylisttype_vardecl));
> +  const svalue *pylist_type_ptr
> +      = mgr->get_ptr_svalue (pylisttype_vardecl_ptr, pylist_type_region);
> +
> +  if (stored_sval != pylist_type_ptr)
> +    {
> +      // TODO: emit diagnostic -Wanalyzer-type-error
> +      cd.get_ctxt ()->terminate_path ();
> +      return;
> +    }
> +
> +  // Check that new_item is not null.
> +  {
> +    const svalue *null_ptr
> +        = mgr->get_or_create_int_cst (newitem_sval->get_type (), 0);
> +    if (!model->add_constraint (newitem_sval, NE_EXPR, null_ptr,
> +                                cd.get_ctxt ()))
> +      {
> +        // TODO: emit diagnostic here
> +        cd.get_ctxt ()->terminate_path ();
> +        return;
> +      }
> +  }
> +}
> +
> +void
> +kf_PyList_Append::impl_call_post (const call_details &cd) const
> +{
> +  /* Three custom subclasses of custom_edge_info, for handling the various
> +     outcomes of "realloc".  */
> +
> +  /* Concrete custom_edge_info: a realloc call that fails, returning NULL.
> +   */
> +  class realloc_failure : public failed_call_info
> +  {
> +  public:
> +    realloc_failure (const call_details &cd) : failed_call_info (cd) {}
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pylist_sval = cd.get_arg_svalue (0);
> +      const region *pylist_reg = model->deref_rvalue (
> +          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
> +
> +      /* Identify ob_item field and set it to NULL. */
> +      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
> +      const region *ob_item_reg
> +          = mgr->get_field_region (pylist_reg, ob_item_field);
> +      const svalue *old_ptr_sval
> +          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
> +
> +      if (const region_svalue *old_reg
> +          = old_ptr_sval->dyn_cast_region_svalue ())
> +        {
> +          const region *freed_reg = old_reg->get_pointee ();
> +          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
> +          model->unset_dynamic_extents (freed_reg);
> +        }
> +
> +      const svalue *null_sval = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
> +      model->set_value (ob_item_reg, null_sval, cd.get_ctxt ());
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *neg_one
> +              = mgr->get_or_create_int_cst (cd.get_lhs_type (), -1);
> +          cd.maybe_set_lhs(neg_one);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  class realloc_success_no_move : public call_info
> +  {
> +  public:
> +    realloc_success_no_move (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (
> +          can_colorize, "when %qE succeeds, without moving underlying buffer",
> +          get_fndecl ());
> +    }
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pylist_sval = cd.get_arg_svalue (0);
> +      const region *pylist_reg = model->deref_rvalue (
> +          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
> +
> +      const svalue *newitem_sval = cd.get_arg_svalue (1);
> +      const region *newitem_reg = model->deref_rvalue (
> +          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
> +
> +      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
> +      const region *ob_size_region
> +          = mgr->get_field_region (pylist_reg, ob_size_field);
> +      const svalue *ob_size_sval = nullptr;
> +      const svalue *new_size_sval = nullptr;
> +      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
> +                     &ob_size_sval, &new_size_sval);
> +
> +      const svalue *sizeof_sval = mgr->get_or_create_cast (
> +          ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
> +      const svalue *num_allocated_bytes = mgr->get_or_create_binop (
> +          size_type_node, MULT_EXPR, sizeof_sval, new_size_sval);
> +
> +      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
> +      const region *ob_item_region
> +          = mgr->get_field_region (pylist_reg, ob_item_field);
> +      const svalue *ob_item_ptr_sval
> +          = model->get_store_value (ob_item_region, cd.get_ctxt ());
> +
> +      /* We can only grow in place with a non-NULL pointer and no unknown
> +       */
> +      {
> +        const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
> +        if (!model->add_constraint (ob_item_ptr_sval, NE_EXPR, null_ptr,
> +                                    cd.get_ctxt ()))
> +          {
> +            return false;
> +          }
> +      }
> +
> +      const unmergeable_svalue *underlying_svalue
> +          = ob_item_ptr_sval->dyn_cast_unmergeable_svalue ();
> +      const svalue *target_svalue = nullptr;
> +      const region_svalue *target_region_svalue = nullptr;
> +
> +      if (underlying_svalue)
> +        {
> +          target_svalue = underlying_svalue->get_arg ();
> +          if (target_svalue->get_kind () != SK_REGION)
> +            {
> +              return false;
> +            }
> +        }
> +      else
> +        {
> +          if (ob_item_ptr_sval->get_kind () != SK_REGION)
> +            {
> +              return false;
> +            }
> +          target_svalue = ob_item_ptr_sval;
> +        }
> +
> +      target_region_svalue = target_svalue->dyn_cast_region_svalue ();
> +      const region *curr_reg = target_region_svalue->get_pointee ();
> +
> +      if (compat_types_p (num_allocated_bytes->get_type (), size_type_node))
> +        model->set_dynamic_extents (curr_reg, num_allocated_bytes, ctxt);
> +
> +      model->set_value (ob_size_region, new_size_sval, ctxt);
> +
> +      const svalue *offset_sval = mgr->get_or_create_binop (
> +          size_type_node, MULT_EXPR, sizeof_sval, ob_size_sval);
> +      const region *element_region
> +          = mgr->get_offset_region (curr_reg, pyobj_ptr_ptr, offset_sval);
> +      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
> +
> +      // Increment ob_refcnt of appended item.
> +      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +      const region *ob_refcnt_region
> +          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
> +      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *zero
> +              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
> +          cd.maybe_set_lhs(zero);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  class realloc_success_move : public call_info
> +  {
> +  public:
> +    realloc_success_move (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (can_colorize, "when %qE succeeds, moving buffer",
> +                              get_fndecl ());
> +    }
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +      const svalue *pylist_sval = cd.get_arg_svalue (0);
> +      const region *pylist_reg = model->deref_rvalue (
> +          pylist_sval, cd.get_arg_tree (0), cd.get_ctxt ());
> +
> +      const svalue *newitem_sval = cd.get_arg_svalue (1);
> +      const region *newitem_reg = model->deref_rvalue (
> +          newitem_sval, cd.get_arg_tree (1), cd.get_ctxt ());
> +
> +      tree ob_size_field = get_field_by_name (varobj_record, "ob_size");
> +      const region *ob_size_region
> +          = mgr->get_field_region (pylist_reg, ob_size_field);
> +      const svalue *old_ob_size_sval = nullptr;
> +      const svalue *new_ob_size_sval = nullptr;
> +      inc_field_val (mgr, model, ob_size_region, integer_type_node, cd,
> +                     &old_ob_size_sval, &new_ob_size_sval);
> +
> +      const svalue *sizeof_sval = mgr->get_or_create_cast (
> +          old_ob_size_sval->get_type (), get_sizeof_pyobjptr (mgr));
> +      const svalue *new_size_sval = mgr->get_or_create_binop (
> +          size_type_node, MULT_EXPR, sizeof_sval, new_ob_size_sval);
> +
> +      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
> +      const region *ob_item_reg
> +          = mgr->get_field_region (pylist_reg, ob_item_field);
> +      const svalue *old_ptr_sval
> +          = model->get_store_value (ob_item_reg, cd.get_ctxt ());
> +
> +      /* Create the new region.  */
> +      const region *new_reg = model->get_or_create_region_for_heap_alloc (
> +          new_size_sval, cd.get_ctxt ());
> +      const svalue *new_ptr_sval
> +          = mgr->get_ptr_svalue (pyobj_ptr_ptr, new_reg);
> +      if (!model->add_constraint (new_ptr_sval, NE_EXPR, old_ptr_sval,
> +                                  cd.get_ctxt ()))
> +        return false;
> +
> +      if (const region_svalue *old_reg
> +          = old_ptr_sval->dyn_cast_region_svalue ())
> +        {
> +          const region *freed_reg = old_reg->get_pointee ();
> +          const svalue *old_size_sval = model->get_dynamic_extents (freed_reg);
> +          if (old_size_sval)
> +            {
> +              const svalue *copied_size_sval
> +                  = get_copied_size (model, old_size_sval, new_size_sval);
> +              const region *copied_old_reg = mgr->get_sized_region (
> +                  freed_reg, pyobj_ptr_ptr, copied_size_sval);
> +              const svalue *buffer_content_sval
> +                  = model->get_store_value (copied_old_reg, cd.get_ctxt ());
> +              const region *copied_new_reg = mgr->get_sized_region (
> +                  new_reg, pyobj_ptr_ptr, copied_size_sval);
> +              model->set_value (copied_new_reg, buffer_content_sval,
> +                                cd.get_ctxt ());
> +            }
> +          else
> +            {
> +              model->mark_region_as_unknown (freed_reg, cd.get_uncertainty ());
> +            }
> +
> +          model->unbind_region_and_descendents (freed_reg, POISON_KIND_FREED);
> +          model->unset_dynamic_extents (freed_reg);
> +        }
> +
> +      const svalue *null_ptr = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
> +      if (!model->add_constraint (new_ptr_sval, NE_EXPR, null_ptr,
> +                                  cd.get_ctxt ()))
> +        return false;
> +
> +      model->set_value (ob_size_region, new_ob_size_sval, ctxt);
> +      model->set_value (ob_item_reg, new_ptr_sval, cd.get_ctxt ());
> +
> +      const svalue *offset_sval = mgr->get_or_create_binop (
> +          size_type_node, MULT_EXPR, sizeof_sval, old_ob_size_sval);
> +      const region *element_region
> +          = mgr->get_offset_region (new_reg, pyobj_ptr_ptr, offset_sval);
> +      model->set_value (element_region, newitem_sval, cd.get_ctxt ());
> +
> +      // Increment ob_refcnt of appended item.
> +      tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +      const region *ob_refcnt_region
> +          = mgr->get_field_region (newitem_reg, ob_refcnt_tree);
> +      inc_field_val (mgr, model, ob_refcnt_region, size_type_node, cd);
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *zero
> +              = mgr->get_or_create_int_cst (cd.get_lhs_type (), 0);
> +          cd.maybe_set_lhs(zero);
> +        }
> +      return true;
> +    }
> +
> +  private:
> +    /* Return the lesser of OLD_SIZE_SVAL and NEW_SIZE_SVAL.
> +       If unknown, OLD_SIZE_SVAL is returned.  */
> +    const svalue *
> +    get_copied_size (region_model *model, const svalue *old_size_sval,
> +                     const svalue *new_size_sval) const
> +    {
> +      tristate res
> +          = model->eval_condition (old_size_sval, GT_EXPR, new_size_sval);
> +      switch (res.get_value ())
> +        {
> +        case tristate::TS_TRUE:
> +          return new_size_sval;
> +        case tristate::TS_FALSE:
> +        case tristate::TS_UNKNOWN:
> +          return old_size_sval;
> +        default:
> +          gcc_unreachable ();
> +        }
> +    }
> +  };
> +
> +  /* Body of kf_PyList_Append::impl_call_post.  */
> +  if (cd.get_ctxt ())
> +    {
> +      cd.get_ctxt ()->bifurcate (make_unique<realloc_failure> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_no_move> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<realloc_success_move> (cd));
> +      cd.get_ctxt ()->terminate_path ();
> +    }
> +}
> +
> +class kf_PyList_New : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && cd.arg_is_integral_p (0));
> +  }
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +
> +void
> +kf_PyList_New::impl_call_post (const call_details &cd) const
> +{
> +  class success : public call_info
> +  {
> +  public:
> +    success (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (can_colorize, "when %qE succeeds",
> +                              get_fndecl ());
> +    }
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pyobj_svalue
> +          = mgr->get_or_create_unknown_svalue (pyobj_record);
> +      const svalue *varobj_svalue
> +          = mgr->get_or_create_unknown_svalue (varobj_record);
> +      const svalue *pylist_svalue
> +          = mgr->get_or_create_unknown_svalue (pylistobj_record);
> +
> +      const svalue *size_sval = cd.get_arg_svalue (0);
> +
> +      const region *pylist_region
> +          = init_pyobject_region (mgr, model, pylist_svalue, cd);
> +
> +      /*
> +      typedef struct
> +      {
> +        PyObject_VAR_HEAD
> +        PyObject **ob_item;
> +        Py_ssize_t allocated;
> +      } PyListObject;
> +      */
> +      tree varobj_field = get_field_by_name (pylistobj_record, "ob_base");
> +      const region *varobj_region
> +          = mgr->get_field_region (pylist_region, varobj_field);
> +      model->set_value (varobj_region, varobj_svalue, cd.get_ctxt ());
> +
> +      tree ob_item_field = get_field_by_name (pylistobj_record, "ob_item");
> +      const region *ob_item_region
> +          = mgr->get_field_region (pylist_region, ob_item_field);
> +
> +      const svalue *zero_sval = mgr->get_or_create_int_cst (size_type_node, 0);
> +      const svalue *casted_size_sval
> +          = mgr->get_or_create_cast (size_type_node, size_sval);
> +      const svalue *size_cond_sval = mgr->get_or_create_binop (
> +          size_type_node, LE_EXPR, casted_size_sval, zero_sval);
> +
> +      // if size <= 0, ob_item = NULL
> +
> +      if (tree_int_cst_equal (size_cond_sval->maybe_get_constant (),
> +                              integer_one_node))
> +        {
> +          const svalue *null_sval
> +              = mgr->get_or_create_null_ptr (pyobj_ptr_ptr);
> +          model->set_value (ob_item_region, null_sval, cd.get_ctxt ());
> +        }
> +      else // calloc
> +        {
> +          const svalue *sizeof_sval = mgr->get_or_create_cast (
> +              size_sval->get_type (), get_sizeof_pyobjptr (mgr));
> +          const svalue *prod_sval = mgr->get_or_create_binop (
> +              size_type_node, MULT_EXPR, sizeof_sval, size_sval);
> +          const region *ob_item_sized_region
> +              = model->get_or_create_region_for_heap_alloc (prod_sval,
> +                                                            cd.get_ctxt ());
> +          model->zero_fill_region (ob_item_sized_region);
> +          const svalue *ob_item_ptr_sval
> +              = mgr->get_ptr_svalue (pyobj_ptr_ptr, ob_item_sized_region);
> +          const svalue *ob_item_unmergeable
> +              = mgr->get_or_create_unmergeable (ob_item_ptr_sval);
> +          model->set_value (ob_item_region, ob_item_unmergeable,
> +                            cd.get_ctxt ());
> +        }
> +
> +      /*
> +      typedef struct {
> +      PyObject ob_base;
> +      Py_ssize_t ob_size; // Number of items in variable part
> +      } PyVarObject;
> +      */
> +      const region *ob_base_region = get_ob_base_region (
> +          mgr, model, varobj_region, varobj_record, pyobj_svalue, cd);
> +
> +      tree ob_size_tree = get_field_by_name (varobj_record, "ob_size");
> +      const region *ob_size_region
> +          = mgr->get_field_region (varobj_region, ob_size_tree);
> +      model->set_value (ob_size_region, size_sval, cd.get_ctxt ());
> +
> +      /*
> +      typedef struct _object {
> +          _PyObject_HEAD_EXTRA
> +          Py_ssize_t ob_refcnt;
> +          PyTypeObject *ob_type;
> +      } PyObject;
> +      */
> +
> +      // Initialize ob_refcnt field to 1.
> +      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
> +
> +      // Get pointer svalue for PyList_Type then assign it to ob_type field.
> +      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylisttype_vardecl, cd);
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *ptr_sval
> +              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylist_region);
> +          cd.maybe_set_lhs (ptr_sval);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  if (cd.get_ctxt ())
> +    {
> +      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
> +      cd.get_ctxt ()->terminate_path ();
> +    }
> +}
> +
> +class kf_PyLong_FromLong : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && cd.arg_is_integral_p (0));
> +  }
> +  void impl_call_post (const call_details &cd) const final override;
> +};
> +
> +void
> +kf_PyLong_FromLong::impl_call_post (const call_details &cd) const
> +{
> +  class success : public call_info
> +  {
> +  public:
> +    success (const call_details &cd) : call_info (cd) {}
> +
> +    label_text
> +    get_desc (bool can_colorize) const final override
> +    {
> +      return make_label_text (can_colorize, "when %qE succeeds",
> +                              get_fndecl ());
> +    }
> +
> +    bool
> +    update_model (region_model *model, const exploded_edge *,
> +                  region_model_context *ctxt) const final override
> +    {
> +      const call_details cd (get_call_details (model, ctxt));
> +      region_model_manager *mgr = cd.get_manager ();
> +
> +      const svalue *pyobj_svalue
> +          = mgr->get_or_create_unknown_svalue (pyobj_record);
> +      const svalue *pylongobj_sval
> +          = mgr->get_or_create_unknown_svalue (pylongobj_record);
> +
> +      const region *pylong_region
> +          = init_pyobject_region (mgr, model, pylongobj_sval, cd);
> +
> +      // Create a region for the base PyObject within the PyLongObject.
> +      const region *ob_base_region = get_ob_base_region (
> +          mgr, model, pylong_region, pylongobj_record, pyobj_svalue, cd);
> +
> +      // Initialize ob_refcnt field to 1.
> +      init_ob_refcnt_field(mgr, model, ob_base_region, pyobj_record, cd);
> +
> +      // Get pointer svalue for PyLong_Type then assign it to ob_type field.
> +      set_ob_type_field(mgr, model, ob_base_region, pyobj_record, pylongtype_vardecl, cd);
> +
> +      // Set the PyLongObject value.
> +      tree ob_digit_field = get_field_by_name (pylongobj_record, "ob_digit");
> +      const region *ob_digit_region
> +          = mgr->get_field_region (pylong_region, ob_digit_field);
> +      const svalue *ob_digit_sval = cd.get_arg_svalue (0);
> +      model->set_value (ob_digit_region, ob_digit_sval, cd.get_ctxt ());
> +
> +      if (cd.get_lhs_type ())
> +        {
> +          const svalue *ptr_sval
> +              = mgr->get_ptr_svalue (cd.get_lhs_type (), pylong_region);
> +          cd.maybe_set_lhs (ptr_sval);
> +        }
> +      return true;
> +    }
> +  };
> +
> +  if (cd.get_ctxt ())
> +    {
> +      cd.get_ctxt ()->bifurcate (make_unique<pyobj_init_fail> (cd));
> +      cd.get_ctxt ()->bifurcate (make_unique<success> (cd));
> +      cd.get_ctxt ()->terminate_path ();
> +    }
> +}
> +
>  static void
>  maybe_stash_named_type (logger *logger, const translation_unit &tu,
>                          const char *name)
> @@ -179,6 +889,12 @@ init_py_structs ()
>    pylongobj_record = get_stashed_type_by_name ("PyLongObject");
>    pylongtype_vardecl = get_stashed_global_var_by_name ("PyLong_Type");
>    pylisttype_vardecl = get_stashed_global_var_by_name ("PyList_Type");
> +
> +  if (pyobj_record)
> +    {
> +      pyobj_ptr_tree = build_pointer_type (pyobj_record);
> +      pyobj_ptr_ptr = build_pointer_type (pyobj_ptr_tree);
> +    }
>  }
>
>  void
> @@ -205,6 +921,12 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
>        sorry_no_cpython_plugin ();
>        return;
>      }
> +
> +  iface->register_known_function ("PyList_Append",
> +                                  make_unique<kf_PyList_Append> ());
> +  iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
> +  iface->register_known_function ("PyLong_FromLong",
> +                                  make_unique<kf_PyLong_FromLong> ());
>  }
>  } // namespace ana
>
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> new file mode 100644
> index 00000000000..19b5c17428a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyList_New (Py_ssize_t len)
> +{
> +  PyObject *obj = PyList_New (len);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +PyObject *
> +test_PyLong_New (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +PyObject *
> +test_PyListAppend (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, item);
> +  return list; /* { dg-warning "leak of 'item'" } */
> +}
> +
> +PyObject *
> +test_PyListAppend_2 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  if (!item)
> +       return NULL;
> +
> +  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  PyObject *list = PyList_New (n);
> +  if (!list)
> +  {
> +       Py_DECREF(item);
> +       return NULL;
> +  }
> +
> +  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +
> +  if (PyList_Append (list, item) < 0)
> +    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  else
> +    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
> +  return list; /* { dg-warning "leak of 'item'" } */
> +}
> +
> +
> +PyObject *
> +test_PyListAppend_3 (PyObject *item, PyObject *list)
> +{
> +  PyList_Append (list, item);
> +  return list;
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index 09c45394b1f..e1ed2d2589e 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,7 +161,8 @@ set plugin_test_list [list \
>           taint-CVE-2011-0521-6.c \
>           taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -         cpython-plugin-test-1.c } \
> +         cpython-plugin-test-1.c \
> +         cpython-plugin-test-2.c } \
>  ]
>
>  foreach plugin_test $plugin_test_list {
> diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
> index 7004711b384..eda53ff3a09 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -12559,3 +12559,28 @@ proc check_effective_target_const_volatile_readonly_section { } {
>      }
>    return 1
>  }
> +
> +# Appends necessary Python flags to extra-tool-flags if Python.h is supported.
> +# Otherwise, modifies dg-do-what.
> +proc dg-require-python-h { args } {
> +    upvar dg-extra-tool-flags extra-tool-flags
> +
> +    verbose "ENTER dg-require-python-h" 2
> +
> +    set result [remote_exec host "python3-config --includes"]
> +    set status [lindex $result 0]
> +    if { $status == 0 } {
> +        set python_flags [lindex $result 1]
> +    } else {
> +       verbose "Python.h not supported" 2
> +       upvar dg-do-what dg-do-what
> +       set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
> +       return
> +    }
> +
> +    verbose "Python flags are: $python_flags" 2
> +
> +    verbose "Before appending, extra-tool-flags: ${extra-tool-flags}" 3
> +    eval lappend extra-tool-flags $python_flags
> +    verbose "After appending, extra-tool-flags: ${extra-tool-flags}" 3
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Update on CPython Extension Module -fanalyzer plugin development
  2023-08-11 20:23                                       ` Eric Feng
@ 2023-08-16 19:17                                         ` Eric Feng
  2023-08-16 21:28                                           ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-16 19:17 UTC (permalink / raw)
  To: dmalcom; +Cc: gcc, Eric Feng

Hi everyone,

After pushing the code that supports various known function classes last week,
I've turned my attention back to the core reference count checking 
functionality. This functionality used to reside in region_model, which 
wasn't ideal. To address this, I've introduced a hook to register callbacks 
to pop_frame. Specifically, this allows the code that checks the reference 
count and emits diagnostics to be housed within the plugin, rather than the 
core analyzer.

As of now, the parameters of pop_frame_callback are tailored specifically to 
our needs. If the use of callbacks at the end of pop_frame becomes more 
prevalent, we can revisit the setup to potentially make it more general.

Moreover, the core reference count checking logic was previously somewhat 
bloated, contained in one extensive function. I've since refactored it, 
breaking it down into several helper functions to simplify and reduce 
complexity. There are still some aspects that need refinement, especially 
since the plugin has seen changes since I last worked on this logic. However, 
I believe that there aren't any significant problems.

Currently, I've started working a custom stmt_finder similar to leak_stmt_finder 
to address the issue of m_stmt and m_stmt_finder being NULL at the time of 
region_model::pop_frame. This approach was discussed as a viable solution in 
a previous email, and I'll keep everyone posted on my progress. Afterwards, I 
will go back to address the refinements necessary mentioned above.

For those interested, I've attached a WIP patch that highlights the specific 
changes mentioned above.

Best,
Eric

gcc/analyzer/ChangeLog:

	* region-model.cc (region_model::pop_frame): New callback
	* mechanism.
	* region-model.h (struct append_regions_cb_data): New variables.
	(class region_model): New functions and variables.

gcc/testsuite/ChangeLog:

	* gcc.dg/plugin/analyzer_cpython_plugin.c: New functions on
	* reference count checking.

---
 gcc/analyzer/region-model.cc                  |   3 +
 gcc/analyzer/region-model.h                   |  19 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 234 +++++++++++++++++-
 3 files changed, 254 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 494a9cdf149..18cea279e53 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -4813,6 +4815,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4f09f2e585a..2fe6a60f7ba 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+				    const svalue *retval,
+				    region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
    a tree of regions, along with their associated values.
    The representation is graph-like because values can be pointers to
@@ -505,6 +509,20 @@ class region_model
   void check_for_null_terminated_string_arg (const call_details &cd,
 					     unsigned idx);
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback &callback)
+  {
+    pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+		       region_model_context *ctxt)
+  {
+    for (auto &callback : pop_frame_callbacks)
+	callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -592,6 +610,7 @@ private:
 						tree callee_fndecl,
 						region_model_context *ctxt) const;
 
+  static auto_vec<pop_frame_callback> pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..918bb5a5587 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"
+#include "analyzer/exploded-graph.h"
 #include "make-unique.h"
 
 int plugin_is_GPL_compatible;
@@ -191,6 +192,234 @@ public:
   }
 };
 
+class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
+{
+public:
+  refcnt_mismatch (const region *base_region,
+				const svalue *ob_refcnt,
+				const svalue *actual_refcnt)
+      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
+	m_actual_refcnt (actual_refcnt)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "refcnt_mismatch";
+  }
+
+  bool
+  operator== (const refcnt_mismatch &other) const
+  {
+    return (m_base_region == other.m_base_region
+	    && m_ob_refcnt == other.m_ob_refcnt
+	    && m_actual_refcnt == other.m_actual_refcnt);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return 0;
+  }
+
+  bool
+  emit (rich_location *rich_loc, logger *) final override
+  {
+    diagnostic_metadata m;
+    bool warned;
+    warned = warning_meta (rich_loc, m, get_controlling_option (),
+			   "REF COUNT PROBLEM");
+    return warned;
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_base_region)
+      interest->add_region_creation (m_base_region);
+  }
+
+private:
+  const region *m_base_region;
+  const svalue *m_ob_refcnt;
+  const svalue *m_actual_refcnt;
+};
+
+/* Checks if the given region is heap allocated. */
+bool
+is_heap_allocated (const region *base_reg)
+{
+  return base_reg->get_kind () == RK_HEAP_ALLOCATED;
+}
+
+/* Increments the actual reference count if the current region matches the base
+ * region. */
+void
+increment_count_if_base_matches (const region *curr_region,
+				  const region *base_reg, int &actual_refcnt)
+{
+  if (curr_region->get_base_region () == base_reg)
+    actual_refcnt++;
+}
+
+/* For PyListObjects: processes the ob_item field within the current region and
+ * increments the reference count if conditions are met. */
+void
+process_ob_item_region (const region_model *model, region_model_manager *mgr,
+			region_model_context *ctxt, const region *curr_region,
+			const svalue *pylist_type_ptr, const region *base_reg,
+			int &actual_refcnt)
+{
+  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
+  const region *ob_item_field_reg
+      = mgr->get_field_region (curr_region, ob_item_field_tree);
+  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, ctxt);
+
+  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
+    {
+      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
+      const svalue *allocated_bytes = model->get_dynamic_extents (ob_item_reg);
+      const region *ob_item_sized = mgr->get_sized_region (
+	  ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
+      const svalue *buffer_contents_sval
+	  = model->get_store_value (ob_item_sized, ctxt);
+
+      if (const auto &buffer_contents
+	  = buffer_contents_sval->dyn_cast_compound_svalue ())
+	{
+	  for (const auto &buffer_content : buffer_contents->get_map ())
+	    {
+		    const auto &content_value = buffer_content.second;
+		    if (const auto &content_region
+			= content_value->dyn_cast_region_svalue ())
+			    if (content_region->get_pointee () == base_reg)
+				    actual_refcnt++;
+	    }
+	}
+    }
+}
+
+/* Counts the actual references from all clusters in the model's store. */
+int
+count_actual_references (const region_model *model, region_model_manager *mgr,
+			 region_model_context *ctxt, const region *base_reg,
+			 const svalue *pylist_type_ptr, tree ob_type_field)
+{
+  int actual_refcnt = 0;
+  for (const auto &other_cluster : *model->get_store ())
+    {
+      for (const auto &binding : other_cluster.second->get_map ())
+	{
+	  const auto &sval = binding.second;
+	  const auto &curr_region = sval->maybe_get_region ();
+
+	  if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
+	    continue;
+
+	  increment_count_if_base_matches (curr_region, base_reg,
+					    actual_refcnt);
+
+	  const region *ob_type_region
+	      = mgr->get_field_region (curr_region, ob_type_field);
+	  const svalue *stored_sval
+	      = model->get_store_value (ob_type_region, ctxt);
+	  const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
+
+	  if (!remove_cast)
+	    continue;
+
+	  const svalue *type = remove_cast->get_arg ();
+	  if (type == pylist_type_ptr)
+	    process_ob_item_region (model, mgr, ctxt, curr_region,
+				    pylist_type_ptr, base_reg, actual_refcnt);
+	}
+    }
+  return actual_refcnt;
+}
+
+/* Retrieves the svalue associated with the ob_refcnt field of the base region.
+ */
+const svalue *
+retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
+			 region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (base_reg, ob_refcnt_tree);
+  const svalue *ob_refcnt_sval
+      = model->get_store_value (ob_refcnt_region, ctxt);
+  ob_refcnt_sval->dump (true);
+  return ob_refcnt_sval;
+}
+
+/* Processes an individual cluster and computes the reference count. */
+void
+process_cluster (
+    const hash_map<const ana::region *,
+		   ana::binding_cluster *>::iterator::reference_pair cluster,
+    const region_model *model, const svalue *retval,
+    region_model_context *ctxt, const svalue *pylist_type_ptr,
+    tree ob_type_field)
+{
+  region_model_manager *mgr = model->get_manager ();
+  const region *base_reg = cluster.first;
+
+  int actual_refcnt = count_actual_references (model, mgr, ctxt, base_reg,
+					       pylist_type_ptr, ob_type_field);
+  inform (UNKNOWN_LOCATION, "actual ref count: %d", actual_refcnt);
+
+  const svalue *ob_refcnt_sval
+      = retrieve_ob_refcnt_sval (base_reg, model, ctxt);
+  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+      ob_refcnt_sval->get_type (), actual_refcnt);
+
+  if (actual_refcnt_sval != ob_refcnt_sval && ctxt)
+    {
+      std::unique_ptr<pending_diagnostic> pd = make_unique<refcnt_mismatch> (
+	  base_reg, ob_refcnt_sval, actual_refcnt_sval);
+    if (pd)
+    inform(UNKNOWN_LOCATION, "DIAGNOSTIC ");
+    }
+}
+
+/* Validates the reference count of Python objects. */
+void
+check_pyobj_refcnt (const region_model *model, const svalue *retval,
+		    region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+
+  const region *pylist_type_region
+      = mgr->get_region_for_global (pylisttype_vardecl);
+  const svalue *pylist_type_ptr = mgr->get_ptr_svalue (
+      TREE_TYPE (pylisttype_vardecl), pylist_type_region);
+
+  tree ob_type_field = get_field_by_name (pyobj_record, "ob_type");
+
+  for (const auto &cluster : *model->get_store ())
+    {
+      if (!is_heap_allocated (cluster.first))
+	continue;
+
+      inform (UNKNOWN_LOCATION, "_________________");
+      const region *base_reg = cluster.first;
+      base_reg->dump (true);
+      if (const auto &retval_region_sval = retval->dyn_cast_region_svalue ())
+	{
+	  const auto &retval_reg = retval_region_sval->get_pointee ();
+	  if (retval_reg == base_reg)
+	    {
+		    inform (UNKNOWN_LOCATION, "same thiong");
+		    continue;
+	    }
+	}
+      process_cluster (cluster, model, retval, ctxt, pylist_type_ptr,
+		       ob_type_field);
+    }
+}
+
+
+
 /* Some concessions were made to
 simplify the analysis process when comparing kf_PyList_Append with the
 real implementation. In particular, PyList_Append performs some
@@ -940,8 +1169,9 @@ plugin_init (struct plugin_name_args *plugin_info,
   const char *plugin_name = plugin_info->base_name;
   if (0)
     inform (input_location, "got here; %qs", plugin_name);
-  ana::register_finish_translation_unit_callback (&stash_named_types);
-  ana::register_finish_translation_unit_callback (&stash_global_vars);
+  register_finish_translation_unit_callback (&stash_named_types);
+  register_finish_translation_unit_callback (&stash_global_vars);
+  region_model::register_pop_frame_callback(check_pyobj_refcnt);
   register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
                      ana::cpython_analyzer_init_cb,
                      NULL); /* void *user_data */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-16 19:17                                         ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
@ 2023-08-16 21:28                                           ` David Malcolm
  2023-08-17  1:47                                             ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-16 21:28 UTC (permalink / raw)
  To: Eric Feng, dmalcolm; +Cc: gcc

On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> Hi everyone,

[fixing typo in my email address]

Hi Eric, thanks for the update, and the WIP patch.

> 
> After pushing the code that supports various known function classes last week,
> I've turned my attention back to the core reference count checking 
> functionality. This functionality used to reside in region_model, which 
> wasn't ideal. To address this, I've introduced a hook to register callbacks 
> to pop_frame. Specifically, this allows the code that checks the reference 
> count and emits diagnostics to be housed within the plugin, rather than the 
> core analyzer.
> 
> As of now, the parameters of pop_frame_callback are tailored specifically to 
> our needs. If the use of callbacks at the end of pop_frame becomes more 
> prevalent, we can revisit the setup to potentially make it more general.
> 
> Moreover, the core reference count checking logic was previously somewhat 
> bloated, contained in one extensive function. I've since refactored it, 
> breaking it down into several helper functions to simplify and reduce
> complexity. There are still some aspects that need refinement, especially 
> since the plugin has seen changes since I last worked on this logic. However, 
> I believe that there aren't any significant problems.

Suggestion: introduce some more decls into analyzer-decls.h and
known_functions for them into the plugin so that you can run/test/debug
the helper functions independently (similar to the existing ones in kf-
analyzer.cc).

e.g.
  extern void __analyzer_cpython_dump_real_refcounts (void);
  extern void __analyzer_cpython_dump_ob_refcnt (void);

> 
> Currently, I've started working a custom stmt_finder similar to leak_stmt_finder 
> to address the issue of m_stmt and m_stmt_finder being NULL at the time of 
> region_model::pop_frame. This approach was discussed as a viable solution in 
> a previous email, and I'll keep everyone posted on my progress. Afterwards, I 
> will go back to address the refinements necessary mentioned above.

You might want to experiment with splitting out
(a) "is there a refcount problem" from
(b) "emit a refcount problem".

For example, you could hardcode (a) to true, so we always complain with
(b) on every heap-allocated object, just to debug the stmt_finder
workaround.


[...snip...]

BTW, you don't need to bother to write ChangeLog entries if you're just
sending a work-in-progress for me.

> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..918bb5a5587 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...]

> +/* For PyListObjects: processes the ob_item field within the current region and
> + * increments the reference count if conditions are met. */
> +void
> +process_ob_item_region (const region_model *model, region_model_manager *mgr,
> +                       region_model_context *ctxt, const region *curr_region,
> +                       const svalue *pylist_type_ptr, const region *base_reg,
> +                       int &actual_refcnt)

You seem to be special-casing PyListObject here; why?  That seems like
it's not going to be scalable tothe general case.

Am I right in thinking the intent of this code is to count the actual
number of pointers in memory that point to a particular region?

Doesn't the ob_item buffer show up in the store as another cluster? 
Can't you just look at the bindings in the clusters and tally up the
pointers for each heap_allocated_region? (accumulating a result map
from region to int of the actual reference counts).

Or am I missing something?

What does
  model->debug ();
show in your examples?


> +{
> +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
> +  const region *ob_item_field_reg
> +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, ctxt);
> +
> +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
> +    {
> +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> +      const svalue *allocated_bytes = model->get_dynamic_extents (ob_item_reg);
> +      const region *ob_item_sized = mgr->get_sized_region (
> +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> +      const svalue *buffer_contents_sval
> +         = model->get_store_value (ob_item_sized, ctxt);
> +
> +      if (const auto &buffer_contents
> +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> +       {
> +         for (const auto &buffer_content : buffer_contents->get_map ())
> +           {
> +                   const auto &content_value = buffer_content.second;
> +                   if (const auto &content_region
> +                       = content_value->dyn_cast_region_svalue ())
> +                           if (content_region->get_pointee () == base_reg)
> +                                   actual_refcnt++;
> +           }
> +       }
> +    }
> +}
> +
> +/* Counts the actual references from all clusters in the model's store. */
> +int
> +count_actual_references (const region_model *model, region_model_manager *mgr,
> +                        region_model_context *ctxt, const region *base_reg,
> +                        const svalue *pylist_type_ptr, tree ob_type_field)
> +{
> +  int actual_refcnt = 0;
> +  for (const auto &other_cluster : *model->get_store ())
> +    {
> +      for (const auto &binding : other_cluster.second->get_map ())
> +       {
> +         const auto &sval = binding.second;
> +         const auto &curr_region = sval->maybe_get_region ();
> +
> +         if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
> +           continue;
> +
> +         increment_count_if_base_matches (curr_region, base_reg,
> +                                           actual_refcnt);
> +
> +         const region *ob_type_region
> +             = mgr->get_field_region (curr_region, ob_type_field);
> +         const svalue *stored_sval
> +             = model->get_store_value (ob_type_region, ctxt);
> +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
> +
> +         if (!remove_cast)
> +           continue;
> +
> +         const svalue *type = remove_cast->get_arg ();
> +         if (type == pylist_type_ptr)
> +           process_ob_item_region (model, mgr, ctxt, curr_region,
> +                                   pylist_type_ptr, base_reg, actual_refcnt);
> +       }
> +    }
> +  return actual_refcnt;
> +}
> 

Hope the above is constructive.
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-16 21:28                                           ` David Malcolm
@ 2023-08-17  1:47                                             ` Eric Feng
  2023-08-21 14:05                                               ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-17  1:47 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi Dave,

Thanks for the feedback!


On Wed, Aug 16, 2023 at 5:29 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > Hi everyone,
>
> [fixing typo in my email address]
>
> Hi Eric, thanks for the update, and the WIP patch.
>
> >
> > After pushing the code that supports various known function classes last week,
> > I've turned my attention back to the core reference count checking
> > functionality. This functionality used to reside in region_model, which
> > wasn't ideal. To address this, I've introduced a hook to register callbacks
> > to pop_frame. Specifically, this allows the code that checks the reference
> > count and emits diagnostics to be housed within the plugin, rather than the
> > core analyzer.
> >
> > As of now, the parameters of pop_frame_callback are tailored specifically to
> > our needs. If the use of callbacks at the end of pop_frame becomes more
> > prevalent, we can revisit the setup to potentially make it more general.
> >
> > Moreover, the core reference count checking logic was previously somewhat
> > bloated, contained in one extensive function. I've since refactored it,
> > breaking it down into several helper functions to simplify and reduce
> > complexity. There are still some aspects that need refinement, especially
> > since the plugin has seen changes since I last worked on this logic. However,
> > I believe that there aren't any significant problems.
>
> Suggestion: introduce some more decls into analyzer-decls.h and
> known_functions for them into the plugin so that you can run/test/debug
> the helper functions independently (similar to the existing ones in kf-
> analyzer.cc).
>
> e.g.
>   extern void __analyzer_cpython_dump_real_refcounts (void);
>   extern void __analyzer_cpython_dump_ob_refcnt (void);
>
> >
Thanks for the suggestion. This will be even more helpful now that we
have split the logic into helper functions. I will look into these
when I come back to the "is there a refcount problem" side of the
equation.
> > Currently, I've started working a custom stmt_finder similar to leak_stmt_finder
> > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > region_model::pop_frame. This approach was discussed as a viable solution in
> > a previous email, and I'll keep everyone posted on my progress. Afterwards, I
> > will go back to address the refinements necessary mentioned above.
>
> You might want to experiment with splitting out
> (a) "is there a refcount problem" from
> (b) "emit a refcount problem".
>
> For example, you could hardcode (a) to true, so we always complain with
> (b) on every heap-allocated object, just to debug the stmt_finder
> workaround.
>
>
> [...snip...]
>
> BTW, you don't need to bother to write ChangeLog entries if you're just
> sending a work-in-progress for me.
>
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index 7cd72e8a886..918bb5a5587 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
>
> [...]
>
> > +/* For PyListObjects: processes the ob_item field within the current region and
> > + * increments the reference count if conditions are met. */
> > +void
> > +process_ob_item_region (const region_model *model, region_model_manager *mgr,
> > +                       region_model_context *ctxt, const region *curr_region,
> > +                       const svalue *pylist_type_ptr, const region *base_reg,
> > +                       int &actual_refcnt)
>
> You seem to be special-casing PyListObject here; why?  That seems like
> it's not going to be scalable tothe general case.
>
> Am I right in thinking the intent of this code is to count the actual
> number of pointers in memory that point to a particular region?
>
> Doesn't the ob_item buffer show up in the store as another cluster?
> Can't you just look at the bindings in the clusters and tally up the
> pointers for each heap_allocated_region? (accumulating a result map
> from region to int of the actual reference counts).
Yes, you're correct, I hadn't thought of that ... This simplifies a
lot of things. Thanks for the great suggestion!
>
> Or am I missing something?
>
> What does
>   model->debug ();
> show in your examples?
>
>
> > +{
> > +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
> > +  const region *ob_item_field_reg
> > +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> > +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, ctxt);
> > +
> > +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
> > +    {
> > +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> > +      const svalue *allocated_bytes = model->get_dynamic_extents (ob_item_reg);
> > +      const region *ob_item_sized = mgr->get_sized_region (
> > +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> > +      const svalue *buffer_contents_sval
> > +         = model->get_store_value (ob_item_sized, ctxt);
> > +
> > +      if (const auto &buffer_contents
> > +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> > +       {
> > +         for (const auto &buffer_content : buffer_contents->get_map ())
> > +           {
> > +                   const auto &content_value = buffer_content.second;
> > +                   if (const auto &content_region
> > +                       = content_value->dyn_cast_region_svalue ())
> > +                           if (content_region->get_pointee () == base_reg)
> > +                                   actual_refcnt++;
> > +           }
> > +       }
> > +    }
> > +}
> > +
> > +/* Counts the actual references from all clusters in the model's store. */
> > +int
> > +count_actual_references (const region_model *model, region_model_manager *mgr,
> > +                        region_model_context *ctxt, const region *base_reg,
> > +                        const svalue *pylist_type_ptr, tree ob_type_field)
> > +{
> > +  int actual_refcnt = 0;
> > +  for (const auto &other_cluster : *model->get_store ())
> > +    {
> > +      for (const auto &binding : other_cluster.second->get_map ())
> > +       {
> > +         const auto &sval = binding.second;
> > +         const auto &curr_region = sval->maybe_get_region ();
> > +
> > +         if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
> > +           continue;
> > +
> > +         increment_count_if_base_matches (curr_region, base_reg,
> > +                                           actual_refcnt);
> > +
> > +         const region *ob_type_region
> > +             = mgr->get_field_region (curr_region, ob_type_field);
> > +         const svalue *stored_sval
> > +             = model->get_store_value (ob_type_region, ctxt);
> > +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
> > +
> > +         if (!remove_cast)
> > +           continue;
> > +
> > +         const svalue *type = remove_cast->get_arg ();
> > +         if (type == pylist_type_ptr)
> > +           process_ob_item_region (model, mgr, ctxt, curr_region,
> > +                                   pylist_type_ptr, base_reg, actual_refcnt);
> > +       }
> > +    }
> > +  return actual_refcnt;
> > +}
> >
>
> Hope the above is constructive.
> Dave
>

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-17  1:47                                             ` Eric Feng
@ 2023-08-21 14:05                                               ` Eric Feng
  2023-08-21 15:04                                                 ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-21 14:05 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi Dave,

Just wanted to give you and everyone else a short update on how
reference count checking is going — we can now observe the refcnt
diagnostic being emitted:

rc3.c:22:10: warning: REF COUNT PROBLEM
   22 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ fails
    |......
    |   22 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

I will fix up and refactor the logic for counting the actual ref count
before coming back and refining the diagnostic to give much more
detailed information.

Best,
Eric


On Wed, Aug 16, 2023 at 9:47 PM Eric Feng <ef2648@columbia.edu> wrote:
>
> Hi Dave,
>
> Thanks for the feedback!
>
>
> On Wed, Aug 16, 2023 at 5:29 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >
> > On Wed, 2023-08-16 at 15:17 -0400, Eric Feng via Gcc wrote:
> > > Hi everyone,
> >
> > [fixing typo in my email address]
> >
> > Hi Eric, thanks for the update, and the WIP patch.
> >
> > >
> > > After pushing the code that supports various known function classes last week,
> > > I've turned my attention back to the core reference count checking
> > > functionality. This functionality used to reside in region_model, which
> > > wasn't ideal. To address this, I've introduced a hook to register callbacks
> > > to pop_frame. Specifically, this allows the code that checks the reference
> > > count and emits diagnostics to be housed within the plugin, rather than the
> > > core analyzer.
> > >
> > > As of now, the parameters of pop_frame_callback are tailored specifically to
> > > our needs. If the use of callbacks at the end of pop_frame becomes more
> > > prevalent, we can revisit the setup to potentially make it more general.
> > >
> > > Moreover, the core reference count checking logic was previously somewhat
> > > bloated, contained in one extensive function. I've since refactored it,
> > > breaking it down into several helper functions to simplify and reduce
> > > complexity. There are still some aspects that need refinement, especially
> > > since the plugin has seen changes since I last worked on this logic. However,
> > > I believe that there aren't any significant problems.
> >
> > Suggestion: introduce some more decls into analyzer-decls.h and
> > known_functions for them into the plugin so that you can run/test/debug
> > the helper functions independently (similar to the existing ones in kf-
> > analyzer.cc).
> >
> > e.g.
> >   extern void __analyzer_cpython_dump_real_refcounts (void);
> >   extern void __analyzer_cpython_dump_ob_refcnt (void);
> >
> > >
> Thanks for the suggestion. This will be even more helpful now that we
> have split the logic into helper functions. I will look into these
> when I come back to the "is there a refcount problem" side of the
> equation.
> > > Currently, I've started working a custom stmt_finder similar to leak_stmt_finder
> > > to address the issue of m_stmt and m_stmt_finder being NULL at the time of
> > > region_model::pop_frame. This approach was discussed as a viable solution in
> > > a previous email, and I'll keep everyone posted on my progress. Afterwards, I
> > > will go back to address the refinements necessary mentioned above.
> >
> > You might want to experiment with splitting out
> > (a) "is there a refcount problem" from
> > (b) "emit a refcount problem".
> >
> > For example, you could hardcode (a) to true, so we always complain with
> > (b) on every heap-allocated object, just to debug the stmt_finder
> > workaround.
> >
> >
> > [...snip...]
> >
> > BTW, you don't need to bother to write ChangeLog entries if you're just
> > sending a work-in-progress for me.
> >
> > > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > index 7cd72e8a886..918bb5a5587 100644
> > > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > [...]
> >
> > > +/* For PyListObjects: processes the ob_item field within the current region and
> > > + * increments the reference count if conditions are met. */
> > > +void
> > > +process_ob_item_region (const region_model *model, region_model_manager *mgr,
> > > +                       region_model_context *ctxt, const region *curr_region,
> > > +                       const svalue *pylist_type_ptr, const region *base_reg,
> > > +                       int &actual_refcnt)
> >
> > You seem to be special-casing PyListObject here; why?  That seems like
> > it's not going to be scalable tothe general case.
> >
> > Am I right in thinking the intent of this code is to count the actual
> > number of pointers in memory that point to a particular region?
> >
> > Doesn't the ob_item buffer show up in the store as another cluster?
> > Can't you just look at the bindings in the clusters and tally up the
> > pointers for each heap_allocated_region? (accumulating a result map
> > from region to int of the actual reference counts).
> Yes, you're correct, I hadn't thought of that ... This simplifies a
> lot of things. Thanks for the great suggestion!
> >
> > Or am I missing something?
> >
> > What does
> >   model->debug ();
> > show in your examples?
> >
> >
> > > +{
> > > +  tree ob_item_field_tree = get_field_by_name (pylistobj_record, "ob_item");
> > > +  const region *ob_item_field_reg
> > > +      = mgr->get_field_region (curr_region, ob_item_field_tree);
> > > +  const svalue *ob_item_ptr = model->get_store_value (ob_item_field_reg, ctxt);
> > > +
> > > +  if (const auto &cast_ob_item_reg = ob_item_ptr->dyn_cast_region_svalue ())
> > > +    {
> > > +      const region *ob_item_reg = cast_ob_item_reg->get_pointee ();
> > > +      const svalue *allocated_bytes = model->get_dynamic_extents (ob_item_reg);
> > > +      const region *ob_item_sized = mgr->get_sized_region (
> > > +         ob_item_reg, pyobj_ptr_ptr, allocated_bytes);
> > > +      const svalue *buffer_contents_sval
> > > +         = model->get_store_value (ob_item_sized, ctxt);
> > > +
> > > +      if (const auto &buffer_contents
> > > +         = buffer_contents_sval->dyn_cast_compound_svalue ())
> > > +       {
> > > +         for (const auto &buffer_content : buffer_contents->get_map ())
> > > +           {
> > > +                   const auto &content_value = buffer_content.second;
> > > +                   if (const auto &content_region
> > > +                       = content_value->dyn_cast_region_svalue ())
> > > +                           if (content_region->get_pointee () == base_reg)
> > > +                                   actual_refcnt++;
> > > +           }
> > > +       }
> > > +    }
> > > +}
> > > +
> > > +/* Counts the actual references from all clusters in the model's store. */
> > > +int
> > > +count_actual_references (const region_model *model, region_model_manager *mgr,
> > > +                        region_model_context *ctxt, const region *base_reg,
> > > +                        const svalue *pylist_type_ptr, tree ob_type_field)
> > > +{
> > > +  int actual_refcnt = 0;
> > > +  for (const auto &other_cluster : *model->get_store ())
> > > +    {
> > > +      for (const auto &binding : other_cluster.second->get_map ())
> > > +       {
> > > +         const auto &sval = binding.second;
> > > +         const auto &curr_region = sval->maybe_get_region ();
> > > +
> > > +         if (!curr_region || curr_region->get_kind () != RK_HEAP_ALLOCATED)
> > > +           continue;
> > > +
> > > +         increment_count_if_base_matches (curr_region, base_reg,
> > > +                                           actual_refcnt);
> > > +
> > > +         const region *ob_type_region
> > > +             = mgr->get_field_region (curr_region, ob_type_field);
> > > +         const svalue *stored_sval
> > > +             = model->get_store_value (ob_type_region, ctxt);
> > > +         const auto &remove_cast = stored_sval->dyn_cast_unaryop_svalue ();
> > > +
> > > +         if (!remove_cast)
> > > +           continue;
> > > +
> > > +         const svalue *type = remove_cast->get_arg ();
> > > +         if (type == pylist_type_ptr)
> > > +           process_ob_item_region (model, mgr, ctxt, curr_region,
> > > +                                   pylist_type_ptr, base_reg, actual_refcnt);
> > > +       }
> > > +    }
> > > +  return actual_refcnt;
> > > +}
> > >
> >
> > Hope the above is constructive.
> > Dave
> >
>
> Best,
> Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-21 14:05                                               ` Eric Feng
@ 2023-08-21 15:04                                                 ` David Malcolm
  2023-08-23 21:15                                                   ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-21 15:04 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Just wanted to give you and everyone else a short update on how
> reference count checking is going — we can now observe the refcnt
> diagnostic being emitted:
> 
> rc3.c:22:10: warning: REF COUNT PROBLEM
>    22 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ fails
>     |......
>     |   22 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |
> 
> I will fix up and refactor the logic for counting the actual ref
> count
> before coming back and refining the diagnostic to give much more
> detailed information.

Excellent!  Thanks for the update.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-21 15:04                                                 ` David Malcolm
@ 2023-08-23 21:15                                                   ` Eric Feng
  2023-08-23 23:16                                                     ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-23 21:15 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Mon, Aug 21, 2023 at 11:04 AM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > Hi Dave,
> >
> > Just wanted to give you and everyone else a short update on how
> > reference count checking is going — we can now observe the refcnt
> > diagnostic being emitted:
> >
> > rc3.c:22:10: warning: REF COUNT PROBLEM
> >    22 |   return list;
> >       |          ^~~~
> >   ‘create_py_object’: events 1-4
> >     |
> >     |    4 |   PyObject* item = PyLong_FromLong(3);
> >     |      |                    ^~~~~~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (1) when ‘PyLong_FromLong’ succeeds
> >     |    5 |   PyObject* list = PyList_New(1);
> >     |      |                    ~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (2) when ‘PyList_New’ succeeds
> >     |......
> >     |   14 |   PyList_Append(list, item);
> >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |   |
> >     |      |   (3) when ‘PyList_Append’ fails
> >     |......
> >     |   22 |   return list;
> >     |      |          ~~~~
> >     |      |          |
> >     |      |          (4) here
> >     |
> >
> > I will fix up and refactor the logic for counting the actual ref
> > count
> > before coming back and refining the diagnostic to give much more
> > detailed information.
>
> Excellent!  Thanks for the update.
>
> Dave
>

Hi Dave,

I've since fixed up the logic to count the actual reference counts of
the PyObject* instances. Now, I'm contemplating the specific
diagnostics we'd want to issue and the appropriate conditions for
emitting them. With this in mind, I wanted to check in with you on the
appropriate approach:

To start, I'm adopting the same assumptions as cpychecker for
functions returning a PyObject*. That is, I'm operating under the
premise that by default such functions return a new reference upon
success rather than, for example, a borrowed reference (which we can
tackle later on). Given this, it's my understanding that the reference
count of the returned object should be 1 if the object is newly
created within the function body and incremented by 1 from what it was
previously if not newly created (e.g passed in as an argument).
Furthermore, the reference count for any PyObject* instances created
within the function should be 0, barring situations where we're
returning a collection, like a list, that includes references to these
objects.

Let me know what you think; thanks!

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-23 21:15                                                   ` Eric Feng
@ 2023-08-23 23:16                                                     ` David Malcolm
  2023-08-24 14:45                                                       ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-23 23:16 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Wed, 2023-08-23 at 17:15 -0400, Eric Feng wrote:
> On Mon, Aug 21, 2023 at 11:04 AM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > > Hi Dave,
> > > 
> > > Just wanted to give you and everyone else a short update on how
> > > reference count checking is going — we can now observe the refcnt
> > > diagnostic being emitted:
> > > 
> > > rc3.c:22:10: warning: REF COUNT PROBLEM
> > >    22 |   return list;
> > >       |          ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |      |                    ~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (2) when ‘PyList_New’ succeeds
> > >     |......
> > >     |   14 |   PyList_Append(list, item);
> > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |   |
> > >     |      |   (3) when ‘PyList_Append’ fails
> > >     |......
> > >     |   22 |   return list;
> > >     |      |          ~~~~
> > >     |      |          |
> > >     |      |          (4) here
> > >     |
> > > 
> > > I will fix up and refactor the logic for counting the actual ref
> > > count
> > > before coming back and refining the diagnostic to give much more
> > > detailed information.
> > 
> > Excellent!  Thanks for the update.
> > 
> > Dave
> > 
> 
> Hi Dave,
> 
> I've since fixed up the logic to count the actual reference counts of
> the PyObject* instances. 

Sounds promising.

> Now, I'm contemplating the specific
> diagnostics we'd want to issue and the appropriate conditions for
> emitting them. With this in mind, I wanted to check in with you on
> the
> appropriate approach:
> 
> To start, I'm adopting the same assumptions as cpychecker for
> functions returning a PyObject*. That is, I'm operating under the
> premise that by default such functions return a new reference upon
> success rather than, for example, a borrowed reference (which we can
> tackle later on). Given this, it's my understanding that the
> reference
> count of the returned object should be 1 if the object is newly
> created within the function body and incremented by 1 from what it
> was
> previously if not newly created (e.g passed in as an argument).
> Furthermore, the reference count for any PyObject* instances created
> within the function should be 0, barring situations where we're
> returning a collection, like a list, that includes references to
> these
> objects.
> 
> Let me know what you think; thanks!

This sounds like a good approach for v1 of the implementation.

It's probably best to focus on getting a simple version of the patch
into trunk, and leave any polish of it to followups.

In terms of deciding what the reference count of a returned PyObject *
ought to be, cpychecker had logic to try to detect callbacks used by
PyMethodDef tables, so that e.g. in:

static PyMethodDef widget_methods[] = {
    {"display",
     (PyCFunction)widget_display,
     (METH_VARARGS | METH_KEYWORDS), /* ml_flags */
     NULL},

    {NULL, NULL, 0, NULL} /* terminator */
};

...we'd know that the callback function "widget_display" follows the
standard rules for a PyCFunction (e.g. returns a new reference).

But that's for later; don't bother trying to implement that until we
have the basics working.

Is it worth posting a work-in-progress patch of what you have so far? 
(you don't need to bother with a ChangeLog for that)

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-23 23:16                                                     ` David Malcolm
@ 2023-08-24 14:45                                                       ` Eric Feng
  2023-08-25 12:50                                                         ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-24 14:45 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Wed, Aug 23, 2023 at 7:16 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Wed, 2023-08-23 at 17:15 -0400, Eric Feng wrote:
> > On Mon, Aug 21, 2023 at 11:04 AM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > >
> > > On Mon, 2023-08-21 at 10:05 -0400, Eric Feng wrote:
> > > > Hi Dave,
> > > >
> > > > Just wanted to give you and everyone else a short update on how
> > > > reference count checking is going — we can now observe the refcnt
> > > > diagnostic being emitted:
> > > >
> > > > rc3.c:22:10: warning: REF COUNT PROBLEM
> > > >    22 |   return list;
> > > >       |          ^~~~
> > > >   ‘create_py_object’: events 1-4
> > > >     |
> > > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > > >     |      |                    |
> > > >     |      |                    (1) when ‘PyLong_FromLong’
> > > > succeeds
> > > >     |    5 |   PyObject* list = PyList_New(1);
> > > >     |      |                    ~~~~~~~~~~~~~
> > > >     |      |                    |
> > > >     |      |                    (2) when ‘PyList_New’ succeeds
> > > >     |......
> > > >     |   14 |   PyList_Append(list, item);
> > > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >     |      |   |
> > > >     |      |   (3) when ‘PyList_Append’ fails
> > > >     |......
> > > >     |   22 |   return list;
> > > >     |      |          ~~~~
> > > >     |      |          |
> > > >     |      |          (4) here
> > > >     |
> > > >
> > > > I will fix up and refactor the logic for counting the actual ref
> > > > count
> > > > before coming back and refining the diagnostic to give much more
> > > > detailed information.
> > >
> > > Excellent!  Thanks for the update.
> > >
> > > Dave
> > >
> >
> > Hi Dave,
> >
> > I've since fixed up the logic to count the actual reference counts of
> > the PyObject* instances.
>
> Sounds promising.
>
> > Now, I'm contemplating the specific
> > diagnostics we'd want to issue and the appropriate conditions for
> > emitting them. With this in mind, I wanted to check in with you on
> > the
> > appropriate approach:
> >
> > To start, I'm adopting the same assumptions as cpychecker for
> > functions returning a PyObject*. That is, I'm operating under the
> > premise that by default such functions return a new reference upon
> > success rather than, for example, a borrowed reference (which we can
> > tackle later on). Given this, it's my understanding that the
> > reference
> > count of the returned object should be 1 if the object is newly
> > created within the function body and incremented by 1 from what it
> > was
> > previously if not newly created (e.g passed in as an argument).
> > Furthermore, the reference count for any PyObject* instances created
> > within the function should be 0, barring situations where we're
> > returning a collection, like a list, that includes references to
> > these
> > objects.
> >
> > Let me know what you think; thanks!
>
> This sounds like a good approach for v1 of the implementation.
>
> It's probably best to focus on getting a simple version of the patch
> into trunk, and leave any polish of it to followups.
>
> In terms of deciding what the reference count of a returned PyObject *
> ought to be, cpychecker had logic to try to detect callbacks used by
> PyMethodDef tables, so that e.g. in:
>
> static PyMethodDef widget_methods[] = {
>     {"display",
>      (PyCFunction)widget_display,
>      (METH_VARARGS | METH_KEYWORDS), /* ml_flags */
>      NULL},
>
>     {NULL, NULL, 0, NULL} /* terminator */
> };
>
> ...we'd know that the callback function "widget_display" follows the
> standard rules for a PyCFunction (e.g. returns a new reference).
>
> But that's for later; don't bother trying to implement that until we
> have the basics working.
I see; sounds good!
>
> Is it worth posting a work-in-progress patch of what you have so far?
> (you don't need to bother with a ChangeLog for that)
Will post a WIP soon. Thanks!
>
> Dave
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Update on CPython Extension Module -fanalyzer plugin development
  2023-08-24 14:45                                                       ` Eric Feng
@ 2023-08-25 12:50                                                         ` Eric Feng
  2023-08-25 19:50                                                           ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-25 12:50 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, Eric Feng

Hi Dave,

Please find an updated WIP patch on reference count checking below. Some
parts aren't properly formatted yet; I apologize for that.

Since the last WIP patch, the major updates include:
- Updated certain areas of the core analyzer to support custom stmt_finder.
- A significant revamp of the reference count checking logic.
- __analyzer_cpython_dump_refcounts: This dumps reference count related information.

Here's an updated look at the current WIP diagnostic we issue:
rc3.c:25:10: warning: Expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   25 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   16 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   25 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

The reference count checking logic for v1 should be almost complete.
Currently, it supports situations where the returned object is newly created.
It doesn't yet support the other case (i.e., incremented by 1 from
what it was previously, if not newly created). 

This week, I've focused primarily on the reference count checking logic. I plan
to shift my focus to refining the diagnostic next. As seen in the placeholder
diagnostic message above, I believe we should at least inform the user about the
variable name associated with the region that has an unexpected reference count
issue (in this case, 'item'). Initially, I suspect the issue might be that:

tree reg_tree = model->get_representative_tree (curr_region);

returns NULL since curr_region is heap allocated and thus the path_var returned would be:

path_var (NULL_TREE, 0);

Which means that:

refcnt_stmt_finder finder (*eg, reg_tree);

always receives a NULL_TREE, causing it to always default to the
not_found case. A workaround might be necessary, but I haven't delved too deeply into this yet,
so my suspicion could be off. Additionally, I think it would be helpful to show users what the
ob_refcnt looks like in each event as well. I'll keep you updated on both these
points and welcome any feedback.

Best,
Eric
---
 gcc/analyzer/engine.cc                        |   8 +-
 gcc/analyzer/exploded-graph.h                 |   4 +-
 gcc/analyzer/region-model.cc                  |   3 +
 gcc/analyzer/region-model.h                   |  38 +-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
 5 files changed, 421 insertions(+), 8 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 61685f43fba..f9e239128a0 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
+impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
+				 const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
     {
       if (get_logger ())
 	get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
       bool terminate_path = d->terminate_path_p ();
       if (m_eg->get_diagnostic_manager ().add_diagnostic
 	  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-	   m_stmt, m_stmt_finder, std::move (d)))
+	   m_stmt, curr_stmt_finder, std::move (d)))
 	{
 	  if (m_path_ctxt
 	      && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 4a4ef9d12b4..633f8c263fc 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 			     uncertainty_t *uncertainty,
 			     logger *logger = NULL);
 
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr<pending_note> pn) final override;
   void on_svalue_leak (const svalue *) override;
   void on_liveness_change (const svalue_set &live_svalues,
@@ -106,6 +107,7 @@ class impl_region_model_context : public region_model_context
 			 std::unique_ptr<sm_context> *out_sm_context) override;
 
   const gimple *get_stmt () const override { return m_stmt; }
+  const exploded_graph *get_eg () const override { return m_eg; }
 
   exploded_graph *m_eg;
   log_user m_logger;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 494a9cdf149..18cea279e53 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -4813,6 +4815,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4f09f2e585a..fd99b987a69 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+				    const svalue *retval,
+				    region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
    a tree of regions, along with their associated values.
    The representation is graph-like because values can be pointers to
@@ -505,6 +509,20 @@ class region_model
   void check_for_null_terminated_string_arg (const call_details &cd,
 					     unsigned idx);
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback &callback)
+  {
+    pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+		       region_model_context *ctxt)
+  {
+    for (auto &callback : pop_frame_callbacks)
+	callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -592,6 +610,7 @@ private:
 						tree callee_fndecl,
 						region_model_context *ctxt) const;
 
+  static auto_vec<pop_frame_callback> pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
@@ -620,8 +639,15 @@ class region_model_context
 {
  public:
   /* Hook for clients to store pending diagnostics.
-     Return true if the diagnostic was stored, or false if it was deleted.  */
-  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
+     Return true if the diagnostic was stored, or false if it was deleted.
+     Optionally provide a custom stmt_finder.  */
+    virtual bool warn(std::unique_ptr<pending_diagnostic> d) {
+        return warn(std::move(d), nullptr);
+    }
+    
+    virtual bool warn(std::unique_ptr<pending_diagnostic> d, const stmt_finder *custom_finder) {
+        return false;
+    }
 
   /* Hook for clients to add a note to the last previously stored
      pending diagnostic.  */
@@ -724,6 +750,8 @@ class region_model_context
 
   /* Get the current statement, if any.  */
   virtual const gimple *get_stmt () const = 0;
+
+  virtual const exploded_graph *get_eg () const = 0;
 };
 
 /* A "do nothing" subclass of region_model_context.  */
@@ -778,6 +806,7 @@ public:
   }
 
   const gimple *get_stmt () const override { return NULL; }
+  const exploded_graph *get_eg () const override { return NULL; }
 };
 
 /* A subclass of region_model_context for determining if operations fail
@@ -912,6 +941,11 @@ class region_model_context_decorator : public region_model_context
     return m_inner->get_stmt ();
   }
 
+  const exploded_graph *get_eg () const override
+  {
+    return m_inner->get_eg ();
+  }
+
 protected:
   region_model_context_decorator (region_model_context *inner)
   : m_inner (inner)
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..a3274ced4a8 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"
+#include "analyzer/exploded-graph.h"
 #include "make-unique.h"
 
 int plugin_is_GPL_compatible;
@@ -191,6 +192,372 @@ public:
   }
 };
 
+/* This is just a copy of leak_stmt_finder for now (subject to change if
+ * necssary)  */
+
+class refcnt_stmt_finder : public stmt_finder
+{
+public:
+  refcnt_stmt_finder (const exploded_graph &eg, tree var)
+      : m_eg (eg), m_var (var)
+  {
+  }
+
+  std::unique_ptr<stmt_finder>
+  clone () const final override
+  {
+    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
+  }
+
+  const gimple *
+  find_stmt (const exploded_path &epath) final override
+  {
+    logger *const logger = m_eg.get_logger ();
+    LOG_FUNC (logger);
+
+    if (m_var && TREE_CODE (m_var) == SSA_NAME)
+      {
+	/* Locate the final write to this SSA name in the path.  */
+	const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
+
+	int idx_of_def_stmt;
+	bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
+	if (!found)
+	  goto not_found;
+
+	/* What was the next write to the underlying var
+	   after the SSA name was set? (if any).  */
+
+	for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
+	     ++idx)
+	  {
+	    const exploded_edge *eedge = epath.m_edges[idx];
+	    if (logger)
+		    logger->log ("eedge[%i]: EN %i -> EN %i", idx,
+				 eedge->m_src->m_index,
+				 eedge->m_dest->m_index);
+	    const exploded_node *dst_node = eedge->m_dest;
+	    const program_point &dst_point = dst_node->get_point ();
+	    const gimple *stmt = dst_point.get_stmt ();
+	    if (!stmt)
+		    continue;
+	    if (const gassign *assign = dyn_cast<const gassign *> (stmt))
+		    {
+			    tree lhs = gimple_assign_lhs (assign);
+			    if (TREE_CODE (lhs) == SSA_NAME
+				&& SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
+				    return assign;
+		    }
+	  }
+      }
+
+  not_found:
+
+    /* Look backwards for the first statement with a location.  */
+    int i;
+    const exploded_edge *eedge;
+    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
+    {
+      if (logger)
+	logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
+		     eedge->m_dest->m_index);
+      const exploded_node *dst_node = eedge->m_dest;
+      const program_point &dst_point = dst_node->get_point ();
+      const gimple *stmt = dst_point.get_stmt ();
+      if (stmt)
+	if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
+	  return stmt;
+    }
+
+    gcc_unreachable ();
+    return NULL;
+  }
+
+private:
+  const exploded_graph &m_eg;
+  tree m_var;
+};
+
+class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
+{
+public:
+  refcnt_mismatch (const region *base_region,
+				const svalue *ob_refcnt,
+				const svalue *actual_refcnt,
+        tree reg_tree)
+      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
+	m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "refcnt_mismatch";
+  }
+
+  bool
+  operator== (const refcnt_mismatch &other) const
+  {
+    return (m_base_region == other.m_base_region
+	    && m_ob_refcnt == other.m_ob_refcnt
+	    && m_actual_refcnt == other.m_actual_refcnt);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return 0;
+  }
+
+  bool
+  emit (rich_location *rich_loc, logger *) final override
+  {
+    diagnostic_metadata m;
+    bool warned;
+    // just assuming constants for now
+    auto actual_refcnt
+	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    warned = warning_meta (
+	rich_loc, m, get_controlling_option (),
+	"Expected <variable name belonging to m_base_region> to have "
+	"reference count: %qE but ob_refcnt field is: %qE",
+	actual_refcnt, ob_refcnt);
+
+    // location_t loc = rich_loc->get_loc ();
+    // foo (loc);
+    return warned;
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_base_region)
+      interest->add_region_creation (m_base_region);
+  }
+
+private:
+
+  void foo(location_t loc) const 
+  {
+    inform(loc, "something is up right here");
+  }
+  const region *m_base_region;
+  const svalue *m_ob_refcnt;
+  const svalue *m_actual_refcnt;
+  tree m_reg_tree;
+};
+
+/* Retrieves the svalue associated with the ob_refcnt field of the base region.
+ */
+const svalue *
+retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
+			 region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (base_reg, ob_refcnt_tree);
+  const svalue *ob_refcnt_sval
+      = model->get_store_value (ob_refcnt_region, ctxt);
+  return ob_refcnt_sval;
+}
+
+void
+increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
+{
+  bool existed;
+  auto &refcnt = map.get_or_insert (key, &existed);
+  refcnt = existed ? refcnt + 1 : 1;
+}
+
+
+/* Recursively fills in region_to_refcnt with the references owned by
+   pyobj_ptr_sval.  */
+void
+count_expected_pyobj_references (const region_model *model,
+			   hash_map<const region *, int> &region_to_refcnt,
+			   const svalue *pyobj_ptr_sval,
+			   hash_set<const region *> &seen)
+{
+  if (!pyobj_ptr_sval)
+    return;
+
+  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
+  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
+  if (!pyobj_region_sval && !pyobj_initial_sval)
+    return;
+
+  // todo: support initial sval (e.g passed in as parameter)
+  if (pyobj_initial_sval)
+    {
+  //     increment_region_refcnt (region_to_refcnt,
+	// 		       pyobj_initial_sval->get_region ());
+      return;
+    }
+
+  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  if (!pyobj_region || seen.contains (pyobj_region))
+    return;
+
+  seen.add (pyobj_region);
+
+  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
+    increment_region_refcnt (region_to_refcnt, pyobj_region);
+
+  const auto *curr_store = model->get_store ();
+  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
+  if (!retval_cluster)
+    return;
+
+  const auto &retval_binding_map = retval_cluster->get_map ();
+
+  for (const auto &binding : retval_binding_map)
+    {
+      const svalue *binding_sval = binding.second;
+      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
+      const region *pointee = unwrapped_sval->maybe_get_region ();
+
+      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
+	count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
+					 seen);
+    }
+}
+
+/* Compare ob_refcnt field vs the actual reference count of a region */
+void
+check_refcnt (const region_model *model, region_model_context *ctxt,
+	      const hash_map<const ana::region *,
+			     int>::iterator::reference_pair region_refcnt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  const auto &curr_region = region_refcnt.first;
+  const auto &actual_refcnt = region_refcnt.second;
+  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
+  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+      ob_refcnt_sval->get_type (), actual_refcnt);
+
+  if (ob_refcnt_sval != actual_refcnt_sval)
+  {
+    // todo: fix this (always null)
+    tree reg_tree = model->get_representative_tree (curr_region);
+
+    const auto &eg = ctxt->get_eg ();
+    refcnt_stmt_finder finder (*eg, reg_tree);
+    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
+					    actual_refcnt_sval, reg_tree);
+    if (pd && eg)
+    ctxt->warn (std::move (pd), &finder);
+  }
+}
+
+void
+check_refcnts (const region_model *model, const svalue *retval,
+	    region_model_context *ctxt,
+	    hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    check_refcnt(model, ctxt, region_refcnt);
+  }
+}
+
+/* Validates the reference count of all Python objects. */
+void
+pyobj_refcnt_checker (const region_model *model, const svalue *retval,
+		    region_model_context *ctxt)
+{
+  if (!ctxt)
+  return;
+
+  auto region_to_refcnt = hash_map<const region *, int> ();
+  auto seen_regions = hash_set<const region *> ();
+
+  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
+  check_refcnts (model, retval, ctxt, region_to_refcnt);
+}
+
+/* Counts the actual pyobject references from all clusters in the model's
+ * store. */
+void
+count_all_references (const region_model *model,
+		      hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &cluster : *model->get_store ())
+  {
+    auto curr_region = cluster.first;
+    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
+    continue;
+
+    increment_region_refcnt (region_to_refcnt, curr_region);
+
+    auto binding_cluster = cluster.second;
+    for (const auto &binding : binding_cluster->get_map ())
+    {
+	  const svalue *binding_sval = binding.second;
+
+	  const svalue *unwrapped_sval
+	      = binding_sval->unwrap_any_unmergeable ();
+	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
+	  // continue;
+
+	  const region *pointee = unwrapped_sval->maybe_get_region ();
+	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
+	    continue;
+
+	  increment_region_refcnt (region_to_refcnt, pointee);
+    }
+  }
+}
+
+void
+dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
+		  const region_model *model, region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp_show_color (&pp) = pp_show_color (global_dc->printer);
+  pp.buffer->stream = stderr;
+
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    auto region = region_refcnt.first;
+    auto actual_refcnt = region_refcnt.second;
+    const svalue *ob_refcnt_sval
+	= retrieve_ob_refcnt_sval (region, model, ctxt);
+    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+	ob_refcnt_sval->get_type (), actual_refcnt);
+
+    region->dump_to_pp (&pp, true);
+    pp_string (&pp, " — ob_refcnt: ");
+    ob_refcnt_sval->dump_to_pp (&pp, true);
+    pp_string (&pp, " actual refcnt: ");
+    actual_refcnt_sval->dump_to_pp (&pp, true);
+    pp_newline (&pp);
+  }
+  pp_string (&pp, "~~~~~~~~\n");
+  pp_flush (&pp);
+}
+
+class kf_analyzer_cpython_dump_refcounts : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return cd.num_args () == 0;
+  }
+  void impl_call_pre (const call_details &cd) const final override
+  {
+    region_model_context *ctxt = cd.get_ctxt ();
+    if (!ctxt)
+      return;
+    region_model *model = cd.get_model ();
+    auto region_to_refcnt = hash_map<const region *, int> ();
+    count_all_references(model, region_to_refcnt);
+    dump_refcnt_info(region_to_refcnt, model, ctxt);
+  }
+};
+
 /* Some concessions were made to
 simplify the analysis process when comparing kf_PyList_Append with the
 real implementation. In particular, PyList_Append performs some
@@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
   iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
   iface->register_known_function ("PyLong_FromLong",
                                   make_unique<kf_PyLong_FromLong> ());
+
+  iface->register_known_function (
+      "__analyzer_cpython_dump_refcounts",
+      make_unique<kf_analyzer_cpython_dump_refcounts> ());
 }
 } // namespace ana
 
@@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
   const char *plugin_name = plugin_info->base_name;
   if (0)
     inform (input_location, "got here; %qs", plugin_name);
-  ana::register_finish_translation_unit_callback (&stash_named_types);
-  ana::register_finish_translation_unit_callback (&stash_global_vars);
+  register_finish_translation_unit_callback (&stash_named_types);
+  register_finish_translation_unit_callback (&stash_global_vars);
+  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
   register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
                      ana::cpython_analyzer_init_cb,
                      NULL); /* void *user_data */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Update on CPython Extension Module -fanalyzer plugin development
  2023-08-25 12:50                                                         ` Eric Feng
@ 2023-08-25 19:50                                                           ` David Malcolm
  2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-25 19:50 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Fri, 2023-08-25 at 08:50 -0400, Eric Feng wrote:
> Hi Dave,
> 
> Please find an updated WIP patch on reference count checking below. Some
> parts aren't properly formatted yet; I apologize for that.
> 
> Since the last WIP patch, the major updates include:
> - Updated certain areas of the core analyzer to support custom stmt_finder.
> - A significant revamp of the reference count checking logic.
> - __analyzer_cpython_dump_refcounts: This dumps reference count related information.

Thanks for the patch.

Given the scope, this is close to being ready for trunk; various
comments inline below...


> 
> Here's an updated look at the current WIP diagnostic we issue:
> rc3.c:25:10: warning: Expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
>    25 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   16 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   25 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |
> 

Looks good.

As mentioned in our chat, ideally there would be some kind of
information to the user as to what the reference actually are.

Perhaps it could be a series of notes like this:

note: there is 1 pointer to "*item"
note: (1) '*item' is pointed to by '((PyListObject *)list)->ob_item[0]'

Or, to be even more fancy, we could add events to the execution path
whenever the number of pointers or ob_refcnt changes, giving something
like:

(1) when ‘PyLong_FromLong’ succeeds
(2) '(*item).ob_refcnt' is now 1
(3) when ‘PyList_New’ succeeds
(4) '*item'  is now pointed to by '((PyListObject *)list)->ob_item[0]';
pointer count is 2, ob_refcnt is 1
(5) when 'create_py_object' returns, '(*item).ob_refcnt' is 1

...or somesuch (I'm brainstorming here, and I expect that the above
might not be implementable, due to local variables and SSA names
confusing things).

Plus there's the question of whether the return value "owns" a
reference. 

But all of that can be deferred to follow-up work.


> The reference count checking logic for v1 should be almost complete.
> Currently, it supports situations where the returned object is newly created.
> It doesn't yet support the other case (i.e., incremented by 1 from
> what it was previously, if not newly created). 
> 
> This week, I've focused primarily on the reference count checking logic. I plan
> to shift my focus to refining the diagnostic next. As seen in the placeholder
> diagnostic message above, I believe we should at least inform the user about the
> variable name associated with the region that has an unexpected reference count
> issue (in this case, 'item'). Initially, I suspect the issue might be that:
> 
> tree reg_tree = model->get_representative_tree (curr_region);
> 
> returns NULL since curr_region is heap allocated and thus the path_var returned would be:
> 
> path_var (NULL_TREE, 0);
> 
> Which means that:
> 
> refcnt_stmt_finder finder (*eg, reg_tree);
> 
> always receives a NULL_TREE, causing it to always default to the
> not_found case. A workaround might be necessary, but I haven't delved too deeply into this yet,
> so my suspicion could be off.
> 

You could try looking for a representative tree for the pointer, rather
than the region.

I ran into similar issues with leak detection: we report a leak when
nothing is pointing at the region, but if nothing is pointing at the
region, then there's nothing to label/describe the region with.  A
workaround I've used is to look at the old model immediately before the
state change, rather than the new one.

So for example, use the region_model * for immediately before popping
the frame (if it's available; you might need to make a copy?), and do
something like:

  tree expr = old_model->get_representative_tree (curr_region);



>  Additionally, I think it would be helpful to show users what the
> ob_refcnt looks like in each event as well.
> 

You could try make a custom_event subclass for such ob_refcnt messages,
but adding them could be fiddly.  Perhaps
diagnostic_manager::add_events_for_eedge could support some kind of
plugin hook whenever we add a statement_event, which you could use to
add your extra events whenever the ob_refcnt of the object of interest
differs between the old and the new state? (though currently we only
emit a statement_event for the first stmt in an enode, and that
function is already quite messy, alas)

But as I said before, that's definitely something to postpone to a
followup.

[...snip...]

> 
> @@ -620,8 +639,15 @@ class region_model_context
>  {
>   public:
>    /* Hook for clients to store pending diagnostics.
> -     Return true if the diagnostic was stored, or false if it was deleted.  */
> -  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
> +     Return true if the diagnostic was stored, or false if it was deleted.
> +     Optionally provide a custom stmt_finder.  */
> +    virtual bool warn(std::unique_ptr<pending_diagnostic> d) {
> +        return warn(std::move(d), nullptr);
> +    }
> +    
> +    virtual bool warn(std::unique_ptr<pending_diagnostic> d, const stmt_finder *custom_finder) {
> +        return false;
> +    }

I found the pair of vfuncs here confusing.

Please can we have a single pure virtual "warn" vfunc, potentially with
an optional arg, so:

  virtual bool warn (std::unique_ptr<pending_diagnostic> d, 
                     const stmt_finder *custom_finder = nullptr)  = 0;

and add the new argument to the various implementations that override
"warn".

[...snip...]

>  /* A subclass of region_model_context for determining if operations fail
> @@ -912,6 +941,11 @@ class region_model_context_decorator : public region_model_context
>      return m_inner->get_stmt ();
>    }
>  
> +  const exploded_graph *get_eg () const override
> +  {
> +    return m_inner->get_eg ();

I recently changed region_model_context_decorator so that m_inner can
be null (see 1e7b0a5d7a45dc5932992d36e1375582d61269e4), so please can
you change this to:

const exploded_graph *get_eg () const override
{
  if (m_inner)
    return m_inner->get_eg ();
  else
    return nullptr;
}

[...snip...]

> 
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..a3274ced4a8 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...snip...]

> +class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
> +{

[...snip...]

> +
> +  bool
> +  emit (rich_location *rich_loc, logger *) final override
> +  {
> +    diagnostic_metadata m;
> +    bool warned;
> +    // just assuming constants for now
> +    auto actual_refcnt
> +       = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    warned = warning_meta (
> +       rich_loc, m, get_controlling_option (),
> +       "Expected <variable name belonging to m_base_region> to have "
> +       "reference count: %qE but ob_refcnt field is: %qE",
> +       actual_refcnt, ob_refcnt);

This is OK for now.

Diagnostic messages should start with a lower-case letter.

Ideally we'd handle both symbolic and constant values.  We'd quote the
values if they're symbolic e.g.
  reference count: 2
but not quote them for constants, e.g.
  reference count: 'x + 1'

However doing so tends to lead to a combinatorial explosion here in the
possible messages, so maybe this could be split up as:

warning: reference count mismatch for PyObject * 'item'
note: 2 pointers found in memory to '*item'...
note: ...but '(*item).ob_refcnt' is 1

(where here both values are concrete)

But again, there's no need to fix this in this version of the code.

[...snip...]

> +/* Compare ob_refcnt field vs the actual reference count of a region */
> +void
> +check_refcnt (const region_model *model, region_model_context *ctxt,
> +             const hash_map<const ana::region *,
> +                            int>::iterator::reference_pair region_refcnt)

This function can be made "static" (as indeed can almost all of the
functions in the plugin, I expect, apart from "plugin_init"); these
aren't APIs being exposed to an external consumer.

> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  const auto &curr_region = region_refcnt.first;
> +  const auto &actual_refcnt = region_refcnt.second;
> +  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +      ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +  if (ob_refcnt_sval != actual_refcnt_sval)
> +  {
> +    // todo: fix this (always null)
> +    tree reg_tree = model->get_representative_tree (curr_region);
> +
> +    const auto &eg = ctxt->get_eg ();
> +    refcnt_stmt_finder finder (*eg, reg_tree);
> +    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> +                                           actual_refcnt_sval, reg_tree);
> +    if (pd && eg)
> +    ctxt->warn (std::move (pd), &finder);

Note that IIRC the region_model code gets rerun with a NULL
region_model_context during feasibility checking, so we need to either
capture the invariant that ctxt is non-null here (with a gcc_assert),
and/or bail out early somewhere in all this checking for the null ctxt
case.

> +  }
> +}
> +
> +void
> +check_refcnts (const region_model *model, const svalue *retval,
> +           region_model_context *ctxt,
> +           hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    check_refcnt(model, ctxt, region_refcnt);
> +  }
> +}
> +
> +/* Validates the reference count of all Python objects. */
> +void
> +pyobj_refcnt_checker (const region_model *model, const svalue *retval,
> +                   region_model_context *ctxt)
> +{
> +  if (!ctxt)
> +  return;

Aha: I see you have an early bailout here for the null ctxt, good.

> +
> +  auto region_to_refcnt = hash_map<const region *, int> ();
> +  auto seen_regions = hash_set<const region *> ();
> +
> +  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
> +  check_refcnts (model, retval, ctxt, region_to_refcnt);
> +}
> +

[...snip...]

> +
> +void
> +dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
> +                 const region_model *model, region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  pretty_printer pp;
> +  pp_format_decoder (&pp) = default_tree_printer;
> +  pp_show_color (&pp) = pp_show_color (global_dc->printer);
> +  pp.buffer->stream = stderr;
> +
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {

FWIW, the iteration order within a hash_map will vary from run to run
(e.g. precise pointer values are affected by address space layout
randomization), and so the ordering within the output to stderr will
vary from run to run.

It's OK for now, but it might be worth first building a vec, sorting it
in some well-defined way, and then iterating over that for the
printing; see e.g. b0702ac5588333e27d7ec43d21d704521f7a05c6.

> +    auto region = region_refcnt.first;
> +    auto actual_refcnt = region_refcnt.second;
> +    const svalue *ob_refcnt_sval
> +       = retrieve_ob_refcnt_sval (region, model, ctxt);
> +    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +       ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +    region->dump_to_pp (&pp, true);
> +    pp_string (&pp, " — ob_refcnt: ");
> +    ob_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_string (&pp, " actual refcnt: ");
> +    actual_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_newline (&pp);
> +  }
> +  pp_string (&pp, "~~~~~~~~\n");
> +  pp_flush (&pp);
> +}
> +
> 
[...snip...]

Does the testcase pass or fail with this code?  Do you get a new
warning for it?

How about some trivial new testcases e.g.:

void test_dropping_PyLong (long val)
{
   PyObject* p = PyLong_FromLong(val);
   /* Do nothing with 'p'; should be reported as a leak.  */
}

PyObject *test_valid_PyLong (long val)
{
   PyObject* p = PyLong_FromLong(val);
   return p; // shouldn't report here
}

PyObject *test_stray_incref_PyLong (long val)
{
   PyObject* p = PyLong_FromLong(val);
   Py_INCREF (p);  // bogus
   return p; // should report that ob_refcnt is 2 when it should be 1
}

...or somesuch.

The patch is almost ready for trunk as-is, but:
- needs a ChangeLog
- please fix the dual "warn" vfuncs thing I mentioned above
- check it compiles!
- make sure the testsuite doesn't emit any new FAILs (e.g. by adding a
dg-warning with a suitable short regex, e.g.
  /* { dg-warning "reference count" ) */

Hope this is constructive
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-25 19:50                                                           ` David Malcolm
@ 2023-08-29  4:31                                                             ` Eric Feng
  2023-08-29  4:35                                                               ` Eric Feng
                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Eric Feng @ 2023-08-29  4:31 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Hi Dave,

Thanks for the feedback. I've addressed the changes you mentioned in
addition to adding more test cases. I've also taken this chance to 
split the test files according to known function subclasses, as you previously 
suggested. Since there were also some changes to the core analyzer, I've done a
bootstrap and regtested the patch as well. Does it look OK for trunk?

Best,
Eric

---

This patch introduces initial support for reference count checking of
PyObjects in relation to the Python/C API for the CPython plugin.
Additionally, the core analyzer underwent several modifications to
accommodate this feature. These include:

- Introducing support for callbacks at the end of
  region_model::pop_frame. This is our current point of validation for
  the reference count of PyObjects.
- An added optional custom stmt_finder parameter to
  region_model_context::warn. This aids in emitting a diagnostic
  concerning the reference count, especially when the stmt_finder is
  NULL, which is currently the case during region_model::pop_frame.

The current diagnostic we emit relating to the reference count
appears as follows:

rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

This is a WIP in several ways:
- Enhancing the diagnostic for better clarity. For instance, users should
  expect to see the variable name 'item' instead of the placeholder in the
  diagnostic above.
- Currently, functions returning PyObject * are assumed to always produce
  a new reference.
- The validation of reference count is only for PyObjects created within a
  function body. Verifying reference counts for PyObjects passed as
  parameters is not supported in this patch.

gcc/analyzer/ChangeLog:
  PR analyzer/107646
	* engine.cc (impl_region_model_context::warn): New optional parameter.
	* exploded-graph.h (class impl_region_model_context): Likewise.
	* region-model.cc (region_model::pop_frame): New callback feature for
  * region_model::pop_frame.
	* region-model.h (struct append_regions_cb_data): Likewise.
	(class region_model): Likewise.
	(class region_model_context): New optional parameter.
	(class region_model_context_decorator): Likewise.

gcc/testsuite/ChangeLog:
  PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
  * checking for PyObjects.
	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
  * added more tests).
	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
  * more tests).
	* gcc.dg/plugin/plugin.exp: New tests.
	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

---
 gcc/analyzer/engine.cc                        |   8 +-
 gcc/analyzer/exploded-graph.h                 |   4 +-
 gcc/analyzer/region-model.cc                  |   3 +
 gcc/analyzer/region-model.h                   |  48 ++-
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
 ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
 .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
 .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
 ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
 .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
 11 files changed, 612 insertions(+), 42 deletions(-)
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
 rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index a1908cdb364..736a41ecdaf 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
 }
 
 bool
-impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
+impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
+				 const stmt_finder *custom_finder)
 {
   LOG_FUNC (get_logger ());
-  if (m_stmt == NULL && m_stmt_finder == NULL)
+  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
+  if (m_stmt == NULL && curr_stmt_finder == NULL)
     {
       if (get_logger ())
 	get_logger ()->log ("rejecting diagnostic: no stmt");
@@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
       bool terminate_path = d->terminate_path_p ();
       if (m_eg->get_diagnostic_manager ().add_diagnostic
 	  (m_enode_for_diag, m_enode_for_diag->get_supernode (),
-	   m_stmt, m_stmt_finder, std::move (d)))
+	   m_stmt, curr_stmt_finder, std::move (d)))
 	{
 	  if (m_path_ctxt
 	      && terminate_path
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 5a7ab645bfe..6e9a5ef58c7 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
 			     uncertainty_t *uncertainty,
 			     logger *logger = NULL);
 
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder = NULL) final override;
   void add_note (std::unique_ptr<pending_note> pn) final override;
   void add_event (std::unique_ptr<checker_event> event) final override;
   void on_svalue_leak (const svalue *) override;
@@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
 			 std::unique_ptr<sm_context> *out_sm_context) override;
 
   const gimple *get_stmt () const override { return m_stmt; }
+  const exploded_graph *get_eg () const override { return m_eg; }
 
   exploded_graph *m_eg;
   log_user m_logger;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 4f31a6dcf0f..eb4f976b83a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 
 namespace ana {
 
+auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
+
 /* Dump T to PP in language-independent form, for debugging/logging/dumping
    purposes.  */
 
@@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
+  notify_on_pop_frame (this, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 10b2a59e787..440ea6d828d 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -236,6 +236,10 @@ public:
 
 struct append_regions_cb_data;
 
+typedef void (*pop_frame_callback) (const region_model *model,
+				    const svalue *retval,
+				    region_model_context *ctxt);
+
 /* A region_model encapsulates a representation of the state of memory, with
    a tree of regions, along with their associated values.
    The representation is graph-like because values can be pointers to
@@ -532,6 +536,20 @@ class region_model
   get_builtin_kf (const gcall *call,
 		  region_model_context *ctxt = NULL) const;
 
+  static void
+  register_pop_frame_callback (const pop_frame_callback &callback)
+  {
+    pop_frame_callbacks.safe_push (callback);
+  }
+
+  static void
+  notify_on_pop_frame (const region_model *model, const svalue *retval,
+		       region_model_context *ctxt)
+  {
+    for (auto &callback : pop_frame_callbacks)
+	callback (model, retval, ctxt);
+  }
+
 private:
   const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
   const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
@@ -621,6 +639,7 @@ private:
 						tree callee_fndecl,
 						region_model_context *ctxt) const;
 
+  static auto_vec<pop_frame_callback> pop_frame_callbacks;
   /* Storing this here to avoid passing it around everywhere.  */
   region_model_manager *const m_mgr;
 
@@ -649,8 +668,10 @@ class region_model_context
 {
  public:
   /* Hook for clients to store pending diagnostics.
-     Return true if the diagnostic was stored, or false if it was deleted.  */
-  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
+     Return true if the diagnostic was stored, or false if it was deleted.
+     Optionally provide a custom stmt_finder.  */
+  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
+		     const stmt_finder *custom_finder = NULL) = 0;
 
   /* Hook for clients to add a note to the last previously stored
      pending diagnostic.  */
@@ -757,6 +778,8 @@ class region_model_context
 
   /* Get the current statement, if any.  */
   virtual const gimple *get_stmt () const = 0;
+
+  virtual const exploded_graph *get_eg () const = 0;
 };
 
 /* A "do nothing" subclass of region_model_context.  */
@@ -764,7 +787,8 @@ class region_model_context
 class noop_region_model_context : public region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override { return false; }
   void add_note (std::unique_ptr<pending_note>) override;
   void add_event (std::unique_ptr<checker_event>) override;
   void on_svalue_leak (const svalue *) override {}
@@ -812,6 +836,7 @@ public:
   }
 
   const gimple *get_stmt () const override { return NULL; }
+  const exploded_graph *get_eg () const override { return NULL; }
 };
 
 /* A subclass of region_model_context for determining if operations fail
@@ -840,7 +865,8 @@ private:
 class region_model_context_decorator : public region_model_context
 {
  public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder)
   {
     if (m_inner)
       return m_inner->warn (std::move (d));
@@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
       return nullptr;
   }
 
+  const exploded_graph *get_eg () const override
+  {
+    if (m_inner)
+	return m_inner->get_eg ();
+    else
+	return nullptr;
+  }
+
 protected:
   region_model_context_decorator (region_model_context *inner)
   : m_inner (inner)
@@ -993,7 +1027,8 @@ protected:
 class annotating_context : public region_model_context_decorator
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) override
   {
     if (m_inner)
       if (m_inner->warn (std::move (d)))
@@ -1158,7 +1193,8 @@ using namespace ::selftest;
 class test_region_model_context : public noop_region_model_context
 {
 public:
-  bool warn (std::unique_ptr<pending_diagnostic> d) final override
+  bool warn (std::unique_ptr<pending_diagnostic> d,
+	     const stmt_finder *custom_finder) final override
   {
     m_diagnostics.safe_push (d.release ());
     return true;
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7cd72e8a886..b2caed8fc1b 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -44,6 +44,7 @@
 #include "analyzer/region-model.h"
 #include "analyzer/call-details.h"
 #include "analyzer/call-info.h"
+#include "analyzer/exploded-graph.h"
 #include "make-unique.h"
 
 int plugin_is_GPL_compatible;
@@ -191,6 +192,372 @@ public:
   }
 };
 
+/* This is just a copy of leak_stmt_finder for now (subject to change if
+ * necssary)  */
+
+class refcnt_stmt_finder : public stmt_finder
+{
+public:
+  refcnt_stmt_finder (const exploded_graph &eg, tree var)
+      : m_eg (eg), m_var (var)
+  {
+  }
+
+  std::unique_ptr<stmt_finder>
+  clone () const final override
+  {
+    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
+  }
+
+  const gimple *
+  find_stmt (const exploded_path &epath) final override
+  {
+    logger *const logger = m_eg.get_logger ();
+    LOG_FUNC (logger);
+
+    if (m_var && TREE_CODE (m_var) == SSA_NAME)
+      {
+	/* Locate the final write to this SSA name in the path.  */
+	const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
+
+	int idx_of_def_stmt;
+	bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
+	if (!found)
+	  goto not_found;
+
+	/* What was the next write to the underlying var
+	   after the SSA name was set? (if any).  */
+
+	for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
+	     ++idx)
+	  {
+	    const exploded_edge *eedge = epath.m_edges[idx];
+	    if (logger)
+		    logger->log ("eedge[%i]: EN %i -> EN %i", idx,
+				 eedge->m_src->m_index,
+				 eedge->m_dest->m_index);
+	    const exploded_node *dst_node = eedge->m_dest;
+	    const program_point &dst_point = dst_node->get_point ();
+	    const gimple *stmt = dst_point.get_stmt ();
+	    if (!stmt)
+		    continue;
+	    if (const gassign *assign = dyn_cast<const gassign *> (stmt))
+		    {
+			    tree lhs = gimple_assign_lhs (assign);
+			    if (TREE_CODE (lhs) == SSA_NAME
+				&& SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
+				    return assign;
+		    }
+	  }
+      }
+
+  not_found:
+
+    /* Look backwards for the first statement with a location.  */
+    int i;
+    const exploded_edge *eedge;
+    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
+    {
+      if (logger)
+	logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
+		     eedge->m_dest->m_index);
+      const exploded_node *dst_node = eedge->m_dest;
+      const program_point &dst_point = dst_node->get_point ();
+      const gimple *stmt = dst_point.get_stmt ();
+      if (stmt)
+	if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
+	  return stmt;
+    }
+
+    gcc_unreachable ();
+    return NULL;
+  }
+
+private:
+  const exploded_graph &m_eg;
+  tree m_var;
+};
+
+class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
+{
+public:
+  refcnt_mismatch (const region *base_region,
+				const svalue *ob_refcnt,
+				const svalue *actual_refcnt,
+        tree reg_tree)
+      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
+	m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "refcnt_mismatch";
+  }
+
+  bool
+  operator== (const refcnt_mismatch &other) const
+  {
+    return (m_base_region == other.m_base_region
+	    && m_ob_refcnt == other.m_ob_refcnt
+	    && m_actual_refcnt == other.m_actual_refcnt);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return 0;
+  }
+
+  bool
+  emit (rich_location *rich_loc, logger *) final override
+  {
+    diagnostic_metadata m;
+    bool warned;
+    // just assuming constants for now
+    auto actual_refcnt
+	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
+    warned = warning_meta (
+	rich_loc, m, get_controlling_option (),
+	"expected <variable name belonging to m_base_region> to have "
+	"reference count: %qE but ob_refcnt field is: %qE",
+	actual_refcnt, ob_refcnt);
+
+    // location_t loc = rich_loc->get_loc ();
+    // foo (loc);
+    return warned;
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_base_region)
+      interest->add_region_creation (m_base_region);
+  }
+
+private:
+
+  void foo(location_t loc) const 
+  {
+    inform(loc, "something is up right here");
+  }
+  const region *m_base_region;
+  const svalue *m_ob_refcnt;
+  const svalue *m_actual_refcnt;
+  tree m_reg_tree;
+};
+
+/* Retrieves the svalue associated with the ob_refcnt field of the base region.
+ */
+static const svalue *
+retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
+			 region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
+  const region *ob_refcnt_region
+      = mgr->get_field_region (base_reg, ob_refcnt_tree);
+  const svalue *ob_refcnt_sval
+      = model->get_store_value (ob_refcnt_region, ctxt);
+  return ob_refcnt_sval;
+}
+
+static void
+increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
+{
+  bool existed;
+  auto &refcnt = map.get_or_insert (key, &existed);
+  refcnt = existed ? refcnt + 1 : 1;
+}
+
+
+/* Recursively fills in region_to_refcnt with the references owned by
+   pyobj_ptr_sval.  */
+static void
+count_expected_pyobj_references (const region_model *model,
+			   hash_map<const region *, int> &region_to_refcnt,
+			   const svalue *pyobj_ptr_sval,
+			   hash_set<const region *> &seen)
+{
+  if (!pyobj_ptr_sval)
+    return;
+
+  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
+  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
+  if (!pyobj_region_sval && !pyobj_initial_sval)
+    return;
+
+  // todo: support initial sval (e.g passed in as parameter)
+  if (pyobj_initial_sval)
+    {
+  //     increment_region_refcnt (region_to_refcnt,
+	// 		       pyobj_initial_sval->get_region ());
+      return;
+    }
+
+  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  if (!pyobj_region || seen.contains (pyobj_region))
+    return;
+
+  seen.add (pyobj_region);
+
+  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
+    increment_region_refcnt (region_to_refcnt, pyobj_region);
+
+  const auto *curr_store = model->get_store ();
+  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
+  if (!retval_cluster)
+    return;
+
+  const auto &retval_binding_map = retval_cluster->get_map ();
+
+  for (const auto &binding : retval_binding_map)
+    {
+      const svalue *binding_sval = binding.second;
+      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
+      const region *pointee = unwrapped_sval->maybe_get_region ();
+
+      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
+	count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
+					 seen);
+    }
+}
+
+/* Compare ob_refcnt field vs the actual reference count of a region */
+static void
+check_refcnt (const region_model *model, region_model_context *ctxt,
+	      const hash_map<const ana::region *,
+			     int>::iterator::reference_pair region_refcnt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  const auto &curr_region = region_refcnt.first;
+  const auto &actual_refcnt = region_refcnt.second;
+  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
+  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+      ob_refcnt_sval->get_type (), actual_refcnt);
+
+  if (ob_refcnt_sval != actual_refcnt_sval)
+  {
+    // todo: fix this
+    tree reg_tree = model->get_representative_tree (curr_region);
+
+    const auto &eg = ctxt->get_eg ();
+    refcnt_stmt_finder finder (*eg, reg_tree);
+    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
+					    actual_refcnt_sval, reg_tree);
+    if (pd && eg)
+    ctxt->warn (std::move (pd), &finder);
+  }
+}
+
+static void
+check_refcnts (const region_model *model, const svalue *retval,
+	    region_model_context *ctxt,
+	    hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    check_refcnt(model, ctxt, region_refcnt);
+  }
+}
+
+/* Validates the reference count of all Python objects. */
+void
+pyobj_refcnt_checker (const region_model *model, const svalue *retval,
+		    region_model_context *ctxt)
+{
+  if (!ctxt)
+  return;
+
+  auto region_to_refcnt = hash_map<const region *, int> ();
+  auto seen_regions = hash_set<const region *> ();
+
+  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
+  check_refcnts (model, retval, ctxt, region_to_refcnt);
+}
+
+/* Counts the actual pyobject references from all clusters in the model's
+ * store. */
+static void
+count_all_references (const region_model *model,
+		      hash_map<const region *, int> &region_to_refcnt)
+{
+  for (const auto &cluster : *model->get_store ())
+  {
+    auto curr_region = cluster.first;
+    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
+    continue;
+
+    increment_region_refcnt (region_to_refcnt, curr_region);
+
+    auto binding_cluster = cluster.second;
+    for (const auto &binding : binding_cluster->get_map ())
+    {
+	  const svalue *binding_sval = binding.second;
+
+	  const svalue *unwrapped_sval
+	      = binding_sval->unwrap_any_unmergeable ();
+	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
+	  // continue;
+
+	  const region *pointee = unwrapped_sval->maybe_get_region ();
+	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
+	    continue;
+
+	  increment_region_refcnt (region_to_refcnt, pointee);
+    }
+  }
+}
+
+static void
+dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
+		  const region_model *model, region_model_context *ctxt)
+{
+  region_model_manager *mgr = model->get_manager ();
+  pretty_printer pp;
+  pp_format_decoder (&pp) = default_tree_printer;
+  pp_show_color (&pp) = pp_show_color (global_dc->printer);
+  pp.buffer->stream = stderr;
+
+  for (const auto &region_refcnt : region_to_refcnt)
+  {
+    auto region = region_refcnt.first;
+    auto actual_refcnt = region_refcnt.second;
+    const svalue *ob_refcnt_sval
+	= retrieve_ob_refcnt_sval (region, model, ctxt);
+    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
+	ob_refcnt_sval->get_type (), actual_refcnt);
+
+    region->dump_to_pp (&pp, true);
+    pp_string (&pp, " — ob_refcnt: ");
+    ob_refcnt_sval->dump_to_pp (&pp, true);
+    pp_string (&pp, " actual refcnt: ");
+    actual_refcnt_sval->dump_to_pp (&pp, true);
+    pp_newline (&pp);
+  }
+  pp_string (&pp, "~~~~~~~~\n");
+  pp_flush (&pp);
+}
+
+class kf_analyzer_cpython_dump_refcounts : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+    return cd.num_args () == 0;
+  }
+  void impl_call_pre (const call_details &cd) const final override
+  {
+    region_model_context *ctxt = cd.get_ctxt ();
+    if (!ctxt)
+      return;
+    region_model *model = cd.get_model ();
+    auto region_to_refcnt = hash_map<const region *, int> ();
+    count_all_references(model, region_to_refcnt);
+    dump_refcnt_info(region_to_refcnt, model, ctxt);
+  }
+};
+
 /* Some concessions were made to
 simplify the analysis process when comparing kf_PyList_Append with the
 real implementation. In particular, PyList_Append performs some
@@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
   iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
   iface->register_known_function ("PyLong_FromLong",
                                   make_unique<kf_PyLong_FromLong> ());
+
+  iface->register_known_function (
+      "__analyzer_cpython_dump_refcounts",
+      make_unique<kf_analyzer_cpython_dump_refcounts> ());
 }
 } // namespace ana
 
@@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
   const char *plugin_name = plugin_info->base_name;
   if (0)
     inform (input_location, "got here; %qs", plugin_name);
-  ana::register_finish_translation_unit_callback (&stash_named_types);
-  ana::register_finish_translation_unit_callback (&stash_global_vars);
+  register_finish_translation_unit_callback (&stash_named_types);
+  register_finish_translation_unit_callback (&stash_global_vars);
+  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
   register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
                      ana::cpython_analyzer_init_cb,
                      NULL); /* void *user_data */
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
similarity index 64%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
index 19b5c17428a..9912f9105d4 100644
--- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
@@ -8,34 +8,6 @@
 #include <Python.h>
 #include "../analyzer/analyzer-decls.h"
 
-PyObject *
-test_PyList_New (Py_ssize_t len)
-{
-  PyObject *obj = PyList_New (len);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
-PyObject *
-test_PyLong_New (long n)
-{
-  PyObject *obj = PyLong_FromLong (n);
-  if (obj)
-    {
-     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
-     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
-    }
-  else
-    __analyzer_dump_path (); /* { dg-message "path" } */
-  return obj;
-}
-
 PyObject *
 test_PyListAppend (long n)
 {
@@ -43,6 +15,7 @@ test_PyListAppend (long n)
   PyObject *list = PyList_New (0);
   PyList_Append(list, item);
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 PyObject *
@@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
   else
     __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
   return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
 }
 
 
@@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
 {
   PyList_Append (list, item);
   return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
 }
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
new file mode 100644
index 00000000000..492d4f7d58d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyList_New (Py_ssize_t len)
+{
+  PyObject *obj = PyList_New (len);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyList_New_2 ()
+{
+  PyObject *obj = PyList_New (0);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyList ()
+{
+  PyObject *p = PyList_New (2);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
new file mode 100644
index 00000000000..97b29849302
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyLong_New (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+  if (obj)
+    {
+     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
+    }
+  else
+    __analyzer_dump_path (); /* { dg-message "path" } */
+  return obj;
+}
+
+void
+test_PyLong_New_2 (long n)
+{
+  PyObject *obj = PyLong_FromLong (n);
+} /* { dg-warning "leak of 'obj'" } */
+
+PyObject *test_stray_incref_PyLong (long val)
+{
+  PyObject *p = PyLong_FromLong (val);
+  if (p)
+    Py_INCREF (p);
+  return p;
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
similarity index 100%
rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
new file mode 100644
index 00000000000..9912f9105d4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_PyListAppend (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, item);
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+PyObject *
+test_PyListAppend_2 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  if (!item)
+	return NULL;
+
+  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  PyObject *list = PyList_New (n);
+  if (!list)
+  {
+	Py_DECREF(item);
+	return NULL;
+  }
+
+  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+
+  if (PyList_Append (list, item) < 0)
+    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
+  else
+    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
+  return list; /* { dg-warning "leak of 'item'" } */
+  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
+}
+
+
+PyObject *
+test_PyListAppend_3 (PyObject *item, PyObject *list)
+{
+  PyList_Append (list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_4 (long n)
+{
+  PyObject *item = PyLong_FromLong (n);
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_5 ()
+{
+  PyObject *list = PyList_New (0);
+  PyList_Append(list, NULL);
+  return list;
+}
+
+PyObject *
+test_PyListAppend_6 ()
+{
+  PyObject *item = NULL;
+  PyObject *list = NULL;
+  PyList_Append(list, item);
+  return list;
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index e1ed2d2589e..cbef6da8d86 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -161,8 +161,9 @@ set plugin_test_list [list \
 	  taint-CVE-2011-0521-6.c \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
-	  cpython-plugin-test-1.c \
-	  cpython-plugin-test-2.c } \
+	  cpython-plugin-test-PyList_Append.c \
+	  cpython-plugin-test-PyList_New.c \
+	  cpython-plugin-test-PyLong_FromLong.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
@ 2023-08-29  4:35                                                               ` Eric Feng
  2023-08-29 17:28                                                                 ` Eric Feng
  2023-08-29 21:08                                                               ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
  2023-09-01  2:49                                                               ` Hans-Peter Nilsson
  2 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-29  4:35 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches

On Tue, Aug 29, 2023 at 12:32 AM Eric Feng <ef2648@columbia.edu> wrote:
>
> Hi Dave,
>
> Thanks for the feedback. I've addressed the changes you mentioned in
> addition to adding more test cases. I've also taken this chance to
> split the test files according to known function subclasses, as you previously
> suggested. Since there were also some changes to the core analyzer, I've done a
> bootstrap and regtested the patch as well. Does it look OK for trunk?
Apologies — I forgot to mention that bootstrap and regtest was done on
aarch64-unknown-linux-gnu.
>
> Best,
> Eric
>
> ---
>
> This patch introduces initial support for reference count checking of
> PyObjects in relation to the Python/C API for the CPython plugin.
> Additionally, the core analyzer underwent several modifications to
> accommodate this feature. These include:
>
> - Introducing support for callbacks at the end of
>   region_model::pop_frame. This is our current point of validation for
>   the reference count of PyObjects.
> - An added optional custom stmt_finder parameter to
>   region_model_context::warn. This aids in emitting a diagnostic
>   concerning the reference count, especially when the stmt_finder is
>   NULL, which is currently the case during region_model::pop_frame.
>
> The current diagnostic we emit relating to the reference count
> appears as follows:
>
> rc3.c:23:10: warning: expected <variable name belonging to m_base_region> to have reference count: ‘1’ but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |
>
> This is a WIP in several ways:
> - Enhancing the diagnostic for better clarity. For instance, users should
>   expect to see the variable name 'item' instead of the placeholder in the
>   diagnostic above.
> - Currently, functions returning PyObject * are assumed to always produce
>   a new reference.
> - The validation of reference count is only for PyObjects created within a
>   function body. Verifying reference counts for PyObjects passed as
>   parameters is not supported in this patch.
>
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
>         * engine.cc (impl_region_model_context::warn): New optional parameter.
>         * exploded-graph.h (class impl_region_model_context): Likewise.
>         * region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
>         * region-model.h (struct append_regions_cb_data): Likewise.
>         (class region_model): Likewise.
>         (class region_model_context): New optional parameter.
>         (class region_model_context_decorator): Likewise.
>
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
>         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
>         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
>         * gcc.dg/plugin/plugin.exp: New tests.
>         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
>         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.
>
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
>
> ---
>  gcc/analyzer/engine.cc                        |   8 +-
>  gcc/analyzer/exploded-graph.h                 |   4 +-
>  gcc/analyzer/region-model.cc                  |   3 +
>  gcc/analyzer/region-model.h                   |  48 ++-
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-
>  ....c => cpython-plugin-test-PyList_Append.c} |  56 +--
>  .../plugin/cpython-plugin-test-PyList_New.c   |  38 ++
>  .../cpython-plugin-test-PyLong_FromLong.c     |  38 ++
>  ...st-1.c => cpython-plugin-test-no-plugin.c} |   0
>  .../cpython-plugin-test-refcnt-checking.c     |  78 ++++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        |   5 +-
>  11 files changed, 612 insertions(+), 42 deletions(-)
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-2.c => cpython-plugin-test-PyList_Append.c} (64%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
>  rename gcc/testsuite/gcc.dg/plugin/{cpython-plugin-test-1.c => cpython-plugin-test-no-plugin.c} (100%)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
>
> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index a1908cdb364..736a41ecdaf 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -115,10 +115,12 @@ impl_region_model_context (program_state *state,
>  }
>
>  bool
> -impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
> +impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d,
> +                                const stmt_finder *custom_finder)
>  {
>    LOG_FUNC (get_logger ());
> -  if (m_stmt == NULL && m_stmt_finder == NULL)
> +  auto curr_stmt_finder = custom_finder ? custom_finder : m_stmt_finder;
> +  if (m_stmt == NULL && curr_stmt_finder == NULL)
>      {
>        if (get_logger ())
>         get_logger ()->log ("rejecting diagnostic: no stmt");
> @@ -129,7 +131,7 @@ impl_region_model_context::warn (std::unique_ptr<pending_diagnostic> d)
>        bool terminate_path = d->terminate_path_p ();
>        if (m_eg->get_diagnostic_manager ().add_diagnostic
>           (m_enode_for_diag, m_enode_for_diag->get_supernode (),
> -          m_stmt, m_stmt_finder, std::move (d)))
> +          m_stmt, curr_stmt_finder, std::move (d)))
>         {
>           if (m_path_ctxt
>               && terminate_path
> diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
> index 5a7ab645bfe..6e9a5ef58c7 100644
> --- a/gcc/analyzer/exploded-graph.h
> +++ b/gcc/analyzer/exploded-graph.h
> @@ -56,7 +56,8 @@ class impl_region_model_context : public region_model_context
>                              uncertainty_t *uncertainty,
>                              logger *logger = NULL);
>
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override;
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder = NULL) final override;
>    void add_note (std::unique_ptr<pending_note> pn) final override;
>    void add_event (std::unique_ptr<checker_event> event) final override;
>    void on_svalue_leak (const svalue *) override;
> @@ -107,6 +108,7 @@ class impl_region_model_context : public region_model_context
>                          std::unique_ptr<sm_context> *out_sm_context) override;
>
>    const gimple *get_stmt () const override { return m_stmt; }
> +  const exploded_graph *get_eg () const override { return m_eg; }
>
>    exploded_graph *m_eg;
>    log_user m_logger;
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
> index 4f31a6dcf0f..eb4f976b83a 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
>
>  namespace ana {
>
> +auto_vec<pop_frame_callback> region_model::pop_frame_callbacks;
> +
>  /* Dump T to PP in language-independent form, for debugging/logging/dumping
>     purposes.  */
>
> @@ -5422,6 +5424,7 @@ region_model::pop_frame (tree result_lvalue,
>      }
>
>    unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
> +  notify_on_pop_frame (this, retval, ctxt);
>  }
>
>  /* Get the number of frames in this region_model's stack.  */
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 10b2a59e787..440ea6d828d 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -236,6 +236,10 @@ public:
>
>  struct append_regions_cb_data;
>
> +typedef void (*pop_frame_callback) (const region_model *model,
> +                                   const svalue *retval,
> +                                   region_model_context *ctxt);
> +
>  /* A region_model encapsulates a representation of the state of memory, with
>     a tree of regions, along with their associated values.
>     The representation is graph-like because values can be pointers to
> @@ -532,6 +536,20 @@ class region_model
>    get_builtin_kf (const gcall *call,
>                   region_model_context *ctxt = NULL) const;
>
> +  static void
> +  register_pop_frame_callback (const pop_frame_callback &callback)
> +  {
> +    pop_frame_callbacks.safe_push (callback);
> +  }
> +
> +  static void
> +  notify_on_pop_frame (const region_model *model, const svalue *retval,
> +                      region_model_context *ctxt)
> +  {
> +    for (auto &callback : pop_frame_callbacks)
> +       callback (model, retval, ctxt);
> +  }
> +
>  private:
>    const region *get_lvalue_1 (path_var pv, region_model_context *ctxt) const;
>    const svalue *get_rvalue_1 (path_var pv, region_model_context *ctxt) const;
> @@ -621,6 +639,7 @@ private:
>                                                 tree callee_fndecl,
>                                                 region_model_context *ctxt) const;
>
> +  static auto_vec<pop_frame_callback> pop_frame_callbacks;
>    /* Storing this here to avoid passing it around everywhere.  */
>    region_model_manager *const m_mgr;
>
> @@ -649,8 +668,10 @@ class region_model_context
>  {
>   public:
>    /* Hook for clients to store pending diagnostics.
> -     Return true if the diagnostic was stored, or false if it was deleted.  */
> -  virtual bool warn (std::unique_ptr<pending_diagnostic> d) = 0;
> +     Return true if the diagnostic was stored, or false if it was deleted.
> +     Optionally provide a custom stmt_finder.  */
> +  virtual bool warn (std::unique_ptr<pending_diagnostic> d,
> +                    const stmt_finder *custom_finder = NULL) = 0;
>
>    /* Hook for clients to add a note to the last previously stored
>       pending diagnostic.  */
> @@ -757,6 +778,8 @@ class region_model_context
>
>    /* Get the current statement, if any.  */
>    virtual const gimple *get_stmt () const = 0;
> +
> +  virtual const exploded_graph *get_eg () const = 0;
>  };
>
>  /* A "do nothing" subclass of region_model_context.  */
> @@ -764,7 +787,8 @@ class region_model_context
>  class noop_region_model_context : public region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic>) override { return false; }
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override { return false; }
>    void add_note (std::unique_ptr<pending_note>) override;
>    void add_event (std::unique_ptr<checker_event>) override;
>    void on_svalue_leak (const svalue *) override {}
> @@ -812,6 +836,7 @@ public:
>    }
>
>    const gimple *get_stmt () const override { return NULL; }
> +  const exploded_graph *get_eg () const override { return NULL; }
>  };
>
>  /* A subclass of region_model_context for determining if operations fail
> @@ -840,7 +865,8 @@ private:
>  class region_model_context_decorator : public region_model_context
>  {
>   public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder)
>    {
>      if (m_inner)
>        return m_inner->warn (std::move (d));
> @@ -978,6 +1004,14 @@ class region_model_context_decorator : public region_model_context
>        return nullptr;
>    }
>
> +  const exploded_graph *get_eg () const override
> +  {
> +    if (m_inner)
> +       return m_inner->get_eg ();
> +    else
> +       return nullptr;
> +  }
> +
>  protected:
>    region_model_context_decorator (region_model_context *inner)
>    : m_inner (inner)
> @@ -993,7 +1027,8 @@ protected:
>  class annotating_context : public region_model_context_decorator
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) override
>    {
>      if (m_inner)
>        if (m_inner->warn (std::move (d)))
> @@ -1158,7 +1193,8 @@ using namespace ::selftest;
>  class test_region_model_context : public noop_region_model_context
>  {
>  public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) final override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +            const stmt_finder *custom_finder) final override
>    {
>      m_diagnostics.safe_push (d.release ());
>      return true;
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7cd72e8a886..b2caed8fc1b 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -44,6 +44,7 @@
>  #include "analyzer/region-model.h"
>  #include "analyzer/call-details.h"
>  #include "analyzer/call-info.h"
> +#include "analyzer/exploded-graph.h"
>  #include "make-unique.h"
>
>  int plugin_is_GPL_compatible;
> @@ -191,6 +192,372 @@ public:
>    }
>  };
>
> +/* This is just a copy of leak_stmt_finder for now (subject to change if
> + * necssary)  */
> +
> +class refcnt_stmt_finder : public stmt_finder
> +{
> +public:
> +  refcnt_stmt_finder (const exploded_graph &eg, tree var)
> +      : m_eg (eg), m_var (var)
> +  {
> +  }
> +
> +  std::unique_ptr<stmt_finder>
> +  clone () const final override
> +  {
> +    return make_unique<refcnt_stmt_finder> (m_eg, m_var);
> +  }
> +
> +  const gimple *
> +  find_stmt (const exploded_path &epath) final override
> +  {
> +    logger *const logger = m_eg.get_logger ();
> +    LOG_FUNC (logger);
> +
> +    if (m_var && TREE_CODE (m_var) == SSA_NAME)
> +      {
> +       /* Locate the final write to this SSA name in the path.  */
> +       const gimple *def_stmt = SSA_NAME_DEF_STMT (m_var);
> +
> +       int idx_of_def_stmt;
> +       bool found = epath.find_stmt_backwards (def_stmt, &idx_of_def_stmt);
> +       if (!found)
> +         goto not_found;
> +
> +       /* What was the next write to the underlying var
> +          after the SSA name was set? (if any).  */
> +
> +       for (unsigned idx = idx_of_def_stmt + 1; idx < epath.m_edges.length ();
> +            ++idx)
> +         {
> +           const exploded_edge *eedge = epath.m_edges[idx];
> +           if (logger)
> +                   logger->log ("eedge[%i]: EN %i -> EN %i", idx,
> +                                eedge->m_src->m_index,
> +                                eedge->m_dest->m_index);
> +           const exploded_node *dst_node = eedge->m_dest;
> +           const program_point &dst_point = dst_node->get_point ();
> +           const gimple *stmt = dst_point.get_stmt ();
> +           if (!stmt)
> +                   continue;
> +           if (const gassign *assign = dyn_cast<const gassign *> (stmt))
> +                   {
> +                           tree lhs = gimple_assign_lhs (assign);
> +                           if (TREE_CODE (lhs) == SSA_NAME
> +                               && SSA_NAME_VAR (lhs) == SSA_NAME_VAR (m_var))
> +                                   return assign;
> +                   }
> +         }
> +      }
> +
> +  not_found:
> +
> +    /* Look backwards for the first statement with a location.  */
> +    int i;
> +    const exploded_edge *eedge;
> +    FOR_EACH_VEC_ELT_REVERSE (epath.m_edges, i, eedge)
> +    {
> +      if (logger)
> +       logger->log ("eedge[%i]: EN %i -> EN %i", i, eedge->m_src->m_index,
> +                    eedge->m_dest->m_index);
> +      const exploded_node *dst_node = eedge->m_dest;
> +      const program_point &dst_point = dst_node->get_point ();
> +      const gimple *stmt = dst_point.get_stmt ();
> +      if (stmt)
> +       if (get_pure_location (stmt->location) != UNKNOWN_LOCATION)
> +         return stmt;
> +    }
> +
> +    gcc_unreachable ();
> +    return NULL;
> +  }
> +
> +private:
> +  const exploded_graph &m_eg;
> +  tree m_var;
> +};
> +
> +class refcnt_mismatch : public pending_diagnostic_subclass<refcnt_mismatch>
> +{
> +public:
> +  refcnt_mismatch (const region *base_region,
> +                               const svalue *ob_refcnt,
> +                               const svalue *actual_refcnt,
> +        tree reg_tree)
> +      : m_base_region (base_region), m_ob_refcnt (ob_refcnt),
> +       m_actual_refcnt (actual_refcnt), m_reg_tree(reg_tree)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "refcnt_mismatch";
> +  }
> +
> +  bool
> +  operator== (const refcnt_mismatch &other) const
> +  {
> +    return (m_base_region == other.m_base_region
> +           && m_ob_refcnt == other.m_ob_refcnt
> +           && m_actual_refcnt == other.m_actual_refcnt);
> +  }
> +
> +  int get_controlling_option () const final override
> +  {
> +    return 0;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc, logger *) final override
> +  {
> +    diagnostic_metadata m;
> +    bool warned;
> +    // just assuming constants for now
> +    auto actual_refcnt
> +       = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> +    warned = warning_meta (
> +       rich_loc, m, get_controlling_option (),
> +       "expected <variable name belonging to m_base_region> to have "
> +       "reference count: %qE but ob_refcnt field is: %qE",
> +       actual_refcnt, ob_refcnt);
> +
> +    // location_t loc = rich_loc->get_loc ();
> +    // foo (loc);
> +    return warned;
> +  }
> +
> +  void mark_interesting_stuff (interesting_t *interest) final override
> +  {
> +    if (m_base_region)
> +      interest->add_region_creation (m_base_region);
> +  }
> +
> +private:
> +
> +  void foo(location_t loc) const
> +  {
> +    inform(loc, "something is up right here");
> +  }
> +  const region *m_base_region;
> +  const svalue *m_ob_refcnt;
> +  const svalue *m_actual_refcnt;
> +  tree m_reg_tree;
> +};
> +
> +/* Retrieves the svalue associated with the ob_refcnt field of the base region.
> + */
> +static const svalue *
> +retrieve_ob_refcnt_sval (const region *base_reg, const region_model *model,
> +                        region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  tree ob_refcnt_tree = get_field_by_name (pyobj_record, "ob_refcnt");
> +  const region *ob_refcnt_region
> +      = mgr->get_field_region (base_reg, ob_refcnt_tree);
> +  const svalue *ob_refcnt_sval
> +      = model->get_store_value (ob_refcnt_region, ctxt);
> +  return ob_refcnt_sval;
> +}
> +
> +static void
> +increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
> +{
> +  bool existed;
> +  auto &refcnt = map.get_or_insert (key, &existed);
> +  refcnt = existed ? refcnt + 1 : 1;
> +}
> +
> +
> +/* Recursively fills in region_to_refcnt with the references owned by
> +   pyobj_ptr_sval.  */
> +static void
> +count_expected_pyobj_references (const region_model *model,
> +                          hash_map<const region *, int> &region_to_refcnt,
> +                          const svalue *pyobj_ptr_sval,
> +                          hash_set<const region *> &seen)
> +{
> +  if (!pyobj_ptr_sval)
> +    return;
> +
> +  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
> +  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
> +  if (!pyobj_region_sval && !pyobj_initial_sval)
> +    return;
> +
> +  // todo: support initial sval (e.g passed in as parameter)
> +  if (pyobj_initial_sval)
> +    {
> +  //     increment_region_refcnt (region_to_refcnt,
> +       //                     pyobj_initial_sval->get_region ());
> +      return;
> +    }
> +
> +  const region *pyobj_region = pyobj_region_sval->get_pointee ();
> +  if (!pyobj_region || seen.contains (pyobj_region))
> +    return;
> +
> +  seen.add (pyobj_region);
> +
> +  if (pyobj_ptr_sval->get_type () == pyobj_ptr_tree)
> +    increment_region_refcnt (region_to_refcnt, pyobj_region);
> +
> +  const auto *curr_store = model->get_store ();
> +  const auto *retval_cluster = curr_store->get_cluster (pyobj_region);
> +  if (!retval_cluster)
> +    return;
> +
> +  const auto &retval_binding_map = retval_cluster->get_map ();
> +
> +  for (const auto &binding : retval_binding_map)
> +    {
> +      const svalue *binding_sval = binding.second;
> +      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
> +      const region *pointee = unwrapped_sval->maybe_get_region ();
> +
> +      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
> +       count_expected_pyobj_references (model, region_to_refcnt, binding_sval,
> +                                        seen);
> +    }
> +}
> +
> +/* Compare ob_refcnt field vs the actual reference count of a region */
> +static void
> +check_refcnt (const region_model *model, region_model_context *ctxt,
> +             const hash_map<const ana::region *,
> +                            int>::iterator::reference_pair region_refcnt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  const auto &curr_region = region_refcnt.first;
> +  const auto &actual_refcnt = region_refcnt.second;
> +  const svalue *ob_refcnt_sval = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +      ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +  if (ob_refcnt_sval != actual_refcnt_sval)
> +  {
> +    // todo: fix this
> +    tree reg_tree = model->get_representative_tree (curr_region);
> +
> +    const auto &eg = ctxt->get_eg ();
> +    refcnt_stmt_finder finder (*eg, reg_tree);
> +    auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> +                                           actual_refcnt_sval, reg_tree);
> +    if (pd && eg)
> +    ctxt->warn (std::move (pd), &finder);
> +  }
> +}
> +
> +static void
> +check_refcnts (const region_model *model, const svalue *retval,
> +           region_model_context *ctxt,
> +           hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    check_refcnt(model, ctxt, region_refcnt);
> +  }
> +}
> +
> +/* Validates the reference count of all Python objects. */
> +void
> +pyobj_refcnt_checker (const region_model *model, const svalue *retval,
> +                   region_model_context *ctxt)
> +{
> +  if (!ctxt)
> +  return;
> +
> +  auto region_to_refcnt = hash_map<const region *, int> ();
> +  auto seen_regions = hash_set<const region *> ();
> +
> +  count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
> +  check_refcnts (model, retval, ctxt, region_to_refcnt);
> +}
> +
> +/* Counts the actual pyobject references from all clusters in the model's
> + * store. */
> +static void
> +count_all_references (const region_model *model,
> +                     hash_map<const region *, int> &region_to_refcnt)
> +{
> +  for (const auto &cluster : *model->get_store ())
> +  {
> +    auto curr_region = cluster.first;
> +    if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> +    continue;
> +
> +    increment_region_refcnt (region_to_refcnt, curr_region);
> +
> +    auto binding_cluster = cluster.second;
> +    for (const auto &binding : binding_cluster->get_map ())
> +    {
> +         const svalue *binding_sval = binding.second;
> +
> +         const svalue *unwrapped_sval
> +             = binding_sval->unwrap_any_unmergeable ();
> +         // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> +         // continue;
> +
> +         const region *pointee = unwrapped_sval->maybe_get_region ();
> +         if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> +           continue;
> +
> +         increment_region_refcnt (region_to_refcnt, pointee);
> +    }
> +  }
> +}
> +
> +static void
> +dump_refcnt_info (const hash_map<const region *, int> &region_to_refcnt,
> +                 const region_model *model, region_model_context *ctxt)
> +{
> +  region_model_manager *mgr = model->get_manager ();
> +  pretty_printer pp;
> +  pp_format_decoder (&pp) = default_tree_printer;
> +  pp_show_color (&pp) = pp_show_color (global_dc->printer);
> +  pp.buffer->stream = stderr;
> +
> +  for (const auto &region_refcnt : region_to_refcnt)
> +  {
> +    auto region = region_refcnt.first;
> +    auto actual_refcnt = region_refcnt.second;
> +    const svalue *ob_refcnt_sval
> +       = retrieve_ob_refcnt_sval (region, model, ctxt);
> +    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> +       ob_refcnt_sval->get_type (), actual_refcnt);
> +
> +    region->dump_to_pp (&pp, true);
> +    pp_string (&pp, " — ob_refcnt: ");
> +    ob_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_string (&pp, " actual refcnt: ");
> +    actual_refcnt_sval->dump_to_pp (&pp, true);
> +    pp_newline (&pp);
> +  }
> +  pp_string (&pp, "~~~~~~~~\n");
> +  pp_flush (&pp);
> +}
> +
> +class kf_analyzer_cpython_dump_refcounts : public known_function
> +{
> +public:
> +  bool matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return cd.num_args () == 0;
> +  }
> +  void impl_call_pre (const call_details &cd) const final override
> +  {
> +    region_model_context *ctxt = cd.get_ctxt ();
> +    if (!ctxt)
> +      return;
> +    region_model *model = cd.get_model ();
> +    auto region_to_refcnt = hash_map<const region *, int> ();
> +    count_all_references(model, region_to_refcnt);
> +    dump_refcnt_info(region_to_refcnt, model, ctxt);
> +  }
> +};
> +
>  /* Some concessions were made to
>  simplify the analysis process when comparing kf_PyList_Append with the
>  real implementation. In particular, PyList_Append performs some
> @@ -927,6 +1294,10 @@ cpython_analyzer_init_cb (void *gcc_data, void * /*user_data */)
>    iface->register_known_function ("PyList_New", make_unique<kf_PyList_New> ());
>    iface->register_known_function ("PyLong_FromLong",
>                                    make_unique<kf_PyLong_FromLong> ());
> +
> +  iface->register_known_function (
> +      "__analyzer_cpython_dump_refcounts",
> +      make_unique<kf_analyzer_cpython_dump_refcounts> ());
>  }
>  } // namespace ana
>
> @@ -940,8 +1311,9 @@ plugin_init (struct plugin_name_args *plugin_info,
>    const char *plugin_name = plugin_info->base_name;
>    if (0)
>      inform (input_location, "got here; %qs", plugin_name);
> -  ana::register_finish_translation_unit_callback (&stash_named_types);
> -  ana::register_finish_translation_unit_callback (&stash_global_vars);
> +  register_finish_translation_unit_callback (&stash_named_types);
> +  register_finish_translation_unit_callback (&stash_global_vars);
> +  region_model::register_pop_frame_callback(pyobj_refcnt_checker);
>    register_callback (plugin_info->base_name, PLUGIN_ANALYZER_INIT,
>                       ana::cpython_analyzer_init_cb,
>                       NULL); /* void *user_data */
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> similarity index 64%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index 19b5c17428a..9912f9105d4 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -8,34 +8,6 @@
>  #include <Python.h>
>  #include "../analyzer/analyzer-decls.h"
>
> -PyObject *
> -test_PyList_New (Py_ssize_t len)
> -{
> -  PyObject *obj = PyList_New (len);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
> -PyObject *
> -test_PyLong_New (long n)
> -{
> -  PyObject *obj = PyLong_FromLong (n);
> -  if (obj)
> -    {
> -     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> -     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> -    }
> -  else
> -    __analyzer_dump_path (); /* { dg-message "path" } */
> -  return obj;
> -}
> -
>  PyObject *
>  test_PyListAppend (long n)
>  {
> @@ -43,6 +15,7 @@ test_PyListAppend (long n)
>    PyObject *list = PyList_New (0);
>    PyList_Append(list, item);
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>  PyObject *
> @@ -67,6 +40,7 @@ test_PyListAppend_2 (long n)
>    else
>      __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
>    return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
>  }
>
>
> @@ -75,4 +49,30 @@ test_PyListAppend_3 (PyObject *item, PyObject *list)
>  {
>    PyList_Append (list, item);
>    return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
>  }
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> new file mode 100644
> index 00000000000..492d4f7d58d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_New.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyList_New (Py_ssize_t len)
> +{
> +  PyObject *obj = PyList_New (len);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyList_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyList_New_2 ()
> +{
> +  PyObject *obj = PyList_New (0);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyList ()
> +{
> +  PyObject *p = PyList_New (2);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> new file mode 100644
> index 00000000000..97b29849302
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyLong_New (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +  if (obj)
> +    {
> +     __analyzer_eval (obj->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +     __analyzer_eval (PyLong_CheckExact (obj)); /* { dg-warning "TRUE" } */
> +    }
> +  else
> +    __analyzer_dump_path (); /* { dg-message "path" } */
> +  return obj;
> +}
> +
> +void
> +test_PyLong_New_2 (long n)
> +{
> +  PyObject *obj = PyLong_FromLong (n);
> +} /* { dg-warning "leak of 'obj'" } */
> +
> +PyObject *test_stray_incref_PyLong (long val)
> +{
> +  PyObject *p = PyLong_FromLong (val);
> +  if (p)
> +    Py_INCREF (p);
> +  return p;
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> new file mode 100644
> index 00000000000..9912f9105d4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_PyListAppend (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, item);
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +PyObject *
> +test_PyListAppend_2 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  if (!item)
> +       return NULL;
> +
> +  __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  PyObject *list = PyList_New (n);
> +  if (!list)
> +  {
> +       Py_DECREF(item);
> +       return NULL;
> +  }
> +
> +  __analyzer_eval (list->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +
> +  if (PyList_Append (list, item) < 0)
> +    __analyzer_eval (item->ob_refcnt == 1); /* { dg-warning "TRUE" } */
> +  else
> +    __analyzer_eval (item->ob_refcnt == 2); /* { dg-warning "TRUE" } */
> +  return list; /* { dg-warning "leak of 'item'" } */
> +  /* { dg-warning "reference count" "" { target *-*-* } .-1 } */
> +}
> +
> +
> +PyObject *
> +test_PyListAppend_3 (PyObject *item, PyObject *list)
> +{
> +  PyList_Append (list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_4 (long n)
> +{
> +  PyObject *item = PyLong_FromLong (n);
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_5 ()
> +{
> +  PyObject *list = PyList_New (0);
> +  PyList_Append(list, NULL);
> +  return list;
> +}
> +
> +PyObject *
> +test_PyListAppend_6 ()
> +{
> +  PyObject *item = NULL;
> +  PyObject *list = NULL;
> +  PyList_Append(list, item);
> +  return list;
> +}
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index e1ed2d2589e..cbef6da8d86 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,8 +161,9 @@ set plugin_test_list [list \
>           taint-CVE-2011-0521-6.c \
>           taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -         cpython-plugin-test-1.c \
> -         cpython-plugin-test-2.c } \
> +         cpython-plugin-test-PyList_Append.c \
> +         cpython-plugin-test-PyList_New.c \
> +         cpython-plugin-test-PyLong_FromLong.c } \
>  ]
>
>  foreach plugin_test $plugin_test_list {
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:35                                                               ` Eric Feng
@ 2023-08-29 17:28                                                                 ` Eric Feng
  2023-08-29 21:14                                                                   ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-29 17:28 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Additionally, by using the old model and the pointer per your suggestion,
we are able to find the representative tree and emit a more accurate diagnostic!

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

If a representative tree is not found, I decided we should just bail out
of emitting a diagnostic for now, to avoid confusing the user on what
the problem is.

I've attached the patch for this (on top of the previous one) below. If
it also looks good, I can merge it with the last patch and push it in at
the same time.

Best,
Eric

---
 gcc/analyzer/region-model.cc                  |  3 +-
 gcc/analyzer/region-model.h                   |  7 ++--
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 35 +++++++++++--------
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb4f976b83a..c1d266d351b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5391,6 +5391,7 @@ region_model::pop_frame (tree result_lvalue,
 {
   gcc_assert (m_current_frame);
 
+  const region_model pre_popped_model = *this;
   const frame_region *frame_reg = m_current_frame;
 
   /* Notify state machines.  */
@@ -5424,7 +5425,7 @@ region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
-  notify_on_pop_frame (this, retval, ctxt);
+  notify_on_pop_frame (this, &pre_popped_model, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 440ea6d828d..b89c6f6c649 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -237,6 +237,7 @@ public:
 struct append_regions_cb_data;
 
 typedef void (*pop_frame_callback) (const region_model *model,
+				    const region_model *prev_model,
 				    const svalue *retval,
 				    region_model_context *ctxt);
 
@@ -543,11 +544,13 @@ class region_model
   }
 
   static void
-  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  notify_on_pop_frame (const region_model *model,
+		       const region_model *prev_model,
+          const svalue *retval,
 		       region_model_context *ctxt)
   {
     for (auto &callback : pop_frame_callbacks)
-	callback (model, retval, ctxt);
+	callback (model, prev_model, retval, ctxt);
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index b2caed8fc1b..6f0a355fe30 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -318,11 +318,10 @@ public:
     auto actual_refcnt
 	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
     auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-    warned = warning_meta (
-	rich_loc, m, get_controlling_option (),
-	"expected <variable name belonging to m_base_region> to have "
-	"reference count: %qE but ob_refcnt field is: %qE",
-	actual_refcnt, ob_refcnt);
+    warned = warning_meta (rich_loc, m, get_controlling_option (),
+			   "expected %qE to have "
+			   "reference count: %qE but ob_refcnt field is: %qE",
+			   m_reg_tree, actual_refcnt, ob_refcnt);
 
     // location_t loc = rich_loc->get_loc ();
     // foo (loc);
@@ -425,7 +424,8 @@ count_expected_pyobj_references (const region_model *model,
 
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
-check_refcnt (const region_model *model, region_model_context *ctxt,
+check_refcnt (const region_model *model, const region_model *old_model,
+	      region_model_context *ctxt,
 	      const hash_map<const ana::region *,
 			     int>::iterator::reference_pair region_refcnt)
 {
@@ -438,8 +438,11 @@ check_refcnt (const region_model *model, region_model_context *ctxt,
 
   if (ob_refcnt_sval != actual_refcnt_sval)
   {
-    // todo: fix this
-    tree reg_tree = model->get_representative_tree (curr_region);
+    const svalue *curr_reg_sval
+	= mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
+    tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
+    if (!reg_tree)
+	    return;
 
     const auto &eg = ctxt->get_eg ();
     refcnt_stmt_finder finder (*eg, reg_tree);
@@ -451,20 +454,22 @@ check_refcnt (const region_model *model, region_model_context *ctxt,
 }
 
 static void
-check_refcnts (const region_model *model, const svalue *retval,
-	    region_model_context *ctxt,
-	    hash_map<const region *, int> &region_to_refcnt)
+check_refcnts (const region_model *model, const region_model *old_model,
+	       const svalue *retval, region_model_context *ctxt,
+	       hash_map<const region *, int> &region_to_refcnt)
 {
   for (const auto &region_refcnt : region_to_refcnt)
   {
-    check_refcnt(model, ctxt, region_refcnt);
+    check_refcnt(model, old_model, ctxt, region_refcnt);
   }
 }
 
 /* Validates the reference count of all Python objects. */
 void
-pyobj_refcnt_checker (const region_model *model, const svalue *retval,
-		    region_model_context *ctxt)
+pyobj_refcnt_checker (const region_model *model,
+		      const region_model *old_model,
+         const svalue *retval,
+		      region_model_context *ctxt)
 {
   if (!ctxt)
   return;
@@ -473,7 +478,7 @@ pyobj_refcnt_checker (const region_model *model, const svalue *retval,
   auto seen_regions = hash_set<const region *> ();
 
   count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
-  check_refcnts (model, retval, ctxt, region_to_refcnt);
+  check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
 }
 
 /* Counts the actual pyobject references from all clusters in the model's
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29 17:28                                                                 ` Eric Feng
@ 2023-08-29 21:14                                                                   ` David Malcolm
  2023-08-30 22:15                                                                     ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-29 21:14 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> Additionally, by using the old model and the pointer per your
> suggestion,
> we are able to find the representative tree and emit a more accurate
> diagnostic!
> 
> rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |

Excellent, that's a big improvement.

> 
> If a representative tree is not found, I decided we should just bail
> out
> of emitting a diagnostic for now, to avoid confusing the user on what
> the problem is.

Fair enough.

> 
> I've attached the patch for this (on top of the previous one) below.
> If
> it also looks good, I can merge it with the last patch and push it in
> at
> the same time.

I don't mind either way, but please can you update the tests so that we
have some automated test coverage that the correct name is being
printed in the warning.

Thanks
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29 21:14                                                                   ` David Malcolm
@ 2023-08-30 22:15                                                                     ` Eric Feng
  2023-08-31 17:01                                                                       ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-30 22:15 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc, gcc-patches

On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >    23 |   return list;
> >       |          ^~~~
> >   ‘create_py_object’: events 1-4
> >     |
> >     |    4 |   PyObject* item = PyLong_FromLong(3);
> >     |      |                    ^~~~~~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (1) when ‘PyLong_FromLong’ succeeds
> >     |    5 |   PyObject* list = PyList_New(1);
> >     |      |                    ~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (2) when ‘PyList_New’ succeeds
> >     |......
> >     |   14 |   PyList_Append(list, item);
> >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |   |
> >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> >     |......
> >     |   23 |   return list;
> >     |      |          ~~~~
> >     |      |          |
> >     |      |          (4) here
> >     |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-30 22:15                                                                     ` Eric Feng
@ 2023-08-31 17:01                                                                       ` David Malcolm
  2023-08-31 19:09                                                                         ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-31 17:01 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > Additionally, by using the old model and the pointer per your
> > > suggestion,
> > > we are able to find the representative tree and emit a more
> > > accurate
> > > diagnostic!
> > > 
> > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > ‘1’
> > > but ob_refcnt field is: ‘2’
> > >    23 |   return list;
> > >       |          ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |      |                    ~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (2) when ‘PyList_New’ succeeds
> > >     |......
> > >     |   14 |   PyList_Append(list, item);
> > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |   |
> > >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > >     |......
> > >     |   23 |   return list;
> > >     |      |          ~~~~
> > >     |      |          |
> > >     |      |          (4) here
> > >     |
> > 
> > Excellent, that's a big improvement.
> > 
> > > 
> > > If a representative tree is not found, I decided we should just
> > > bail
> > > out
> > > of emitting a diagnostic for now, to avoid confusing the user on
> > > what
> > > the problem is.
> > 
> > Fair enough.
> > 
> > > 
> > > I've attached the patch for this (on top of the previous one)
> > > below.
> > > If
> > > it also looks good, I can merge it with the last patch and push
> > > it in
> > > at
> > > the same time.
> > 
> > I don't mind either way, but please can you update the tests so
> > that we
> > have some automated test coverage that the correct name is being
> > printed in the warning.
> > 
> > Thanks
> > Dave
> > 
> 
> Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> to
> preserve our chain on the list:
> 
> ---
> 
> Thanks; pushed to trunk with nits fixed:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> .

Thanks; looks good.

Do you want to add this to the GCC 14 part of the "History" section on
the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer
or should I?

> 
> Incidentally, I updated my formatting settings in VSCode, which I've
> previously mentioned in passing. In case anyone is interested:
> 
> "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> AlignAfterOpenBracket: Align,
> AllowAllParametersOfDeclarationOnNextLine: true }",
> 
> This fixes some issues with the indent width and also ensures
> function
> parameters of appropriate length are aligned properly and on a new
> line each (like the rest of the analyzer code).

Thanks
Dave



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-31 17:01                                                                       ` David Malcolm
@ 2023-08-31 19:09                                                                         ` Eric Feng
  2023-08-31 20:19                                                                           ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-08-31 19:09 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Thu, Aug 31, 2023 at 1:01 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> > On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > >
> > > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > > Additionally, by using the old model and the pointer per your
> > > > suggestion,
> > > > we are able to find the representative tree and emit a more
> > > > accurate
> > > > diagnostic!
> > > >
> > > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > > ‘1’
> > > > but ob_refcnt field is: ‘2’
> > > >    23 |   return list;
> > > >       |          ^~~~
> > > >   ‘create_py_object’: events 1-4
> > > >     |
> > > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > > >     |      |                    |
> > > >     |      |                    (1) when ‘PyLong_FromLong’
> > > > succeeds
> > > >     |    5 |   PyObject* list = PyList_New(1);
> > > >     |      |                    ~~~~~~~~~~~~~
> > > >     |      |                    |
> > > >     |      |                    (2) when ‘PyList_New’ succeeds
> > > >     |......
> > > >     |   14 |   PyList_Append(list, item);
> > > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >     |      |   |
> > > >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > > >     |......
> > > >     |   23 |   return list;
> > > >     |      |          ~~~~
> > > >     |      |          |
> > > >     |      |          (4) here
> > > >     |
> > >
> > > Excellent, that's a big improvement.
> > >
> > > >
> > > > If a representative tree is not found, I decided we should just
> > > > bail
> > > > out
> > > > of emitting a diagnostic for now, to avoid confusing the user on
> > > > what
> > > > the problem is.
> > >
> > > Fair enough.
> > >
> > > >
> > > > I've attached the patch for this (on top of the previous one)
> > > > below.
> > > > If
> > > > it also looks good, I can merge it with the last patch and push
> > > > it in
> > > > at
> > > > the same time.
> > >
> > > I don't mind either way, but please can you update the tests so
> > > that we
> > > have some automated test coverage that the correct name is being
> > > printed in the warning.
> > >
> > > Thanks
> > > Dave
> > >
> >
> > Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> > to
> > preserve our chain on the list:
> >
> > ---
> >
> > Thanks; pushed to trunk with nits fixed:
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > .
>
> Thanks; looks good.
>
> Do you want to add this to the GCC 14 part of the "History" section on
> the wiki page:
>   https://gcc.gnu.org/wiki/StaticAnalyzer
> or should I?
Happy to add it myself, but I'm not finding an option to edit the page
(created an account under efric@gcc.gnu.org). Do I need to be added to
the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?

>
> >
> > Incidentally, I updated my formatting settings in VSCode, which I've
> > previously mentioned in passing. In case anyone is interested:
> >
> > "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> > TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> > AlignAfterOpenBracket: Align,
> > AllowAllParametersOfDeclarationOnNextLine: true }",
> >
> > This fixes some issues with the indent width and also ensures
> > function
> > parameters of appropriate length are aligned properly and on a new
> > line each (like the rest of the analyzer code).
>
> Thanks
> Dave
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-31 19:09                                                                         ` Eric Feng
@ 2023-08-31 20:19                                                                           ` David Malcolm
  2023-09-01  1:25                                                                             ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-08-31 20:19 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> On Thu, Aug 31, 2023 at 1:01 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:

[...]

> > > 
> > > Thanks; pushed to trunk with nits fixed:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > .
> > 
> > Thanks; looks good.
> > 
> > Do you want to add this to the GCC 14 part of the "History" section
> > on
> > the wiki page:
> >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > or should I?
> Happy to add it myself, but I'm not finding an option to edit the
> page
> (created an account under efric@gcc.gnu.org). Do I need to be added
> to
> the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?

I can do this.  What's your account's WikiName ?

Thanks
Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-31 20:19                                                                           ` David Malcolm
@ 2023-09-01  1:25                                                                             ` Eric Feng
  2023-09-01 11:57                                                                               ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-09-01  1:25 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Thu, Aug 31, 2023 at 4:19 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> > On Thu, Aug 31, 2023 at 1:01 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > >
> > > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
>
> [...]
>
> > > >
> > > > Thanks; pushed to trunk with nits fixed:
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > > .
> > >
> > > Thanks; looks good.
> > >
> > > Do you want to add this to the GCC 14 part of the "History" section
> > > on
> > > the wiki page:
> > >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > > or should I?
> > Happy to add it myself, but I'm not finding an option to edit the
> > page
> > (created an account under efric@gcc.gnu.org). Do I need to be added
> > to
> > the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?
>
> I can do this.  What's your account's WikiName ?
Thank you — it is EricFeng.
>
> Thanks
> Dave
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-09-01  1:25                                                                             ` Eric Feng
@ 2023-09-01 11:57                                                                               ` David Malcolm
  2023-09-05  2:13                                                                                 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-09-01 11:57 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc

On Thu, 2023-08-31 at 21:25 -0400, Eric Feng wrote:
> On Thu, Aug 31, 2023 at 4:19 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> > > On Thu, Aug 31, 2023 at 1:01 PM David Malcolm
> > > <dmalcolm@redhat.com>
> > > wrote:
> > > > 
> > > > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> > 
> > [...]
> > 
> > > > > 
> > > > > Thanks; pushed to trunk with nits fixed:
> > > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > > > .
> > > > 
> > > > Thanks; looks good.
> > > > 
> > > > Do you want to add this to the GCC 14 part of the "History"
> > > > section
> > > > on
> > > > the wiki page:
> > > >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > > > or should I?
> > > Happy to add it myself, but I'm not finding an option to edit the
> > > page
> > > (created an account under efric@gcc.gnu.org). Do I need to be
> > > added
> > > to
> > > the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?
> > 
> > I can do this.  What's your account's WikiName ?
> Thank you — it is EricFeng.
> 

I've added EricFeng to that page, so hopefully you should be able to
make wiki edits now.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
  2023-09-01 11:57                                                                               ` David Malcolm
@ 2023-09-05  2:13                                                                                 ` Eric Feng
  2023-09-07 17:28                                                                                   ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-09-05  2:13 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, gcc-patches, Eric Feng

Hi Dave,

Recently I've been working on symbolic value support for the reference
count checker. I've attached a patch for it below; let me know it looks
OK for trunk. Thanks!

Best,
Eric

---

This patch enhances the reference count checker in the CPython plugin by
adding support for symbolic values. Whereas previously we were only able
to check the reference count of PyObject* objects created in the scope
of the function; we are now able to emit diagnostics on reference count
mismatch of objects that were, for example, passed in as a function
parameter.

rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’
    6 |   return obj;
      |          ^~~
  ‘create_py_object2’: event 1
    |
    |    6 |   return obj;
    |      |          ^~~
    |      |          |
    |      |          (1) here
    |


gcc/testsuite/ChangeLog:
	PR analyzer/107646
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Support reference count checking
	of symbolic values.
	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: New test.
	* gcc.dg/plugin/plugin.exp: New test.
	* gcc.dg/plugin/cpython-plugin-test-refcnt.c: New test.

Signed-off-by: Eric Feng <ef2648@columbia.edu>

---
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 133 +++++++++++-------
 .../cpython-plugin-test-PyList_Append.c       |  21 ++-
 .../plugin/cpython-plugin-test-refcnt.c       |  18 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp        |   1 +
 4 files changed, 118 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index bf1982e79c3..d7ecd7fce09 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -314,17 +314,20 @@ public:
   {
     diagnostic_metadata m;
     bool warned;
-    // just assuming constants for now
-    auto actual_refcnt
-	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-    warned = warning_meta (rich_loc, m, get_controlling_option (),
-			   "expected %qE to have "
-			   "reference count: %qE but ob_refcnt field is: %qE",
-			   m_reg_tree, actual_refcnt, ob_refcnt);
-
-    // location_t loc = rich_loc->get_loc ();
-    // foo (loc);
+
+    const auto *actual_refcnt_constant
+	= m_actual_refcnt->dyn_cast_constant_svalue ();
+    const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue ();
+    if (!actual_refcnt_constant || !ob_refcnt_constant)
+      return false;
+
+    auto actual_refcnt = actual_refcnt_constant->get_constant ();
+    auto ob_refcnt = ob_refcnt_constant->get_constant ();
+    warned = warning_meta (
+	rich_loc, m, get_controlling_option (),
+	"expected %qE to have "
+	"reference count: N + %qE but ob_refcnt field is N + %qE",
+	m_reg_tree, actual_refcnt, ob_refcnt);
     return warned;
   }
 
@@ -336,10 +339,6 @@ public:
 
 private:
 
-  void foo(location_t loc) const 
-  {
-    inform(loc, "something is up right here");
-  }
   const region *m_base_region;
   const svalue *m_ob_refcnt;
   const svalue *m_actual_refcnt;
@@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
   refcnt = existed ? refcnt + 1 : 1;
 }
 
+static const region *
+get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
+{
+  const auto *region_sval = sval->dyn_cast_region_svalue ();
+  if (region_sval)
+    return region_sval->get_pointee ();
+
+  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
+  if (initial_sval)
+    return mgr->get_symbolic_region (initial_sval);
+
+  return nullptr;
+}
 
 /* Recursively fills in region_to_refcnt with the references owned by
    pyobj_ptr_sval.  */
@@ -381,20 +393,9 @@ count_pyobj_references (const region_model *model,
   if (!pyobj_ptr_sval)
     return;
 
-  const auto *pyobj_region_sval = pyobj_ptr_sval->dyn_cast_region_svalue ();
-  const auto *pyobj_initial_sval = pyobj_ptr_sval->dyn_cast_initial_svalue ();
-  if (!pyobj_region_sval && !pyobj_initial_sval)
-    return;
-
-  // todo: support initial sval (e.g passed in as parameter)
-  if (pyobj_initial_sval)
-    {
-      //     increment_region_refcnt (region_to_refcnt,
-      // 		       pyobj_initial_sval->get_region ());
-      return;
-    }
+  region_model_manager *mgr = model->get_manager ();
 
-  const region *pyobj_region = pyobj_region_sval->get_pointee ();
+  const region *pyobj_region = get_region_from_svalue (pyobj_ptr_sval, mgr);
   if (!pyobj_region || seen.contains (pyobj_region))
     return;
 
@@ -409,49 +410,75 @@ count_pyobj_references (const region_model *model,
     return;
 
   const auto &retval_binding_map = retval_cluster->get_map ();
-
   for (const auto &binding : retval_binding_map)
     {
-      const svalue *binding_sval = binding.second;
-      const svalue *unwrapped_sval = binding_sval->unwrap_any_unmergeable ();
-      const region *pointee = unwrapped_sval->maybe_get_region ();
-
-      if (pointee && pointee->get_kind () == RK_HEAP_ALLOCATED)
+      const svalue *binding_sval = binding.second->unwrap_any_unmergeable ();
+      if (get_region_from_svalue (binding_sval, mgr))
 	count_pyobj_references (model, region_to_refcnt, binding_sval, seen);
     }
 }
 
+static void
+unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
+{
+  if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
+    {
+      auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
+      if (!unwrap_cast)
+	unwrap_cast = ob_refcnt_sval;
+
+      if (unwrap_cast->get_kind () == SK_BINOP)
+	ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 ();
+    }
+}
+
+static void
+handle_refcnt_mismatch (const region_model *old_model,
+			const ana::region *curr_region,
+			const svalue *ob_refcnt_sval,
+			const svalue *actual_refcnt_sval,
+			region_model_context *ctxt)
+{
+  region_model_manager *mgr = old_model->get_manager ();
+  const svalue *curr_reg_sval
+      = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
+  tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
+
+  if (!reg_tree)
+    return;
+
+  const auto &eg = ctxt->get_eg ();
+  refcnt_stmt_finder finder (*eg, reg_tree);
+  auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
+					  actual_refcnt_sval, reg_tree);
+  if (pd && eg)
+    ctxt->warn (std::move (pd), &finder);
+}
+
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
 check_refcnt (const region_model *model,
 	      const region_model *old_model,
 	      region_model_context *ctxt,
 	      const hash_map<const ana::region *,
-			     int>::iterator::reference_pair region_refcnt)
+			     int>::iterator::reference_pair &region_refcnt)
 {
   region_model_manager *mgr = model->get_manager ();
   const auto &curr_region = region_refcnt.first;
   const auto &actual_refcnt = region_refcnt.second;
+
   const svalue *ob_refcnt_sval
       = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
+  if (!ob_refcnt_sval)
+    return;
+
+  unwrap_any_ob_refcnt_sval (ob_refcnt_sval);
+
   const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
       ob_refcnt_sval->get_type (), actual_refcnt);
-
   if (ob_refcnt_sval != actual_refcnt_sval)
-    {
-      const svalue *curr_reg_sval
-	  = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
-      tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
-      if (!reg_tree)
-	return;
-
-      const auto &eg = ctxt->get_eg ();
-      refcnt_stmt_finder finder (*eg, reg_tree);
-      auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
-					      actual_refcnt_sval, reg_tree);
-      if (pd && eg)
-	ctxt->warn (std::move (pd), &finder);
-    }
+    handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
+			    actual_refcnt_sval, ctxt);
 }
 
 static void
@@ -493,8 +520,6 @@ count_all_references (const region_model *model,
   for (const auto &cluster : *model->get_store ())
     {
       auto curr_region = cluster.first;
-      if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
-	continue;
 
       increment_region_refcnt (region_to_refcnt, curr_region);
 
@@ -505,8 +530,8 @@ count_all_references (const region_model *model,
 
 	  const svalue *unwrapped_sval
 	      = binding_sval->unwrap_any_unmergeable ();
-	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
-	  // continue;
+	  if (unwrapped_sval->get_type () != pyobj_ptr_tree)
+	  continue;
 
 	  const region *pointee = unwrapped_sval->maybe_get_region ();
 	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
index e1efd9efda5..46daf2f8975 100644
--- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
@@ -75,4 +75,23 @@ test_PyListAppend_6 ()
   PyObject *list = NULL;
   PyList_Append(list, item);
   return list;
-}
\ No newline at end of file
+}
+
+PyObject *
+test_PyListAppend_7 (PyObject *item)
+{
+  PyObject *list = PyList_New (0);
+  Py_INCREF(item);
+  PyList_Append(list, item);
+  return list;
+  /* { dg-warning "expected 'item' to have reference count" "" { target *-*-* } .-1 } */
+}
+
+PyObject *
+test_PyListAppend_8 (PyObject *item, PyObject *list)
+{
+  Py_INCREF(item);
+  Py_INCREF(item);
+  PyList_Append(list, item);
+  return list;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
new file mode 100644
index 00000000000..a7f39509d6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target analyzer } */
+/* { dg-options "-fanalyzer" } */
+/* { dg-require-python-h "" } */
+
+
+#define PY_SSIZE_T_CLEAN
+#include <Python.h>
+#include "../analyzer/analyzer-decls.h"
+
+PyObject *
+test_refcnt_1 (PyObject *obj)
+{
+  Py_INCREF(obj);
+  Py_INCREF(obj);
+  return obj;
+  /* { dg-warning "expected 'obj' to have reference count" "" { target *-*-* } .-1 } */
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index ed72912309c..87862b4ca00 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -162,6 +162,7 @@ set plugin_test_list [list \
 	  taint-antipatterns-1.c } \
     { analyzer_cpython_plugin.c \
 	  cpython-plugin-test-no-Python-h.c \
+	  cpython-plugin-test-refcnt.c \
 	  cpython-plugin-test-PyList_Append.c \
 	  cpython-plugin-test-PyList_New.c \
 	  cpython-plugin-test-PyLong_FromLong.c } \
-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
  2023-09-05  2:13                                                                                 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
@ 2023-09-07 17:28                                                                                   ` David Malcolm
  2023-09-11  2:12                                                                                     ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-09-07 17:28 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:

> Hi Dave,

Hi Eric, thanks for the patch.

> 
> Recently I've been working on symbolic value support for the reference
> count checker. I've attached a patch for it below; let me know it looks
> OK for trunk. Thanks!
> 
> Best,
> Eric
> 
> ---
> 
> This patch enhances the reference count checker in the CPython plugin by
> adding support for symbolic values. Whereas previously we were only able
> to check the reference count of PyObject* objects created in the scope
> of the function; we are now able to emit diagnostics on reference count
> mismatch of objects that were, for example, passed in as a function
> parameter.
> 
> rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’
>     6 |   return obj;
>       |          ^~~

[...snip...]

>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> 
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index bf1982e79c3..d7ecd7fce09 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -314,17 +314,20 @@ public:
>    {
>      diagnostic_metadata m;
>      bool warned;
> -    // just assuming constants for now
> -    auto actual_refcnt
> -	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> -    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> -    warned = warning_meta (rich_loc, m, get_controlling_option (),
> -			   "expected %qE to have "
> -			   "reference count: %qE but ob_refcnt field is: %qE",
> -			   m_reg_tree, actual_refcnt, ob_refcnt);
> -
> -    // location_t loc = rich_loc->get_loc ();
> -    // foo (loc);
> +
> +    const auto *actual_refcnt_constant
> +	= m_actual_refcnt->dyn_cast_constant_svalue ();
> +    const auto *ob_refcnt_constant = m_ob_refcnt->dyn_cast_constant_svalue ();
> +    if (!actual_refcnt_constant || !ob_refcnt_constant)
> +      return false;
> +
> +    auto actual_refcnt = actual_refcnt_constant->get_constant ();
> +    auto ob_refcnt = ob_refcnt_constant->get_constant ();
> +    warned = warning_meta (
> +	rich_loc, m, get_controlling_option (),
> +	"expected %qE to have "
> +	"reference count: N + %qE but ob_refcnt field is N + %qE",
> +	m_reg_tree, actual_refcnt, ob_refcnt);
>      return warned;

I know you're emulating the old behavior I implemented way back in
cpychecker, but I don't like that behavior :(

Specifically, although the patch improves the behavior for symbolic
values, it regresses the precision of wording for the concrete values
case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only have
1 pointer, then it's more readable to say:

  warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
field is ‘2’

than:

  warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt field is N + ‘2’

...and we shouldn't quote concrete numbers, the message should be:

  warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field is 2

or better:

  warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2


Can you move the unwrapping of the svalue from the tests below into the
emit vfunc?  That way the m_actual_refcnt doesn't have to be a
constant_svalue; you could have logic in the emit vfunc to print
readable messages based on what kind of svalue it is.

Rather than 'N', it might be better to say 'initial'; how about:

  warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt' field has increased by 1
  warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field has increased by 2
  warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt' field is unchanged
  warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt' field has decreased by 1
  warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field is unchanged

and similar?

Maybe have a flag that tracks whether we're talking about a concrete
value that's absolute versus a concrete value that's relative to the
initial value?


[...snip...]


> @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *, int> &map, const region *key)
>    refcnt = existed ? refcnt + 1 : 1;
>  }
>  
> +static const region *
> +get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
> +{
> +  const auto *region_sval = sval->dyn_cast_region_svalue ();
> +  if (region_sval)
> +    return region_sval->get_pointee ();
> +
> +  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
> +  if (initial_sval)
> +    return mgr->get_symbolic_region (initial_sval);
> +
> +  return nullptr;
> +}

This is dereferencing a pointer, right?

Can the caller use region_model::deref_rvalue instead?


[...snip...]

> +static void
> +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
> +{
> +  if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
> +    {
> +      auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
> +      if (!unwrap_cast)
> +	unwrap_cast = ob_refcnt_sval;
> +
> +      if (unwrap_cast->get_kind () == SK_BINOP)
> +	ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1 ();

This would be better spelled as:

         if (const binop_svalue *binop_sval = unwrap_cast->dyn_cast_binop_svalue ())
	    ob_refcnt_sval = binop_sval->get_arg1 ();

[...snip...]

>  /* Compare ob_refcnt field vs the actual reference count of a region */
>  static void
>  check_refcnt (const region_model *model,
>  	      const region_model *old_model,
>  	      region_model_context *ctxt,
>  	      const hash_map<const ana::region *,
> -			     int>::iterator::reference_pair region_refcnt)
> +			     int>::iterator::reference_pair &region_refcnt)

Could really use a typedef for
  const hash_map<const ana::region *, int>
to simplify this code.

>  {
>    region_model_manager *mgr = model->get_manager ();
>    const auto &curr_region = region_refcnt.first;
>    const auto &actual_refcnt = region_refcnt.second;
> +
>    const svalue *ob_refcnt_sval
>        = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> +  if (!ob_refcnt_sval)
> +    return;
> +
> +  unwrap_any_ob_refcnt_sval (ob_refcnt_sval);

As noted above, can the diagnostic store the pre-unwrapped
ob_refcnt_sval?  Might mean you have to do the unwrapping both here,
and later when displaying the diagnostic.  Or (probably best) track
both the original and unwrapped ob_refcnt_sval, and store both in the
pending_diagnostic.

> +
>    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
>        ob_refcnt_sval->get_type (), actual_refcnt);
> -
>    if (ob_refcnt_sval != actual_refcnt_sval)
> -    {
> -      const svalue *curr_reg_sval
> -	  = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
> -      tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
> -      if (!reg_tree)
> -	return;
> -
> -      const auto &eg = ctxt->get_eg ();
> -      refcnt_stmt_finder finder (*eg, reg_tree);
> -      auto pd = make_unique<refcnt_mismatch> (curr_region, ob_refcnt_sval,
> -					      actual_refcnt_sval, reg_tree);
> -      if (pd && eg)
> -	ctxt->warn (std::move (pd), &finder);
> -    }
> +    handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
> +			    actual_refcnt_sval, ctxt);
>  }
>  
>  static void
> @@ -493,8 +520,6 @@ count_all_references (const region_model *model,
>    for (const auto &cluster : *model->get_store ())
>      {
>        auto curr_region = cluster.first;
> -      if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> -	continue;
>  
>        increment_region_refcnt (region_to_refcnt, curr_region);
>  
> @@ -505,8 +530,8 @@ count_all_references (const region_model *model,
>  
>  	  const svalue *unwrapped_sval
>  	      = binding_sval->unwrap_any_unmergeable ();
> -	  // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> -	  // continue;
> +	  if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> +	  continue;

We'll probably want a smarter test for this, that the type "inherits"
C-style from PyObject (e.g. PyLongObject).


>  
>  	  const region *pointee = unwrapped_sval->maybe_get_region ();
>  	  if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> index e1efd9efda5..46daf2f8975 100644
> --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> @@ -75,4 +75,23 @@ test_PyListAppend_6 ()
>    PyObject *list = NULL;
>    PyList_Append(list, item);
>    return list;
> -}
> \ No newline at end of file
> +}
> +
> +PyObject *
> +test_PyListAppend_7 (PyObject *item)
> +{
> +  PyObject *list = PyList_New (0);
> +  Py_INCREF(item);
> +  PyList_Append(list, item);
> +  return list;
> +  /* { dg-warning "expected 'item' to have reference count" "" { target *-*-* } .-1 } */

It would be good if these dg-warning directives regexp contained the
actual and expected counts; I find I can't easily tell what the
intended output is meant to be.


> +}
> +
> +PyObject *
> +test_PyListAppend_8 (PyObject *item, PyObject *list)
> +{
> +  Py_INCREF(item);
> +  Py_INCREF(item);
> +  PyList_Append(list, item);
> +  return list;
> +}

Should we complain here about item->ob_refcnt being too high?

> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> new file mode 100644
> index 00000000000..a7f39509d6d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target analyzer } */
> +/* { dg-options "-fanalyzer" } */
> +/* { dg-require-python-h "" } */
> +
> +
> +#define PY_SSIZE_T_CLEAN
> +#include <Python.h>
> +#include "../analyzer/analyzer-decls.h"
> +
> +PyObject *
> +test_refcnt_1 (PyObject *obj)
> +{
> +  Py_INCREF(obj);
> +  Py_INCREF(obj);
> +  return obj;
> +  /* { dg-warning "expected 'obj' to have reference count" "" { target *-*-* } .-1 } */

Likewise, it would be better for the dg-warning directive's expressed
the expected "actual vs expected" values.

[...snip...]

Thanks again for the patch, hope this is constructive

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
  2023-09-07 17:28                                                                                   ` David Malcolm
@ 2023-09-11  2:12                                                                                     ` Eric Feng
  2023-09-11 19:00                                                                                       ` David Malcolm
  0 siblings, 1 reply; 50+ messages in thread
From: Eric Feng @ 2023-09-11  2:12 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 12148 bytes --]

On Thu, Sep 7, 2023 at 1:28 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
>
> > Hi Dave,
>
> Hi Eric, thanks for the patch.
>
> >
> > Recently I've been working on symbolic value support for the reference
> > count checker. I've attached a patch for it below; let me know it looks
> > OK for trunk. Thanks!
> >
> > Best,
> > Eric
> >
> > ---
> >
> > This patch enhances the reference count checker in the CPython plugin by
> > adding support for symbolic values. Whereas previously we were only able
> > to check the reference count of PyObject* objects created in the scope
> > of the function; we are now able to emit diagnostics on reference count
> > mismatch of objects that were, for example, passed in as a function
> > parameter.
> >
> > rc6.c:6:10: warning: expected ‘obj’ to have reference count: N + ‘1’ but
> ob_refcnt field is N + ‘2’
> >     6 |   return obj;
> >       |          ^~~
>
> [...snip...]
>
> >  create mode 100644
> gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index bf1982e79c3..d7ecd7fce09 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -314,17 +314,20 @@ public:
> >    {
> >      diagnostic_metadata m;
> >      bool warned;
> > -    // just assuming constants for now
> > -    auto actual_refcnt
> > -     = m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
> > -    auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue
> ()->get_constant ();
> > -    warned = warning_meta (rich_loc, m, get_controlling_option (),
> > -                        "expected %qE to have "
> > -                        "reference count: %qE but ob_refcnt field is:
> %qE",
> > -                        m_reg_tree, actual_refcnt, ob_refcnt);
> > -
> > -    // location_t loc = rich_loc->get_loc ();
> > -    // foo (loc);
> > +
> > +    const auto *actual_refcnt_constant
> > +     = m_actual_refcnt->dyn_cast_constant_svalue ();
> > +    const auto *ob_refcnt_constant =
> m_ob_refcnt->dyn_cast_constant_svalue ();
> > +    if (!actual_refcnt_constant || !ob_refcnt_constant)
> > +      return false;
> > +
> > +    auto actual_refcnt = actual_refcnt_constant->get_constant ();
> > +    auto ob_refcnt = ob_refcnt_constant->get_constant ();
> > +    warned = warning_meta (
> > +     rich_loc, m, get_controlling_option (),
> > +     "expected %qE to have "
> > +     "reference count: N + %qE but ob_refcnt field is N + %qE",
> > +     m_reg_tree, actual_refcnt, ob_refcnt);
> >      return warned;
>
> I know you're emulating the old behavior I implemented way back in
> cpychecker, but I don't like that behavior :(
>
> Specifically, although the patch improves the behavior for symbolic
> values, it regresses the precision of wording for the concrete values
> case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only have
> 1 pointer, then it's more readable to say:
>
>   warning: expected ‘obj’ to have reference count: ‘1’ but ob_refcnt
> field is ‘2’
>
> than:
>
>   warning: expected ‘obj’ to have reference count: N + ‘1’ but ob_refcnt
> field is N + ‘2’
>
> ...and we shouldn't quote concrete numbers, the message should be:
>
>   warning: expected ‘obj’ to have reference count of 1 but ob_refcnt field
> is 2


> or better:
>
>   warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field is 2
>
>
> Can you move the unwrapping of the svalue from the tests below into the
> emit vfunc?  That way the m_actual_refcnt doesn't have to be a
> constant_svalue; you could have logic in the emit vfunc to print
> readable messages based on what kind of svalue it is.
>
> Rather than 'N', it might be better to say 'initial'; how about:
>
>   warning: ‘*obj’ is pointed to by 0 additional pointers but 'ob_refcnt'
> field has increased by 1
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field has increased by 2
>   warning: ‘*obj’ is pointed to by 1 additional pointer but 'ob_refcnt'
> field is unchanged
>   warning: ‘*obj’ is pointed to by 2 additional pointers but 'ob_refcnt'
> field has decreased by 1
>   warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt' field
> is unchanged
>
> and similar?
>

That makes sense to me as well (indeed I was just emulating the old
behavior)! Will experiment and keep you posted on a revised patch with this
in mind.  This is somewhat of a minor detail but can we emit ‘*obj’ as
bolded text in the diagnostic message? Currently, I can emit this
(including the asterisk) like so: '*%E'. But unlike using %qE, it doesn't
bold the body of the single quotations. Is this possible?

>
> Maybe have a flag that tracks whether we're talking about a concrete
> value that's absolute versus a concrete value that's relative to the
> initial value?
>
>
> [...snip...]
>
>
> > @@ -369,6 +368,19 @@ increment_region_refcnt (hash_map<const region *,
> int> &map, const region *key)
> >    refcnt = existed ? refcnt + 1 : 1;
> >  }
> >
> > +static const region *
> > +get_region_from_svalue (const svalue *sval, region_model_manager *mgr)
> > +{
> > +  const auto *region_sval = sval->dyn_cast_region_svalue ();
> > +  if (region_sval)
> > +    return region_sval->get_pointee ();
> > +
> > +  const auto *initial_sval = sval->dyn_cast_initial_svalue ();
> > +  if (initial_sval)
> > +    return mgr->get_symbolic_region (initial_sval);
> > +
> > +  return nullptr;
> > +}
>
> This is dereferencing a pointer, right?
>
> Can the caller use region_model::deref_rvalue instead?
>
>
> [...snip...]
>
> > +static void
> > +unwrap_any_ob_refcnt_sval (const svalue *&ob_refcnt_sval)
> > +{
> > +  if (ob_refcnt_sval->get_kind () != SK_CONSTANT)
> > +    {
> > +      auto unwrap_cast = ob_refcnt_sval->maybe_undo_cast ();
> > +      if (!unwrap_cast)
> > +     unwrap_cast = ob_refcnt_sval;
> > +
> > +      if (unwrap_cast->get_kind () == SK_BINOP)
> > +     ob_refcnt_sval = unwrap_cast->dyn_cast_binop_svalue ()->get_arg1
> ();
>
> This would be better spelled as:
>
>          if (const binop_svalue *binop_sval =
> unwrap_cast->dyn_cast_binop_svalue ())
>             ob_refcnt_sval = binop_sval->get_arg1 ();
>
> [...snip...]
>
> >  /* Compare ob_refcnt field vs the actual reference count of a region */
> >  static void
> >  check_refcnt (const region_model *model,
> >             const region_model *old_model,
> >             region_model_context *ctxt,
> >             const hash_map<const ana::region *,
> > -                          int>::iterator::reference_pair region_refcnt)
> > +                          int>::iterator::reference_pair &region_refcnt)
>
> Could really use a typedef for
>   const hash_map<const ana::region *, int>
> to simplify this code.
>
> >  {
> >    region_model_manager *mgr = model->get_manager ();
> >    const auto &curr_region = region_refcnt.first;
> >    const auto &actual_refcnt = region_refcnt.second;
> > +
> >    const svalue *ob_refcnt_sval
> >        = retrieve_ob_refcnt_sval (curr_region, model, ctxt);
> > +  if (!ob_refcnt_sval)
> > +    return;
> > +
> > +  unwrap_any_ob_refcnt_sval (ob_refcnt_sval);
>
> As noted above, can the diagnostic store the pre-unwrapped
> ob_refcnt_sval?  Might mean you have to do the unwrapping both here,
> and later when displaying the diagnostic.  Or (probably best) track
> both the original and unwrapped ob_refcnt_sval, and store both in the
> pending_diagnostic.
>
> > +
> >    const svalue *actual_refcnt_sval = mgr->get_or_create_int_cst (
> >        ob_refcnt_sval->get_type (), actual_refcnt);
> > -
> >    if (ob_refcnt_sval != actual_refcnt_sval)
> > -    {
> > -      const svalue *curr_reg_sval
> > -       = mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
> > -      tree reg_tree = old_model->get_representative_tree
> (curr_reg_sval);
> > -      if (!reg_tree)
> > -     return;
> > -
> > -      const auto &eg = ctxt->get_eg ();
> > -      refcnt_stmt_finder finder (*eg, reg_tree);
> > -      auto pd = make_unique<refcnt_mismatch> (curr_region,
> ob_refcnt_sval,
> > -                                           actual_refcnt_sval,
> reg_tree);
> > -      if (pd && eg)
> > -     ctxt->warn (std::move (pd), &finder);
> > -    }
> > +    handle_refcnt_mismatch (old_model, curr_region, ob_refcnt_sval,
> > +                         actual_refcnt_sval, ctxt);
> >  }
> >
> >  static void
> > @@ -493,8 +520,6 @@ count_all_references (const region_model *model,
> >    for (const auto &cluster : *model->get_store ())
> >      {
> >        auto curr_region = cluster.first;
> > -      if (curr_region->get_kind () != RK_HEAP_ALLOCATED)
> > -     continue;
> >
> >        increment_region_refcnt (region_to_refcnt, curr_region);
> >
> > @@ -505,8 +530,8 @@ count_all_references (const region_model *model,
> >
> >         const svalue *unwrapped_sval
> >             = binding_sval->unwrap_any_unmergeable ();
> > -       // if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> > -       // continue;
> > +       if (unwrapped_sval->get_type () != pyobj_ptr_tree)
> > +       continue;
>
> We'll probably want a smarter test for this, that the type "inherits"
> C-style from PyObject (e.g. PyLongObject).
>
>
> >
> >         const region *pointee = unwrapped_sval->maybe_get_region ();
> >         if (!pointee || pointee->get_kind () != RK_HEAP_ALLOCATED)
> > diff --git
> a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > index e1efd9efda5..46daf2f8975 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-PyList_Append.c
> > @@ -75,4 +75,23 @@ test_PyListAppend_6 ()
> >    PyObject *list = NULL;
> >    PyList_Append(list, item);
> >    return list;
> > -}
> > \ No newline at end of file
> > +}
> > +
> > +PyObject *
> > +test_PyListAppend_7 (PyObject *item)
> > +{
> > +  PyObject *list = PyList_New (0);
> > +  Py_INCREF(item);
> > +  PyList_Append(list, item);
> > +  return list;
> > +  /* { dg-warning "expected 'item' to have reference count" "" { target
> *-*-* } .-1 } */
>
> It would be good if these dg-warning directives regexp contained the
> actual and expected counts; I find I can't easily tell what the
> intended output is meant to be.
>
>
> > +}
> > +
> > +PyObject *
> > +test_PyListAppend_8 (PyObject *item, PyObject *list)
> > +{
> > +  Py_INCREF(item);
> > +  Py_INCREF(item);
> > +  PyList_Append(list, item);
> > +  return list;
> > +}
>
> Should we complain here about item->ob_refcnt being too high?
>
> > diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> > new file mode 100644
> > index 00000000000..a7f39509d6d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-refcnt.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target analyzer } */
> > +/* { dg-options "-fanalyzer" } */
> > +/* { dg-require-python-h "" } */
> > +
> > +
> > +#define PY_SSIZE_T_CLEAN
> > +#include <Python.h>
> > +#include "../analyzer/analyzer-decls.h"
> > +
> > +PyObject *
> > +test_refcnt_1 (PyObject *obj)
> > +{
> > +  Py_INCREF(obj);
> > +  Py_INCREF(obj);
> > +  return obj;
> > +  /* { dg-warning "expected 'obj' to have reference count" "" { target
> *-*-* } .-1 } */
>
> Likewise, it would be better for the dg-warning directive's expressed
> the expected "actual vs expected" values.
>
> [...snip...]
>
> Thanks again for the patch, hope this is constructive
>
> Dave
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646]
  2023-09-11  2:12                                                                                     ` Eric Feng
@ 2023-09-11 19:00                                                                                       ` David Malcolm
  0 siblings, 0 replies; 50+ messages in thread
From: David Malcolm @ 2023-09-11 19:00 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Sun, 2023-09-10 at 22:12 -0400, Eric Feng wrote:
> On Thu, Sep 7, 2023 at 1:28 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> 
> > On Mon, 2023-09-04 at 22:13 -0400, Eric Feng wrote:
> > 

[...snip...]

> > 
> > 
> > I know you're emulating the old behavior I implemented way back in
> > cpychecker, but I don't like that behavior :(
> > 
> > Specifically, although the patch improves the behavior for symbolic
> > values, it regresses the precision of wording for the concrete
> > values
> > case.  If we have e.g. a concrete ob_refcnt of 2, whereas we only
> > have
> > 1 pointer, then it's more readable to say:
> > 
> >   warning: expected ‘obj’ to have reference count: ‘1’ but
> > ob_refcnt
> > field is ‘2’
> > 
> > than:
> > 
> >   warning: expected ‘obj’ to have reference count: N + ‘1’ but
> > ob_refcnt
> > field is N + ‘2’
> > 
> > ...and we shouldn't quote concrete numbers, the message should be:
> > 
> >   warning: expected ‘obj’ to have reference count of 1 but
> > ob_refcnt field
> > is 2
> 
> 
> > or better:
> > 
> >   warning: ‘*obj’ is pointed to by 1 pointer but 'ob_refcnt' field
> > is 2
> > 
> > 
> > Can you move the unwrapping of the svalue from the tests below into
> > the
> > emit vfunc?  That way the m_actual_refcnt doesn't have to be a
> > constant_svalue; you could have logic in the emit vfunc to print
> > readable messages based on what kind of svalue it is.
> > 
> > Rather than 'N', it might be better to say 'initial'; how about:
> > 
> >   warning: ‘*obj’ is pointed to by 0 additional pointers but
> > 'ob_refcnt'
> > field has increased by 1
> >   warning: ‘*obj’ is pointed to by 1 additional pointer but
> > 'ob_refcnt'
> > field has increased by 2
> >   warning: ‘*obj’ is pointed to by 1 additional pointer but
> > 'ob_refcnt'
> > field is unchanged
> >   warning: ‘*obj’ is pointed to by 2 additional pointers but
> > 'ob_refcnt'
> > field has decreased by 1
> >   warning: ‘*obj’ is pointed to by 1 fewer pointers but 'ob_refcnt'
> > field
> > is unchanged
> > 
> > and similar?
> > 
> 
> That makes sense to me as well (indeed I was just emulating the old
> behavior)! Will experiment and keep you posted on a revised patch
> with this
> in mind.  This is somewhat of a minor detail but can we emit ‘*obj’
> as
> bolded text in the diagnostic message? Currently, I can emit this
> (including the asterisk) like so: '*%E'. But unlike using %qE, it
> doesn't
> bold the body of the single quotations. Is this possible?

Yes.

You could use %< and %> to get the colorized (and localized) quotes
(see pretty-print.cc), but better would probably be to pass a tree for
the *obj, rather than obj.  You can make this by building a MEM_REF
tree node wrapping the pointer (you can see an example of this in the
RK_SYMBOLIC case of region_model::get_representative_path_var_1).

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
  2023-08-29  4:35                                                               ` Eric Feng
@ 2023-08-29 21:08                                                               ` David Malcolm
  2023-09-01  2:49                                                               ` Hans-Peter Nilsson
  2 siblings, 0 replies; 50+ messages in thread
From: David Malcolm @ 2023-08-29 21:08 UTC (permalink / raw)
  To: Eric Feng; +Cc: gcc, gcc-patches

On Tue, 2023-08-29 at 00:31 -0400, Eric Feng wrote:
> Hi Dave,

Hi Eric.

Thanks for the updated patch.

A few nits below; this is OK for trunk with them fixed...

[...snip...]

> 
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
> 	* engine.cc (impl_region_model_context::warn): New optional parameter.
> 	* exploded-graph.h (class impl_region_model_context): Likewise.
> 	* region-model.cc (region_model::pop_frame): New callback feature for
>   * region_model::pop_frame.
> 	* region-model.h (struct append_regions_cb_data): Likewise.
> 	(class region_model): Likewise.
> 	(class region_model_context): New optional parameter.
> 	(class region_model_context_decorator): Likewise.
> 
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> 	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> 	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> 	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> 	* gcc.dg/plugin/plugin.exp: New tests.
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

The ChangeLog formatting here seems wrong; lines starting with a '*'
should refer to a filename.  Continuation lines begin with just a tab
character.

[...snip...]

> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
> index 10b2a59e787..440ea6d828d 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h

[...snip...]

> @@ -840,7 +865,8 @@ private:
>  class region_model_context_decorator : public region_model_context
>  {
>   public:
> -  bool warn (std::unique_ptr<pending_diagnostic> d) override
> +  bool warn (std::unique_ptr<pending_diagnostic> d,
> +	     const stmt_finder *custom_finder)
>    {
>      if (m_inner)
>        return m_inner->warn (std::move (d));

This should presumably pass the custom_finder on to the 2nd argument of
m_inner->warn, rather than have the inner call to warn implicitly use
the NULL default arg.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c b/gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c
> similarity index 100%
> rename from gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c
> rename to gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-no-plugin.c

Looks like
  "-no-Python-h.c"
would be a better suffix than
  "-no-plugin.c"
as it's the include that's missing, not the plugin.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> index e1ed2d2589e..cbef6da8d86 100644
> --- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
> +++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
> @@ -161,8 +161,9 @@ set plugin_test_list [list \
>  	  taint-CVE-2011-0521-6.c \
>  	  taint-antipatterns-1.c } \
>      { analyzer_cpython_plugin.c \
> -	  cpython-plugin-test-1.c \
> -	  cpython-plugin-test-2.c } \
> +	  cpython-plugin-test-PyList_Append.c \
> +	  cpython-plugin-test-PyList_New.c \
> +	  cpython-plugin-test-PyLong_FromLong.c } \

Looks like this is missing:
  cpython-plugin-test-no-plugin.c
and
  cpython-plugin-test-refcnt-checking.c
(though as noted above"cpython-plugin-test-no-Python-h.c" would be a
better name for the former)

so it wasn't actually compiling these tests.

Be sure to doublecheck that these tests pass when updating.

[...snip...]

OK for trunk with the above nits fixed.

Dave


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
  2023-08-29  4:35                                                               ` Eric Feng
  2023-08-29 21:08                                                               ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
@ 2023-09-01  2:49                                                               ` Hans-Peter Nilsson
  2023-09-01 14:51                                                                 ` David Malcolm
  2 siblings, 1 reply; 50+ messages in thread
From: Hans-Peter Nilsson @ 2023-09-01  2:49 UTC (permalink / raw)
  To: Eric Feng; +Cc: dmalcolm, gcc, gcc-patches, ef2648

(Looks like this was committed as r14-3580-g597b9ec69bca8a)

> Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng <ef2648@columbia.edu>
> From: Eric Feng via Gcc <gcc@gcc.gnu.org>

> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
> 	* gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
> 	* gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
> 	* gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> 	* gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
> 	* gcc.dg/plugin/plugin.exp: New tests.
> 	* gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
> 	* gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

It seems this was more or less a rewrite, but that said,
it's generally preferable to always *add* tests, never *modify* them.

>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +++++++++++++++++-

^^^ Ouch!  Was it not within reason to keep that test as it
was, and just add another test?

Anyway, the test after rewrite fails, and for some targets
like cris-elf and apparently m68k-linux, yields an error.
I see a PR was already opened.

Also, mostly for future reference, several files in the
patch miss a final newline, as seen by a "\ No newline at
end of file"-marker.

I think I found the problem; a mismatch between default C++
language standard between host-gcc and target-gcc.

(It's actually *not* as simple as "auto var = typeofvar<bar>()"
not being recognized in C++11 --or else there'd be an error
for the hash_set declaration too, which I just changed for
consistency-- but it's close enough for me.)

With this, retesting plugin.exp for cris-elf works.

Ok to commit?

-- >8 --
From: Hans-Peter Nilsson <hp@axis.com>
Date: Fri, 1 Sep 2023 04:36:03 +0200
Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c declarations, PR testsuite/111264

Also, add missing newline at end of file.

	PR testsuite/111264
	* gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
	C++11-compatible.
---
 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7af520436549..bf1982e79c37 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
   if (!ctxt)
     return;
 
-  auto region_to_refcnt = hash_map<const region *, int> ();
-  auto seen_regions = hash_set<const region *> ();
+  hash_map<const region *, int> region_to_refcnt;
+  hash_set<const region *> seen_regions;
 
   count_pyobj_references (model, region_to_refcnt, retval, seen_regions);
   check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
@@ -561,7 +561,7 @@ public:
     if (!ctxt)
       return;
     region_model *model = cd.get_model ();
-    auto region_to_refcnt = hash_map<const region *, int> ();
+    hash_map<const region *, int> region_to_refcnt;
     count_all_references(model, region_to_refcnt);
     dump_refcnt_info(region_to_refcnt, model, ctxt);
   }
@@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args *plugin_info,
   sorry_no_analyzer ();
 #endif
   return 0;
-}
\ No newline at end of file
+}
-- 
2.30.2

brgds, H-P

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-09-01  2:49                                                               ` Hans-Peter Nilsson
@ 2023-09-01 14:51                                                                 ` David Malcolm
  2023-09-01 21:07                                                                   ` Eric Feng
  0 siblings, 1 reply; 50+ messages in thread
From: David Malcolm @ 2023-09-01 14:51 UTC (permalink / raw)
  To: Hans-Peter Nilsson, Eric Feng; +Cc: gcc, gcc-patches

On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> 
> > Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng
> > <ef2648@columbia.edu>
> > From: Eric Feng via Gcc <gcc@gcc.gnu.org>
> 
> > gcc/testsuite/ChangeLog:
> >   PR analyzer/107646
> >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > reference count
> >   * checking for PyObjects.
> >         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> >         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > ...here (and
> >   * added more tests).
> >         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> >         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > (and added
> >   * more tests).
> >         * gcc.dg/plugin/plugin.exp: New tests.
> >         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> >         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > test.
> >         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > test.
> 
> It seems this was more or less a rewrite, but that said,
> it's generally preferable to always *add* tests, never *modify* them.
> 
> >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > +++++++++++++++++-
> 
> ^^^ Ouch!  Was it not within reason to keep that test as it
> was, and just add another test?
> 
> Anyway, the test after rewrite fails, and for some targets
> like cris-elf and apparently m68k-linux, yields an error.
> I see a PR was already opened.
> 
> Also, mostly for future reference, several files in the
> patch miss a final newline, as seen by a "\ No newline at
> end of file"-marker.
> 
> I think I found the problem; a mismatch between default C++
> language standard between host-gcc and target-gcc.
> 
> (It's actually *not* as simple as "auto var = typeofvar<bar>()"
> not being recognized in C++11 --or else there'd be an error
> for the hash_set declaration too, which I just changed for
> consistency-- but it's close enough for me.)
> 
> With this, retesting plugin.exp for cris-elf works.
> 
> Ok to commit?

Sorry about the failing tests.

Thanks for the patch; please go ahead and commit.

Dave

> 
> -- >8 --
> From: Hans-Peter Nilsson <hp@axis.com>
> Date: Fri, 1 Sep 2023 04:36:03 +0200
> Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> declarations, PR testsuite/111264
> 
> Also, add missing newline at end of file.
> 
>         PR testsuite/111264
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
>         C++11-compatible.
> ---
>  gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 7af520436549..bf1982e79c37 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> @@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
>    if (!ctxt)
>      return;
>  
> -  auto region_to_refcnt = hash_map<const region *, int> ();
> -  auto seen_regions = hash_set<const region *> ();
> +  hash_map<const region *, int> region_to_refcnt;
> +  hash_set<const region *> seen_regions;
>  
>    count_pyobj_references (model, region_to_refcnt, retval,
> seen_regions);
>    check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
> @@ -561,7 +561,7 @@ public:
>      if (!ctxt)
>        return;
>      region_model *model = cd.get_model ();
> -    auto region_to_refcnt = hash_map<const region *, int> ();
> +    hash_map<const region *, int> region_to_refcnt;
>      count_all_references(model, region_to_refcnt);
>      dump_refcnt_info(region_to_refcnt, model, ctxt);
>    }
> @@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args
> *plugin_info,
>    sorry_no_analyzer ();
>  #endif
>    return 0;
> -}
> \ No newline at end of file
> +}


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]
  2023-09-01 14:51                                                                 ` David Malcolm
@ 2023-09-01 21:07                                                                   ` Eric Feng
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Feng @ 2023-09-01 21:07 UTC (permalink / raw)
  To: David Malcolm; +Cc: Hans-Peter Nilsson, gcc, gcc-patches

Thank you for the patch!

On Fri, Sep 1, 2023 at 10:51 AM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-09-01 at 04:49 +0200, Hans-Peter Nilsson wrote:
> > (Looks like this was committed as r14-3580-g597b9ec69bca8a)
> >
> > > Cc: gcc@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng
> > > <ef2648@columbia.edu>
> > > From: Eric Feng via Gcc <gcc@gcc.gnu.org>
> >
> > > gcc/testsuite/ChangeLog:
> > >   PR analyzer/107646
> > >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements
> > > reference count
> > >   * checking for PyObjects.
> > >         * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
> > >         * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c:
> > > ...here (and
> > >   * added more tests).
> > >         * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
> > >         * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here
> > > (and added
> > >   * more tests).
> > >         * gcc.dg/plugin/plugin.exp: New tests.
> > >         * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
> > >         * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New
> > > test.
> > >         * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New
> > > test.
> >
> > It seems this was more or less a rewrite, but that said,
> > it's generally preferable to always *add* tests, never *modify* them.
> >
> > >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376
> > > +++++++++++++++++-
> >
> > ^^^ Ouch!  Was it not within reason to keep that test as it
> > was, and just add another test?
Thanks for the feedback. To clarify, 'analyzer_cpython_plugin.c' is
not a test itself but rather a plugin that currently lives within the
testsuite. The core of the test cases were also not modified, rather I
renamed certain filenames containing them for clarity (unless this is
what you meant in terms of modification, in which case noted) and
added to them. However, I understand the preference and will keep that
in mind.
> >
> > Anyway, the test after rewrite fails, and for some targets
> > like cris-elf and apparently m68k-linux, yields an error.
> > I see a PR was already opened.
> >
> > Also, mostly for future reference, several files in the
> > patch miss a final newline, as seen by a "\ No newline at
> > end of file"-marker.
Noted.
> >
> > I think I found the problem; a mismatch between default C++
> > language standard between host-gcc and target-gcc.
> >
> > (It's actually *not* as simple as "auto var = typeofvar<bar>()"
> > not being recognized in C++11 --or else there'd be an error
> > for the hash_set declaration too, which I just changed for
> > consistency-- but it's close enough for me.)
> >
> > With this, retesting plugin.exp for cris-elf works.
Sounds good, thanks again! I was also curious about why hash_map had
an issue here with that syntax whilst hash_set did not, so I tried to
investigate a bit further. I believe the issue was due to the compiler
having trouble disambiguating between the hash_map constructors in
C++11.

From the error message we received:

test/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c:480:58:
error: no matching function for call to 'hash_map<const ana::region*,
int>::hash_map(hash_map<const ana::region*, int>)'
   auto region_to_refcnt = hash_map<const region *, int> ();

I think the compiler is mistakenly interpreting the call here as we
would like to create a new hash_map object using its copy constructor,
but we "forgot" to provide the object to be copied, rather than our
intention of using the default constructor.

Looking at hash_map.h and hash_set.h seems to support this hypothesis,
as hash_map has two constructors, one of which resembles a copy
constructor with additional arguments:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-map.h#L147.
Perhaps the default arguments here further complicated the ambiguity
as to which constructor to use in the presence of the empty
parenthesis.

On the other hand, hash_set has only the default constructor with
default parameters, and thus there is no ambiguity:
https://github.com/gcc-mirror/gcc/blob/master/gcc/hash-set.h#L40.

I assume this ambiguity was cleared up by later versions, and thus we
observed no problems in C++17. However, I am certainly still a
relative newbie of C++, so please anyone feel free to correct my
reasoning and chime in!
> >
> > Ok to commit?
>
> Sorry about the failing tests.
>
> Thanks for the patch; please go ahead and commit.
>
> Dave
>
> >
> > -- >8 --
> > From: Hans-Peter Nilsson <hp@axis.com>
> > Date: Fri, 1 Sep 2023 04:36:03 +0200
> > Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c
> > declarations, PR testsuite/111264
> >
> > Also, add missing newline at end of file.
> >
> >         PR testsuite/111264
> >         * gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
> >         C++11-compatible.
> > ---
> >  gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > index 7af520436549..bf1982e79c37 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> > @@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
> >    if (!ctxt)
> >      return;
> >
> > -  auto region_to_refcnt = hash_map<const region *, int> ();
> > -  auto seen_regions = hash_set<const region *> ();
> > +  hash_map<const region *, int> region_to_refcnt;
> > +  hash_set<const region *> seen_regions;
> >
> >    count_pyobj_references (model, region_to_refcnt, retval,
> > seen_regions);
> >    check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
> > @@ -561,7 +561,7 @@ public:
> >      if (!ctxt)
> >        return;
> >      region_model *model = cd.get_model ();
> > -    auto region_to_refcnt = hash_map<const region *, int> ();
> > +    hash_map<const region *, int> region_to_refcnt;
> >      count_all_references(model, region_to_refcnt);
> >      dump_refcnt_info(region_to_refcnt, model, ctxt);
> >    }
> > @@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args
> > *plugin_info,
> >    sorry_no_analyzer ();
> >  #endif
> >    return 0;
> > -}
> > \ No newline at end of file
> > +}
>

Best,
Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2023-09-11 19:00 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-25  4:49 Update and Questions on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-07-25 14:41 ` David Malcolm
2023-07-27 22:13   ` Eric Feng
2023-07-27 22:35     ` David Malcolm
2023-07-30 17:52       ` Eric Feng
2023-07-30 23:44         ` David Malcolm
2023-08-01 13:57           ` Eric Feng
2023-08-01 17:06             ` David Malcolm
2023-08-04 15:02               ` Eric Feng
2023-08-04 15:39                 ` David Malcolm
2023-08-04 20:48                   ` Eric Feng
2023-08-04 22:42                     ` David Malcolm
2023-08-04 22:46                       ` David Malcolm
2023-08-07 18:31                         ` Eric Feng
2023-08-07 23:16                           ` David Malcolm
2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
2023-08-08 18:08                               ` David Malcolm
2023-08-08 18:51                               ` David Malcolm
2023-08-09 19:22                                 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
2023-08-09 21:36                                   ` David Malcolm
2023-08-11 17:47                                     ` [COMMITTED] " Eric Feng
2023-08-11 20:23                                       ` Eric Feng
2023-08-16 19:17                                         ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-08-16 21:28                                           ` David Malcolm
2023-08-17  1:47                                             ` Eric Feng
2023-08-21 14:05                                               ` Eric Feng
2023-08-21 15:04                                                 ` David Malcolm
2023-08-23 21:15                                                   ` Eric Feng
2023-08-23 23:16                                                     ` David Malcolm
2023-08-24 14:45                                                       ` Eric Feng
2023-08-25 12:50                                                         ` Eric Feng
2023-08-25 19:50                                                           ` David Malcolm
2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29  4:35                                                               ` Eric Feng
2023-08-29 17:28                                                                 ` Eric Feng
2023-08-29 21:14                                                                   ` David Malcolm
2023-08-30 22:15                                                                     ` Eric Feng
2023-08-31 17:01                                                                       ` David Malcolm
2023-08-31 19:09                                                                         ` Eric Feng
2023-08-31 20:19                                                                           ` David Malcolm
2023-09-01  1:25                                                                             ` Eric Feng
2023-09-01 11:57                                                                               ` David Malcolm
2023-09-05  2:13                                                                                 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
2023-09-07 17:28                                                                                   ` David Malcolm
2023-09-11  2:12                                                                                     ` Eric Feng
2023-09-11 19:00                                                                                       ` David Malcolm
2023-08-29 21:08                                                               ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
2023-09-01  2:49                                                               ` Hans-Peter Nilsson
2023-09-01 14:51                                                                 ` David Malcolm
2023-09-01 21:07                                                                   ` Eric Feng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).