public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Eric Feng <ef2648@columbia.edu>
Cc: gcc@gcc.gnu.org
Subject: Re: [PATCH] WIP for dg-require-python-h [PR107646]
Date: Tue, 08 Aug 2023 14:51:05 -0400	[thread overview]
Message-ID: <132e5dd3440e22003db92244e9d17916fe33f0c7.camel@redhat.com> (raw)
In-Reply-To: <20230808165159.80250-1-ef2648@columbia.edu>

On Tue, 2023-08-08 at 12:51 -0400, Eric Feng wrote:
> Unfortunately, there doesn’t seem to be any ERRORs in the .log nor
> any of the debug print statements which I’ve scattered within proc
> dg-require-python-h when run. I’ve attached the WIP below; thank you!
> Please note that in this version of the patch, I’ve removed the other
> (non Python) test cases in plugin.exp for convenience. 
> 
> Aside from issues with dg-require-python-h, everything works as
> expected (when using /* { dg-options "-fanalyzer -
> I/usr/include/python3.9" }. The patch includes support for
> PyList_New, PyLong_FromLong, PyList_Append and also the optional
> parameters for get_or_create_region_for_heap_alloc as we previously
> discussed. I will submit the version of the patch sans dg-require-
> python-h to gcc-patches for review as soon as I confirm regtests pass
> as expected; perhaps we can first push these changes to trunk and
> later push a separate patch for dg-require-python-h. 
> > 

[...snip...]

Various comments on the WIP patch inline below...


> ---
> This patch adds known function subclasses for the following Python/C
> API: PyList_New, PyLong_FromLong, PyList_Append. It also adds new
> optional parameters to
> region_model::get_or_create_region_for_heap_alloc
> so that the newly allocated region may transition from the start
> state
> to the assumed non null state on the malloc state machine immediately
> if
> desired.
> 
> The main warnings we gain in this patch with respect to the known
> function subclasses
> mentioned are leak related. For example:
> 
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-
> leak]
> │
>    21 |   return list;
>       │
>       |          ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
>     |
> │
>     |    4 |   PyObject* item = PyLong_FromLong(10);
> │
>     |      |                    ^~~~~~~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (1) allocated here
> │
>     |      |                    (2) when ‘PyLong_FromLong’ succeeds
> │
>     |    5 |   PyObject* list = PyList_New(2);
> │
>     |      |                    ~~~~~~~~~~~~~
> │
>     |      |                    |
> │
>     |      |                    (3) when ‘PyList_New’ fails
> │
>     |......
> │
>     |   21 |   return list;
> │
>     |      |          ~~~~
> │
>     |      |          |
> │
>     |      |          (4) ‘item’ leaks here; was allocated at (1)
> │
> 
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with
> the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every
> time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual
> implementation
> follows.
> 
> gcc/analyzer/ChangeLog:
>   PR analyzer/107646
>         * region-model.cc
> (region_model::get_or_create_region_for_heap_alloc):
>   New optional parameters.

>         * region-model.h (class region_model): Likewise.
>         * sm-malloc.cc (on_realloc_with_move): New function.
>         (region_model::move_ptr_sval_non_null): New function.
> 
> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>         * gcc.dg/plugin/analyzer_cpython_plugin.c: New features for
> plugin.

I don't want a lot of detail, but please have the ChangeLog entry at
least list the new functions that are being simulated, since that
willlikely the most pertinent information when we go back look at the
logs

>         * gcc.dg/plugin/plugin.exp: New test.
>         * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
> 
> Signed-off-by: Eric Feng <ef2648@columbia.edu>
> ---
>  gcc/analyzer/region-model.cc                  |  15 +-
>  gcc/analyzer/region-model.h                   |  10 +-
>  gcc/analyzer/sm-malloc.cc                     |  40 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 710
> ++++++++++++++++++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c     |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp        | 107 +--
>  gcc/testsuite/lib/target-supports.exp         |  25 +
>  7 files changed, 876 insertions(+), 109 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 2.c
> 
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index e92b3f7b074..c53446b2afc 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5131,7 +5131,9 @@ region_model::check_dynamic_size_for_floats
> (const svalue *size_in_bytes,
>  
>  const region *
>  region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> -                                                 
> region_model_context *ctxt)
> +       region_model_context *ctxt,
> +       bool register_alloc,
> +       const call_details *cd)

When adding new params/functionality to an existing function, please
update the leading comment.


>  {
>    /* Determine which regions are referenced in this region_model, so
> that
>       we can reuse an existing heap_allocated_region if it's not in
> use on
> @@ -5153,6 +5155,17 @@
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
>    if (size_in_bytes)
>      if (compat_types_p (size_in_bytes->get_type (), size_type_node))
>        set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> +       if (register_alloc && cd)
> +               {
> +                       const svalue *ptr_sval = nullptr;
> +                       if (cd->get_lhs_type ())
> +       ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> +                       else
> +       ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
> +                       move_ptr_sval_non_null (ctxt, ptr_sval);
> +               }
> +
>    return reg;
>  }
>  
> diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-
> model.h
> index 0cf38714c96..84c964fadc9 100644
> --- a/gcc/analyzer/region-model.h
> +++ b/gcc/analyzer/region-model.h
> @@ -387,9 +387,9 @@ class region_model
>                        region_model_context *ctxt,
>                        rejected_constraint **out);
>  
> -  const region *
> -  get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
> -                                      region_model_context *ctxt);
> +  const region *get_or_create_region_for_heap_alloc (
> +      const svalue *size_in_bytes, region_model_context *ctxt,
> +      bool register_alloc = false, const call_details *cd =
> nullptr);

I don't like "register_alloc" as a param name as the word "register" is
overused in the context of a compiler.  Maybe "update_state_machine"?

>    const region *create_region_for_alloca (const svalue
> *size_in_bytes,
>                                           region_model_context
> *ctxt);
>    void get_referenced_base_regions (auto_bitmap &out_ids) const;
> @@ -476,6 +476,10 @@ class region_model
>                              const svalue *old_ptr_sval,
>                              const svalue *new_ptr_sval);
>  
> +  /* Implemented in sm-malloc.cc.  */
> +  void move_ptr_sval_non_null (region_model_context *ctxt,
> +       const svalue *new_ptr_sval);

"move" makes me think of C++11; how about
"transition_ptr_sval_to_non_null"?

> +
>    /* Implemented in sm-taint.cc.  */
>    void mark_as_tainted (const svalue *sval,
>                         region_model_context *ctxt);

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> index 9ecc42d4465..4d985620c01 100644
> --- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> +++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c

[...snip...]

The known_function implementations get very long, alas.  Some of that
may be inevitable, but there's a lot of repetition within them, which
ought to be split out into subroutines.

Some examples:
- grepping for PLUS_EXPR, there seem to be two places that spell out
how to incref an object; probably there should be a static helper
function for doing this.

- grepping for "ob_refcnt", there are two places which simulate a new
object having its reference count initialized to 1.  Again, this ought
to be a call to a static helper subroutine.

- grepping for "ob_type", there are two places that simulate
initializing the ob_type of a new object to point at a type object. 
Likewise, this ought to be a call to a static helper subroutine.

Hopefully doing so will help tame the complexity of these functions,
and as we add new ones, they'll probably be able to reuse some of these
subroutines.

[...snip...]

> +class kf_PyLong_FromLong : public known_function
> +{
> +public:
> +  bool
> +  matches_call_types_p (const call_details &cd) const final override
> +  {
> +    return (cd.num_args () == 1 && arg_is_long_p (cd, 0));
> +  }

I think it's probably enough here to just check that the arg's type is
integral, rather than a specific type of integer.

[...snip...]

> 
> @@ -205,6 +909,12 @@ cpython_analyzer_init_cb (void *gcc_data, void *
> /*user_data */)
>        sorry_no_cpython_plugin ();
>        return;
>      }
> +
> +  iface->register_known_function ("PyLong_FromLong",
> +                                  make_unique<kf_PyLong_FromLong>
> ());
> +  iface->register_known_function ("PyList_New",
> make_unique<kf_PyList_New> ());
> +  iface->register_known_function ("PyList_Append",
> +                                  make_unique<kf_PyList_Append> ());

Hopefully there will eventually be a lot of Py* known_functions, so
shall we keep the functions alphabetized?  (both in terms of
registration, and in terms of declarations/definitions).  

[...snip...]

Hope this is constructive
Dave

  parent reply	other threads:[~2023-08-08 18:51 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-25  4:49 Update and Questions on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-07-25 14:41 ` David Malcolm
2023-07-27 22:13   ` Eric Feng
2023-07-27 22:35     ` David Malcolm
2023-07-30 17:52       ` Eric Feng
2023-07-30 23:44         ` David Malcolm
2023-08-01 13:57           ` Eric Feng
2023-08-01 17:06             ` David Malcolm
2023-08-04 15:02               ` Eric Feng
2023-08-04 15:39                 ` David Malcolm
2023-08-04 20:48                   ` Eric Feng
2023-08-04 22:42                     ` David Malcolm
2023-08-04 22:46                       ` David Malcolm
2023-08-07 18:31                         ` Eric Feng
2023-08-07 23:16                           ` David Malcolm
2023-08-08 16:51                             ` [PATCH] WIP for dg-require-python-h [PR107646] Eric Feng
2023-08-08 18:08                               ` David Malcolm
2023-08-08 18:51                               ` David Malcolm [this message]
2023-08-09 19:22                                 ` [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646] Eric Feng
2023-08-09 21:36                                   ` David Malcolm
2023-08-11 17:47                                     ` [COMMITTED] " Eric Feng
2023-08-11 20:23                                       ` Eric Feng
2023-08-16 19:17                                         ` Update on CPython Extension Module -fanalyzer plugin development Eric Feng
2023-08-16 21:28                                           ` David Malcolm
2023-08-17  1:47                                             ` Eric Feng
2023-08-21 14:05                                               ` Eric Feng
2023-08-21 15:04                                                 ` David Malcolm
2023-08-23 21:15                                                   ` Eric Feng
2023-08-23 23:16                                                     ` David Malcolm
2023-08-24 14:45                                                       ` Eric Feng
2023-08-25 12:50                                                         ` Eric Feng
2023-08-25 19:50                                                           ` David Malcolm
2023-08-29  4:31                                                             ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] Eric Feng
2023-08-29  4:35                                                               ` Eric Feng
2023-08-29 17:28                                                                 ` Eric Feng
2023-08-29 21:14                                                                   ` David Malcolm
2023-08-30 22:15                                                                     ` Eric Feng
2023-08-31 17:01                                                                       ` David Malcolm
2023-08-31 19:09                                                                         ` Eric Feng
2023-08-31 20:19                                                                           ` David Malcolm
2023-09-01  1:25                                                                             ` Eric Feng
2023-09-01 11:57                                                                               ` David Malcolm
2023-09-05  2:13                                                                                 ` [PATCH] analyzer: implement symbolic value support for CPython plugin's refcnt checker [PR107646] Eric Feng
2023-09-07 17:28                                                                                   ` David Malcolm
2023-09-11  2:12                                                                                     ` Eric Feng
2023-09-11 19:00                                                                                       ` David Malcolm
2023-08-29 21:08                                                               ` [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646] David Malcolm
2023-09-01  2:49                                                               ` Hans-Peter Nilsson
2023-09-01 14:51                                                                 ` David Malcolm
2023-09-01 21:07                                                                   ` Eric Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=132e5dd3440e22003db92244e9d17916fe33f0c7.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=ef2648@columbia.edu \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).