Re: [PATCH] Adding three new function attributes for static analysis of file descriptors

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: David Malcolm <dmalcolm@redhat.com>
To: mirimnan017@gmail.com, gcc-patches@gcc.gnu.org
Cc: Immad Mir <mirimmad@outlook.com>
Subject: Re: [PATCH] Adding three new function attributes for static analysis of file descriptors
Date: Fri, 15 Jul 2022 12:57:54 -0400	[thread overview]
Message-ID: <77393168bbd011cdc3a708d96f97f8f421155a3d.camel@redhat.com> (raw)
In-Reply-To: <MWHPR1801MB1919D1A08ACF815E5C32ADD7C68B9@MWHPR1801MB1919.namprd18.prod.outlook.com>

On Fri, 2022-07-15 at 21:08 +0530, Immad Mir wrote:


Thanks for the patch.

Various review comments:

The patch is missing a ChangeLog.

> ---
>  gcc/analyzer/sm-fd.cc                | 257 ++++++++++++++++++++++++---
>  gcc/c-family/c-attribs.cc            | 115 ++++++++++++
>  gcc/doc/extend.texi                  |  19 ++
>  gcc/testsuite/gcc.dg/analyzer/fd-5.c |  53 ++++++
>  gcc/testsuite/gcc.dg/analyzer/fd-6.c |  14 ++
>  5 files changed, 431 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/fd-6.c
> 
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> index 8e4300b06e2..20018dfd12b 100644
> --- a/gcc/analyzer/sm-fd.cc
> +++ b/gcc/analyzer/sm-fd.cc

[...snip...]

> +enum fd_access_direction
> +{
> +  DIR_READ_WRITE,
> +  DIR_READ,
> +  DIR_WRITE
> +};

Don't we already have DIR_READ and DIR_WRITE in enum access_direction
in analyzer.h?  I wonder why this isn't an error due to the naming
collision?

The new enum refers to a *set* of valid access directions, so maybe
rename the new enum to "enum access_directions" (note the plural), and
rename the elements from DIR_ to DIRS_.  Please carefully check all
usage of DIR_* in sm-fd.cc.

Maybe instead we should convert the values in enum access_direction
from indexes into bitfield masks.

[...snip...]

> @@ -317,19 +335,27 @@ public:
>    bool
>    emit (rich_location *rich_loc) final override
>    {
> +    bool warned;
>      switch (m_fd_dir)
>        {
>        case DIR_READ:
> -        return warning_at (rich_loc, get_controlling_option (),
> +        warned =  warning_at (rich_loc, get_controlling_option (),
>                             "%qE on %<read-only%> file descriptor %qE",
>                             m_callee_fndecl, m_arg);
> +        break;
>        case DIR_WRITE:
> -        return warning_at (rich_loc, get_controlling_option (),
> +        warned = warning_at (rich_loc, get_controlling_option (),
>                             "%qE on %<write-only%> file descriptor %qE",
>                             m_callee_fndecl, m_arg);
> +        break;

...i.e. which DIR_READ/DIR_WRITE values are these cases using?

>        default:
>          gcc_unreachable ();
>        }
> +      if (warned && m_attr)
> +      {
> +        m_sm.inform_filedescriptor_attribute (m_callee_fndecl, m_arg_idx, m_fd_dir);
> +      }

Redundant braces here ^^^

> +      return warned;
>    }
>  
>    bool
> @@ -359,8 +385,10 @@ public:
>    }
>  
>  private:
> -  enum access_direction m_fd_dir;
> +  enum fd_access_direction m_fd_dir;
>    const tree m_callee_fndecl;
> +  bool m_attr;
> +  int m_arg_idx;
>  };

Most (all?) of the concrete subclasses seem to be adding the two fields
m_attr and m_arg_idx, so can you add them to class fd_diagnostic, or
introduce a common subclass, rather than repeating them in each
subclass?

I suspect the code would be simpler if inform_filedescriptor_attribute
were a method of fd_diagnostic, rather than of the state_machine: you
could move the (&& m_attr) part of the conditional into there, rather
than having every emit vfunc do the same test, and having to pass the
same fields to it each time (except perhaps the access dir?).

[...snip...]

> @@ -466,25 +510,29 @@ public:
>    describe_final_event (const evdesc::final_event &ev) final override
>    {
>      if (m_first_close_event.known_p ())
> -      return ev.formatted_print (
> -          "%qE on closed file descriptor %qE; %qs was at %@", m_callee_fndecl,
> -          m_arg, "close", &m_first_close_event);
> -    else
> -      return ev.formatted_print ("%qE on closed file descriptor %qE",
> -                                 m_callee_fndecl, m_arg);
> +        return ev.formatted_print (
> +            "%qE on closed file descriptor %qE; %qs was at %@", m_callee_fndecl,
> +            m_arg, "close", &m_first_close_event);
> +      else
> +        return ev.formatted_print ("%qE on closed file descriptor %qE",
> +                                  m_callee_fndecl, m_arg);

What changed here?  Is this just whitespace changes?

[...snip...]

>  fd_state_machine::fd_state_machine (logger *logger)
>      : state_machine ("file-descriptor", logger),
>        m_constant_fd (add_state ("fd-constant")),
> @@ -647,11 +733,126 @@ fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
>              on_read (sm_ctxt, node, stmt, call, callee_fndecl);
>              return true;
>            } // "read"
> +
> +          
> +          {
> +            // Handle __attribute__((fd_arg))
> +            bitmap argmap = get_fd_attrs ("fd_arg", callee_fndecl);
> +            if (argmap)
> +              check_for_fd_attrs (sm_ctxt, node, stmt, call, callee_fndecl, argmap, DIR_READ_WRITE);
> +    
> +            // Handle __attribute__((fd_arg_read))
> +            bitmap read_argmap = get_fd_attrs ("fd_arg_read", callee_fndecl);
> +            if(read_argmap)
> +              check_for_fd_attrs (sm_ctxt, node, stmt, call, callee_fndecl, read_argmap, DIR_READ);
> +            
> +            // Handle __attribute__((fd_arg_write))
> +            bitmap write_argmap = get_fd_attrs ("fd_arg_write", callee_fndecl);
> +            if (write_argmap)
> +              check_for_fd_attrs (sm_ctxt, node, stmt, call, callee_fndecl, write_argmap, DIR_WRITE);

I think there are three memory leaks here, anytime the bitmap is non-
null you need a
  BITMAP_FREE (foo_argmap);
on it.

Rather than repeating that logic three times, it's probably much
cleaner to combine get_fd_attrs and check_for_fd_attrs into a single
function that:
  - builds the bitmap
  - bails out early for the common case where there's no attribute
  - use the bitmap
  - frees it, or, better, uses auto_bitmap rather than bitmap so that
the cleanup happens implicitly

Then the above logic could read:

  handle_any_fd_attrs (callee_fndecl, sm_ctxt, node, stmt, call,
                       "fd_arg", DIR_READ_WRITE);
  handle_any_fd_attrs (callee_fndecl, sm_ctxt, node, stmt, call,
                       "fd_arg_read", DIR_READ);
  handle_any_fd_attrs (callee_fndecl, sm_ctxt, node, stmt, call,
  		       "fd_arg_write", DIR_WRITE);

or similar.


> +            
> +            return true;
> +          }
> +          
>        }
>  
>    return false;
>  }

[...snip...]

> +bitmap
> +fd_state_machine::get_fd_attrs (const char *attr_name, tree callee_fndecl) const
> +{
> +  bitmap argmap = NULL;
> +  tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
> +  attrs = lookup_attribute (attr_name, attrs);
> +  if (!attrs)
> +    return argmap;
> +
> +  if (!TREE_VALUE (attrs))
> +    return argmap;
> +
> +  argmap = BITMAP_ALLOC (NULL);
> +
> +  for (tree idx = TREE_VALUE (attrs); idx; idx = TREE_CHAIN (idx))
> +    {
> +      unsigned int val = TREE_INT_CST_LOW (TREE_VALUE (idx)) - 1;

We need to be sure that the attribute-validation logic in c-attribs.cc
rejects argument numbers that aren't >= 1.  I think the above will
crash if the user supplies an argument number that's 0, negative, or
not an integer.  I *think* positional_argument checks for all this and
will reject such attributes, but let's have coverage for each of these
in the test suite.

> +      bitmap_set_bit (argmap, val);
> +    }
> +  return argmap;
> +}
> +
> +
>  void
>  fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
>                             const gimple *stmt, const gcall *call) const
> @@ -729,7 +930,7 @@ void
>  fd_state_machine::check_for_open_fd (
>      sm_context *sm_ctxt, const supernode *node, const gimple *stmt,
>      const gcall *call, const tree callee_fndecl,
> -    enum access_direction callee_fndecl_dir) const
> +    enum fd_access_direction callee_fndecl_dir) const
>  {
>    tree arg = gimple_call_arg (call, 0);
>    tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> @@ -738,7 +939,7 @@ fd_state_machine::check_for_open_fd (
>    if (is_closed_fd_p (state))
>      {
>        sm_ctxt->warn (node, stmt, arg,
> -                     new fd_use_after_close (*this, diag_arg, callee_fndecl));
> +                     new fd_use_after_close (*this, diag_arg, callee_fndecl, false, -1));

I'd prefer overloaded constructors for these fd_diagnostic subclasses,
one for the for the hardcoded case, another for the attribute case, so
that the -1 for the "this is meaningless" attribute argno is only in
the fd_diagnostic constructor.

[...snip...]

> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc

[...snip...]

> +/* Handle the "fd_arg" attribute */
> +
> +static tree
> +handle_fd_arg_attribute (tree *node, tree name, tree args,
> +                              int ARG_UNUSED (flags), bool *no_add_attrs)
> +{

Am I right in thinking that all three of the handle_fd_arg*_attribute
callback functions are identical?  If so, let's just have a single
callback shared by all three entries in the array.

> +  tree type = *node;
> +  if (!args)
> +    {
> +      if (!prototype_p (type))
> +        {
> +          error ("%qE attribute without arguments on a non-prototype", name);
> +          *no_add_attrs = true;
> +        }
> +      return NULL_TREE;
> +    }
> +
> +  for (int i = 1; args; ++i)
> +    {
> +      tree pos = TREE_VALUE (args);
> +      tree next = TREE_CHAIN (args);
> +      if (tree val = positional_argument (type, name, pos, INTEGER_TYPE,
> +                                          next || i > 1 ? i : 0))
> +        TREE_VALUE (args) = val;
> +      else
> +        {
> +          *no_add_attrs = true;
> +          break;
> +        }
> +      args = next;
> +    }

Looks like you've copied and pasted the above loop from
handle_alloc_size_attribute, which can take multiple args.

The new attributes only take a single argument, so I think
handle_alloc_align_attribute is a better model.


> +    return NULL_TREE;
> +}

[...snip...]

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index dfbe33ac652..eba86e9f7ef 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3007,6 +3007,25 @@ produced by @command{gold}.
>  For other linkers that cannot generate resolution file,
>  explicit @code{externally_visible} attributes are still necessary.
>  
> +@item fd_arg (@Var{N}, @Var{...})

The @var directives should have lower-case "v".

You should add a @cindex line below the @item for each of these; see
the existing function attributes for how to do that.

> +
> +The @Var{fd_arg} attribute may be applied to a function that takes an open file

@Var should be @code here.

> +descriptor at referenced argument @Var{N}. It indicates that the passed file
> +descriptor must not have been closed.

Please spell out which analyzer warnings are affected by this
attribute.

> +
> +@item fd_arg_read (@Var{N}, @Var{...})

Aha: you're supporting 1 or more multiple arguments, rather than just
1?

I don't think we need this, almost every API I can think of takes a
single FD (or is a special-case, like dup2).

So for simplicity, let's just support a single argument per attribute;
if there's a rare case where an API takes two fd_arg_read, the user can
simply supply the attribute twice.

> +
> +The @Var{fd_arg_read} attribute may be applied to function that takes an open file
> +descriptor at referenced argument @Var{N} that it might read from. It 
> +indicates that the passed file descriptor must not have been closed or
> +not opened as write-only.

It's probably simplest to document this as:  "This attribute is
identical to @code{fd_arg}, but with the additional requirement
that..." or somesuch.

> +
> +@item fd_arg_write (@Var{N}, @Var{...})
> +
> +The @Var{fd_arg_write} attribute may be applied to a function that takes an open
> +file descriptor at referenced argument @Var{N} that it might write to. It indicates
> +that the passed file descriptor must not have been closed or not opened as read-only.

Likewise.

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-6.c b/gcc/testsuite/gcc.dg/analyzer/fd-6.c
> new file mode 100644
> index 00000000000..7efb9e69b58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-6.c
> @@ -0,0 +1,14 @@
> +
> +int not_a_fn __attribute__ ((fd_arg(1))); /* { dg-warning "'fd_arg' attribute only applies to function types" } */
> +
> +void f (char *p) __attribute__ ((fd_arg(1))); /* { dg-warning "'fd_arg' attribute argument value '1' refers to parameter type 'char \\\*'" } */
> +
> +
> +int not_a_fn_b __attribute__ ((fd_arg_read(1))); /* { dg-warning "'fd_arg_read' attribute only applies to function types" } */
> +
> +void g (char *p) __attribute__ ((fd_arg_read(1))); /* { dg-warning "'fd_arg_read' attribute argument value '1' refers to parameter type 'char \\\*'" } */
> +
> +
> +int not_a_fn_c __attribute__ ((fd_arg_write(1))); /* { dg-warning "'fd_arg_write' attribute only applies to function types" } */
> +
> +void f (char *p) __attribute__ ((fd_arg_write(1))); /* { dg-warning "'fd_arg_write' attribute argument value '1' refers to parameter type 'char \\\*'" } */
> \ No newline at end of file

The attribute-parsing code isn't analyzer-specific, so move this test
file to c-c++-common/attr-fd.c

Hope the above makes sense; thanks again for the patch

Dave

next prev parent reply	other threads:[~2022-07-15 16:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 15:38 Immad Mir
2022-07-15 16:57 ` David Malcolm [this message]
2022-07-19 16:06 Immad Mir
2022-07-19 18:18 ` David Malcolm
2022-07-19 20:08   ` David Malcolm
2022-07-20 17:59 Immad Mir
2022-07-20 18:23 ` David Malcolm
2022-07-20 18:28 ` Prathamesh Kulkarni
2022-07-20 18:39   ` Mir Immad
2022-07-22 15:55 Immad Mir
2022-07-22 18:27 ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77393168bbd011cdc3a708d96f97f8f421155a3d.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=mirimmad@outlook.com \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).