Re: [PATCH] analyzer PR 106003

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>, gcc@gcc.gnu.org
Subject: Re: [PATCH] analyzer PR 106003
Date: Sun, 26 Jun 2022 11:53:45 -0400	[thread overview]
Message-ID: <2ec314d7adfbc74e6c6fb1c7daa07bf54685d46b.camel@redhat.com> (raw)
In-Reply-To: <CAE1-7oxh0_O1wfRaR3DDJDYUo_Q9yomHZqYvAzT7aLO0DBk89Q@mail.gmail.com>

Thanks for the updated patch.

Various comments inline below.

Sorry if this seems nitpicky in places.

I think the patch is nearly ready; please write a ChangeLog entry for
the next one (none of my proposed changes are going to affect what the
ChangeLog will look like, apart from new test case files perhaps, given
that much of it is going to be just:

	* sm-fd.cc: New file.

[...snip...]

> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> new file mode 100644
> index 00000000000..83eb0df724d
> --- /dev/null
> +++ b/gcc/analyzer/sm-fd.cc

[...snip...]

> +class fd_leak : public fd_diagnostic
> +{
> +public:
> +  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm, arg)
> {}
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "fd_leak";
> +  }
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_fd_leak;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    /*CWE-775: Missing Release of File Descriptor or Handle after Effective
> +     * Lifetime
> +     */

Formatting nit: we don't put these "*" on contination lines in
multiline comments; this should read:

> +    /* CWE-775: Missing Release of File Descriptor or Handle after
Effective
> +       Lifetime.  */


[...snip...]

> +class fd_access_mode_mismatch : public fd_diagnostic
> +{
> +public:
> +  /* mode specfies whether read on write was attempted or vice versa.

Typo: "specfies" -> "specifies"

[...snip...]

> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +
> +    switch (m_mode)
> +      {

I like to put in a "default: gcc_unreachable ();" in switch statements,
especially one that isn't covering all of the values in the enum, but
this got me thinking: this class has "enum access_mode m_mode;", but
that enum is describing a *set* of ways in which the FD can be validly
accessed, whereas here we're trying to describe a particular way in
which the FD is being invalidly accessed.

analyzer.h has this enum:

/* An enum for describing the direction of an access to memory.  */

enum access_direction
{
  DIR_READ,
  DIR_WRITE
};

which I think would be a much better fit here for
fd_access_mode_mismatch, so please change fd_access_mode_mismatch's:
  enum access_mode m_mode;
to
  enum access_direction m_access_dir;
and update this switch accordingly:

> +      case READ_ONLY:
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%qE on %<read-only%> file descriptor %qE",
> +                           m_callee_fndecl, m_arg);
> +      case WRITE_ONLY:
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%qE on %<write-only%> file descriptor %qE",
> +                           m_callee_fndecl, m_arg);
> +      }

[...snip...]

> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    return fd_diagnostic::describe_state_change (change);
> +  }

This vfunc is redundant; it can be deleted so we simply inherit the
fd_diagnostic implementation directly.

> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    switch (m_mode)
> +      {
> +      case READ_ONLY:
> +        return ev.formatted_print ("%qE on %<read-only%> file descriptor
> %qE",
> +                                   m_callee_fndecl, m_arg);
> +      case WRITE_ONLY:
> +        return ev.formatted_print ("%qE on %<write-only%> file descriptor
> %qE",
> +                                   m_callee_fndecl, m_arg);
> +      }
> +  }

Similar comments here about the switch (m_mode) as per the emit vfunc.


> +
> +private:
> +  enum access_mode m_mode;

...and this field.

> +  const tree m_callee_fndecl;
> +};


[...snip...]

> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    if (change.m_new_state == m_sm.m_closed)
> +      {
> +        return change.formatted_print ("closed here");
> +      }

Formatting nit: we don't add the { } when they're redundant, like in
these "if" stmts (I dislike this rule in our conventions, but it's
probably best to be consistent).

[...snip...]


> +bool
> +fd_state_machine::check_for_closed_fd (state_t state) const
> +{
> +  return (state == m_closed);
> +}
> +
> +bool
> +fd_state_machine::check_for_invalid_fd (state_t state) const
> +{
> +  return (state == m_invalid);
> +}
> +
> +bool
> +fd_state_machine::check_for_constant_fd (state_t state) const
> +{
> +  return (state == m_constant_fd);


You're using the prefix "check_for_" for two different kinds of
functions:
(a) the functions above are simple predicate functions that merely
determine "is this state_t a <foo>" for some foo, and have no side-
effects

(b) the "check_for" functions later on *do* things: they issue warnings
and make state transitions

For clarity, please rename the type (a) "check_for_foo" functions to
"is_foo_p" (we use the "_p" suffix to indicate a simple predicate
without side-effects), and keep the "check_for_" prefix for the type
(b) functions that actually make changes.

Hence please rename the above to:

  fd_state_machine::is_closed_fd_p
  fd_state_machine::is_invalid_fd_p
  fd_state_machine::is_constant_fd_p

so that the "check_for_" prefix for the functions lower down imply that
warnings could be issued and state transitions could occur.

> +void
> +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call) const
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs)
> +    {
> +      tree arg = gimple_call_arg (call, 1);
> +      if (TREE_CODE (arg) == INTEGER_CST)
> +        {
> +          int flag = TREE_INT_CST_LOW (arg);
> +          enum access_mode mode = get_access_mode_from_flag (flag);
> +
> +          switch (mode)
> +            {
> +            case READ_ONLY:
> +              sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                      m_unchecked_read_only);
> +              break;
> +            case WRITE_ONLY:
> +              sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                      m_unchecked_write_only);
> +              break;
> +            default:
> +              sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                      m_unchecked_read_write);
> +            }
> +        }

If the user writes code where the flag is determined with logic, then
"arg" won't be an INTEGER_CST and we won't transition the lhs; consider
e.g.:

  int flags = O_RDONLY;
  if (some_condition ())
    flags |= O_NOATIME;
  int fd = open (path, flags)

I don't think we need to handle this yet, but let's add test coverage
for it, at least.


> +    }
> +  else
> +    {
> +      /* FIXME: add leak reporting */
> +    }
> +}
> +

Please add a testcase for this, with an "xfail" in the dg-warning
directive.


[...snip...]


> +void
> +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode *node,
> +                            const gimple *stmt, const gcall *call,
> +                            const tree callee_fndecl) const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  state_t state = sm_ctxt->get_state (stmt, arg);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +
> +  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write, m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only, m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only, m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_constant_fd, m_closed);
> +
> +  if (check_for_closed_fd (state))
> +    {
> +
> +      sm_ctxt->warn (node, stmt, arg, new double_close (*this, diag_arg));
> +      sm_ctxt->set_next_state (stmt, arg, m_stop);
> +    }
> +}
> +void
> +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call,
> +                           const tree callee_fndecl) const
> +{
> +  check_for_open_fd (sm_ctxt, node, stmt, call, callee_fndecl, true);
> +}
> +void
> +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node,
> +                            const gimple *stmt, const gcall *call,
> +                            const tree callee_fndecl) const
> +{
> +  check_for_open_fd (sm_ctxt, node, stmt, call, callee_fndecl, false);
> +}
> +
> +void
> +fd_state_machine::check_for_open_fd (sm_context *sm_ctxt, const supernode
> *node,
> +                                     const gimple *stmt, const gcall *call,
> +                                     const tree callee_fndecl,
> +                                     bool toggle) const

"bool" arguments to functions are often a "code smell".

This will be much clearer if you change "bool toggle" to "enum
access_direction access_dir"...

> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +  state_t state = sm_ctxt->get_state (stmt, arg);
> +  enum access_mode mode = get_access_mode_from_state (state);

...and rename "mode" to "fd_mode", since that way it's clear to the
reader that we're comparing how the FD is being accessed with how the
FD expects to be accessed.

> +
> +  if (check_for_closed_fd (state))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new fd_use_after_close (*this, diag_arg,
> callee_fndecl));
> +    }
> +
> +  else
> +    {
> +
> +      if (!is_valid (state))
> +        {
> +          if (!check_for_constant_fd (state))
> +            sm_ctxt->warn (
> +                node, stmt, arg,
> +                new unchecked_use_of_fd (*this, diag_arg, callee_fndecl));
> +        }
> +      if (toggle)
> +        /* This condition determines whether we are considering "read" or
> +         "write" function */

and so (hopefully) the above comment becomes redundant.

> +        {
> +          if (mode == WRITE_ONLY)
> +            {
> +              tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +              sm_ctxt->warn (node, stmt, arg,
> +                             new fd_access_mode_mismatch (*this, diag_arg,
> mode,
> +                                                          callee_fndecl));
> +            }
> +        }
> +      else
> +        {
> +          if (mode == READ_ONLY)
> +            {
> +              tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +              sm_ctxt->warn (node, stmt, arg,
> +                             new fd_access_mode_mismatch (*this, diag_arg,
> mode,
> +                                                          callee_fndecl));
> +            }
> +        }
> +    }
> +}

[...snip...]

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 174bc09e5cf..e5522b95ad2 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi

[...snip...]

> +@item -Wno-analyzer-fd-access-mode-mismatch
> +@opindex Wanalyzer-fd-access-mode-mismatch
> +@opindex Wno-analyzer-fd-access-mode-mismatch
> +This warning requires @option{-fanalyzer}, which enables it; use
> +@option{-Wno-analyzer-fd-access-mode-mismatch}
> +to disable it.
> +
> +This diagnostic warns for paths through code in which a
> +@code{read} on write-only file descriptor is attempted, or vice versa

Grammar nit: "on write-only" -> "on a write-only"

> +@item -Wno-analyzer-fd-double-close
> +@opindex Wanalyzer-fd-double-close
> +@opindex Wno-analyzer-fd-double-close
> +This warning requires @option{-fanalyzer}, which enables it; use
> +@option{-Wno-analyzer-fd-double-close}
> +to disable it.
> +
> +This diagnostic warns for paths through code in which a
> +file descriptor can closed more than once.

Grammar nit: "can closed" -> "can be closed".

> +@item -Wno-analyzer-fd-leak
> +@opindex Wanalyzer-fd-leak
> +@opindex Wno-analyzer-fd-leak
> +This warning requires @option{-fanalyzer}, which enables it; use
> +@option{-Wno-analyzer-fd-leak}
> +to disable it.
> +
> +This diagnostic warns for paths through code in which a
> +file descriptor is leaked.

More precise as: "an open file descriptor".

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-3.c
> b/gcc/testsuite/gcc.dg/analyzer/fd-3.c
> new file mode 100644
> index 00000000000..d8c2ff26098
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-3.c
> @@ -0,0 +1,43 @@
> +int open(const char *, int mode);
> +void close(int fd);
> +int write (int fd, void *buf, int nbytes);
> +int read (int fd, void *buf, int nbytes);
> +
> +#define O_RDONLY 0
> +#define O_WRONLY 1
> +#define O_RDWR 2
> +#define STDIN 0

[...snip...]

> +void
> +test_4 (void *buf)
> +{
> +    int fd = STDIN;
> +    write (fd, buf, 1);
> +    close(fd);
> +}

Probably worth adding a comment to this test case, something along the
lines of:

FD 0 is stdin at the entry to "main" and thus read-only, but we have no
guarantees here that it hasn't been closed and then reopened for
writing, so we can't issue a warning.

or similar.

[...snip...]

Thanks again for the patch; hope the above makes sense

Dave

next prev parent reply	other threads:[~2022-06-26 15:53 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-25 16:00 Mir Immad
2022-06-26 15:53 ` David Malcolm [this message]
2022-06-27 17:07   ` Mir Immad
  -- strict thread matches above, loose matches on Subject: below --
2022-06-27 17:08 Mir Immad
2022-06-27 20:55 ` David Malcolm
2022-06-23 18:30 Mir Immad
2022-06-23 23:20 ` David Malcolm
2022-07-14 15:40   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ec314d7adfbc74e6c6fb1c7daa07bf54685d46b.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).