public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>, gcc@gcc.gnu.org
Subject: Re: [PATCH] analyzer PR 106003
Date: Mon, 27 Jun 2022 16:55:33 -0400	[thread overview]
Message-ID: <9236f59ebb2710a9209c11f6e4ce58ef0588b844.camel@redhat.com> (raw)
In-Reply-To: <CAE1-7ox0BAG-0ytqgPLw9o1-dcdKY6smc6HxELE4R=1OB1r3KA@mail.gmail.com>

Thanks for the updated patch.

Various notes below; mostly nits, but I realized there's a logic error
in fd_state_machine::on_condition that I hadn't spotted before...

[...snip...]

> diff --git a/gcc/analyzer/ChangeLog b/gcc/analyzer/ChangeLog
> index 53b3ffb487b..d30e94f2f62 100644
> --- a/gcc/analyzer/ChangeLog
> +++ b/gcc/analyzer/ChangeLog
> @@ -1,3 +1,6 @@
> +2022-06-27 Immad Mir <mir@sourceware.org>
> + * sm-fd.cc: New file.
> +

The new ChangeLog entries should be part of the commit message of the
patch, not expressed part of the patch itself.  Some notes on this can
be seen at:

https://gcc.gnu.org/codingconventions.html#ChangeLogs


[...snip...]

> +/* An enum for describing the direction of an access to a file descriptor.
>  */
> +
> +enum access_direction
> +{
> +  DIR_READ,
> +  DIR_WRITE
> +};

We already have a declaration of "enum access_direction" in
analyzer/analyzer.h, so let's just use that, rather than declaring it a
second time.

[...snip...]

> +class fd_access_mode_mismatch : public fd_diagnostic
> +{
> +public:
> +  fd_access_mode_mismatch (const fd_state_machine &sm, tree arg,
> +                           enum access_direction access_dir,
> +                           const tree callee_fndecl)
> +      : m_access_dir (access_dir), m_callee_fndecl (callee_fndecl),
> +        fd_diagnostic (sm, arg)

The ordering in this initializer list looks wrong: I'd expect to see
the base class initializer (fd_diagnostics) *before* the fields.

I think we have a warning about this.

> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "fd_access_mode_mismatch";
> +  }
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_fd_access_mode_mismatch;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +
> +    switch (m_access_dir)
> +      {
> +      case DIR_READ:
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%qE on %<read-only%> file descriptor %qE",
> +                           m_callee_fndecl, m_arg);
> +      case DIR_WRITE:
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%qE on %<write-only%> file descriptor %qE",
> +                           m_callee_fndecl, m_arg);
> +      default:
> +        gcc_unreachable ();
> +      }

I got a bit confused here between the direction in which the FD expects
accesses (e.g. due to O_RDONLY), as opposed to the direction of the
operation being attempted on said FD (e.g. due to a "write" call).

I realize that "access" is ambiguous in that it could refer to either
meaning; "m_access_dir" here means the the former.

How about the naming convention of "fd_dir" (and thus "m_fd_dir" here)
where we're talking about the former, and "FOO_dir" for some FOO (e.g.
"callee_fndecl_dir") when we're talking about the latter?  Using that
consistently throughout could clarify things.

[...snip...]

> +enum access_direction
> +fd_state_machine::get_access_dir_from_state (state_t state) const
> +{
> +  if (state == m_unchecked_read_only || state == m_valid_read_only)
> +    return DIR_READ;
> +  else if (state == m_unchecked_write_only || state == m_valid_write_only)
> +    return DIR_WRITE;
> +}

Control will "fall off the end" of this function if "state" doesn't
match one of the four states you test against.  Hopefully we emit a
warning for this (which should make the bootstrap build fail).

This only gets used in one place, in
fd_state_machine::check_for_open_fd, and the value is only needed once
we know we have a valid_fd, so I think it's best to simply put the
above logic there, within the switch statement...

[...snip...]

> +void
> +fd_state_machine::check_for_open_fd (sm_context *sm_ctxt, const supernode
> *node,
> +                                     const gimple *stmt, const gcall *call,
> +                                     const tree callee_fndecl,
> +                                     enum access_direction access_fn) const

Please clarify this for me by renaming "access_fn" to
"callee_fndecl_dir" or "attempted_dir" or similar.

> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +  state_t state = sm_ctxt->get_state (stmt, arg);
> +  enum access_direction access_dir = get_access_dir_from_state (state);

....so drop this call...

> +
> +  if (is_closed_fd_p (state))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new fd_use_after_close (*this, diag_arg,
> callee_fndecl));
> +    }
> +
> +  else
> +    {
> +
> +      if (!is_valid_fd_p (state))
> +        {
> +          if (!is_constant_fd_p (state))
> +            sm_ctxt->warn (
> +                node, stmt, arg,
> +                new unchecked_use_of_fd (*this, diag_arg, callee_fndecl));
> +        }
> +      switch (access_fn)
> +        {
> +        case DIR_READ:

...in favor of simply checking for the write-only states here:
(perhaps with a new "is_readonly_fd_p")

> +          if (access_dir == DIR_WRITE)
> +            {
> +              tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +              sm_ctxt->warn (node, stmt, arg,
> +                             new fd_access_mode_mismatch (
> +                                 *this, diag_arg, access_dir,
> callee_fndecl));
> +            }
> +
> +          break;
> +        case DIR_WRITE:
> +
> +          if (access_dir == DIR_READ)
> +            {

...and for the read-only states here:
(perhaps with a new "is_writeonly_fd_p")

> +              tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +              sm_ctxt->warn (node, stmt, arg,
> +                             new fd_access_mode_mismatch (
> +                                 *this, diag_arg, access_dir,
> callee_fndecl));
> +            }
> +          break;
> +        }
> +    }
> +}
> +
> +void
> +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode *node,
> +                                const gimple *stmt, const svalue *lhs,
> +                                enum tree_code op, const svalue *rhs) const
> +{
> +
> +  if (!rhs->all_zeroes_p ())
> +    return;
> +
> +  if (op == GE_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> +                              m_valid_read_write);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> +                              m_valid_read_only);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> +                              m_valid_write_only);
> +    }
> +  else if (op == LT_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> +                              m_invalid);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> +                              m_invalid);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> +                              m_invalid);
> +    }

Here we're detecting when the result of "open" is tested for < 0 vs
>=0.

I did some refresher reading on file descriptors over the weekend
(specifically W. Richard Stevens's classic "Advanced Programming in the
UNIX Environment") and in his examples he always checks the result of
"open" for != -1, rather than < 0.

The man page for "open" specifies:
https://www.man7.org/linux/man-pages/man2/open.2.html#RETURN_VALUE
"On success, open(), openat(), and creat() return the new file
descriptor (a nonnegative integer).  On error, -1 is returned and errno
is set to indicate the error."

Similarly,
  https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
has: "Upon successful completion, these functions shall open the file
and return a non-negative integer representing the file descriptor.
Otherwise, these functions shall return -1 and set errno to indicate
the error. If -1 is returned, no files shall be created or modified."


We should *not* warn about code that does a test for != -1, since
that's what the standard says to test for.

That said, I think that it would be overzealous to complain about code
that uses < 0 as a test on the result of "open".

So please add DejaGnu test coverage for such "!= -1" code, and ensure
that fd_state_machine::on_condition handles EQ_EXPR/NE_EXPR against -1
(and continues to handle the "< 0" test).

[...snip...]

> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 604f4e3b0a8..37173529b2d 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-06-27 Immad Mir <mir@sourceware.org>
> + * gcc.dg/analyzer/fd-1.c: New test.
> + * gcc.dg/analyzer/fd-2.c: New test.
> + * gcc.dg/analyzer/fd-3.c: New test.
> + * gcc.dg/analyzer/fd-4.c: New test.

Same notes as above about how we manage ChangeLog entries.

[...snip...]

I think that with the above nits fixed, the next version of this is
likely going to be ready for trunk - assuming that you bootstrap it and
verify that the testsuite doesn't regress.

Hope the above makes sense; thanks again for the patch.
Dave



  reply	other threads:[~2022-06-27 20:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-27 17:08 Mir Immad
2022-06-27 20:55 ` David Malcolm [this message]
  -- strict thread matches above, loose matches on Subject: below --
2022-06-25 16:00 Mir Immad
2022-06-26 15:53 ` David Malcolm
2022-06-27 17:07   ` Mir Immad
2022-06-23 18:30 Mir Immad
2022-06-23 23:20 ` David Malcolm
2022-07-14 15:40   ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9236f59ebb2710a9209c11f6e4ce58ef0588b844.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).