public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>, gcc@gcc.gnu.org
Subject: Re: [PATCH] static analysis support for posix file desccriptor APIs
Date: Tue, 21 Jun 2022 14:34:36 -0400	[thread overview]
Message-ID: <bef3049406cdac2fdb5edf915bfc5636c6be0a8e.camel@redhat.com> (raw)
In-Reply-To: <CAE1-7oz4Oi+p3+iKf+H0+OGEL+vfAHvT17fYM-0Vz-r0-f5R5Q@mail.gmail.com>

Hi Immad, thanks for this patch.

Overall, looks like you're making good progress.

Various notes and nitpicks inline below, throughout...

On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote:

[...snip...]

> 
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 23dfc797cea..d99acfbb069 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -54,6 +54,10 @@ The minimum number of supernodes within a function
> for
> the analyzer to consider
>  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
> Init(200) Param
>  The maximum depth of exploded nodes that should appear in a dot dump
> before switching to a less verbose format.

I'm not yet sure if this is a good idea, but I wonder if all of these
warnings ought to have a "-Wanalyzer-fd-" prefix?  "file-descriptor" is
rather long, and the analyzer's warnings are already tending to be on
the long side, and having a consistent prefix might make it easier for
users to grok them.

(though this feels like a "what color do we paint the bikeshed?" issue;
see e.g. https://bikeshed.org/ )

> 
> +Wanalyzer-double-close
> +Common Var(warn_analyzer_double_close) Init(1) Warning
> +Warn about code paths in which a file descriptor can be closed more
> than
> once.

...so this could be Wanalyzer-fd-double-close

> +
>  Wanalyzer-double-fclose
>  Common Var(warn_analyzer_double_fclose) Init(1) Warning
>  Warn about code paths in which a stdio FILE can be closed more than
> once.
> @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
>  Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
>  Warn about code paths in which sensitive data is written to a file.
> 
> +Wanalyzer-file-descriptor-leak
> +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
> +Warn about code paths in which a file descriptor is not closed.

...and Wanalyzer-fd-leak

> +
>  Wanalyzer-file-leak
>  Common Var(warn_analyzer_file_leak) Init(1) Warning
>  Warn about code paths in which a stdio FILE is not closed.
> @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
>  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
>  Warn about code paths in which the wrong deallocation function is
> called.
> 
> +Wanalyzer-mismatching-operation-on-file-descriptor
> +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor)
> Init(1)
> Warning
> +Warn about the code paths in which read on write-only file descriptor
> or
> write on read-only file descriptor is called.

Maybe:
  Wanalyzer-fd-access-mode-mismatch
?

Lose the "the" in "the code paths", so maybe: "Warn about code paths in
which a write is attempted on a read-only file descriptor, or vice-
versa."


> +
> +Wanalyzer-possible-invalid-file-descriptor
> +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1)
> Warning
> +warn about code paths in which a possibly invalid file descriptor is
> passed to a must-be-a-valid file descriptor function argument.
> +

Wanalyzer-fd-use-without-check ?

FWIW I believe that this isn't strictly speaking a security issue, that
if code blithely assumes the fd is open and then tries to read/write
it, the access will fail with errno set to EBADF.  But it seems worth
warning for; code that doesn't bother implementing error-checking seems
suspect to me.



> 
> +Wanalyzer-read-write-on-closed-file-descriptor

Maybe: Wanalyzer-fd-use-after-close ?

> +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
> Warning
> +Warn about code paths in which a read on write in performed on a
> closed
> file descriptor.

Typo "read on write"; maybe replace with "Warn about code paths in
which a file descriptor is used after being closed." ?



[...snip...]

> +
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> new file mode 100644
> index 00000000000..23e79e3e16a
> --- /dev/null
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -0,0 +1,697 @@
> +/* FIXME: add copyright header. */

Obviously the FIXME needs fixing :)

> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "function.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "options.h"
> +#include "diagnostic-path.h"
> +#include "diagnostic-metadata.h"
> +#include "function.h"
> +#include "json.h"
> +#include "analyzer/analyzer.h"
> +#include "diagnostic-event-id.h"
> +#include "analyzer/analyzer-logging.h"
> +#include "analyzer/sm.h"
> +#include "analyzer/pending-diagnostic.h"
> +#include "analyzer/function-set.h"
> +#include "analyzer/analyzer-selftests.h"
> +#include "tristate.h"
> +#include "selftest.h"
> +#include "analyzer/call-string.h"
> +#include "analyzer/program-point.h"
> +#include "analyzer/store.h"
> +#include "analyzer/region-model.h"
> +
> +#include <unistd.h>

Presumably you're including <unistd.h> here for the definitions of 
O_RDONLY and O_WRONLY.

As we discussed off-list, I think we want to avoid the use of the
host's O_RDONLY and O_WRONLY in this file, as they might not have the
same values as on the target, and I'm not sure that the host is even
guaranteed to have a <unistd.h>.

I think we want our own enum representing the three access modes:
READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting
target flags to access mode.

So ultimately that's something we want to fix, though exactly how, I'm
not quite sure; we presumably want to look up the target's definitions
of those macros - but I don't think the analyzer has access to the
cpp_reader instance from the frontend.



> +#if ENABLE_ANALYZER
> +
> +namespace ana
> +{
> +
> +namespace
> +{
> +class fd_state_machine : public state_machine
> +{
> +public:
> +  fd_state_machine (logger *logger);
> +
> +  bool
> +  inherited_state_p () const final override
> +  {
> +    return false;
> +  }
> +
> +  state_machine::state_t
> +  get_default_state (const svalue *sval) const final override
> +  {
> +    if (tree cst = sval->maybe_get_constant ())
> +      {
> +        int val = TREE_INT_CST_LOW (cst);
> +        if (val < 0)
> +          {
> +            return m_null;

Can we rename the invalid state to "m_invalid"?  I find m_null
confusing.

> +          }
> +      }
> +    return m_start;
> +  }
> +
> +  bool on_stmt (sm_context *sm_ctxt, const supernode *node,
> +                const gimple *stmt) const final override;
> +
> +  void on_condition (sm_context *sm_ctxt, const supernode *node,
> +                     const gimple *stmt, const svalue *lhs, const
> tree_code op,
> +                     const svalue *rhs) const final override;
> +
> +  bool can_purge_p (state_t s) const final override;
> +  pending_diagnostic *on_leak (tree var) const final override;
> +
> +  bool is_unchecked (state_t s) const;
> +  bool is_valid (state_t s) const;
> +
> +  /* State for a file descriptor that hasn't been checked for
> validity. */
> +  state_t m_unchecked_read_write;
> +
> +  /* State for a file descriptor opened with O_RDONLY flag. */
> +  state_t m_unchecked_read_only;
> +
> +  /* State for a file descriptor opneded with O_WRONLY flag. */
> +  state_t m_unchecked_write_only;

Typo: opneded -> opened.

Maybe group these three states in the source like this:

  /* States representing a file descriptor that hasn't yet been
     checked for validity after opening, for three different
     access modes.  */
  state_t m_unchecked_read_write;
  state_t m_unchecked_read_only;
  state_t m_unchecked_write_only;

> +
> +  /* State for a file descriptor that is known to be invalid. */
> +  state_t m_null;

Let's rename this to m_invalid;
The condition is that it's < 0, right?  If so, please update the
comment to clarify that.

> +
> +  /* State for a file descriptor that was opened in read-write mode
> and is
> known
> +   * to be valid. */
> +  state_t m_valid_read_write;
> +
> +  /* State for a file descriptor that was opened as read-only and is
> known
> to be
> +   * valid*/
> +  state_t m_valid_read_only;
> +
> +  /* State for a file descriptor that was opened as write-only and is
> known to
> +   * be valid*/
> +  state_t m_valid_write_only;

Maybe group these three states in a similar way to above.


> +
> +  /* State for a file descriptor that has been closed.*/
> +  state_t m_closed;
> +
> +  /* State for a file descriptor that we do not want to track anymore
> . */
> +  state_t m_stop;
> +
> +private:
> +  void on_open (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                const gcall *call) const;
> +  void on_close (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                 const gcall *call) const;
> +  void on_read (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                const gcall *call) const;
> +  void on_write (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                 const gcall *call) const;
> +};
> +
> +/* Base diagnostic class relative to fd_state_machine. */
> +class fd_diagnostic : public pending_diagnostic
> +{
> +public:
> +  fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm),
> m_arg
> (arg)
> +  {
> +  }
> +
> +  bool
> +  subclass_equal_p (const pending_diagnostic &base_other) const
> override
> +  {
> +    return same_tree_p (m_arg, ((const fd_diagnostic
> &)base_other).m_arg);
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (change.m_old_state == m_sm.get_start_state ()
> +        && m_sm.is_unchecked (change.m_new_state))
> +      {
> +        if (change.m_new_state == m_sm.m_unchecked_read_write)
> +          return change.formatted_print ("opened here as %<read-
> write%>");
> +
> +        if (change.m_new_state == m_sm.m_unchecked_read_only)
> +          return change.formatted_print ("opened here as %<read-
> only%>");
> +
> +        if (change.m_new_state == m_sm.m_unchecked_write_only)
> +          return change.formatted_print ("opened here as %<write-
> only%>");
> +      }
> +
> +    if (m_sm.is_unchecked (change.m_old_state)
> +        && m_sm.is_valid (change.m_new_state))
> +      if (change.m_expr)
> +        return change.formatted_print (
> +            "assuming %qE is a valid file descriptor", change.m_expr);
> +      else
> +        return change.formatted_print ("assuming a valid file
> descriptor");

Maybe reword to "when %qE is a valid file descriptor (>= 0)" ?


> +
> +    if (m_sm.is_unchecked (change.m_old_state)
> +        && change.m_new_state == m_sm.m_null)
> +      if (change.m_expr)
> +        return change.formatted_print (
> +            "assuming %qE is an invalid file descriptor",
> change.m_expr);
> +      else
> +        return change.formatted_print ("assuming an invalid file
> descriptor");

Maybe reword to "when %qE is not a valid file descriptor (< 0)" ?


> +
> +    return label_text ();
> +  }
> +
> +protected:
> +  const fd_state_machine &m_sm;
> +  tree m_arg;
> +};
> +
> +class fd_leak : public fd_diagnostic
> +{
> +public:
> +  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
> arg)
> {}
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "fd_leak";
> +  }
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_file_descriptor_leak;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    /*CWE-775: Missing Release of File Descriptor or Handle after
> Effective
> +     * Lifetime
> +     */
> +    diagnostic_metadata m;
> +    m.add_cwe (775);
> +    if (m_arg)
> +      return warning_meta (rich_loc, m, get_controlling_option (),
> +                           "leak of file descriptor %qE", m_arg);
> +    else
> +      return warning_meta (rich_loc, m, get_controlling_option (),
> +                           "leak of file descriptor");
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) final
> override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        m_open_event = change.m_event_id;
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_open_event.known_p ())
> +      {
> +        if (ev.m_expr)
> +          return ev.formatted_print ("%qE leaks here; was opened at
> %@",
> +                                     ev.m_expr, &m_open_event);
> +        else
> +          return ev.formatted_print ("leaks here; was opened at %@",
> +                                     &m_open_event);
> +      }
> +    else
> +      {
> +        if (ev.m_expr)
> +          return ev.formatted_print ("%qE leaks here", ev.m_expr);
> +        else
> +          return ev.formatted_print ("leaks here");
> +      }
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_open_event;
> +};
> +
> +class read_write_diag_fd : public fd_diagnostic

Maybe rename to fd_access_mode_mismatch ?


> +{
> +public:
> +  /* mode specfies whether read on write was attempted or vice versa.
> +   */

Make "mode" be an enum rather than an int.

> +  read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
> +      : m_mode (mode), fd_diagnostic (sm, arg)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "read_write_diag_fd";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      {
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%<write%> on %<read-only%> file descriptor
> %qE",
> +                           m_arg);
> +      }
> +    else
> +      {
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%<read%> on %<write-only%> file descriptor
> %qE",
> +                           m_arg);

Rather than hardcoding "%<write%>" and "%<read%>", let's generalize
this so that the diagnostic has a "tree fndecl" argument and m_fndecl
field, which was the function that was called that attempted to used
the fd in the wrong way.  You can then write %qE to print it in quotes.

> +      }
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    return fd_diagnostic::describe_state_change (change);
> +  }

Looks like this override is redundant (albeit harmless): it can
implicitly inherit the fd_diagnostic implementation.


> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_mode)
> +      return ev.formatted_print ("%qs on %<read-only%> file descriptor
> %qE",
> +                                 "write", m_arg);
> +    else
> +      return ev.formatted_print ("%qs on %<write-only%> file
> descriptor
> %qE",
> +                                 "read", m_arg);
> +  }

and generalize this also to work on m_fndecl.

> +
> +private:
> +  int m_mode;

As noted above, please make this an enum.


> +};
> +
> +class double_close : public fd_diagnostic
> +{
> +public:
> +  double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic
> (sm,
> arg)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "double_close";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_double_close;
> +  }
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    diagnostic_metadata m;
> +    // CWE-1341: Multiple Releases of Same Resource or Handle
> +    m.add_cwe (1341);
> +    return warning_meta (rich_loc, m, get_controlling_option (),
> +                         "double %<close%> of file descriptor %qE",
> m_arg);
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      return label_text::borrow ("opened here");
> +
> +    if (change.m_new_state == m_sm.m_closed)
> +      {
> +        m_first_close_event = change.m_event_id;
> +        return change.formatted_print ("first %qs here", "close");
> +      }
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_first_close_event.known_p ())
> +      return ev.formatted_print ("second %qs here; first %qs was at
> %@",
> +                                 "close", "close",
> &m_first_close_event);
> +    return ev.formatted_print ("second %qs here", "close");
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_first_close_event;
> +};
> +
> +class read_write_on_closed_fd : public fd_diagnostic

Maybe rename to "use_after_close_of_fd" or somesuch.

> +{
> +public:
> +  read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int
> mode)
> +      : fd_diagnostic (sm, arg), m_mode (mode)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    if (m_mode)
> +      return "write_on_closed_fd";
> +    else
> +      return "read_on_closed_fd";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<write%> on closed file descriptor %qE",
> m_arg);
> +    else
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<read%> on closed file descriptor %qE",
> m_arg);
> +  }

As above, maybe generalize this by adding a "tree m_fndecl;" field to
the class, giving what function was called on the the closed fd.  I
think you might be able to eliminate m_mode from this subclass by doing
that instead.

> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    if (change.m_new_state == m_sm.m_closed)
> +      {
> +        return change.formatted_print ("closed here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_mode)
> +      return ev.formatted_print ("%<write%> on closed file descriptor
> %qE
> here",
> +                                 m_arg);
> +    else
> +      return ev.formatted_print ("%<read%> on closed file descriptor
> %qE
> here",
> +                                 m_arg);
> +  }

Likewise here.

> +
> +private:
> +  int m_mode;
> +};
> +
> +class possible_invalid_fd_diag : public fd_diagnostic

Rename to unchecked_use_of_fd ?

> +{
> +public:
> +  possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int
> mode)
> +      : fd_diagnostic (sm, arg), m_mode (mode)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "possible_invalid_fd_diag";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_possible_invalid_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<write%> on possibly invalid file
> descriptor
> %qE",
> +                         m_arg);
> +    else
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<read%> on possibly invalid file descriptor
> %qE",
> +                         m_arg);
> +  }

As above, generalize this with a "tree m_fndecl;" capturing what fndecl
was called on the fd without checking, and with that, I think you can
eliminate the m_mode field.


> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        m_first_open_event = change.m_event_id;
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    return ev.formatted_print ("%qE could be invalid: unchecked value
> from
> %@",
> +                               m_arg, &m_first_open_event);
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_first_open_event;
> +  int m_mode;
> +};
> +
> +fd_state_machine::fd_state_machine (logger *logger)
> +    : state_machine ("file-descriptor", logger),
> +      m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
> +      m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
> +      m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
> +      m_null (add_state ("fd-null")), m_closed (add_state ("fd-
> closed")),
> +      m_valid_read_write (add_state ("fd-valid-read-write")),
> +      m_valid_read_only (add_state ("fd-valid-read-only")),
> +      m_valid_write_only (add_state ("fd-valid-write-only")),
> +      m_stop (add_state ("fd-stop"))
> +{
> +}
> +
> +bool
> +fd_state_machine::is_unchecked (state_t s) const
> +{
> +  return s == m_unchecked_read_write || s == m_unchecked_read_only
> +         || s == m_unchecked_write_only;

With multiline conditionals, once they occupy multiple lines, it's more
readable to split them like this:

  return (s == m_unchecked_read_write
          || s == m_unchecked_read_only
          || s == m_unchecked_write_only);


> +}
> +
> +bool
> +fd_state_machine::is_valid (state_t s) const
> +{
> +  return s == m_valid_read_write || s == m_valid_read_only
> +         || s == m_valid_write_only;
> +}

Similar here.

> +
> +bool
> +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt) const
> +{
> +  if (const gcall *call = dyn_cast<const gcall *> (stmt))
> +    if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
> +      {
> +        if (is_named_call_p (callee_fndecl, "open", call, 2))
> +          {
> +            on_open (sm_ctxt, node, stmt, call);
> +            return true;
> +          } //  "open"
> +
> +        if (is_named_call_p (callee_fndecl, "close", call, 1))
> +          {
> +            on_close (sm_ctxt, node, stmt, call);
> +            return true;
> +          } //  "close"
> +
> +        if (is_named_call_p (callee_fndecl, "write", call, 3))
> +          {
> +            on_write (sm_ctxt, node, stmt, call);
> +            return true;
> +          } // "write"
> +
> +        if (is_named_call_p (callee_fndecl, "read", call, 3))
> +          {
> +            on_read (sm_ctxt, node, stmt, call);
> +            return true;
> +          } // "read"
> +

You'll probably want to pass callee_fndecl to some of these "on_FNNAME"
subroutines, so that it's available for the new m_fndecl fields of the
warnings I suggested above.


> +        return true;
> +      }
> +
> +  return false;
> +}
> +
> +void
> +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call)
> const
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs)
> +    {
> +      tree arg = gimple_call_arg (call, 1);
> +      if (TREE_CODE (arg) == INTEGER_CST)
> +        {
> +          int flag = TREE_INT_CST_LOW (arg);
> +          /* FIXME: this code assumes the access modes on the host and
> +          target are the same, which in practice might not be the
> case. */
> +          if (flag == O_RDONLY)
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_read_only);
> +          else if (flag == O_WRONLY)
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_write_only);
> +          else
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_read_write);

Let's introduce an enum for the access mode, and a subroutine to get
the enum value from "flag", so that we can isolate this "FIXME" in that
subroutine.

> +        }
> +    }
> +  else
> +    {
> +      /* FIXME: add leak reporting */

Do you have a testcase that exhibits this behavior?


> +    }
> +}
> +
> +void
> +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode
> *node,
> +                            const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +
> +  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> +
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only,
> m_closed);


> +  sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);

Perhaps we should warn for operations on known invalid FDs? (and thus
complain about close on a known-invalid FD).

I believe that the OS will handle such code gracefully by setting errno
to EBADF.


> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +      sm_ctxt->warn (node, stmt, arg, new double_close (*this,
> diag_arg));
> +      sm_ctxt->set_next_state (stmt, arg, m_stop);
> +    }
> +}
> +void
> +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd
> (*this,
> diag_arg, 0));
> +    }

Maybe consolidate on_read and on_write by adding a subroutine that 
checks for m_closed, and for checking access mode (maybe a
"check_for_open_fd" that takes an access mode enum value amongst other
params.  If you pass around caller_fndecl, I think much of this code
can be shared that way between on_read and on_write (which will help
simplify things if we want to support further functions).

> +
> +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new possible_invalid_fd_diag (*this, diag_arg,
> 0));
> +    }
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
> +      || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new read_write_diag_fd (*this, diag_arg, 0));
> +    }
> +}
> +void
> +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode
> *node,
> +                            const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
> diag_arg, 1));
> +    }
> +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new possible_invalid_fd_diag (*this, diag_arg,
> 1));
> +    }
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
> +      || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
> +    {
> +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new read_write_diag_fd (*this, diag_arg, 1));
> +    }
> +}
> +
> +void
> +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode
> *node,
> +                                const gimple *stmt, const svalue *lhs,
> +                                enum tree_code op, const svalue *rhs)
> const
> +{
> +  if (!rhs->all_zeroes_p ())
> +    return;
> +
> +  if (op == GE_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> +                              m_valid_read_write);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> +                              m_valid_read_only);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> +                              m_valid_write_only);
> +    }
> +  else if (op == LT_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> m_null);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> m_null);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> m_null);
> +    }
> +}
> +
> +bool
> +fd_state_machine::can_purge_p (state_t s) const
> +{
> +  if (is_unchecked (s) || is_valid (s))
> +    return false;
> +  else
> +    return true;
> +}
> +
> +pending_diagnostic *
> +fd_state_machine::on_leak (tree var) const
> +{
> +  return new fd_leak (*this, var);
> +}
> +} // namespace
> +
> +state_machine *
> +make_fd_state_machine (logger *logger)
> +{
> +  return new fd_state_machine (logger);
> +}
> +} // namespace ana
> +
> +#endif // ENABLE_ANALYZER
> \ No newline at end of file

[...snip...]

> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> new file mode 100644
> index 00000000000..985e9ac75de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> @@ -0,0 +1,26 @@
> +int open(const char *, int mode);
> +#define O_RDONLY 0
> +#define O_WRONLY 1
> +#define O_RDWR 2

As noted above, we want to somehow use the values for the O_ macros
from the test code, rather than the <unistd.h> with which sm-fd.cc was
built.

> +
> +void
> +test_1 (const char *path)
> +{
> +  int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened
> here" }
> */
> +  return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> 775\\\]"
> "warning" } */
> + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)"
> "event"
> { target *-*-* } .-1 } */
> +}
> +
> +void
> +test_2 (const char *path)
> +{
> +  int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here"
> } */
> +  if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
> descriptor" "event1" } */
> +  /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
> 0'\\)..." "event2" { target *-*-* } .-1 } */
> +  {
> +    return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> 775\\\]"
> "warning" } */
> +    /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
> } */
> +    /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
> "event2" { target *-*-* } .-2 } */
> +  }
> +}

Looks good, but in future don't feel the need to express all of the
events: I think the significant ones we should test for here are (1),
(2), (5), whereas specifying directives for (3) and (4) are probably
overkill.  prune.exp will prune away stray lines beginning with "note:
" so we don't need to write directives for all events, just the ones we
care about.


[...snip...]

Lots of good test coverage here.  With a state-machine based
diagnostic, it's probably best to try to have coverage of all possible
calls in all possible states - that way we can think systematically
about what we should be warning for, and what we shouldn't be warning
for.   Note that once you've got test coverage of the basic events
(like you have), you don't have to have dg-message directives for all
of the events in all of the rest of the tests (that would be excessive,
and too much work to create and maintain).

A few other nits:
- the Subject line for the commit should have an  "analyzer: " prefix
and reference the bugzilla problem report (PR 106003)
- for better or worse, the project requires ChangeLog entries.  You can
use the script ./contrib/mklog.py to help generate this.  Some more
information is here:
  https://www.gnu.org/prep/standards/html_node/Change-Logs.html
In GCC our ChangeLog entries are part of the git commit message, and we
have a serverside script to update the ChangeLog files in the source
tree (to avoid constantly having merge conflicts).

I think we'll have at least one more iteration of this patch to review,
and before I can approve you to push a later version of the patch to
trunk, you'll need to "fully" test it, by doing a bootstrap, and
running the full testsuite on the result.  Do you need some help on how
to do that?  Do you have access to the GCC compile farm yet?

I'll see if I can figure out how to access the preprocessor from the
analyzer.

Thanks again for the patch; hope the above makes sense and is
constructive

Dave





  parent reply	other threads:[~2022-06-21 18:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-21 16:30 Mir Immad
2022-06-21 16:30 ` Mir Immad
2022-06-21 18:34 ` David Malcolm [this message]
2022-06-21 18:50   ` Joseph Myers
2022-06-21 20:31     ` David Malcolm
2022-06-21 21:18       ` Joseph Myers
2022-06-23 18:28   ` Mir Immad
2022-06-24 18:05     ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bef3049406cdac2fdb5edf915bfc5636c6be0a8e.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).