[PATCH] static analysis support for posix file desccriptor APIs

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] static analysis support for posix file desccriptor APIs
@ 2022-06-21 16:30 Mir Immad
  2022-06-21 16:30 ` Mir Immad
  2022-06-21 18:34 ` David Malcolm
  0 siblings, 2 replies; 8+ messages in thread
From: Mir Immad @ 2022-06-21 16:30 UTC (permalink / raw)
  To: gcc, David Malcolm

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index b6dcc45a58a..04631f737ea 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1269,6 +1269,7 @@ ANALYZER_OBJS = \
  analyzer/region-model-reachability.o \
  analyzer/sm.o \
  analyzer/sm-file.o \
+ analyzer/sm-fd.o \
  analyzer/sm-malloc.o \
  analyzer/sm-pattern-test.o \
  analyzer/sm-sensitive.o \
diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 23dfc797cea..d99acfbb069 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -54,6 +54,10 @@ The minimum number of supernodes within a function for
the analyzer to consider
 Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
Init(200) Param
 The maximum depth of exploded nodes that should appear in a dot dump
before switching to a less verbose format.

+Wanalyzer-double-close
+Common Var(warn_analyzer_double_close) Init(1) Warning
+Warn about code paths in which a file descriptor can be closed more than
once.
+
 Wanalyzer-double-fclose
 Common Var(warn_analyzer_double_fclose) Init(1) Warning
 Warn about code paths in which a stdio FILE can be closed more than once.
@@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
 Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
 Warn about code paths in which sensitive data is written to a file.

+Wanalyzer-file-descriptor-leak
+Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
+Warn about code paths in which a file descriptor is not closed.
+
 Wanalyzer-file-leak
 Common Var(warn_analyzer_file_leak) Init(1) Warning
 Warn about code paths in which a stdio FILE is not closed.
@@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is called.

+Wanalyzer-mismatching-operation-on-file-descriptor
+Common Var(warn_analyzer_mismatching_operation_on_file_descriptor) Init(1)
Warning
+Warn about the code paths in which read on write-only file descriptor or
write on read-only file descriptor is called.
+
+Wanalyzer-possible-invalid-file-descriptor
+Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1) Warning
+warn about code paths in which a possibly invalid file descriptor is
passed to a must-be-a-valid file descriptor function argument.
+
 Wanalyzer-possible-null-argument
 Common Var(warn_analyzer_possible_null_argument) Init(1) Warning
 Warn about code paths in which a possibly-NULL value is passed to a
must-not-be-NULL function argument.
@@ -90,6 +106,10 @@ Wanalyzer-possible-null-dereference
 Common Var(warn_analyzer_possible_null_dereference) Init(1) Warning
 Warn about code paths in which a possibly-NULL pointer is dereferenced.

+Wanalyzer-read-write-on-closed-file-descriptor
+Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
Warning
+Warn about code paths in which a read on write in performed on a closed
file descriptor.
+
 Wanalyzer-unsafe-call-within-signal-handler
 Common Var(warn_analyzer_unsafe_call_within_signal_handler) Init(1) Warning
 Warn about code paths in which an async-signal-unsafe function is called
from a signal handler.
diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
new file mode 100644
index 00000000000..23e79e3e16a
--- /dev/null
+++ b/gcc/analyzer/sm-fd.cc
@@ -0,0 +1,697 @@
+/* FIXME: add copyright header. */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "options.h"
+#include "diagnostic-path.h"
+#include "diagnostic-metadata.h"
+#include "function.h"
+#include "json.h"
+#include "analyzer/analyzer.h"
+#include "diagnostic-event-id.h"
+#include "analyzer/analyzer-logging.h"
+#include "analyzer/sm.h"
+#include "analyzer/pending-diagnostic.h"
+#include "analyzer/function-set.h"
+#include "analyzer/analyzer-selftests.h"
+#include "tristate.h"
+#include "selftest.h"
+#include "analyzer/call-string.h"
+#include "analyzer/program-point.h"
+#include "analyzer/store.h"
+#include "analyzer/region-model.h"
+
+#include <unistd.h>
+#if ENABLE_ANALYZER
+
+namespace ana
+{
+
+namespace
+{
+class fd_state_machine : public state_machine
+{
+public:
+  fd_state_machine (logger *logger);
+
+  bool
+  inherited_state_p () const final override
+  {
+    return false;
+  }
+
+  state_machine::state_t
+  get_default_state (const svalue *sval) const final override
+  {
+    if (tree cst = sval->maybe_get_constant ())
+      {
+        int val = TREE_INT_CST_LOW (cst);
+        if (val < 0)
+          {
+            return m_null;
+          }
+      }
+    return m_start;
+  }
+
+  bool on_stmt (sm_context *sm_ctxt, const supernode *node,
+                const gimple *stmt) const final override;
+
+  void on_condition (sm_context *sm_ctxt, const supernode *node,
+                     const gimple *stmt, const svalue *lhs, const
tree_code op,
+                     const svalue *rhs) const final override;
+
+  bool can_purge_p (state_t s) const final override;
+  pending_diagnostic *on_leak (tree var) const final override;
+
+  bool is_unchecked (state_t s) const;
+  bool is_valid (state_t s) const;
+
+  /* State for a file descriptor that hasn't been checked for validity. */
+  state_t m_unchecked_read_write;
+
+  /* State for a file descriptor opened with O_RDONLY flag. */
+  state_t m_unchecked_read_only;
+
+  /* State for a file descriptor opneded with O_WRONLY flag. */
+  state_t m_unchecked_write_only;
+
+  /* State for a file descriptor that is known to be invalid. */
+  state_t m_null;
+
+  /* State for a file descriptor that was opened in read-write mode and is
known
+   * to be valid. */
+  state_t m_valid_read_write;
+
+  /* State for a file descriptor that was opened as read-only and is known
to be
+   * valid*/
+  state_t m_valid_read_only;
+
+  /* State for a file descriptor that was opened as write-only and is
known to
+   * be valid*/
+  state_t m_valid_write_only;
+
+  /* State for a file descriptor that has been closed.*/
+  state_t m_closed;
+
+  /* State for a file descriptor that we do not want to track anymore . */
+  state_t m_stop;
+
+private:
+  void on_open (sm_context *sm_ctxt, const supernode *node, const gimple
*stmt,
+                const gcall *call) const;
+  void on_close (sm_context *sm_ctxt, const supernode *node, const gimple
*stmt,
+                 const gcall *call) const;
+  void on_read (sm_context *sm_ctxt, const supernode *node, const gimple
*stmt,
+                const gcall *call) const;
+  void on_write (sm_context *sm_ctxt, const supernode *node, const gimple
*stmt,
+                 const gcall *call) const;
+};
+
+/* Base diagnostic class relative to fd_state_machine. */
+class fd_diagnostic : public pending_diagnostic
+{
+public:
+  fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm), m_arg
(arg)
+  {
+  }
+
+  bool
+  subclass_equal_p (const pending_diagnostic &base_other) const override
+  {
+    return same_tree_p (m_arg, ((const fd_diagnostic &)base_other).m_arg);
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) override
+  {
+    if (change.m_old_state == m_sm.get_start_state ()
+        && m_sm.is_unchecked (change.m_new_state))
+      {
+        if (change.m_new_state == m_sm.m_unchecked_read_write)
+          return change.formatted_print ("opened here as %<read-write%>");
+
+        if (change.m_new_state == m_sm.m_unchecked_read_only)
+          return change.formatted_print ("opened here as %<read-only%>");
+
+        if (change.m_new_state == m_sm.m_unchecked_write_only)
+          return change.formatted_print ("opened here as %<write-only%>");
+      }
+
+    if (m_sm.is_unchecked (change.m_old_state)
+        && m_sm.is_valid (change.m_new_state))
+      if (change.m_expr)
+        return change.formatted_print (
+            "assuming %qE is a valid file descriptor", change.m_expr);
+      else
+        return change.formatted_print ("assuming a valid file descriptor");
+
+    if (m_sm.is_unchecked (change.m_old_state)
+        && change.m_new_state == m_sm.m_null)
+      if (change.m_expr)
+        return change.formatted_print (
+            "assuming %qE is an invalid file descriptor", change.m_expr);
+      else
+        return change.formatted_print ("assuming an invalid file
descriptor");
+
+    return label_text ();
+  }
+
+protected:
+  const fd_state_machine &m_sm;
+  tree m_arg;
+};
+
+class fd_leak : public fd_diagnostic
+{
+public:
+  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm, arg)
{}
+
+  const char *
+  get_kind () const final override
+  {
+    return "fd_leak";
+  }
+
+  int
+  get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_file_descriptor_leak;
+  }
+
+  bool
+  emit (rich_location *rich_loc) final override
+  {
+    /*CWE-775: Missing Release of File Descriptor or Handle after Effective
+     * Lifetime
+     */
+    diagnostic_metadata m;
+    m.add_cwe (775);
+    if (m_arg)
+      return warning_meta (rich_loc, m, get_controlling_option (),
+                           "leak of file descriptor %qE", m_arg);
+    else
+      return warning_meta (rich_loc, m, get_controlling_option (),
+                           "leak of file descriptor");
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) final override
+  {
+    if (m_sm.is_unchecked (change.m_new_state))
+      {
+        m_open_event = change.m_event_id;
+        return label_text::borrow ("opened here");
+      }
+
+    return fd_diagnostic::describe_state_change (change);
+  }
+
+  label_text
+  describe_final_event (const evdesc::final_event &ev) final override
+  {
+    if (m_open_event.known_p ())
+      {
+        if (ev.m_expr)
+          return ev.formatted_print ("%qE leaks here; was opened at %@",
+                                     ev.m_expr, &m_open_event);
+        else
+          return ev.formatted_print ("leaks here; was opened at %@",
+                                     &m_open_event);
+      }
+    else
+      {
+        if (ev.m_expr)
+          return ev.formatted_print ("%qE leaks here", ev.m_expr);
+        else
+          return ev.formatted_print ("leaks here");
+      }
+  }
+
+private:
+  diagnostic_event_id_t m_open_event;
+};
+
+class read_write_diag_fd : public fd_diagnostic
+{
+public:
+  /* mode specfies whether read on write was attempted or vice versa.
+   */
+  read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
+      : m_mode (mode), fd_diagnostic (sm, arg)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "read_write_diag_fd";
+  }
+
+
+  int
+  get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
+  }
+
+  bool
+  emit (rich_location *rich_loc) final override
+  {
+    if (m_mode)
+      {
+        return warning_at (rich_loc, get_controlling_option (),
+                           "%<write%> on %<read-only%> file descriptor
%qE",
+                           m_arg);
+      }
+    else
+      {
+        return warning_at (rich_loc, get_controlling_option (),
+                           "%<read%> on %<write-only%> file descriptor
%qE",
+                           m_arg);
+      }
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) override
+  {
+    return fd_diagnostic::describe_state_change (change);
+  }
+
+  label_text
+  describe_final_event (const evdesc::final_event &ev) final override
+  {
+    if (m_mode)
+      return ev.formatted_print ("%qs on %<read-only%> file descriptor
%qE",
+                                 "write", m_arg);
+    else
+      return ev.formatted_print ("%qs on %<write-only%> file descriptor
%qE",
+                                 "read", m_arg);
+  }
+
+private:
+  int m_mode;
+};
+
+class double_close : public fd_diagnostic
+{
+public:
+  double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
arg)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "double_close";
+  }
+
+
+  int
+  get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_double_close;
+  }
+  bool
+  emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    // CWE-1341: Multiple Releases of Same Resource or Handle
+    m.add_cwe (1341);
+    return warning_meta (rich_loc, m, get_controlling_option (),
+                         "double %<close%> of file descriptor %qE", m_arg);
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) override
+  {
+    if (m_sm.is_unchecked (change.m_new_state))
+      return label_text::borrow ("opened here");
+
+    if (change.m_new_state == m_sm.m_closed)
+      {
+        m_first_close_event = change.m_event_id;
+        return change.formatted_print ("first %qs here", "close");
+      }
+    return fd_diagnostic::describe_state_change (change);
+  }
+
+  label_text
+  describe_final_event (const evdesc::final_event &ev) final override
+  {
+    if (m_first_close_event.known_p ())
+      return ev.formatted_print ("second %qs here; first %qs was at %@",
+                                 "close", "close", &m_first_close_event);
+    return ev.formatted_print ("second %qs here", "close");
+  }
+
+private:
+  diagnostic_event_id_t m_first_close_event;
+};
+
+class read_write_on_closed_fd : public fd_diagnostic
+{
+public:
+  read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int mode)
+      : fd_diagnostic (sm, arg), m_mode (mode)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    if (m_mode)
+      return "write_on_closed_fd";
+    else
+      return "read_on_closed_fd";
+  }
+
+
+  int
+  get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
+  }
+
+  bool
+  emit (rich_location *rich_loc) final override
+  {
+    if (m_mode)
+      return warning_at (rich_loc, get_controlling_option (),
+                         "%<write%> on closed file descriptor %qE", m_arg);
+    else
+      return warning_at (rich_loc, get_controlling_option (),
+                         "%<read%> on closed file descriptor %qE", m_arg);
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) override
+  {
+    if (m_sm.is_unchecked (change.m_new_state))
+      {
+        return label_text::borrow ("opened here");
+      }
+
+    if (change.m_new_state == m_sm.m_closed)
+      {
+        return change.formatted_print ("closed here");
+      }
+
+    return fd_diagnostic::describe_state_change (change);
+  }
+
+  label_text
+  describe_final_event (const evdesc::final_event &ev) final override
+  {
+    if (m_mode)
+      return ev.formatted_print ("%<write%> on closed file descriptor %qE
here",
+                                 m_arg);
+    else
+      return ev.formatted_print ("%<read%> on closed file descriptor %qE
here",
+                                 m_arg);
+  }
+
+private:
+  int m_mode;
+};
+
+class possible_invalid_fd_diag : public fd_diagnostic
+{
+public:
+  possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int mode)
+      : fd_diagnostic (sm, arg), m_mode (mode)
+  {
+  }
+
+  const char *
+  get_kind () const final override
+  {
+    return "possible_invalid_fd_diag";
+  }
+
+
+  int
+  get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_possible_invalid_file_descriptor;
+  }
+
+  bool
+  emit (rich_location *rich_loc) final override
+  {
+    if (m_mode)
+      return warning_at (rich_loc, get_controlling_option (),
+                         "%<write%> on possibly invalid file descriptor
%qE",
+                         m_arg);
+    else
+      return warning_at (rich_loc, get_controlling_option (),
+                         "%<read%> on possibly invalid file descriptor
%qE",
+                         m_arg);
+  }
+
+  label_text
+  describe_state_change (const evdesc::state_change &change) override
+  {
+    if (m_sm.is_unchecked (change.m_new_state))
+      {
+        m_first_open_event = change.m_event_id;
+        return label_text::borrow ("opened here");
+      }
+
+    return fd_diagnostic::describe_state_change (change);
+  }
+
+  label_text
+  describe_final_event (const evdesc::final_event &ev) final override
+  {
+    return ev.formatted_print ("%qE could be invalid: unchecked value from
%@",
+                               m_arg, &m_first_open_event);
+  }
+
+private:
+  diagnostic_event_id_t m_first_open_event;
+  int m_mode;
+};
+
+fd_state_machine::fd_state_machine (logger *logger)
+    : state_machine ("file-descriptor", logger),
+      m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
+      m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
+      m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
+      m_null (add_state ("fd-null")), m_closed (add_state ("fd-closed")),
+      m_valid_read_write (add_state ("fd-valid-read-write")),
+      m_valid_read_only (add_state ("fd-valid-read-only")),
+      m_valid_write_only (add_state ("fd-valid-write-only")),
+      m_stop (add_state ("fd-stop"))
+{
+}
+
+bool
+fd_state_machine::is_unchecked (state_t s) const
+{
+  return s == m_unchecked_read_write || s == m_unchecked_read_only
+         || s == m_unchecked_write_only;
+}
+
+bool
+fd_state_machine::is_valid (state_t s) const
+{
+  return s == m_valid_read_write || s == m_valid_read_only
+         || s == m_valid_write_only;
+}
+
+bool
+fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
+                           const gimple *stmt) const
+{
+  if (const gcall *call = dyn_cast<const gcall *> (stmt))
+    if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
+      {
+        if (is_named_call_p (callee_fndecl, "open", call, 2))
+          {
+            on_open (sm_ctxt, node, stmt, call);
+            return true;
+          } //  "open"
+
+        if (is_named_call_p (callee_fndecl, "close", call, 1))
+          {
+            on_close (sm_ctxt, node, stmt, call);
+            return true;
+          } //  "close"
+
+        if (is_named_call_p (callee_fndecl, "write", call, 3))
+          {
+            on_write (sm_ctxt, node, stmt, call);
+            return true;
+          } // "write"
+
+        if (is_named_call_p (callee_fndecl, "read", call, 3))
+          {
+            on_read (sm_ctxt, node, stmt, call);
+            return true;
+          } // "read"
+
+        return true;
+      }
+
+  return false;
+}
+
+void
+fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
+                           const gimple *stmt, const gcall *call) const
+{
+  tree lhs = gimple_call_lhs (call);
+  if (lhs)
+    {
+      tree arg = gimple_call_arg (call, 1);
+      if (TREE_CODE (arg) == INTEGER_CST)
+        {
+          int flag = TREE_INT_CST_LOW (arg);
+          /* FIXME: this code assumes the access modes on the host and
+          target are the same, which in practice might not be the case. */
+          if (flag == O_RDONLY)
+            sm_ctxt->on_transition (node, stmt, lhs, m_start,
+                                    m_unchecked_read_only);
+          else if (flag == O_WRONLY)
+            sm_ctxt->on_transition (node, stmt, lhs, m_start,
+                                    m_unchecked_write_only);
+          else
+            sm_ctxt->on_transition (node, stmt, lhs, m_start,
+                                    m_unchecked_read_write);
+        }
+    }
+  else
+    {
+      /* FIXME: add leak reporting */
+    }
+}
+
+void
+fd_state_machine::on_close (sm_context *sm_ctxt, const supernode *node,
+                            const gimple *stmt, const gcall *call) const
+{
+  tree arg = gimple_call_arg (call, 0);
+
+  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
+
+  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write, m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only, m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only, m_closed);
+  sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);
+
+  if (sm_ctxt->get_state (stmt, arg) == m_closed)
+    {
+      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
+      sm_ctxt->warn (node, stmt, arg, new double_close (*this, diag_arg));
+      sm_ctxt->set_next_state (stmt, arg, m_stop);
+    }
+}
+void
+fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
+                           const gimple *stmt, const gcall *call) const
+{
+  tree arg = gimple_call_arg (call, 0);
+  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
+
+  if (sm_ctxt->get_state (stmt, arg) == m_closed)
+    {
+      sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd (*this,
diag_arg, 0));
+    }
+
+  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
+    {
+      sm_ctxt->warn (node, stmt, arg,
+                     new possible_invalid_fd_diag (*this, diag_arg, 0));
+    }
+
+  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
+      || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
+    {
+      sm_ctxt->warn (node, stmt, arg,
+                     new read_write_diag_fd (*this, diag_arg, 0));
+    }
+}
+void
+fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node,
+                            const gimple *stmt, const gcall *call) const
+{
+  tree arg = gimple_call_arg (call, 0);
+  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
+
+  if (sm_ctxt->get_state (stmt, arg) == m_closed)
+    {
+      sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
diag_arg, 1));
+    }
+  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
+    {
+      sm_ctxt->warn (node, stmt, arg,
+                     new possible_invalid_fd_diag (*this, diag_arg, 1));
+    }
+
+  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
+      || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
+    {
+      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
+      sm_ctxt->warn (node, stmt, arg,
+                     new read_write_diag_fd (*this, diag_arg, 1));
+    }
+}
+
+void
+fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode *node,
+                                const gimple *stmt, const svalue *lhs,
+                                enum tree_code op, const svalue *rhs) const
+{
+  if (!rhs->all_zeroes_p ())
+    return;
+
+  if (op == GE_EXPR)
+    {
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
+                              m_valid_read_write);
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
+                              m_valid_read_only);
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
+                              m_valid_write_only);
+    }
+  else if (op == LT_EXPR)
+    {
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
m_null);
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
m_null);
+      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
m_null);
+    }
+}
+
+bool
+fd_state_machine::can_purge_p (state_t s) const
+{
+  if (is_unchecked (s) || is_valid (s))
+    return false;
+  else
+    return true;
+}
+
+pending_diagnostic *
+fd_state_machine::on_leak (tree var) const
+{
+  return new fd_leak (*this, var);
+}
+} // namespace
+
+state_machine *
+make_fd_state_machine (logger *logger)
+{
+  return new fd_state_machine (logger);
+}
+} // namespace ana
+
+#endif // ENABLE_ANALYZER
\ No newline at end of file
diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc
index 622cb0b7ab3..24c20b894cd 100644
--- a/gcc/analyzer/sm.cc
+++ b/gcc/analyzer/sm.cc
@@ -167,6 +167,7 @@ make_checkers (auto_delete_vec <state_machine> &out,
logger *logger)
 {
   out.safe_push (make_malloc_state_machine (logger));
   out.safe_push (make_fileptr_state_machine (logger));
+  out.safe_push (make_fd_state_machine (logger));
   /* The "taint" checker must be explicitly enabled (as it currently
      leads to state explosions that stop the other checkers working).  */
   if (flag_analyzer_checker)
diff --git a/gcc/analyzer/sm.h b/gcc/analyzer/sm.h
index 4cc54531c56..e80ef1fac37 100644
--- a/gcc/analyzer/sm.h
+++ b/gcc/analyzer/sm.h
@@ -301,6 +301,7 @@ extern state_machine *make_sensitive_state_machine
(logger *logger);
 extern state_machine *make_signal_state_machine (logger *logger);
 extern state_machine *make_pattern_test_state_machine (logger *logger);
 extern state_machine *make_va_list_state_machine (logger *logger);
+extern state_machine *make_fd_state_machine (logger *logger);

 } // namespace ana

diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
new file mode 100644
index 00000000000..985e9ac75de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
@@ -0,0 +1,26 @@
+int open(const char *, int mode);
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+
+void
+test_1 (const char *path)
+{
+  int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened here" }
*/
+  return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-775\\\]"
"warning" } */
+ /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)" "event"
{ target *-*-* } .-1 } */
+}
+
+void
+test_2 (const char *path)
+{
+  int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" } */
+  if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
descriptor" "event1" } */
+  /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
0'\\)..." "event2" { target *-*-* } .-1 } */
+  {
+    return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-775\\\]"
"warning" } */
+    /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1 } */
+    /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
"event2" { target *-*-* } .-2 } */
+  }
+}
+
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-2.c
b/gcc/testsuite/gcc.dg/analyzer/fd-2.c
new file mode 100644
index 00000000000..4ff4b46b42b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-2.c
@@ -0,0 +1,27 @@
+int open(const char *, int mode);
+void close(int fd);
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+
+void
+test_1 (const char *path)
+{
+    int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" }
*/
+    close (fd); /* { dg-message "\\(2\\) first 'close' here" "event1" } */
+    close (fd); /* { dg-warning "double 'close' of file descriptor 'fd'
\\\[CWE-1341\\\]" "warning" } */
+    /* { dg-message "\\(3\\) second 'close' here; first 'close' was at
\\(2\\)" "event2" { target *-*-* } .-1 } */
+}
+
+void
+test_2 (const char *path)
+{
+    int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" }
*/
+    if (fd < 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
descriptor" "event1" } */
+    /* { dg-message "\\(3\\) following 'false' branch \\(when 'fd >=
0'\\)..." "event2" { target *-*-* } .-1 } */
+        return;
+    close (fd); /* { dg-message "\\(4\\) ...to here" "event1" } */
+    /* { dg-message "\\(5\\) first 'close' here" "event2" { target *-*-* }
.-1 } */
+    close (fd); /* { dg-warning "double 'close' of file descriptor 'fd'
\\\[CWE-1341\\\]" "warning" } */
+    /* {dg-message "\\(6\\) seconf 'close' here; first was at \\(5\\)" ""
{ target *-*-* } .-1 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-3.c
b/gcc/testsuite/gcc.dg/analyzer/fd-3.c
new file mode 100644
index 00000000000..d32e3fe4923
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-3.c
@@ -0,0 +1,27 @@
+int open(const char *, int mode);
+void close(int fd);
+int write (int fd, void *buf, int nbytes);
+int read (int fd, void *buf, int nbytes);
+
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+
+
+void
+test_1 (const char *path, void *buf)
+{
+    int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" }
*/
+    write (fd, buf, 1); /* { dg-message "\\(2\\) 'fd' could be invalid:
unchecked value from \\(1\\)" } */
+    /* { dg-warning "'write' on possibly invalid file descriptor 'fd'"
"warning" { target *-*-* } .-1 } */
+    close(fd);
+}
+
+void
+test_2 (const char *path, void *buf)
+{
+    int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" }
*/
+    read (fd, buf, 1); /* { dg-message "\\(2\\) 'fd' could be invalid:
unchecked value from \\(1\\)" } */
+    /* { dg-warning "'read' on possibly invalid file descriptor 'fd'"
"warning" { target *-*-* } .-1 } */
+    close (fd);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-4.c
b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
new file mode 100644
index 00000000000..a802c4b6d42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-4.c
@@ -0,0 +1,37 @@
+int open(const char *, int mode);
+void close(int fd);
+int write (int fd, void *buf, int nbytes);
+int read (int fd, void *buf, int nbytes);
+
+#define O_RDONLY 0
+#define O_WRONLY 1
+#define O_RDWR 2
+
+
+void
+test_1 (const char *path, void *buf)
+{
+    int fd = open(path, O_RDONLY); /* { dg-message "opened here as
'read-only'" } */
+    if (fd >= 0) /* { dg-message "assuming 'fd' is a valid file
descriptor" "event1" } */
+    /* { dg-message "following 'true' branch \\(when 'fd >= 0'\\)..."
"event2" { target *-*-* } .-1 } */
+    {
+        write (fd, buf, 1); /* { dg-warning "'write' on 'read-only' file
descriptor 'fd'" "warning" } */
+        /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
} */
+        /* { dg-message "\\(5\\) 'write' on 'read-only' file descriptor
'fd'" "event2" { target *-*-* } .-2 } */
+        close (fd);
+    }
+}
+
+void
+test_2 (const char *path, void *buf)
+{
+    int fd = open(path, O_WRONLY); /* { dg-message "opened here as
'write-only'" } */
+    if (fd >= 0) /* { dg-message "assuming 'fd' is a valid file
descriptor" "event1" } */
+    /* { dg-message "following 'true' branch \\(when 'fd >= 0'\\)..."
"event2" { target *-*-* } .-1 } */
+    {
+        read (fd, buf, 1); /* { dg-warning "'read' on 'write-only' file
descriptor 'fd'" "warning" } */
+        /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
} */
+        /* { dg-message "\\(5\\) 'read' on 'write-only' file descriptor
'fd'" "event2" { target *-*-* } .-2 } */
+        close (fd);
+    }
+}
\ No newline at end of file

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 16:30 [PATCH] static analysis support for posix file desccriptor APIs Mir Immad
@ 2022-06-21 16:30 ` Mir Immad
  2022-06-21 18:34 ` David Malcolm
  1 sibling, 0 replies; 8+ messages in thread
From: Mir Immad @ 2022-06-21 16:30 UTC (permalink / raw)
  To: gcc, David Malcolm

 This is a patch for extending static analysis support for posix file
descriptor APIs which is a part of my GSoC project. I've written a few
testcases, which are all passing. There are a few TODOs like adding the
copyright header and adding docs to gcc/doc/invoke.texi.

I'm looking for suggestions for whether the names of warnings I have chosen
are appropriate, and configuring my vsocde editor to follow the GNU coding
conventions.

- Immad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 16:30 [PATCH] static analysis support for posix file desccriptor APIs Mir Immad
  2022-06-21 16:30 ` Mir Immad
@ 2022-06-21 18:34 ` David Malcolm
  2022-06-21 18:50   ` Joseph Myers
  2022-06-23 18:28   ` Mir Immad
  1 sibling, 2 replies; 8+ messages in thread
From: David Malcolm @ 2022-06-21 18:34 UTC (permalink / raw)
  To: Mir Immad, gcc

Hi Immad, thanks for this patch.

Overall, looks like you're making good progress.

Various notes and nitpicks inline below, throughout...

On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote:

[...snip...]

> 
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 23dfc797cea..d99acfbb069 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -54,6 +54,10 @@ The minimum number of supernodes within a function
> for
> the analyzer to consider
>  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
> Init(200) Param
>  The maximum depth of exploded nodes that should appear in a dot dump
> before switching to a less verbose format.

I'm not yet sure if this is a good idea, but I wonder if all of these
warnings ought to have a "-Wanalyzer-fd-" prefix?  "file-descriptor" is
rather long, and the analyzer's warnings are already tending to be on
the long side, and having a consistent prefix might make it easier for
users to grok them.

(though this feels like a "what color do we paint the bikeshed?" issue;
see e.g. https://bikeshed.org/ )

> 
> +Wanalyzer-double-close
> +Common Var(warn_analyzer_double_close) Init(1) Warning
> +Warn about code paths in which a file descriptor can be closed more
> than
> once.

...so this could be Wanalyzer-fd-double-close

> +
>  Wanalyzer-double-fclose
>  Common Var(warn_analyzer_double_fclose) Init(1) Warning
>  Warn about code paths in which a stdio FILE can be closed more than
> once.
> @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
>  Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
>  Warn about code paths in which sensitive data is written to a file.
> 
> +Wanalyzer-file-descriptor-leak
> +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
> +Warn about code paths in which a file descriptor is not closed.

...and Wanalyzer-fd-leak

> +
>  Wanalyzer-file-leak
>  Common Var(warn_analyzer_file_leak) Init(1) Warning
>  Warn about code paths in which a stdio FILE is not closed.
> @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
>  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
>  Warn about code paths in which the wrong deallocation function is
> called.
> 
> +Wanalyzer-mismatching-operation-on-file-descriptor
> +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor)
> Init(1)
> Warning
> +Warn about the code paths in which read on write-only file descriptor
> or
> write on read-only file descriptor is called.

Maybe:
  Wanalyzer-fd-access-mode-mismatch
?

Lose the "the" in "the code paths", so maybe: "Warn about code paths in
which a write is attempted on a read-only file descriptor, or vice-
versa."


> +
> +Wanalyzer-possible-invalid-file-descriptor
> +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1)
> Warning
> +warn about code paths in which a possibly invalid file descriptor is
> passed to a must-be-a-valid file descriptor function argument.
> +

Wanalyzer-fd-use-without-check ?

FWIW I believe that this isn't strictly speaking a security issue, that
if code blithely assumes the fd is open and then tries to read/write
it, the access will fail with errno set to EBADF.  But it seems worth
warning for; code that doesn't bother implementing error-checking seems
suspect to me.



> 
> +Wanalyzer-read-write-on-closed-file-descriptor

Maybe: Wanalyzer-fd-use-after-close ?

> +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
> Warning
> +Warn about code paths in which a read on write in performed on a
> closed
> file descriptor.

Typo "read on write"; maybe replace with "Warn about code paths in
which a file descriptor is used after being closed." ?



[...snip...]

> +
> diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> new file mode 100644
> index 00000000000..23e79e3e16a
> --- /dev/null
> +++ b/gcc/analyzer/sm-fd.cc
> @@ -0,0 +1,697 @@
> +/* FIXME: add copyright header. */

Obviously the FIXME needs fixing :)

> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "function.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "options.h"
> +#include "diagnostic-path.h"
> +#include "diagnostic-metadata.h"
> +#include "function.h"
> +#include "json.h"
> +#include "analyzer/analyzer.h"
> +#include "diagnostic-event-id.h"
> +#include "analyzer/analyzer-logging.h"
> +#include "analyzer/sm.h"
> +#include "analyzer/pending-diagnostic.h"
> +#include "analyzer/function-set.h"
> +#include "analyzer/analyzer-selftests.h"
> +#include "tristate.h"
> +#include "selftest.h"
> +#include "analyzer/call-string.h"
> +#include "analyzer/program-point.h"
> +#include "analyzer/store.h"
> +#include "analyzer/region-model.h"
> +
> +#include <unistd.h>

Presumably you're including <unistd.h> here for the definitions of 
O_RDONLY and O_WRONLY.

As we discussed off-list, I think we want to avoid the use of the
host's O_RDONLY and O_WRONLY in this file, as they might not have the
same values as on the target, and I'm not sure that the host is even
guaranteed to have a <unistd.h>.

I think we want our own enum representing the three access modes:
READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting
target flags to access mode.

So ultimately that's something we want to fix, though exactly how, I'm
not quite sure; we presumably want to look up the target's definitions
of those macros - but I don't think the analyzer has access to the
cpp_reader instance from the frontend.



> +#if ENABLE_ANALYZER
> +
> +namespace ana
> +{
> +
> +namespace
> +{
> +class fd_state_machine : public state_machine
> +{
> +public:
> +  fd_state_machine (logger *logger);
> +
> +  bool
> +  inherited_state_p () const final override
> +  {
> +    return false;
> +  }
> +
> +  state_machine::state_t
> +  get_default_state (const svalue *sval) const final override
> +  {
> +    if (tree cst = sval->maybe_get_constant ())
> +      {
> +        int val = TREE_INT_CST_LOW (cst);
> +        if (val < 0)
> +          {
> +            return m_null;

Can we rename the invalid state to "m_invalid"?  I find m_null
confusing.

> +          }
> +      }
> +    return m_start;
> +  }
> +
> +  bool on_stmt (sm_context *sm_ctxt, const supernode *node,
> +                const gimple *stmt) const final override;
> +
> +  void on_condition (sm_context *sm_ctxt, const supernode *node,
> +                     const gimple *stmt, const svalue *lhs, const
> tree_code op,
> +                     const svalue *rhs) const final override;
> +
> +  bool can_purge_p (state_t s) const final override;
> +  pending_diagnostic *on_leak (tree var) const final override;
> +
> +  bool is_unchecked (state_t s) const;
> +  bool is_valid (state_t s) const;
> +
> +  /* State for a file descriptor that hasn't been checked for
> validity. */
> +  state_t m_unchecked_read_write;
> +
> +  /* State for a file descriptor opened with O_RDONLY flag. */
> +  state_t m_unchecked_read_only;
> +
> +  /* State for a file descriptor opneded with O_WRONLY flag. */
> +  state_t m_unchecked_write_only;

Typo: opneded -> opened.

Maybe group these three states in the source like this:

  /* States representing a file descriptor that hasn't yet been
     checked for validity after opening, for three different
     access modes.  */
  state_t m_unchecked_read_write;
  state_t m_unchecked_read_only;
  state_t m_unchecked_write_only;

> +
> +  /* State for a file descriptor that is known to be invalid. */
> +  state_t m_null;

Let's rename this to m_invalid;
The condition is that it's < 0, right?  If so, please update the
comment to clarify that.

> +
> +  /* State for a file descriptor that was opened in read-write mode
> and is
> known
> +   * to be valid. */
> +  state_t m_valid_read_write;
> +
> +  /* State for a file descriptor that was opened as read-only and is
> known
> to be
> +   * valid*/
> +  state_t m_valid_read_only;
> +
> +  /* State for a file descriptor that was opened as write-only and is
> known to
> +   * be valid*/
> +  state_t m_valid_write_only;

Maybe group these three states in a similar way to above.


> +
> +  /* State for a file descriptor that has been closed.*/
> +  state_t m_closed;
> +
> +  /* State for a file descriptor that we do not want to track anymore
> . */
> +  state_t m_stop;
> +
> +private:
> +  void on_open (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                const gcall *call) const;
> +  void on_close (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                 const gcall *call) const;
> +  void on_read (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                const gcall *call) const;
> +  void on_write (sm_context *sm_ctxt, const supernode *node, const
> gimple
> *stmt,
> +                 const gcall *call) const;
> +};
> +
> +/* Base diagnostic class relative to fd_state_machine. */
> +class fd_diagnostic : public pending_diagnostic
> +{
> +public:
> +  fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm),
> m_arg
> (arg)
> +  {
> +  }
> +
> +  bool
> +  subclass_equal_p (const pending_diagnostic &base_other) const
> override
> +  {
> +    return same_tree_p (m_arg, ((const fd_diagnostic
> &)base_other).m_arg);
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (change.m_old_state == m_sm.get_start_state ()
> +        && m_sm.is_unchecked (change.m_new_state))
> +      {
> +        if (change.m_new_state == m_sm.m_unchecked_read_write)
> +          return change.formatted_print ("opened here as %<read-
> write%>");
> +
> +        if (change.m_new_state == m_sm.m_unchecked_read_only)
> +          return change.formatted_print ("opened here as %<read-
> only%>");
> +
> +        if (change.m_new_state == m_sm.m_unchecked_write_only)
> +          return change.formatted_print ("opened here as %<write-
> only%>");
> +      }
> +
> +    if (m_sm.is_unchecked (change.m_old_state)
> +        && m_sm.is_valid (change.m_new_state))
> +      if (change.m_expr)
> +        return change.formatted_print (
> +            "assuming %qE is a valid file descriptor", change.m_expr);
> +      else
> +        return change.formatted_print ("assuming a valid file
> descriptor");

Maybe reword to "when %qE is a valid file descriptor (>= 0)" ?


> +
> +    if (m_sm.is_unchecked (change.m_old_state)
> +        && change.m_new_state == m_sm.m_null)
> +      if (change.m_expr)
> +        return change.formatted_print (
> +            "assuming %qE is an invalid file descriptor",
> change.m_expr);
> +      else
> +        return change.formatted_print ("assuming an invalid file
> descriptor");

Maybe reword to "when %qE is not a valid file descriptor (< 0)" ?


> +
> +    return label_text ();
> +  }
> +
> +protected:
> +  const fd_state_machine &m_sm;
> +  tree m_arg;
> +};
> +
> +class fd_leak : public fd_diagnostic
> +{
> +public:
> +  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
> arg)
> {}
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "fd_leak";
> +  }
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_file_descriptor_leak;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    /*CWE-775: Missing Release of File Descriptor or Handle after
> Effective
> +     * Lifetime
> +     */
> +    diagnostic_metadata m;
> +    m.add_cwe (775);
> +    if (m_arg)
> +      return warning_meta (rich_loc, m, get_controlling_option (),
> +                           "leak of file descriptor %qE", m_arg);
> +    else
> +      return warning_meta (rich_loc, m, get_controlling_option (),
> +                           "leak of file descriptor");
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) final
> override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        m_open_event = change.m_event_id;
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_open_event.known_p ())
> +      {
> +        if (ev.m_expr)
> +          return ev.formatted_print ("%qE leaks here; was opened at
> %@",
> +                                     ev.m_expr, &m_open_event);
> +        else
> +          return ev.formatted_print ("leaks here; was opened at %@",
> +                                     &m_open_event);
> +      }
> +    else
> +      {
> +        if (ev.m_expr)
> +          return ev.formatted_print ("%qE leaks here", ev.m_expr);
> +        else
> +          return ev.formatted_print ("leaks here");
> +      }
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_open_event;
> +};
> +
> +class read_write_diag_fd : public fd_diagnostic

Maybe rename to fd_access_mode_mismatch ?


> +{
> +public:
> +  /* mode specfies whether read on write was attempted or vice versa.
> +   */

Make "mode" be an enum rather than an int.

> +  read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
> +      : m_mode (mode), fd_diagnostic (sm, arg)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "read_write_diag_fd";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      {
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%<write%> on %<read-only%> file descriptor
> %qE",
> +                           m_arg);
> +      }
> +    else
> +      {
> +        return warning_at (rich_loc, get_controlling_option (),
> +                           "%<read%> on %<write-only%> file descriptor
> %qE",
> +                           m_arg);

Rather than hardcoding "%<write%>" and "%<read%>", let's generalize
this so that the diagnostic has a "tree fndecl" argument and m_fndecl
field, which was the function that was called that attempted to used
the fd in the wrong way.  You can then write %qE to print it in quotes.

> +      }
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    return fd_diagnostic::describe_state_change (change);
> +  }

Looks like this override is redundant (albeit harmless): it can
implicitly inherit the fd_diagnostic implementation.


> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_mode)
> +      return ev.formatted_print ("%qs on %<read-only%> file descriptor
> %qE",
> +                                 "write", m_arg);
> +    else
> +      return ev.formatted_print ("%qs on %<write-only%> file
> descriptor
> %qE",
> +                                 "read", m_arg);
> +  }

and generalize this also to work on m_fndecl.

> +
> +private:
> +  int m_mode;

As noted above, please make this an enum.


> +};
> +
> +class double_close : public fd_diagnostic
> +{
> +public:
> +  double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic
> (sm,
> arg)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "double_close";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_double_close;
> +  }
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    diagnostic_metadata m;
> +    // CWE-1341: Multiple Releases of Same Resource or Handle
> +    m.add_cwe (1341);
> +    return warning_meta (rich_loc, m, get_controlling_option (),
> +                         "double %<close%> of file descriptor %qE",
> m_arg);
> +  }
> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      return label_text::borrow ("opened here");
> +
> +    if (change.m_new_state == m_sm.m_closed)
> +      {
> +        m_first_close_event = change.m_event_id;
> +        return change.formatted_print ("first %qs here", "close");
> +      }
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_first_close_event.known_p ())
> +      return ev.formatted_print ("second %qs here; first %qs was at
> %@",
> +                                 "close", "close",
> &m_first_close_event);
> +    return ev.formatted_print ("second %qs here", "close");
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_first_close_event;
> +};
> +
> +class read_write_on_closed_fd : public fd_diagnostic

Maybe rename to "use_after_close_of_fd" or somesuch.

> +{
> +public:
> +  read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int
> mode)
> +      : fd_diagnostic (sm, arg), m_mode (mode)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    if (m_mode)
> +      return "write_on_closed_fd";
> +    else
> +      return "read_on_closed_fd";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<write%> on closed file descriptor %qE",
> m_arg);
> +    else
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<read%> on closed file descriptor %qE",
> m_arg);
> +  }

As above, maybe generalize this by adding a "tree m_fndecl;" field to
the class, giving what function was called on the the closed fd.  I
think you might be able to eliminate m_mode from this subclass by doing
that instead.

> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    if (change.m_new_state == m_sm.m_closed)
> +      {
> +        return change.formatted_print ("closed here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    if (m_mode)
> +      return ev.formatted_print ("%<write%> on closed file descriptor
> %qE
> here",
> +                                 m_arg);
> +    else
> +      return ev.formatted_print ("%<read%> on closed file descriptor
> %qE
> here",
> +                                 m_arg);
> +  }

Likewise here.

> +
> +private:
> +  int m_mode;
> +};
> +
> +class possible_invalid_fd_diag : public fd_diagnostic

Rename to unchecked_use_of_fd ?

> +{
> +public:
> +  possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int
> mode)
> +      : fd_diagnostic (sm, arg), m_mode (mode)
> +  {
> +  }
> +
> +  const char *
> +  get_kind () const final override
> +  {
> +    return "possible_invalid_fd_diag";
> +  }
> +
> +
> +  int
> +  get_controlling_option () const final override
> +  {
> +    return OPT_Wanalyzer_possible_invalid_file_descriptor;
> +  }
> +
> +  bool
> +  emit (rich_location *rich_loc) final override
> +  {
> +    if (m_mode)
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<write%> on possibly invalid file
> descriptor
> %qE",
> +                         m_arg);
> +    else
> +      return warning_at (rich_loc, get_controlling_option (),
> +                         "%<read%> on possibly invalid file descriptor
> %qE",
> +                         m_arg);
> +  }

As above, generalize this with a "tree m_fndecl;" capturing what fndecl
was called on the fd without checking, and with that, I think you can
eliminate the m_mode field.


> +
> +  label_text
> +  describe_state_change (const evdesc::state_change &change) override
> +  {
> +    if (m_sm.is_unchecked (change.m_new_state))
> +      {
> +        m_first_open_event = change.m_event_id;
> +        return label_text::borrow ("opened here");
> +      }
> +
> +    return fd_diagnostic::describe_state_change (change);
> +  }
> +
> +  label_text
> +  describe_final_event (const evdesc::final_event &ev) final override
> +  {
> +    return ev.formatted_print ("%qE could be invalid: unchecked value
> from
> %@",
> +                               m_arg, &m_first_open_event);
> +  }
> +
> +private:
> +  diagnostic_event_id_t m_first_open_event;
> +  int m_mode;
> +};
> +
> +fd_state_machine::fd_state_machine (logger *logger)
> +    : state_machine ("file-descriptor", logger),
> +      m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
> +      m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
> +      m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
> +      m_null (add_state ("fd-null")), m_closed (add_state ("fd-
> closed")),
> +      m_valid_read_write (add_state ("fd-valid-read-write")),
> +      m_valid_read_only (add_state ("fd-valid-read-only")),
> +      m_valid_write_only (add_state ("fd-valid-write-only")),
> +      m_stop (add_state ("fd-stop"))
> +{
> +}
> +
> +bool
> +fd_state_machine::is_unchecked (state_t s) const
> +{
> +  return s == m_unchecked_read_write || s == m_unchecked_read_only
> +         || s == m_unchecked_write_only;

With multiline conditionals, once they occupy multiple lines, it's more
readable to split them like this:

  return (s == m_unchecked_read_write
          || s == m_unchecked_read_only
          || s == m_unchecked_write_only);


> +}
> +
> +bool
> +fd_state_machine::is_valid (state_t s) const
> +{
> +  return s == m_valid_read_write || s == m_valid_read_only
> +         || s == m_valid_write_only;
> +}

Similar here.

> +
> +bool
> +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt) const
> +{
> +  if (const gcall *call = dyn_cast<const gcall *> (stmt))
> +    if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
> +      {
> +        if (is_named_call_p (callee_fndecl, "open", call, 2))
> +          {
> +            on_open (sm_ctxt, node, stmt, call);
> +            return true;
> +          } //  "open"
> +
> +        if (is_named_call_p (callee_fndecl, "close", call, 1))
> +          {
> +            on_close (sm_ctxt, node, stmt, call);
> +            return true;
> +          } //  "close"
> +
> +        if (is_named_call_p (callee_fndecl, "write", call, 3))
> +          {
> +            on_write (sm_ctxt, node, stmt, call);
> +            return true;
> +          } // "write"
> +
> +        if (is_named_call_p (callee_fndecl, "read", call, 3))
> +          {
> +            on_read (sm_ctxt, node, stmt, call);
> +            return true;
> +          } // "read"
> +

You'll probably want to pass callee_fndecl to some of these "on_FNNAME"
subroutines, so that it's available for the new m_fndecl fields of the
warnings I suggested above.


> +        return true;
> +      }
> +
> +  return false;
> +}
> +
> +void
> +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call)
> const
> +{
> +  tree lhs = gimple_call_lhs (call);
> +  if (lhs)
> +    {
> +      tree arg = gimple_call_arg (call, 1);
> +      if (TREE_CODE (arg) == INTEGER_CST)
> +        {
> +          int flag = TREE_INT_CST_LOW (arg);
> +          /* FIXME: this code assumes the access modes on the host and
> +          target are the same, which in practice might not be the
> case. */
> +          if (flag == O_RDONLY)
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_read_only);
> +          else if (flag == O_WRONLY)
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_write_only);
> +          else
> +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> +                                    m_unchecked_read_write);

Let's introduce an enum for the access mode, and a subroutine to get
the enum value from "flag", so that we can isolate this "FIXME" in that
subroutine.

> +        }
> +    }
> +  else
> +    {
> +      /* FIXME: add leak reporting */

Do you have a testcase that exhibits this behavior?


> +    }
> +}
> +
> +void
> +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode
> *node,
> +                            const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +
> +  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> +
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only,
> m_closed);
> +  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only,
> m_closed);


> +  sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);

Perhaps we should warn for operations on known invalid FDs? (and thus
complain about close on a known-invalid FD).

I believe that the OS will handle such code gracefully by setting errno
to EBADF.


> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +      sm_ctxt->warn (node, stmt, arg, new double_close (*this,
> diag_arg));
> +      sm_ctxt->set_next_state (stmt, arg, m_stop);
> +    }
> +}
> +void
> +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> +                           const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd
> (*this,
> diag_arg, 0));
> +    }

Maybe consolidate on_read and on_write by adding a subroutine that 
checks for m_closed, and for checking access mode (maybe a
"check_for_open_fd" that takes an access mode enum value amongst other
params.  If you pass around caller_fndecl, I think much of this code
can be shared that way between on_read and on_write (which will help
simplify things if we want to support further functions).

> +
> +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new possible_invalid_fd_diag (*this, diag_arg,
> 0));
> +    }
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
> +      || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new read_write_diag_fd (*this, diag_arg, 0));
> +    }
> +}
> +void
> +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode
> *node,
> +                            const gimple *stmt, const gcall *call)
> const
> +{
> +  tree arg = gimple_call_arg (call, 0);
> +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> +    {
> +      sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
> diag_arg, 1));
> +    }
> +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> +    {
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new possible_invalid_fd_diag (*this, diag_arg,
> 1));
> +    }
> +
> +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
> +      || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
> +    {
> +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> +      sm_ctxt->warn (node, stmt, arg,
> +                     new read_write_diag_fd (*this, diag_arg, 1));
> +    }
> +}
> +
> +void
> +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode
> *node,
> +                                const gimple *stmt, const svalue *lhs,
> +                                enum tree_code op, const svalue *rhs)
> const
> +{
> +  if (!rhs->all_zeroes_p ())
> +    return;
> +
> +  if (op == GE_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> +                              m_valid_read_write);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> +                              m_valid_read_only);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> +                              m_valid_write_only);
> +    }
> +  else if (op == LT_EXPR)
> +    {
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> m_null);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> m_null);
> +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> m_null);
> +    }
> +}
> +
> +bool
> +fd_state_machine::can_purge_p (state_t s) const
> +{
> +  if (is_unchecked (s) || is_valid (s))
> +    return false;
> +  else
> +    return true;
> +}
> +
> +pending_diagnostic *
> +fd_state_machine::on_leak (tree var) const
> +{
> +  return new fd_leak (*this, var);
> +}
> +} // namespace
> +
> +state_machine *
> +make_fd_state_machine (logger *logger)
> +{
> +  return new fd_state_machine (logger);
> +}
> +} // namespace ana
> +
> +#endif // ENABLE_ANALYZER
> \ No newline at end of file

[...snip...]

> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> new file mode 100644
> index 00000000000..985e9ac75de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> @@ -0,0 +1,26 @@
> +int open(const char *, int mode);
> +#define O_RDONLY 0
> +#define O_WRONLY 1
> +#define O_RDWR 2

As noted above, we want to somehow use the values for the O_ macros
from the test code, rather than the <unistd.h> with which sm-fd.cc was
built.

> +
> +void
> +test_1 (const char *path)
> +{
> +  int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened
> here" }
> */
> +  return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> 775\\\]"
> "warning" } */
> + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)"
> "event"
> { target *-*-* } .-1 } */
> +}
> +
> +void
> +test_2 (const char *path)
> +{
> +  int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here"
> } */
> +  if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
> descriptor" "event1" } */
> +  /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
> 0'\\)..." "event2" { target *-*-* } .-1 } */
> +  {
> +    return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> 775\\\]"
> "warning" } */
> +    /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
> } */
> +    /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
> "event2" { target *-*-* } .-2 } */
> +  }
> +}

Looks good, but in future don't feel the need to express all of the
events: I think the significant ones we should test for here are (1),
(2), (5), whereas specifying directives for (3) and (4) are probably
overkill.  prune.exp will prune away stray lines beginning with "note:
" so we don't need to write directives for all events, just the ones we
care about.


[...snip...]

Lots of good test coverage here.  With a state-machine based
diagnostic, it's probably best to try to have coverage of all possible
calls in all possible states - that way we can think systematically
about what we should be warning for, and what we shouldn't be warning
for.   Note that once you've got test coverage of the basic events
(like you have), you don't have to have dg-message directives for all
of the events in all of the rest of the tests (that would be excessive,
and too much work to create and maintain).

A few other nits:
- the Subject line for the commit should have an  "analyzer: " prefix
and reference the bugzilla problem report (PR 106003)
- for better or worse, the project requires ChangeLog entries.  You can
use the script ./contrib/mklog.py to help generate this.  Some more
information is here:
  https://www.gnu.org/prep/standards/html_node/Change-Logs.html
In GCC our ChangeLog entries are part of the git commit message, and we
have a serverside script to update the ChangeLog files in the source
tree (to avoid constantly having merge conflicts).

I think we'll have at least one more iteration of this patch to review,
and before I can approve you to push a later version of the patch to
trunk, you'll need to "fully" test it, by doing a bootstrap, and
running the full testsuite on the result.  Do you need some help on how
to do that?  Do you have access to the GCC compile farm yet?

I'll see if I can figure out how to access the preprocessor from the
analyzer.

Thanks again for the patch; hope the above makes sense and is
constructive

Dave





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 18:34 ` David Malcolm
@ 2022-06-21 18:50   ` Joseph Myers
  2022-06-21 20:31     ` David Malcolm
  2022-06-23 18:28   ` Mir Immad
  1 sibling, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2022-06-21 18:50 UTC (permalink / raw)
  To: David Malcolm; +Cc: Mir Immad, gcc

On Tue, 21 Jun 2022, David Malcolm via Gcc wrote:

> So ultimately that's something we want to fix, though exactly how, I'm
> not quite sure; we presumably want to look up the target's definitions
> of those macros - but I don't think the analyzer has access to the
> cpp_reader instance from the frontend.

Normally that would use a target hook to specify the values of those 
macros.  The default would be the traditional Unix values of 0, 1, 2 for 
O_RDONLY, O_WRONLY, O_RDWR, while Hurd would need its own definition of 
the hook to use values 1, 2, 3 (I don't know if there are any non-Hurd 
systems not using the traditional values).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 18:50   ` Joseph Myers
@ 2022-06-21 20:31     ` David Malcolm
  2022-06-21 21:18       ` Joseph Myers
  0 siblings, 1 reply; 8+ messages in thread
From: David Malcolm @ 2022-06-21 20:31 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Mir Immad, gcc

On Tue, 2022-06-21 at 18:50 +0000, Joseph Myers wrote:
> On Tue, 21 Jun 2022, David Malcolm via Gcc wrote:
> 
> > So ultimately that's something we want to fix, though exactly how,
> > I'm
> > not quite sure; we presumably want to look up the target's
> > definitions
> > of those macros - but I don't think the analyzer has access to the
> > cpp_reader instance from the frontend.
> 
> Normally that would use a target hook to specify the values of those 
> macros.  The default would be the traditional Unix values of 0, 1, 2
> for 
> O_RDONLY, O_WRONLY, O_RDWR, while Hurd would need its own definition of
> the hook to use values 1, 2, 3 (I don't know if there are any non-Hurd 
> systems not using the traditional values).


I found that it's at least theoretically possible to access the
preprocessor from within the analyzer; e.g. given:

#define O_RDONLY 42

and then breaking in the debugger in ana::run_checkers:

(gdb) p cpp_lookup (parse_in, (unsigned char *)"O_RDONLY",
(size_t)strlen ("O_RDONLY"))
$1 = (cpp_hashnode *) 0x7fffea794248
(gdb) p *$1
$2 = {ident = {str = 0x7fffea7877d0 "O_RDONLY", len = 8, hash_value =
3761648590}, is_directive = 0, directive_index = 0, rid_code = 0, flags
= 0, type = NT_USER_MACRO, deferred = 0, value = {answers =
0x7fffea7a5480, macro = 0x7fffea7a5480, builtin = 3933885568, arg_index
= 21632}}
(gdb) p $1->value.macro.exp.tokens.type
$6 = CPP_NUMBER
(gdb) p $1->value.macro.exp.tokens.val.str
$7 = {len = 2, text = 0x3ec182b "42"}

so given the C/C++ FEs it might be possible for the analyzer to try to
lookup the value of O_RDONLY.

But I'm wary of this; e.g. the LTO case, and for non-trivial macro
definitions.

Joseph: is the target hook the way to go with this?  Would it look
something like:

DEFHOOK (fd_access_mode, "FIXME", int (int))

taking the build configuration's O_ access mode, and returning the
target configurations's access mode (so e.g fd_access_mode (O_RDONLY)
would return the target's int) ?

Thanks
Dave 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 20:31     ` David Malcolm
@ 2022-06-21 21:18       ` Joseph Myers
  0 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2022-06-21 21:18 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Tue, 21 Jun 2022, David Malcolm via Gcc wrote:

> Joseph: is the target hook the way to go with this?  Would it look
> something like:
> 
> DEFHOOK (fd_access_mode, "FIXME", int (int))
> 
> taking the build configuration's O_ access mode, and returning the
> target configurations's access mode (so e.g fd_access_mode (O_RDONLY)
> would return the target's int) ?

I think it would probably take the GCC-specific READ_WRITE, READ_ONLY, 
WRITE_ONLY you suggested, rather than the host's O_*.  (Consider that it 
might make sense in future to have checks for other O_* flags, some of 
which might not be supported on the host.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-21 18:34 ` David Malcolm
  2022-06-21 18:50   ` Joseph Myers
@ 2022-06-23 18:28   ` Mir Immad
  2022-06-24 18:05     ` David Malcolm
  1 sibling, 1 reply; 8+ messages in thread
From: Mir Immad @ 2022-06-23 18:28 UTC (permalink / raw)
  To: David Malcolm, gcc

 Hi Dave,
Thanks for the suggestions,

I changed most of the things that you suggested, however reporting for
warnings like close of known invalid fd was problematic:

consider the following code:

if (fd >= 0)
{ write (fd,...); }
close(fd);

As I was checking the exploded graph for this; the "close(fd)" stmt when
visited by the FALSE edge of if stmt (fd < 0) finds fd to be in m_invalid
state; hence warns about "close on known invalid fd" which I believe is not
true as fd at that point is not *known* to be invalid. I spent quite some
time on this and decided not to add this diagnosis for now.

Also, when close transitions m_invalid to m_close; this leads to double
close even when the second close is outside the ( < 0 ) condition which
again does not seem right.
if (fd < 0)
close(fd):
close(fd); // double close here.

> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params.  If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions)

I hope I got this right.


> > +        }
> > +    }
> > +  else
> > +    {
>  >+      /* FIXME: add leak reporting */

> Do you have a testcase that exhibits this behavior?

I was thinking of the following case:
void test()
{
 open(..);
}
Here the resources are leaked because there is no way to free them.

In "read" and "write" funcs, I'm warning for unchecked_use_of_fd and
access_mode_mismatch only when we know fd is not in closed state.
Otherwise, such code leads to lot of irrelevant warnings, example:

void test1(const char *path, void *buf) {
  int fd = open(path, O_RDONLY);
  if (fd >= 0)
  {
  close(fd);
  read(fd, buf, 1); // read on closed fd + read on possibly invalid fd
  write(fd, buf, 1); // write on closed fd + write on possibly invlid fd
  }
}


Adding docs for new options still remains pending. I added some new tests;
and all tests are passing. The stuff about O_* macros is left as-is.

 I'm sending a patch in another email.

Thanks a lot.

On Wed, Jun 22, 2022 at 12:04 AM David Malcolm <dmalcolm@redhat.com> wrote:

> Hi Immad, thanks for this patch.
>
> Overall, looks like you're making good progress.
>
> Various notes and nitpicks inline below, throughout...
>
> On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote:
>
> [...snip...]
>
> >
> > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> > index 23dfc797cea..d99acfbb069 100644
> > --- a/gcc/analyzer/analyzer.opt
> > +++ b/gcc/analyzer/analyzer.opt
> > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function
> > for
> > the analyzer to consider
> >  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
> > Init(200) Param
> >  The maximum depth of exploded nodes that should appear in a dot dump
> > before switching to a less verbose format.
>
> I'm not yet sure if this is a good idea, but I wonder if all of these
> warnings ought to have a "-Wanalyzer-fd-" prefix?  "file-descriptor" is
> rather long, and the analyzer's warnings are already tending to be on
> the long side, and having a consistent prefix might make it easier for
> users to grok them.
>
> (though this feels like a "what color do we paint the bikeshed?" issue;
> see e.g. https://bikeshed.org/ )
>
> >
> > +Wanalyzer-double-close
> > +Common Var(warn_analyzer_double_close) Init(1) Warning
> > +Warn about code paths in which a file descriptor can be closed more
> > than
> > once.
>
> ...so this could be Wanalyzer-fd-double-close
>
> > +
> >  Wanalyzer-double-fclose
> >  Common Var(warn_analyzer_double_fclose) Init(1) Warning
> >  Warn about code paths in which a stdio FILE can be closed more than
> > once.
> > @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
> >  Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
> >  Warn about code paths in which sensitive data is written to a file.
> >
> > +Wanalyzer-file-descriptor-leak
> > +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
> > +Warn about code paths in which a file descriptor is not closed.
>
> ...and Wanalyzer-fd-leak
>
> > +
> >  Wanalyzer-file-leak
> >  Common Var(warn_analyzer_file_leak) Init(1) Warning
> >  Warn about code paths in which a stdio FILE is not closed.
> > @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
> >  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
> >  Warn about code paths in which the wrong deallocation function is
> > called.
> >
> > +Wanalyzer-mismatching-operation-on-file-descriptor
> > +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor)
> > Init(1)
> > Warning
> > +Warn about the code paths in which read on write-only file descriptor
> > or
> > write on read-only file descriptor is called.
>
> Maybe:
>   Wanalyzer-fd-access-mode-mismatch
> ?
>
> Lose the "the" in "the code paths", so maybe: "Warn about code paths in
> which a write is attempted on a read-only file descriptor, or vice-
> versa."
>
>
> > +
> > +Wanalyzer-possible-invalid-file-descriptor
> > +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1)
> > Warning
> > +warn about code paths in which a possibly invalid file descriptor is
> > passed to a must-be-a-valid file descriptor function argument.
> > +
>
> Wanalyzer-fd-use-without-check ?
>
> FWIW I believe that this isn't strictly speaking a security issue, that
> if code blithely assumes the fd is open and then tries to read/write
> it, the access will fail with errno set to EBADF.  But it seems worth
> warning for; code that doesn't bother implementing error-checking seems
> suspect to me.
>
>
>
> >
> > +Wanalyzer-read-write-on-closed-file-descriptor
>
> Maybe: Wanalyzer-fd-use-after-close ?
>
> > +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
> > Warning
> > +Warn about code paths in which a read on write in performed on a
> > closed
> > file descriptor.
>
> Typo "read on write"; maybe replace with "Warn about code paths in
> which a file descriptor is used after being closed." ?
>
>
>
> [...snip...]
>
> > +
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > new file mode 100644
> > index 00000000000..23e79e3e16a
> > --- /dev/null
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -0,0 +1,697 @@
> > +/* FIXME: add copyright header. */
>
> Obviously the FIXME needs fixing :)
>
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tree.h"
> > +#include "function.h"
> > +#include "basic-block.h"
> > +#include "gimple.h"
> > +#include "options.h"
> > +#include "diagnostic-path.h"
> > +#include "diagnostic-metadata.h"
> > +#include "function.h"
> > +#include "json.h"
> > +#include "analyzer/analyzer.h"
> > +#include "diagnostic-event-id.h"
> > +#include "analyzer/analyzer-logging.h"
> > +#include "analyzer/sm.h"
> > +#include "analyzer/pending-diagnostic.h"
> > +#include "analyzer/function-set.h"
> > +#include "analyzer/analyzer-selftests.h"
> > +#include "tristate.h"
> > +#include "selftest.h"
> > +#include "analyzer/call-string.h"
> > +#include "analyzer/program-point.h"
> > +#include "analyzer/store.h"
> > +#include "analyzer/region-model.h"
> > +
> > +#include <unistd.h>
>
> Presumably you're including <unistd.h> here for the definitions of
> O_RDONLY and O_WRONLY.
>
> As we discussed off-list, I think we want to avoid the use of the
> host's O_RDONLY and O_WRONLY in this file, as they might not have the
> same values as on the target, and I'm not sure that the host is even
> guaranteed to have a <unistd.h>.
>
> I think we want our own enum representing the three access modes:
> READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting
> target flags to access mode.
>
> So ultimately that's something we want to fix, though exactly how, I'm
> not quite sure; we presumably want to look up the target's definitions
> of those macros - but I don't think the analyzer has access to the
> cpp_reader instance from the frontend.
>
>
>
> > +#if ENABLE_ANALYZER
> > +
> > +namespace ana
> > +{
> > +
> > +namespace
> > +{
> > +class fd_state_machine : public state_machine
> > +{
> > +public:
> > +  fd_state_machine (logger *logger);
> > +
> > +  bool
> > +  inherited_state_p () const final override
> > +  {
> > +    return false;
> > +  }
> > +
> > +  state_machine::state_t
> > +  get_default_state (const svalue *sval) const final override
> > +  {
> > +    if (tree cst = sval->maybe_get_constant ())
> > +      {
> > +        int val = TREE_INT_CST_LOW (cst);
> > +        if (val < 0)
> > +          {
> > +            return m_null;
>
> Can we rename the invalid state to "m_invalid"?  I find m_null
> confusing.
>
> > +          }
> > +      }
> > +    return m_start;
> > +  }
> > +
> > +  bool on_stmt (sm_context *sm_ctxt, const supernode *node,
> > +                const gimple *stmt) const final override;
> > +
> > +  void on_condition (sm_context *sm_ctxt, const supernode *node,
> > +                     const gimple *stmt, const svalue *lhs, const
> > tree_code op,
> > +                     const svalue *rhs) const final override;
> > +
> > +  bool can_purge_p (state_t s) const final override;
> > +  pending_diagnostic *on_leak (tree var) const final override;
> > +
> > +  bool is_unchecked (state_t s) const;
> > +  bool is_valid (state_t s) const;
> > +
> > +  /* State for a file descriptor that hasn't been checked for
> > validity. */
> > +  state_t m_unchecked_read_write;
> > +
> > +  /* State for a file descriptor opened with O_RDONLY flag. */
> > +  state_t m_unchecked_read_only;
> > +
> > +  /* State for a file descriptor opneded with O_WRONLY flag. */
> > +  state_t m_unchecked_write_only;
>
> Typo: opneded -> opened.
>
> Maybe group these three states in the source like this:
>
>   /* States representing a file descriptor that hasn't yet been
>      checked for validity after opening, for three different
>      access modes.  */
>   state_t m_unchecked_read_write;
>   state_t m_unchecked_read_only;
>   state_t m_unchecked_write_only;
>
> > +
> > +  /* State for a file descriptor that is known to be invalid. */
> > +  state_t m_null;
>
> Let's rename this to m_invalid;
> The condition is that it's < 0, right?  If so, please update the
> comment to clarify that.
>
> > +
> > +  /* State for a file descriptor that was opened in read-write mode
> > and is
> > known
> > +   * to be valid. */
> > +  state_t m_valid_read_write;
> > +
> > +  /* State for a file descriptor that was opened as read-only and is
> > known
> > to be
> > +   * valid*/
> > +  state_t m_valid_read_only;
> > +
> > +  /* State for a file descriptor that was opened as write-only and is
> > known to
> > +   * be valid*/
> > +  state_t m_valid_write_only;
>
> Maybe group these three states in a similar way to above.
>
>
> > +
> > +  /* State for a file descriptor that has been closed.*/
> > +  state_t m_closed;
> > +
> > +  /* State for a file descriptor that we do not want to track anymore
> > . */
> > +  state_t m_stop;
> > +
> > +private:
> > +  void on_open (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                const gcall *call) const;
> > +  void on_close (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                 const gcall *call) const;
> > +  void on_read (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                const gcall *call) const;
> > +  void on_write (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                 const gcall *call) const;
> > +};
> > +
> > +/* Base diagnostic class relative to fd_state_machine. */
> > +class fd_diagnostic : public pending_diagnostic
> > +{
> > +public:
> > +  fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm),
> > m_arg
> > (arg)
> > +  {
> > +  }
> > +
> > +  bool
> > +  subclass_equal_p (const pending_diagnostic &base_other) const
> > override
> > +  {
> > +    return same_tree_p (m_arg, ((const fd_diagnostic
> > &)base_other).m_arg);
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (change.m_old_state == m_sm.get_start_state ()
> > +        && m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        if (change.m_new_state == m_sm.m_unchecked_read_write)
> > +          return change.formatted_print ("opened here as %<read-
> > write%>");
> > +
> > +        if (change.m_new_state == m_sm.m_unchecked_read_only)
> > +          return change.formatted_print ("opened here as %<read-
> > only%>");
> > +
> > +        if (change.m_new_state == m_sm.m_unchecked_write_only)
> > +          return change.formatted_print ("opened here as %<write-
> > only%>");
> > +      }
> > +
> > +    if (m_sm.is_unchecked (change.m_old_state)
> > +        && m_sm.is_valid (change.m_new_state))
> > +      if (change.m_expr)
> > +        return change.formatted_print (
> > +            "assuming %qE is a valid file descriptor", change.m_expr);
> > +      else
> > +        return change.formatted_print ("assuming a valid file
> > descriptor");
>
> Maybe reword to "when %qE is a valid file descriptor (>= 0)" ?
>
>
> > +
> > +    if (m_sm.is_unchecked (change.m_old_state)
> > +        && change.m_new_state == m_sm.m_null)
> > +      if (change.m_expr)
> > +        return change.formatted_print (
> > +            "assuming %qE is an invalid file descriptor",
> > change.m_expr);
> > +      else
> > +        return change.formatted_print ("assuming an invalid file
> > descriptor");
>
> Maybe reword to "when %qE is not a valid file descriptor (< 0)" ?
>
>
> > +
> > +    return label_text ();
> > +  }
> > +
> > +protected:
> > +  const fd_state_machine &m_sm;
> > +  tree m_arg;
> > +};
> > +
> > +class fd_leak : public fd_diagnostic
> > +{
> > +public:
> > +  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
> > arg)
> > {}
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "fd_leak";
> > +  }
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_file_descriptor_leak;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    /*CWE-775: Missing Release of File Descriptor or Handle after
> > Effective
> > +     * Lifetime
> > +     */
> > +    diagnostic_metadata m;
> > +    m.add_cwe (775);
> > +    if (m_arg)
> > +      return warning_meta (rich_loc, m, get_controlling_option (),
> > +                           "leak of file descriptor %qE", m_arg);
> > +    else
> > +      return warning_meta (rich_loc, m, get_controlling_option (),
> > +                           "leak of file descriptor");
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) final
> > override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        m_open_event = change.m_event_id;
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_open_event.known_p ())
> > +      {
> > +        if (ev.m_expr)
> > +          return ev.formatted_print ("%qE leaks here; was opened at
> > %@",
> > +                                     ev.m_expr, &m_open_event);
> > +        else
> > +          return ev.formatted_print ("leaks here; was opened at %@",
> > +                                     &m_open_event);
> > +      }
> > +    else
> > +      {
> > +        if (ev.m_expr)
> > +          return ev.formatted_print ("%qE leaks here", ev.m_expr);
> > +        else
> > +          return ev.formatted_print ("leaks here");
> > +      }
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_open_event;
> > +};
> > +
> > +class read_write_diag_fd : public fd_diagnostic
>
> Maybe rename to fd_access_mode_mismatch ?
>
>
> > +{
> > +public:
> > +  /* mode specfies whether read on write was attempted or vice versa.
> > +   */
>
> Make "mode" be an enum rather than an int.
>
> > +  read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
> > +      : m_mode (mode), fd_diagnostic (sm, arg)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "read_write_diag_fd";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      {
> > +        return warning_at (rich_loc, get_controlling_option (),
> > +                           "%<write%> on %<read-only%> file descriptor
> > %qE",
> > +                           m_arg);
> > +      }
> > +    else
> > +      {
> > +        return warning_at (rich_loc, get_controlling_option (),
> > +                           "%<read%> on %<write-only%> file descriptor
> > %qE",
> > +                           m_arg);
>
> Rather than hardcoding "%<write%>" and "%<read%>", let's generalize
> this so that the diagnostic has a "tree fndecl" argument and m_fndecl
> field, which was the function that was called that attempted to used
> the fd in the wrong way.  You can then write %qE to print it in quotes.
>
> > +      }
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
>
> Looks like this override is redundant (albeit harmless): it can
> implicitly inherit the fd_diagnostic implementation.
>
>
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_mode)
> > +      return ev.formatted_print ("%qs on %<read-only%> file descriptor
> > %qE",
> > +                                 "write", m_arg);
> > +    else
> > +      return ev.formatted_print ("%qs on %<write-only%> file
> > descriptor
> > %qE",
> > +                                 "read", m_arg);
> > +  }
>
> and generalize this also to work on m_fndecl.
>
> > +
> > +private:
> > +  int m_mode;
>
> As noted above, please make this an enum.
>
>
> > +};
> > +
> > +class double_close : public fd_diagnostic
> > +{
> > +public:
> > +  double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic
> > (sm,
> > arg)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "double_close";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_double_close;
> > +  }
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    diagnostic_metadata m;
> > +    // CWE-1341: Multiple Releases of Same Resource or Handle
> > +    m.add_cwe (1341);
> > +    return warning_meta (rich_loc, m, get_controlling_option (),
> > +                         "double %<close%> of file descriptor %qE",
> > m_arg);
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      return label_text::borrow ("opened here");
> > +
> > +    if (change.m_new_state == m_sm.m_closed)
> > +      {
> > +        m_first_close_event = change.m_event_id;
> > +        return change.formatted_print ("first %qs here", "close");
> > +      }
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_first_close_event.known_p ())
> > +      return ev.formatted_print ("second %qs here; first %qs was at
> > %@",
> > +                                 "close", "close",
> > &m_first_close_event);
> > +    return ev.formatted_print ("second %qs here", "close");
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_first_close_event;
> > +};
> > +
> > +class read_write_on_closed_fd : public fd_diagnostic
>
> Maybe rename to "use_after_close_of_fd" or somesuch.
>
> > +{
> > +public:
> > +  read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int
> > mode)
> > +      : fd_diagnostic (sm, arg), m_mode (mode)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    if (m_mode)
> > +      return "write_on_closed_fd";
> > +    else
> > +      return "read_on_closed_fd";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<write%> on closed file descriptor %qE",
> > m_arg);
> > +    else
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<read%> on closed file descriptor %qE",
> > m_arg);
> > +  }
>
> As above, maybe generalize this by adding a "tree m_fndecl;" field to
> the class, giving what function was called on the the closed fd.  I
> think you might be able to eliminate m_mode from this subclass by doing
> that instead.
>
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    if (change.m_new_state == m_sm.m_closed)
> > +      {
> > +        return change.formatted_print ("closed here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_mode)
> > +      return ev.formatted_print ("%<write%> on closed file descriptor
> > %qE
> > here",
> > +                                 m_arg);
> > +    else
> > +      return ev.formatted_print ("%<read%> on closed file descriptor
> > %qE
> > here",
> > +                                 m_arg);
> > +  }
>
> Likewise here.
>
> > +
> > +private:
> > +  int m_mode;
> > +};
> > +
> > +class possible_invalid_fd_diag : public fd_diagnostic
>
> Rename to unchecked_use_of_fd ?
>
> > +{
> > +public:
> > +  possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int
> > mode)
> > +      : fd_diagnostic (sm, arg), m_mode (mode)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "possible_invalid_fd_diag";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_possible_invalid_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<write%> on possibly invalid file
> > descriptor
> > %qE",
> > +                         m_arg);
> > +    else
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<read%> on possibly invalid file descriptor
> > %qE",
> > +                         m_arg);
> > +  }
>
> As above, generalize this with a "tree m_fndecl;" capturing what fndecl
> was called on the fd without checking, and with that, I think you can
> eliminate the m_mode field.
>
>
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        m_first_open_event = change.m_event_id;
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    return ev.formatted_print ("%qE could be invalid: unchecked value
> > from
> > %@",
> > +                               m_arg, &m_first_open_event);
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_first_open_event;
> > +  int m_mode;
> > +};
> > +
> > +fd_state_machine::fd_state_machine (logger *logger)
> > +    : state_machine ("file-descriptor", logger),
> > +      m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
> > +      m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
> > +      m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
> > +      m_null (add_state ("fd-null")), m_closed (add_state ("fd-
> > closed")),
> > +      m_valid_read_write (add_state ("fd-valid-read-write")),
> > +      m_valid_read_only (add_state ("fd-valid-read-only")),
> > +      m_valid_write_only (add_state ("fd-valid-write-only")),
> > +      m_stop (add_state ("fd-stop"))
> > +{
> > +}
> > +
> > +bool
> > +fd_state_machine::is_unchecked (state_t s) const
> > +{
> > +  return s == m_unchecked_read_write || s == m_unchecked_read_only
> > +         || s == m_unchecked_write_only;
>
> With multiline conditionals, once they occupy multiple lines, it's more
> readable to split them like this:
>
>   return (s == m_unchecked_read_write
>           || s == m_unchecked_read_only
>           || s == m_unchecked_write_only);
>
>
> > +}
> > +
> > +bool
> > +fd_state_machine::is_valid (state_t s) const
> > +{
> > +  return s == m_valid_read_write || s == m_valid_read_only
> > +         || s == m_valid_write_only;
> > +}
>
> Similar here.
>
> > +
> > +bool
> > +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt) const
> > +{
> > +  if (const gcall *call = dyn_cast<const gcall *> (stmt))
> > +    if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
> > +      {
> > +        if (is_named_call_p (callee_fndecl, "open", call, 2))
> > +          {
> > +            on_open (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } //  "open"
> > +
> > +        if (is_named_call_p (callee_fndecl, "close", call, 1))
> > +          {
> > +            on_close (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } //  "close"
> > +
> > +        if (is_named_call_p (callee_fndecl, "write", call, 3))
> > +          {
> > +            on_write (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } // "write"
> > +
> > +        if (is_named_call_p (callee_fndecl, "read", call, 3))
> > +          {
> > +            on_read (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } // "read"
> > +
>
> You'll probably want to pass callee_fndecl to some of these "on_FNNAME"
> subroutines, so that it's available for the new m_fndecl fields of the
> warnings I suggested above.
>
>
> > +        return true;
> > +      }
> > +
> > +  return false;
> > +}
> > +
> > +void
> > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree lhs = gimple_call_lhs (call);
> > +  if (lhs)
> > +    {
> > +      tree arg = gimple_call_arg (call, 1);
> > +      if (TREE_CODE (arg) == INTEGER_CST)
> > +        {
> > +          int flag = TREE_INT_CST_LOW (arg);
> > +          /* FIXME: this code assumes the access modes on the host and
> > +          target are the same, which in practice might not be the
> > case. */
> > +          if (flag == O_RDONLY)
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_read_only);
> > +          else if (flag == O_WRONLY)
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_write_only);
> > +          else
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_read_write);
>
> Let's introduce an enum for the access mode, and a subroutine to get
> the enum value from "flag", so that we can isolate this "FIXME" in that
> subroutine.
>
> > +        }
> > +    }
> > +  else
> > +    {
> > +      /* FIXME: add leak reporting */
>
> Do you have a testcase that exhibits this behavior?
>
>
> > +    }
> > +}
> > +
> > +void
> > +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode
> > *node,
> > +                            const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +
> > +  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> > +
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only,
> > m_closed);
>
>
> > +  sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);
>
> Perhaps we should warn for operations on known invalid FDs? (and thus
> complain about close on a known-invalid FD).
>
> I believe that the OS will handle such code gracefully by setting errno
> to EBADF.
>
>
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +      sm_ctxt->warn (node, stmt, arg, new double_close (*this,
> > diag_arg));
> > +      sm_ctxt->set_next_state (stmt, arg, m_stop);
> > +    }
> > +}
> > +void
> > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd
> > (*this,
> > diag_arg, 0));
> > +    }
>
> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params.  If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions).
>
> > +
> > +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new possible_invalid_fd_diag (*this, diag_arg,
> > 0));
> > +    }
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
> > +      || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new read_write_diag_fd (*this, diag_arg, 0));
> > +    }
> > +}
> > +void
> > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode
> > *node,
> > +                            const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
> > diag_arg, 1));
> > +    }
> > +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new possible_invalid_fd_diag (*this, diag_arg,
> > 1));
> > +    }
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
> > +      || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
> > +    {
> > +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new read_write_diag_fd (*this, diag_arg, 1));
> > +    }
> > +}
> > +
> > +void
> > +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode
> > *node,
> > +                                const gimple *stmt, const svalue *lhs,
> > +                                enum tree_code op, const svalue *rhs)
> > const
> > +{
> > +  if (!rhs->all_zeroes_p ())
> > +    return;
> > +
> > +  if (op == GE_EXPR)
> > +    {
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > +                              m_valid_read_write);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > +                              m_valid_read_only);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > +                              m_valid_write_only);
> > +    }
> > +  else if (op == LT_EXPR)
> > +    {
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > m_null);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > m_null);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > m_null);
> > +    }
> > +}
> > +
> > +bool
> > +fd_state_machine::can_purge_p (state_t s) const
> > +{
> > +  if (is_unchecked (s) || is_valid (s))
> > +    return false;
> > +  else
> > +    return true;
> > +}
> > +
> > +pending_diagnostic *
> > +fd_state_machine::on_leak (tree var) const
> > +{
> > +  return new fd_leak (*this, var);
> > +}
> > +} // namespace
> > +
> > +state_machine *
> > +make_fd_state_machine (logger *logger)
> > +{
> > +  return new fd_state_machine (logger);
> > +}
> > +} // namespace ana
> > +
> > +#endif // ENABLE_ANALYZER
> > \ No newline at end of file
>
> [...snip...]
>
> >
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > new file mode 100644
> > index 00000000000..985e9ac75de
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > @@ -0,0 +1,26 @@
> > +int open(const char *, int mode);
> > +#define O_RDONLY 0
> > +#define O_WRONLY 1
> > +#define O_RDWR 2
>
> As noted above, we want to somehow use the values for the O_ macros
> from the test code, rather than the <unistd.h> with which sm-fd.cc was
> built.
>
> > +
> > +void
> > +test_1 (const char *path)
> > +{
> > +  int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened
> > here" }
> > */
> > +  return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event"
> > { target *-*-* } .-1 } */
> > +}
> > +
> > +void
> > +test_2 (const char *path)
> > +{
> > +  int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here"
> > } */
> > +  if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
> > descriptor" "event1" } */
> > +  /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
> > 0'\\)..." "event2" { target *-*-* } .-1 } */
> > +  {
> > +    return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > +    /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
> > } */
> > +    /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event2" { target *-*-* } .-2 } */
> > +  }
> > +}
>
> Looks good, but in future don't feel the need to express all of the
> events: I think the significant ones we should test for here are (1),
> (2), (5), whereas specifying directives for (3) and (4) are probably
> overkill.  prune.exp will prune away stray lines beginning with "note:
> " so we don't need to write directives for all events, just the ones we
> care about.
>
>
> [...snip...]
>
> Lots of good test coverage here.  With a state-machine based
> diagnostic, it's probably best to try to have coverage of all possible
> calls in all possible states - that way we can think systematically
> about what we should be warning for, and what we shouldn't be warning
> for.   Note that once you've got test coverage of the basic events
> (like you have), you don't have to have dg-message directives for all
> of the events in all of the rest of the tests (that would be excessive,
> and too much work to create and maintain).
>
> A few other nits:
> - the Subject line for the commit should have an  "analyzer: " prefix
> and reference the bugzilla problem report (PR 106003)
> - for better or worse, the project requires ChangeLog entries.  You can
> use the script ./contrib/mklog.py to help generate this.  Some more
> information is here:
>   https://www.gnu.org/prep/standards/html_node/Change-Logs.html
> In GCC our ChangeLog entries are part of the git commit message, and we
> have a serverside script to update the ChangeLog files in the source
> tree (to avoid constantly having merge conflicts).
>
> I think we'll have at least one more iteration of this patch to review,
> and before I can approve you to push a later version of the patch to
> trunk, you'll need to "fully" test it, by doing a bootstrap, and
> running the full testsuite on the result.  Do you need some help on how
> to do that?  Do you have access to the GCC compile farm yet?
>
> I'll see if I can figure out how to access the preprocessor from the
> analyzer.
>
> Thanks again for the patch; hope the above makes sense and is
> constructive
>
> Dave
>
>
>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] static analysis support for posix file desccriptor APIs
  2022-06-23 18:28   ` Mir Immad
@ 2022-06-24 18:05     ` David Malcolm
  0 siblings, 0 replies; 8+ messages in thread
From: David Malcolm @ 2022-06-24 18:05 UTC (permalink / raw)
  To: Mir Immad, gcc

On Thu, 2022-06-23 at 23:58 +0530, Mir Immad wrote:
>  Hi Dave,
> Thanks for the suggestions,
> 
> I changed most of the things that you suggested, however reporting
> for
> warnings like close of known invalid fd was problematic:
> 
> consider the following code:
> 
> if (fd >= 0)
> { write (fd,...); }
> close(fd);
> 
> As I was checking the exploded graph for this; the "close(fd)" stmt
> when
> visited by the FALSE edge of if stmt (fd < 0) finds fd to be in
> m_invalid
> state; hence warns about "close on known invalid fd" which I believe
> is not
> true as fd at that point is not *known* to be invalid. I spent quite
> some
> time on this and decided not to add this diagnosis for now.

That choice seems reasonable to me.

> 
> Also, when close transitions m_invalid to m_close; this leads to
> double
> close even when the second close is outside the ( < 0 ) condition
> which
> again does not seem right.
> if (fd < 0)
> close(fd):
> close(fd); // double close here.

"close" on an invalid FD doesn't give you a closed FD, you still have
an invalid FD.

By analogy with sm-malloc.cc: free (NULL) doesn't make NULL a freed
pointer, it's still NULL, and thus just a no-op.

(although in this case:  close(invalid)  will also set errno to EBADF,
so it's not strictly a no-op).


> 
> > Maybe consolidate on_read and on_write by adding a subroutine that
> > checks for m_closed, and for checking access mode (maybe a
> > "check_for_open_fd" that takes an access mode enum value amongst
> > other
> > params.  If you pass around caller_fndecl, I think much of this
> > code
> > can be shared that way between on_read and on_write (which will
> > help
> > simplify things if we want to support further functions)
> 
> I hope I got this right.
> 
> 
> > > +        }
> > > +    }
> > > +  else
> > > +    {
> >  >+      /* FIXME: add leak reporting */
> 
> > Do you have a testcase that exhibits this behavior?
> 
> I was thinking of the following case:
> void test()
> {
>  open(..);
> }
> Here the resources are leaked because there is no way to free them.

Please add a test case for this, marking it with xfail if need be.

But hopefully it will be easy to implement the FIXME here, with an:
   sm->on_warn (new fd_leak ());
or somesuch (not sure if there's a "tree var" you can give it for this
case though, might have to use NULL).

> 
> In "read" and "write" funcs, I'm warning for unchecked_use_of_fd and
> access_mode_mismatch only when we know fd is not in closed state.
> Otherwise, such code leads to lot of irrelevant warnings, example:
> 
> void test1(const char *path, void *buf) {
>   int fd = open(path, O_RDONLY);
>   if (fd >= 0)
>   {
>   close(fd);
>   read(fd, buf, 1); // read on closed fd + read on possibly invalid
> fd
>   write(fd, buf, 1); // write on closed fd + write on possibly invlid
> fd
>   }
> }

Should we transition the FD to the "stop" state after the first
warning?  That way we ought to avoid a bunch of followup warnings for
that FD.

> 
> 
> Adding docs for new options still remains pending. I added some new
> tests;
> and all tests are passing. The stuff about O_* macros is left as-is.
> 
>  I'm sending a patch in another email.
> 
> Thanks a lot.

Thanks
Dave



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-06-24 18:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-21 16:30 [PATCH] static analysis support for posix file desccriptor APIs Mir Immad
2022-06-21 16:30 ` Mir Immad
2022-06-21 18:34 ` David Malcolm
2022-06-21 18:50   ` Joseph Myers
2022-06-21 20:31     ` David Malcolm
2022-06-21 21:18       ` Joseph Myers
2022-06-23 18:28   ` Mir Immad
2022-06-24 18:05     ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).