From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mirimnan017@gmail.com>
Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com
 [IPv6:2607:f8b0:4864:20::102f])
 by sourceware.org (Postfix) with ESMTPS id DA0A6385AE70
 for <gcc@gcc.gnu.org>; Thu, 23 Jun 2022 18:29:05 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DA0A6385AE70
Received: by mail-pj1-x102f.google.com with SMTP id cv13so393260pjb.4
 for <gcc@gcc.gnu.org>; Thu, 23 Jun 2022 11:29:05 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to;
 bh=rdsL9ZdySsKGZiNc86QQ8gDzHITp8UDZCjVixYQkPTM=;
 b=gQNpAsd48iGs3D2ZLSglF55LIbzDFvGf8eRAJ18lVrcyxqbnc+dztGrO5x6kUDXl7d
 fKESXwQax8c16BsXCYvSwLHjmDlgHceJWR3eV0cxAfVuj3gj7OnEdRptNFWe3J5fMDtz
 OpJ2jwHtB++Gyam/jVdGO4/uOcvWH9flOGfsRKxIz5Lwe5OPcGc3pe5n0bB3vnhuA5T2
 H7wV3alPWQkLdLCyD3yQj0b/P3QNWfduApSeEu9tNJYyZ5yYkB2NEW0wCS+KqvJC8HgA
 noQ33RemMYQrB0x+FZPiPzFXv3B/btiSK8V0ALPaaseWjVzSOLWTXFtrzaGTehRzpHad
 3oRg==
X-Gm-Message-State: AJIora/+emRcuvxR7Q+rMFjSw2w9LhTQffP9bgg05UsCZlv3KEXA7dLS
 xhfWEOszOwvQzmxzzcbqFZJIL+orQoI2dwj/qC+8AfAG8jBd8g==
X-Google-Smtp-Source: AGRyM1trVTaHJKZ+rafbOFmEZo2spxeq/L044pe09ZNuIbJGQiHs8pLHlDqk8HIiG4A3y1t+bQ0mEC6uQPwOLDQsufg=
X-Received: by 2002:a17:90b:3b82:b0:1ec:ac0d:5d3b with SMTP id
 pc2-20020a17090b3b8200b001ecac0d5d3bmr5412623pjb.158.1656008944660; Thu, 23
 Jun 2022 11:29:04 -0700 (PDT)
MIME-Version: 1.0
References: <CAE1-7oz4Oi+p3+iKf+H0+OGEL+vfAHvT17fYM-0Vz-r0-f5R5Q@mail.gmail.com>
 <bef3049406cdac2fdb5edf915bfc5636c6be0a8e.camel@redhat.com>
In-Reply-To: <bef3049406cdac2fdb5edf915bfc5636c6be0a8e.camel@redhat.com>
From: Mir Immad <mirimnan017@gmail.com>
Date: Thu, 23 Jun 2022 23:58:53 +0530
Message-ID: <CAE1-7oxk=t7AzdBLB8+u+eU+5TDHYFbZ1-wtwdQJSRkC8iu9ug@mail.gmail.com>
Subject: Re: [PATCH] static analysis support for posix file desccriptor APIs
To: David Malcolm <dmalcolm@redhat.com>, gcc@gcc.gnu.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FROM, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Jun 2022 18:29:14 -0000

 Hi Dave,
Thanks for the suggestions,

I changed most of the things that you suggested, however reporting for
warnings like close of known invalid fd was problematic:

consider the following code:

if (fd >= 0)
{ write (fd,...); }
close(fd);

As I was checking the exploded graph for this; the "close(fd)" stmt when
visited by the FALSE edge of if stmt (fd < 0) finds fd to be in m_invalid
state; hence warns about "close on known invalid fd" which I believe is not
true as fd at that point is not *known* to be invalid. I spent quite some
time on this and decided not to add this diagnosis for now.

Also, when close transitions m_invalid to m_close; this leads to double
close even when the second close is outside the ( < 0 ) condition which
again does not seem right.
if (fd < 0)
close(fd):
close(fd); // double close here.

> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params.  If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions)

I hope I got this right.


> > +        }
> > +    }
> > +  else
> > +    {
>  >+      /* FIXME: add leak reporting */

> Do you have a testcase that exhibits this behavior?

I was thinking of the following case:
void test()
{
 open(..);
}
Here the resources are leaked because there is no way to free them.

In "read" and "write" funcs, I'm warning for unchecked_use_of_fd and
access_mode_mismatch only when we know fd is not in closed state.
Otherwise, such code leads to lot of irrelevant warnings, example:

void test1(const char *path, void *buf) {
  int fd = open(path, O_RDONLY);
  if (fd >= 0)
  {
  close(fd);
  read(fd, buf, 1); // read on closed fd + read on possibly invalid fd
  write(fd, buf, 1); // write on closed fd + write on possibly invlid fd
  }
}


Adding docs for new options still remains pending. I added some new tests;
and all tests are passing. The stuff about O_* macros is left as-is.

 I'm sending a patch in another email.

Thanks a lot.

On Wed, Jun 22, 2022 at 12:04 AM David Malcolm <dmalcolm@redhat.com> wrote:

> Hi Immad, thanks for this patch.
>
> Overall, looks like you're making good progress.
>
> Various notes and nitpicks inline below, throughout...
>
> On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote:
>
> [...snip...]
>
> >
> > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> > index 23dfc797cea..d99acfbb069 100644
> > --- a/gcc/analyzer/analyzer.opt
> > +++ b/gcc/analyzer/analyzer.opt
> > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function
> > for
> > the analyzer to consider
> >  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
> > Init(200) Param
> >  The maximum depth of exploded nodes that should appear in a dot dump
> > before switching to a less verbose format.
>
> I'm not yet sure if this is a good idea, but I wonder if all of these
> warnings ought to have a "-Wanalyzer-fd-" prefix?  "file-descriptor" is
> rather long, and the analyzer's warnings are already tending to be on
> the long side, and having a consistent prefix might make it easier for
> users to grok them.
>
> (though this feels like a "what color do we paint the bikeshed?" issue;
> see e.g. https://bikeshed.org/ )
>
> >
> > +Wanalyzer-double-close
> > +Common Var(warn_analyzer_double_close) Init(1) Warning
> > +Warn about code paths in which a file descriptor can be closed more
> > than
> > once.
>
> ...so this could be Wanalyzer-fd-double-close
>
> > +
> >  Wanalyzer-double-fclose
> >  Common Var(warn_analyzer_double_fclose) Init(1) Warning
> >  Warn about code paths in which a stdio FILE can be closed more than
> > once.
> > @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
> >  Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
> >  Warn about code paths in which sensitive data is written to a file.
> >
> > +Wanalyzer-file-descriptor-leak
> > +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
> > +Warn about code paths in which a file descriptor is not closed.
>
> ...and Wanalyzer-fd-leak
>
> > +
> >  Wanalyzer-file-leak
> >  Common Var(warn_analyzer_file_leak) Init(1) Warning
> >  Warn about code paths in which a stdio FILE is not closed.
> > @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
> >  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
> >  Warn about code paths in which the wrong deallocation function is
> > called.
> >
> > +Wanalyzer-mismatching-operation-on-file-descriptor
> > +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor)
> > Init(1)
> > Warning
> > +Warn about the code paths in which read on write-only file descriptor
> > or
> > write on read-only file descriptor is called.
>
> Maybe:
>   Wanalyzer-fd-access-mode-mismatch
> ?
>
> Lose the "the" in "the code paths", so maybe: "Warn about code paths in
> which a write is attempted on a read-only file descriptor, or vice-
> versa."
>
>
> > +
> > +Wanalyzer-possible-invalid-file-descriptor
> > +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1)
> > Warning
> > +warn about code paths in which a possibly invalid file descriptor is
> > passed to a must-be-a-valid file descriptor function argument.
> > +
>
> Wanalyzer-fd-use-without-check ?
>
> FWIW I believe that this isn't strictly speaking a security issue, that
> if code blithely assumes the fd is open and then tries to read/write
> it, the access will fail with errno set to EBADF.  But it seems worth
> warning for; code that doesn't bother implementing error-checking seems
> suspect to me.
>
>
>
> >
> > +Wanalyzer-read-write-on-closed-file-descriptor
>
> Maybe: Wanalyzer-fd-use-after-close ?
>
> > +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
> > Warning
> > +Warn about code paths in which a read on write in performed on a
> > closed
> > file descriptor.
>
> Typo "read on write"; maybe replace with "Warn about code paths in
> which a file descriptor is used after being closed." ?
>
>
>
> [...snip...]
>
> > +
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > new file mode 100644
> > index 00000000000..23e79e3e16a
> > --- /dev/null
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -0,0 +1,697 @@
> > +/* FIXME: add copyright header. */
>
> Obviously the FIXME needs fixing :)
>
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tree.h"
> > +#include "function.h"
> > +#include "basic-block.h"
> > +#include "gimple.h"
> > +#include "options.h"
> > +#include "diagnostic-path.h"
> > +#include "diagnostic-metadata.h"
> > +#include "function.h"
> > +#include "json.h"
> > +#include "analyzer/analyzer.h"
> > +#include "diagnostic-event-id.h"
> > +#include "analyzer/analyzer-logging.h"
> > +#include "analyzer/sm.h"
> > +#include "analyzer/pending-diagnostic.h"
> > +#include "analyzer/function-set.h"
> > +#include "analyzer/analyzer-selftests.h"
> > +#include "tristate.h"
> > +#include "selftest.h"
> > +#include "analyzer/call-string.h"
> > +#include "analyzer/program-point.h"
> > +#include "analyzer/store.h"
> > +#include "analyzer/region-model.h"
> > +
> > +#include <unistd.h>
>
> Presumably you're including <unistd.h> here for the definitions of
> O_RDONLY and O_WRONLY.
>
> As we discussed off-list, I think we want to avoid the use of the
> host's O_RDONLY and O_WRONLY in this file, as they might not have the
> same values as on the target, and I'm not sure that the host is even
> guaranteed to have a <unistd.h>.
>
> I think we want our own enum representing the three access modes:
> READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting
> target flags to access mode.
>
> So ultimately that's something we want to fix, though exactly how, I'm
> not quite sure; we presumably want to look up the target's definitions
> of those macros - but I don't think the analyzer has access to the
> cpp_reader instance from the frontend.
>
>
>
> > +#if ENABLE_ANALYZER
> > +
> > +namespace ana
> > +{
> > +
> > +namespace
> > +{
> > +class fd_state_machine : public state_machine
> > +{
> > +public:
> > +  fd_state_machine (logger *logger);
> > +
> > +  bool
> > +  inherited_state_p () const final override
> > +  {
> > +    return false;
> > +  }
> > +
> > +  state_machine::state_t
> > +  get_default_state (const svalue *sval) const final override
> > +  {
> > +    if (tree cst = sval->maybe_get_constant ())
> > +      {
> > +        int val = TREE_INT_CST_LOW (cst);
> > +        if (val < 0)
> > +          {
> > +            return m_null;
>
> Can we rename the invalid state to "m_invalid"?  I find m_null
> confusing.
>
> > +          }
> > +      }
> > +    return m_start;
> > +  }
> > +
> > +  bool on_stmt (sm_context *sm_ctxt, const supernode *node,
> > +                const gimple *stmt) const final override;
> > +
> > +  void on_condition (sm_context *sm_ctxt, const supernode *node,
> > +                     const gimple *stmt, const svalue *lhs, const
> > tree_code op,
> > +                     const svalue *rhs) const final override;
> > +
> > +  bool can_purge_p (state_t s) const final override;
> > +  pending_diagnostic *on_leak (tree var) const final override;
> > +
> > +  bool is_unchecked (state_t s) const;
> > +  bool is_valid (state_t s) const;
> > +
> > +  /* State for a file descriptor that hasn't been checked for
> > validity. */
> > +  state_t m_unchecked_read_write;
> > +
> > +  /* State for a file descriptor opened with O_RDONLY flag. */
> > +  state_t m_unchecked_read_only;
> > +
> > +  /* State for a file descriptor opneded with O_WRONLY flag. */
> > +  state_t m_unchecked_write_only;
>
> Typo: opneded -> opened.
>
> Maybe group these three states in the source like this:
>
>   /* States representing a file descriptor that hasn't yet been
>      checked for validity after opening, for three different
>      access modes.  */
>   state_t m_unchecked_read_write;
>   state_t m_unchecked_read_only;
>   state_t m_unchecked_write_only;
>
> > +
> > +  /* State for a file descriptor that is known to be invalid. */
> > +  state_t m_null;
>
> Let's rename this to m_invalid;
> The condition is that it's < 0, right?  If so, please update the
> comment to clarify that.
>
> > +
> > +  /* State for a file descriptor that was opened in read-write mode
> > and is
> > known
> > +   * to be valid. */
> > +  state_t m_valid_read_write;
> > +
> > +  /* State for a file descriptor that was opened as read-only and is
> > known
> > to be
> > +   * valid*/
> > +  state_t m_valid_read_only;
> > +
> > +  /* State for a file descriptor that was opened as write-only and is
> > known to
> > +   * be valid*/
> > +  state_t m_valid_write_only;
>
> Maybe group these three states in a similar way to above.
>
>
> > +
> > +  /* State for a file descriptor that has been closed.*/
> > +  state_t m_closed;
> > +
> > +  /* State for a file descriptor that we do not want to track anymore
> > . */
> > +  state_t m_stop;
> > +
> > +private:
> > +  void on_open (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                const gcall *call) const;
> > +  void on_close (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                 const gcall *call) const;
> > +  void on_read (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                const gcall *call) const;
> > +  void on_write (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > +                 const gcall *call) const;
> > +};
> > +
> > +/* Base diagnostic class relative to fd_state_machine. */
> > +class fd_diagnostic : public pending_diagnostic
> > +{
> > +public:
> > +  fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm),
> > m_arg
> > (arg)
> > +  {
> > +  }
> > +
> > +  bool
> > +  subclass_equal_p (const pending_diagnostic &base_other) const
> > override
> > +  {
> > +    return same_tree_p (m_arg, ((const fd_diagnostic
> > &)base_other).m_arg);
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (change.m_old_state == m_sm.get_start_state ()
> > +        && m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        if (change.m_new_state == m_sm.m_unchecked_read_write)
> > +          return change.formatted_print ("opened here as %<read-
> > write%>");
> > +
> > +        if (change.m_new_state == m_sm.m_unchecked_read_only)
> > +          return change.formatted_print ("opened here as %<read-
> > only%>");
> > +
> > +        if (change.m_new_state == m_sm.m_unchecked_write_only)
> > +          return change.formatted_print ("opened here as %<write-
> > only%>");
> > +      }
> > +
> > +    if (m_sm.is_unchecked (change.m_old_state)
> > +        && m_sm.is_valid (change.m_new_state))
> > +      if (change.m_expr)
> > +        return change.formatted_print (
> > +            "assuming %qE is a valid file descriptor", change.m_expr);
> > +      else
> > +        return change.formatted_print ("assuming a valid file
> > descriptor");
>
> Maybe reword to "when %qE is a valid file descriptor (>= 0)" ?
>
>
> > +
> > +    if (m_sm.is_unchecked (change.m_old_state)
> > +        && change.m_new_state == m_sm.m_null)
> > +      if (change.m_expr)
> > +        return change.formatted_print (
> > +            "assuming %qE is an invalid file descriptor",
> > change.m_expr);
> > +      else
> > +        return change.formatted_print ("assuming an invalid file
> > descriptor");
>
> Maybe reword to "when %qE is not a valid file descriptor (< 0)" ?
>
>
> > +
> > +    return label_text ();
> > +  }
> > +
> > +protected:
> > +  const fd_state_machine &m_sm;
> > +  tree m_arg;
> > +};
> > +
> > +class fd_leak : public fd_diagnostic
> > +{
> > +public:
> > +  fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
> > arg)
> > {}
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "fd_leak";
> > +  }
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_file_descriptor_leak;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    /*CWE-775: Missing Release of File Descriptor or Handle after
> > Effective
> > +     * Lifetime
> > +     */
> > +    diagnostic_metadata m;
> > +    m.add_cwe (775);
> > +    if (m_arg)
> > +      return warning_meta (rich_loc, m, get_controlling_option (),
> > +                           "leak of file descriptor %qE", m_arg);
> > +    else
> > +      return warning_meta (rich_loc, m, get_controlling_option (),
> > +                           "leak of file descriptor");
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) final
> > override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        m_open_event = change.m_event_id;
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_open_event.known_p ())
> > +      {
> > +        if (ev.m_expr)
> > +          return ev.formatted_print ("%qE leaks here; was opened at
> > %@",
> > +                                     ev.m_expr, &m_open_event);
> > +        else
> > +          return ev.formatted_print ("leaks here; was opened at %@",
> > +                                     &m_open_event);
> > +      }
> > +    else
> > +      {
> > +        if (ev.m_expr)
> > +          return ev.formatted_print ("%qE leaks here", ev.m_expr);
> > +        else
> > +          return ev.formatted_print ("leaks here");
> > +      }
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_open_event;
> > +};
> > +
> > +class read_write_diag_fd : public fd_diagnostic
>
> Maybe rename to fd_access_mode_mismatch ?
>
>
> > +{
> > +public:
> > +  /* mode specfies whether read on write was attempted or vice versa.
> > +   */
>
> Make "mode" be an enum rather than an int.
>
> > +  read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
> > +      : m_mode (mode), fd_diagnostic (sm, arg)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "read_write_diag_fd";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      {
> > +        return warning_at (rich_loc, get_controlling_option (),
> > +                           "%<write%> on %<read-only%> file descriptor
> > %qE",
> > +                           m_arg);
> > +      }
> > +    else
> > +      {
> > +        return warning_at (rich_loc, get_controlling_option (),
> > +                           "%<read%> on %<write-only%> file descriptor
> > %qE",
> > +                           m_arg);
>
> Rather than hardcoding "%<write%>" and "%<read%>", let's generalize
> this so that the diagnostic has a "tree fndecl" argument and m_fndecl
> field, which was the function that was called that attempted to used
> the fd in the wrong way.  You can then write %qE to print it in quotes.
>
> > +      }
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
>
> Looks like this override is redundant (albeit harmless): it can
> implicitly inherit the fd_diagnostic implementation.
>
>
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_mode)
> > +      return ev.formatted_print ("%qs on %<read-only%> file descriptor
> > %qE",
> > +                                 "write", m_arg);
> > +    else
> > +      return ev.formatted_print ("%qs on %<write-only%> file
> > descriptor
> > %qE",
> > +                                 "read", m_arg);
> > +  }
>
> and generalize this also to work on m_fndecl.
>
> > +
> > +private:
> > +  int m_mode;
>
> As noted above, please make this an enum.
>
>
> > +};
> > +
> > +class double_close : public fd_diagnostic
> > +{
> > +public:
> > +  double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic
> > (sm,
> > arg)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "double_close";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_double_close;
> > +  }
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    diagnostic_metadata m;
> > +    // CWE-1341: Multiple Releases of Same Resource or Handle
> > +    m.add_cwe (1341);
> > +    return warning_meta (rich_loc, m, get_controlling_option (),
> > +                         "double %<close%> of file descriptor %qE",
> > m_arg);
> > +  }
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      return label_text::borrow ("opened here");
> > +
> > +    if (change.m_new_state == m_sm.m_closed)
> > +      {
> > +        m_first_close_event = change.m_event_id;
> > +        return change.formatted_print ("first %qs here", "close");
> > +      }
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_first_close_event.known_p ())
> > +      return ev.formatted_print ("second %qs here; first %qs was at
> > %@",
> > +                                 "close", "close",
> > &m_first_close_event);
> > +    return ev.formatted_print ("second %qs here", "close");
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_first_close_event;
> > +};
> > +
> > +class read_write_on_closed_fd : public fd_diagnostic
>
> Maybe rename to "use_after_close_of_fd" or somesuch.
>
> > +{
> > +public:
> > +  read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int
> > mode)
> > +      : fd_diagnostic (sm, arg), m_mode (mode)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    if (m_mode)
> > +      return "write_on_closed_fd";
> > +    else
> > +      return "read_on_closed_fd";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<write%> on closed file descriptor %qE",
> > m_arg);
> > +    else
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<read%> on closed file descriptor %qE",
> > m_arg);
> > +  }
>
> As above, maybe generalize this by adding a "tree m_fndecl;" field to
> the class, giving what function was called on the the closed fd.  I
> think you might be able to eliminate m_mode from this subclass by doing
> that instead.
>
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    if (change.m_new_state == m_sm.m_closed)
> > +      {
> > +        return change.formatted_print ("closed here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    if (m_mode)
> > +      return ev.formatted_print ("%<write%> on closed file descriptor
> > %qE
> > here",
> > +                                 m_arg);
> > +    else
> > +      return ev.formatted_print ("%<read%> on closed file descriptor
> > %qE
> > here",
> > +                                 m_arg);
> > +  }
>
> Likewise here.
>
> > +
> > +private:
> > +  int m_mode;
> > +};
> > +
> > +class possible_invalid_fd_diag : public fd_diagnostic
>
> Rename to unchecked_use_of_fd ?
>
> > +{
> > +public:
> > +  possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int
> > mode)
> > +      : fd_diagnostic (sm, arg), m_mode (mode)
> > +  {
> > +  }
> > +
> > +  const char *
> > +  get_kind () const final override
> > +  {
> > +    return "possible_invalid_fd_diag";
> > +  }
> > +
> > +
> > +  int
> > +  get_controlling_option () const final override
> > +  {
> > +    return OPT_Wanalyzer_possible_invalid_file_descriptor;
> > +  }
> > +
> > +  bool
> > +  emit (rich_location *rich_loc) final override
> > +  {
> > +    if (m_mode)
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<write%> on possibly invalid file
> > descriptor
> > %qE",
> > +                         m_arg);
> > +    else
> > +      return warning_at (rich_loc, get_controlling_option (),
> > +                         "%<read%> on possibly invalid file descriptor
> > %qE",
> > +                         m_arg);
> > +  }
>
> As above, generalize this with a "tree m_fndecl;" capturing what fndecl
> was called on the fd without checking, and with that, I think you can
> eliminate the m_mode field.
>
>
> > +
> > +  label_text
> > +  describe_state_change (const evdesc::state_change &change) override
> > +  {
> > +    if (m_sm.is_unchecked (change.m_new_state))
> > +      {
> > +        m_first_open_event = change.m_event_id;
> > +        return label_text::borrow ("opened here");
> > +      }
> > +
> > +    return fd_diagnostic::describe_state_change (change);
> > +  }
> > +
> > +  label_text
> > +  describe_final_event (const evdesc::final_event &ev) final override
> > +  {
> > +    return ev.formatted_print ("%qE could be invalid: unchecked value
> > from
> > %@",
> > +                               m_arg, &m_first_open_event);
> > +  }
> > +
> > +private:
> > +  diagnostic_event_id_t m_first_open_event;
> > +  int m_mode;
> > +};
> > +
> > +fd_state_machine::fd_state_machine (logger *logger)
> > +    : state_machine ("file-descriptor", logger),
> > +      m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
> > +      m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
> > +      m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
> > +      m_null (add_state ("fd-null")), m_closed (add_state ("fd-
> > closed")),
> > +      m_valid_read_write (add_state ("fd-valid-read-write")),
> > +      m_valid_read_only (add_state ("fd-valid-read-only")),
> > +      m_valid_write_only (add_state ("fd-valid-write-only")),
> > +      m_stop (add_state ("fd-stop"))
> > +{
> > +}
> > +
> > +bool
> > +fd_state_machine::is_unchecked (state_t s) const
> > +{
> > +  return s == m_unchecked_read_write || s == m_unchecked_read_only
> > +         || s == m_unchecked_write_only;
>
> With multiline conditionals, once they occupy multiple lines, it's more
> readable to split them like this:
>
>   return (s == m_unchecked_read_write
>           || s == m_unchecked_read_only
>           || s == m_unchecked_write_only);
>
>
> > +}
> > +
> > +bool
> > +fd_state_machine::is_valid (state_t s) const
> > +{
> > +  return s == m_valid_read_write || s == m_valid_read_only
> > +         || s == m_valid_write_only;
> > +}
>
> Similar here.
>
> > +
> > +bool
> > +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt) const
> > +{
> > +  if (const gcall *call = dyn_cast<const gcall *> (stmt))
> > +    if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
> > +      {
> > +        if (is_named_call_p (callee_fndecl, "open", call, 2))
> > +          {
> > +            on_open (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } //  "open"
> > +
> > +        if (is_named_call_p (callee_fndecl, "close", call, 1))
> > +          {
> > +            on_close (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } //  "close"
> > +
> > +        if (is_named_call_p (callee_fndecl, "write", call, 3))
> > +          {
> > +            on_write (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } // "write"
> > +
> > +        if (is_named_call_p (callee_fndecl, "read", call, 3))
> > +          {
> > +            on_read (sm_ctxt, node, stmt, call);
> > +            return true;
> > +          } // "read"
> > +
>
> You'll probably want to pass callee_fndecl to some of these "on_FNNAME"
> subroutines, so that it's available for the new m_fndecl fields of the
> warnings I suggested above.
>
>
> > +        return true;
> > +      }
> > +
> > +  return false;
> > +}
> > +
> > +void
> > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree lhs = gimple_call_lhs (call);
> > +  if (lhs)
> > +    {
> > +      tree arg = gimple_call_arg (call, 1);
> > +      if (TREE_CODE (arg) == INTEGER_CST)
> > +        {
> > +          int flag = TREE_INT_CST_LOW (arg);
> > +          /* FIXME: this code assumes the access modes on the host and
> > +          target are the same, which in practice might not be the
> > case. */
> > +          if (flag == O_RDONLY)
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_read_only);
> > +          else if (flag == O_WRONLY)
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_write_only);
> > +          else
> > +            sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > +                                    m_unchecked_read_write);
>
> Let's introduce an enum for the access mode, and a subroutine to get
> the enum value from "flag", so that we can isolate this "FIXME" in that
> subroutine.
>
> > +        }
> > +    }
> > +  else
> > +    {
> > +      /* FIXME: add leak reporting */
>
> Do you have a testcase that exhibits this behavior?
>
>
> > +    }
> > +}
> > +
> > +void
> > +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode
> > *node,
> > +                            const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +
> > +  sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> > +
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only,
> > m_closed);
> > +  sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only,
> > m_closed);
>
>
> > +  sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);
>
> Perhaps we should warn for operations on known invalid FDs? (and thus
> complain about close on a known-invalid FD).
>
> I believe that the OS will handle such code gracefully by setting errno
> to EBADF.
>
>
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +      sm_ctxt->warn (node, stmt, arg, new double_close (*this,
> > diag_arg));
> > +      sm_ctxt->set_next_state (stmt, arg, m_stop);
> > +    }
> > +}
> > +void
> > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> > +                           const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd
> > (*this,
> > diag_arg, 0));
> > +    }
>
> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params.  If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions).
>
> > +
> > +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new possible_invalid_fd_diag (*this, diag_arg,
> > 0));
> > +    }
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
> > +      || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new read_write_diag_fd (*this, diag_arg, 0));
> > +    }
> > +}
> > +void
> > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode
> > *node,
> > +                            const gimple *stmt, const gcall *call)
> > const
> > +{
> > +  tree arg = gimple_call_arg (call, 0);
> > +  tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
> > diag_arg, 1));
> > +    }
> > +  if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > +    {
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new possible_invalid_fd_diag (*this, diag_arg,
> > 1));
> > +    }
> > +
> > +  if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
> > +      || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
> > +    {
> > +      tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +      sm_ctxt->warn (node, stmt, arg,
> > +                     new read_write_diag_fd (*this, diag_arg, 1));
> > +    }
> > +}
> > +
> > +void
> > +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode
> > *node,
> > +                                const gimple *stmt, const svalue *lhs,
> > +                                enum tree_code op, const svalue *rhs)
> > const
> > +{
> > +  if (!rhs->all_zeroes_p ())
> > +    return;
> > +
> > +  if (op == GE_EXPR)
> > +    {
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > +                              m_valid_read_write);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > +                              m_valid_read_only);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > +                              m_valid_write_only);
> > +    }
> > +  else if (op == LT_EXPR)
> > +    {
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > m_null);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > m_null);
> > +      sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > m_null);
> > +    }
> > +}
> > +
> > +bool
> > +fd_state_machine::can_purge_p (state_t s) const
> > +{
> > +  if (is_unchecked (s) || is_valid (s))
> > +    return false;
> > +  else
> > +    return true;
> > +}
> > +
> > +pending_diagnostic *
> > +fd_state_machine::on_leak (tree var) const
> > +{
> > +  return new fd_leak (*this, var);
> > +}
> > +} // namespace
> > +
> > +state_machine *
> > +make_fd_state_machine (logger *logger)
> > +{
> > +  return new fd_state_machine (logger);
> > +}
> > +} // namespace ana
> > +
> > +#endif // ENABLE_ANALYZER
> > \ No newline at end of file
>
> [...snip...]
>
> >
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > new file mode 100644
> > index 00000000000..985e9ac75de
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > @@ -0,0 +1,26 @@
> > +int open(const char *, int mode);
> > +#define O_RDONLY 0
> > +#define O_WRONLY 1
> > +#define O_RDWR 2
>
> As noted above, we want to somehow use the values for the O_ macros
> from the test code, rather than the <unistd.h> with which sm-fd.cc was
> built.
>
> > +
> > +void
> > +test_1 (const char *path)
> > +{
> > +  int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened
> > here" }
> > */
> > +  return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event"
> > { target *-*-* } .-1 } */
> > +}
> > +
> > +void
> > +test_2 (const char *path)
> > +{
> > +  int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here"
> > } */
> > +  if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
> > descriptor" "event1" } */
> > +  /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
> > 0'\\)..." "event2" { target *-*-* } .-1 } */
> > +  {
> > +    return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > +    /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
> > } */
> > +    /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event2" { target *-*-* } .-2 } */
> > +  }
> > +}
>
> Looks good, but in future don't feel the need to express all of the
> events: I think the significant ones we should test for here are (1),
> (2), (5), whereas specifying directives for (3) and (4) are probably
> overkill.  prune.exp will prune away stray lines beginning with "note:
> " so we don't need to write directives for all events, just the ones we
> care about.
>
>
> [...snip...]
>
> Lots of good test coverage here.  With a state-machine based
> diagnostic, it's probably best to try to have coverage of all possible
> calls in all possible states - that way we can think systematically
> about what we should be warning for, and what we shouldn't be warning
> for.   Note that once you've got test coverage of the basic events
> (like you have), you don't have to have dg-message directives for all
> of the events in all of the rest of the tests (that would be excessive,
> and too much work to create and maintain).
>
> A few other nits:
> - the Subject line for the commit should have an  "analyzer: " prefix
> and reference the bugzilla problem report (PR 106003)
> - for better or worse, the project requires ChangeLog entries.  You can
> use the script ./contrib/mklog.py to help generate this.  Some more
> information is here:
>   https://www.gnu.org/prep/standards/html_node/Change-Logs.html
> In GCC our ChangeLog entries are part of the git commit message, and we
> have a serverside script to update the ChangeLog files in the source
> tree (to avoid constantly having merge conflicts).
>
> I think we'll have at least one more iteration of this patch to review,
> and before I can approve you to push a later version of the patch to
> trunk, you'll need to "fully" test it, by doing a bootstrap, and
> running the full testsuite on the result.  Do you need some help on how
> to do that?  Do you have access to the GCC compile farm yet?
>
> I'll see if I can figure out how to access the preprocessor from the
> analyzer.
>
> Thanks again for the patch; hope the above makes sense and is
> constructive
>
> Dave
>
>
>
>
>