From: Mir Immad <mirimnan017@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>, gcc@gcc.gnu.org
Subject: Re: [PATCH] static analysis support for posix file desccriptor APIs
Date: Thu, 23 Jun 2022 23:58:53 +0530 [thread overview]
Message-ID: <CAE1-7oxk=t7AzdBLB8+u+eU+5TDHYFbZ1-wtwdQJSRkC8iu9ug@mail.gmail.com> (raw)
In-Reply-To: <bef3049406cdac2fdb5edf915bfc5636c6be0a8e.camel@redhat.com>
Hi Dave,
Thanks for the suggestions,
I changed most of the things that you suggested, however reporting for
warnings like close of known invalid fd was problematic:
consider the following code:
if (fd >= 0)
{ write (fd,...); }
close(fd);
As I was checking the exploded graph for this; the "close(fd)" stmt when
visited by the FALSE edge of if stmt (fd < 0) finds fd to be in m_invalid
state; hence warns about "close on known invalid fd" which I believe is not
true as fd at that point is not *known* to be invalid. I spent quite some
time on this and decided not to add this diagnosis for now.
Also, when close transitions m_invalid to m_close; this leads to double
close even when the second close is outside the ( < 0 ) condition which
again does not seem right.
if (fd < 0)
close(fd):
close(fd); // double close here.
> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params. If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions)
I hope I got this right.
> > + }
> > + }
> > + else
> > + {
> >+ /* FIXME: add leak reporting */
> Do you have a testcase that exhibits this behavior?
I was thinking of the following case:
void test()
{
open(..);
}
Here the resources are leaked because there is no way to free them.
In "read" and "write" funcs, I'm warning for unchecked_use_of_fd and
access_mode_mismatch only when we know fd is not in closed state.
Otherwise, such code leads to lot of irrelevant warnings, example:
void test1(const char *path, void *buf) {
int fd = open(path, O_RDONLY);
if (fd >= 0)
{
close(fd);
read(fd, buf, 1); // read on closed fd + read on possibly invalid fd
write(fd, buf, 1); // write on closed fd + write on possibly invlid fd
}
}
Adding docs for new options still remains pending. I added some new tests;
and all tests are passing. The stuff about O_* macros is left as-is.
I'm sending a patch in another email.
Thanks a lot.
On Wed, Jun 22, 2022 at 12:04 AM David Malcolm <dmalcolm@redhat.com> wrote:
> Hi Immad, thanks for this patch.
>
> Overall, looks like you're making good progress.
>
> Various notes and nitpicks inline below, throughout...
>
> On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote:
>
> [...snip...]
>
> >
> > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> > index 23dfc797cea..d99acfbb069 100644
> > --- a/gcc/analyzer/analyzer.opt
> > +++ b/gcc/analyzer/analyzer.opt
> > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function
> > for
> > the analyzer to consider
> > Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump)
> > Init(200) Param
> > The maximum depth of exploded nodes that should appear in a dot dump
> > before switching to a less verbose format.
>
> I'm not yet sure if this is a good idea, but I wonder if all of these
> warnings ought to have a "-Wanalyzer-fd-" prefix? "file-descriptor" is
> rather long, and the analyzer's warnings are already tending to be on
> the long side, and having a consistent prefix might make it easier for
> users to grok them.
>
> (though this feels like a "what color do we paint the bikeshed?" issue;
> see e.g. https://bikeshed.org/ )
>
> >
> > +Wanalyzer-double-close
> > +Common Var(warn_analyzer_double_close) Init(1) Warning
> > +Warn about code paths in which a file descriptor can be closed more
> > than
> > once.
>
> ...so this could be Wanalyzer-fd-double-close
>
> > +
> > Wanalyzer-double-fclose
> > Common Var(warn_analyzer_double_fclose) Init(1) Warning
> > Warn about code paths in which a stdio FILE can be closed more than
> > once.
> > @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file
> > Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning
> > Warn about code paths in which sensitive data is written to a file.
> >
> > +Wanalyzer-file-descriptor-leak
> > +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning
> > +Warn about code paths in which a file descriptor is not closed.
>
> ...and Wanalyzer-fd-leak
>
> > +
> > Wanalyzer-file-leak
> > Common Var(warn_analyzer_file_leak) Init(1) Warning
> > Warn about code paths in which a stdio FILE is not closed.
> > @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation
> > Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
> > Warn about code paths in which the wrong deallocation function is
> > called.
> >
> > +Wanalyzer-mismatching-operation-on-file-descriptor
> > +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor)
> > Init(1)
> > Warning
> > +Warn about the code paths in which read on write-only file descriptor
> > or
> > write on read-only file descriptor is called.
>
> Maybe:
> Wanalyzer-fd-access-mode-mismatch
> ?
>
> Lose the "the" in "the code paths", so maybe: "Warn about code paths in
> which a write is attempted on a read-only file descriptor, or vice-
> versa."
>
>
> > +
> > +Wanalyzer-possible-invalid-file-descriptor
> > +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1)
> > Warning
> > +warn about code paths in which a possibly invalid file descriptor is
> > passed to a must-be-a-valid file descriptor function argument.
> > +
>
> Wanalyzer-fd-use-without-check ?
>
> FWIW I believe that this isn't strictly speaking a security issue, that
> if code blithely assumes the fd is open and then tries to read/write
> it, the access will fail with errno set to EBADF. But it seems worth
> warning for; code that doesn't bother implementing error-checking seems
> suspect to me.
>
>
>
> >
> > +Wanalyzer-read-write-on-closed-file-descriptor
>
> Maybe: Wanalyzer-fd-use-after-close ?
>
> > +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1)
> > Warning
> > +Warn about code paths in which a read on write in performed on a
> > closed
> > file descriptor.
>
> Typo "read on write"; maybe replace with "Warn about code paths in
> which a file descriptor is used after being closed." ?
>
>
>
> [...snip...]
>
> > +
> > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
> > new file mode 100644
> > index 00000000000..23e79e3e16a
> > --- /dev/null
> > +++ b/gcc/analyzer/sm-fd.cc
> > @@ -0,0 +1,697 @@
> > +/* FIXME: add copyright header. */
>
> Obviously the FIXME needs fixing :)
>
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tree.h"
> > +#include "function.h"
> > +#include "basic-block.h"
> > +#include "gimple.h"
> > +#include "options.h"
> > +#include "diagnostic-path.h"
> > +#include "diagnostic-metadata.h"
> > +#include "function.h"
> > +#include "json.h"
> > +#include "analyzer/analyzer.h"
> > +#include "diagnostic-event-id.h"
> > +#include "analyzer/analyzer-logging.h"
> > +#include "analyzer/sm.h"
> > +#include "analyzer/pending-diagnostic.h"
> > +#include "analyzer/function-set.h"
> > +#include "analyzer/analyzer-selftests.h"
> > +#include "tristate.h"
> > +#include "selftest.h"
> > +#include "analyzer/call-string.h"
> > +#include "analyzer/program-point.h"
> > +#include "analyzer/store.h"
> > +#include "analyzer/region-model.h"
> > +
> > +#include <unistd.h>
>
> Presumably you're including <unistd.h> here for the definitions of
> O_RDONLY and O_WRONLY.
>
> As we discussed off-list, I think we want to avoid the use of the
> host's O_RDONLY and O_WRONLY in this file, as they might not have the
> same values as on the target, and I'm not sure that the host is even
> guaranteed to have a <unistd.h>.
>
> I think we want our own enum representing the three access modes:
> READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting
> target flags to access mode.
>
> So ultimately that's something we want to fix, though exactly how, I'm
> not quite sure; we presumably want to look up the target's definitions
> of those macros - but I don't think the analyzer has access to the
> cpp_reader instance from the frontend.
>
>
>
> > +#if ENABLE_ANALYZER
> > +
> > +namespace ana
> > +{
> > +
> > +namespace
> > +{
> > +class fd_state_machine : public state_machine
> > +{
> > +public:
> > + fd_state_machine (logger *logger);
> > +
> > + bool
> > + inherited_state_p () const final override
> > + {
> > + return false;
> > + }
> > +
> > + state_machine::state_t
> > + get_default_state (const svalue *sval) const final override
> > + {
> > + if (tree cst = sval->maybe_get_constant ())
> > + {
> > + int val = TREE_INT_CST_LOW (cst);
> > + if (val < 0)
> > + {
> > + return m_null;
>
> Can we rename the invalid state to "m_invalid"? I find m_null
> confusing.
>
> > + }
> > + }
> > + return m_start;
> > + }
> > +
> > + bool on_stmt (sm_context *sm_ctxt, const supernode *node,
> > + const gimple *stmt) const final override;
> > +
> > + void on_condition (sm_context *sm_ctxt, const supernode *node,
> > + const gimple *stmt, const svalue *lhs, const
> > tree_code op,
> > + const svalue *rhs) const final override;
> > +
> > + bool can_purge_p (state_t s) const final override;
> > + pending_diagnostic *on_leak (tree var) const final override;
> > +
> > + bool is_unchecked (state_t s) const;
> > + bool is_valid (state_t s) const;
> > +
> > + /* State for a file descriptor that hasn't been checked for
> > validity. */
> > + state_t m_unchecked_read_write;
> > +
> > + /* State for a file descriptor opened with O_RDONLY flag. */
> > + state_t m_unchecked_read_only;
> > +
> > + /* State for a file descriptor opneded with O_WRONLY flag. */
> > + state_t m_unchecked_write_only;
>
> Typo: opneded -> opened.
>
> Maybe group these three states in the source like this:
>
> /* States representing a file descriptor that hasn't yet been
> checked for validity after opening, for three different
> access modes. */
> state_t m_unchecked_read_write;
> state_t m_unchecked_read_only;
> state_t m_unchecked_write_only;
>
> > +
> > + /* State for a file descriptor that is known to be invalid. */
> > + state_t m_null;
>
> Let's rename this to m_invalid;
> The condition is that it's < 0, right? If so, please update the
> comment to clarify that.
>
> > +
> > + /* State for a file descriptor that was opened in read-write mode
> > and is
> > known
> > + * to be valid. */
> > + state_t m_valid_read_write;
> > +
> > + /* State for a file descriptor that was opened as read-only and is
> > known
> > to be
> > + * valid*/
> > + state_t m_valid_read_only;
> > +
> > + /* State for a file descriptor that was opened as write-only and is
> > known to
> > + * be valid*/
> > + state_t m_valid_write_only;
>
> Maybe group these three states in a similar way to above.
>
>
> > +
> > + /* State for a file descriptor that has been closed.*/
> > + state_t m_closed;
> > +
> > + /* State for a file descriptor that we do not want to track anymore
> > . */
> > + state_t m_stop;
> > +
> > +private:
> > + void on_open (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > + const gcall *call) const;
> > + void on_close (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > + const gcall *call) const;
> > + void on_read (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > + const gcall *call) const;
> > + void on_write (sm_context *sm_ctxt, const supernode *node, const
> > gimple
> > *stmt,
> > + const gcall *call) const;
> > +};
> > +
> > +/* Base diagnostic class relative to fd_state_machine. */
> > +class fd_diagnostic : public pending_diagnostic
> > +{
> > +public:
> > + fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm),
> > m_arg
> > (arg)
> > + {
> > + }
> > +
> > + bool
> > + subclass_equal_p (const pending_diagnostic &base_other) const
> > override
> > + {
> > + return same_tree_p (m_arg, ((const fd_diagnostic
> > &)base_other).m_arg);
> > + }
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) override
> > + {
> > + if (change.m_old_state == m_sm.get_start_state ()
> > + && m_sm.is_unchecked (change.m_new_state))
> > + {
> > + if (change.m_new_state == m_sm.m_unchecked_read_write)
> > + return change.formatted_print ("opened here as %<read-
> > write%>");
> > +
> > + if (change.m_new_state == m_sm.m_unchecked_read_only)
> > + return change.formatted_print ("opened here as %<read-
> > only%>");
> > +
> > + if (change.m_new_state == m_sm.m_unchecked_write_only)
> > + return change.formatted_print ("opened here as %<write-
> > only%>");
> > + }
> > +
> > + if (m_sm.is_unchecked (change.m_old_state)
> > + && m_sm.is_valid (change.m_new_state))
> > + if (change.m_expr)
> > + return change.formatted_print (
> > + "assuming %qE is a valid file descriptor", change.m_expr);
> > + else
> > + return change.formatted_print ("assuming a valid file
> > descriptor");
>
> Maybe reword to "when %qE is a valid file descriptor (>= 0)" ?
>
>
> > +
> > + if (m_sm.is_unchecked (change.m_old_state)
> > + && change.m_new_state == m_sm.m_null)
> > + if (change.m_expr)
> > + return change.formatted_print (
> > + "assuming %qE is an invalid file descriptor",
> > change.m_expr);
> > + else
> > + return change.formatted_print ("assuming an invalid file
> > descriptor");
>
> Maybe reword to "when %qE is not a valid file descriptor (< 0)" ?
>
>
> > +
> > + return label_text ();
> > + }
> > +
> > +protected:
> > + const fd_state_machine &m_sm;
> > + tree m_arg;
> > +};
> > +
> > +class fd_leak : public fd_diagnostic
> > +{
> > +public:
> > + fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm,
> > arg)
> > {}
> > +
> > + const char *
> > + get_kind () const final override
> > + {
> > + return "fd_leak";
> > + }
> > +
> > + int
> > + get_controlling_option () const final override
> > + {
> > + return OPT_Wanalyzer_file_descriptor_leak;
> > + }
> > +
> > + bool
> > + emit (rich_location *rich_loc) final override
> > + {
> > + /*CWE-775: Missing Release of File Descriptor or Handle after
> > Effective
> > + * Lifetime
> > + */
> > + diagnostic_metadata m;
> > + m.add_cwe (775);
> > + if (m_arg)
> > + return warning_meta (rich_loc, m, get_controlling_option (),
> > + "leak of file descriptor %qE", m_arg);
> > + else
> > + return warning_meta (rich_loc, m, get_controlling_option (),
> > + "leak of file descriptor");
> > + }
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) final
> > override
> > + {
> > + if (m_sm.is_unchecked (change.m_new_state))
> > + {
> > + m_open_event = change.m_event_id;
> > + return label_text::borrow ("opened here");
> > + }
> > +
> > + return fd_diagnostic::describe_state_change (change);
> > + }
> > +
> > + label_text
> > + describe_final_event (const evdesc::final_event &ev) final override
> > + {
> > + if (m_open_event.known_p ())
> > + {
> > + if (ev.m_expr)
> > + return ev.formatted_print ("%qE leaks here; was opened at
> > %@",
> > + ev.m_expr, &m_open_event);
> > + else
> > + return ev.formatted_print ("leaks here; was opened at %@",
> > + &m_open_event);
> > + }
> > + else
> > + {
> > + if (ev.m_expr)
> > + return ev.formatted_print ("%qE leaks here", ev.m_expr);
> > + else
> > + return ev.formatted_print ("leaks here");
> > + }
> > + }
> > +
> > +private:
> > + diagnostic_event_id_t m_open_event;
> > +};
> > +
> > +class read_write_diag_fd : public fd_diagnostic
>
> Maybe rename to fd_access_mode_mismatch ?
>
>
> > +{
> > +public:
> > + /* mode specfies whether read on write was attempted or vice versa.
> > + */
>
> Make "mode" be an enum rather than an int.
>
> > + read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode)
> > + : m_mode (mode), fd_diagnostic (sm, arg)
> > + {
> > + }
> > +
> > + const char *
> > + get_kind () const final override
> > + {
> > + return "read_write_diag_fd";
> > + }
> > +
> > +
> > + int
> > + get_controlling_option () const final override
> > + {
> > + return OPT_Wanalyzer_mismatching_operation_on_file_descriptor;
> > + }
> > +
> > + bool
> > + emit (rich_location *rich_loc) final override
> > + {
> > + if (m_mode)
> > + {
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<write%> on %<read-only%> file descriptor
> > %qE",
> > + m_arg);
> > + }
> > + else
> > + {
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<read%> on %<write-only%> file descriptor
> > %qE",
> > + m_arg);
>
> Rather than hardcoding "%<write%>" and "%<read%>", let's generalize
> this so that the diagnostic has a "tree fndecl" argument and m_fndecl
> field, which was the function that was called that attempted to used
> the fd in the wrong way. You can then write %qE to print it in quotes.
>
> > + }
> > + }
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) override
> > + {
> > + return fd_diagnostic::describe_state_change (change);
> > + }
>
> Looks like this override is redundant (albeit harmless): it can
> implicitly inherit the fd_diagnostic implementation.
>
>
> > +
> > + label_text
> > + describe_final_event (const evdesc::final_event &ev) final override
> > + {
> > + if (m_mode)
> > + return ev.formatted_print ("%qs on %<read-only%> file descriptor
> > %qE",
> > + "write", m_arg);
> > + else
> > + return ev.formatted_print ("%qs on %<write-only%> file
> > descriptor
> > %qE",
> > + "read", m_arg);
> > + }
>
> and generalize this also to work on m_fndecl.
>
> > +
> > +private:
> > + int m_mode;
>
> As noted above, please make this an enum.
>
>
> > +};
> > +
> > +class double_close : public fd_diagnostic
> > +{
> > +public:
> > + double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic
> > (sm,
> > arg)
> > + {
> > + }
> > +
> > + const char *
> > + get_kind () const final override
> > + {
> > + return "double_close";
> > + }
> > +
> > +
> > + int
> > + get_controlling_option () const final override
> > + {
> > + return OPT_Wanalyzer_double_close;
> > + }
> > + bool
> > + emit (rich_location *rich_loc) final override
> > + {
> > + diagnostic_metadata m;
> > + // CWE-1341: Multiple Releases of Same Resource or Handle
> > + m.add_cwe (1341);
> > + return warning_meta (rich_loc, m, get_controlling_option (),
> > + "double %<close%> of file descriptor %qE",
> > m_arg);
> > + }
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) override
> > + {
> > + if (m_sm.is_unchecked (change.m_new_state))
> > + return label_text::borrow ("opened here");
> > +
> > + if (change.m_new_state == m_sm.m_closed)
> > + {
> > + m_first_close_event = change.m_event_id;
> > + return change.formatted_print ("first %qs here", "close");
> > + }
> > + return fd_diagnostic::describe_state_change (change);
> > + }
> > +
> > + label_text
> > + describe_final_event (const evdesc::final_event &ev) final override
> > + {
> > + if (m_first_close_event.known_p ())
> > + return ev.formatted_print ("second %qs here; first %qs was at
> > %@",
> > + "close", "close",
> > &m_first_close_event);
> > + return ev.formatted_print ("second %qs here", "close");
> > + }
> > +
> > +private:
> > + diagnostic_event_id_t m_first_close_event;
> > +};
> > +
> > +class read_write_on_closed_fd : public fd_diagnostic
>
> Maybe rename to "use_after_close_of_fd" or somesuch.
>
> > +{
> > +public:
> > + read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int
> > mode)
> > + : fd_diagnostic (sm, arg), m_mode (mode)
> > + {
> > + }
> > +
> > + const char *
> > + get_kind () const final override
> > + {
> > + if (m_mode)
> > + return "write_on_closed_fd";
> > + else
> > + return "read_on_closed_fd";
> > + }
> > +
> > +
> > + int
> > + get_controlling_option () const final override
> > + {
> > + return OPT_Wanalyzer_read_write_on_closed_file_descriptor;
> > + }
> > +
> > + bool
> > + emit (rich_location *rich_loc) final override
> > + {
> > + if (m_mode)
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<write%> on closed file descriptor %qE",
> > m_arg);
> > + else
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<read%> on closed file descriptor %qE",
> > m_arg);
> > + }
>
> As above, maybe generalize this by adding a "tree m_fndecl;" field to
> the class, giving what function was called on the the closed fd. I
> think you might be able to eliminate m_mode from this subclass by doing
> that instead.
>
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) override
> > + {
> > + if (m_sm.is_unchecked (change.m_new_state))
> > + {
> > + return label_text::borrow ("opened here");
> > + }
> > +
> > + if (change.m_new_state == m_sm.m_closed)
> > + {
> > + return change.formatted_print ("closed here");
> > + }
> > +
> > + return fd_diagnostic::describe_state_change (change);
> > + }
> > +
> > + label_text
> > + describe_final_event (const evdesc::final_event &ev) final override
> > + {
> > + if (m_mode)
> > + return ev.formatted_print ("%<write%> on closed file descriptor
> > %qE
> > here",
> > + m_arg);
> > + else
> > + return ev.formatted_print ("%<read%> on closed file descriptor
> > %qE
> > here",
> > + m_arg);
> > + }
>
> Likewise here.
>
> > +
> > +private:
> > + int m_mode;
> > +};
> > +
> > +class possible_invalid_fd_diag : public fd_diagnostic
>
> Rename to unchecked_use_of_fd ?
>
> > +{
> > +public:
> > + possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int
> > mode)
> > + : fd_diagnostic (sm, arg), m_mode (mode)
> > + {
> > + }
> > +
> > + const char *
> > + get_kind () const final override
> > + {
> > + return "possible_invalid_fd_diag";
> > + }
> > +
> > +
> > + int
> > + get_controlling_option () const final override
> > + {
> > + return OPT_Wanalyzer_possible_invalid_file_descriptor;
> > + }
> > +
> > + bool
> > + emit (rich_location *rich_loc) final override
> > + {
> > + if (m_mode)
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<write%> on possibly invalid file
> > descriptor
> > %qE",
> > + m_arg);
> > + else
> > + return warning_at (rich_loc, get_controlling_option (),
> > + "%<read%> on possibly invalid file descriptor
> > %qE",
> > + m_arg);
> > + }
>
> As above, generalize this with a "tree m_fndecl;" capturing what fndecl
> was called on the fd without checking, and with that, I think you can
> eliminate the m_mode field.
>
>
> > +
> > + label_text
> > + describe_state_change (const evdesc::state_change &change) override
> > + {
> > + if (m_sm.is_unchecked (change.m_new_state))
> > + {
> > + m_first_open_event = change.m_event_id;
> > + return label_text::borrow ("opened here");
> > + }
> > +
> > + return fd_diagnostic::describe_state_change (change);
> > + }
> > +
> > + label_text
> > + describe_final_event (const evdesc::final_event &ev) final override
> > + {
> > + return ev.formatted_print ("%qE could be invalid: unchecked value
> > from
> > %@",
> > + m_arg, &m_first_open_event);
> > + }
> > +
> > +private:
> > + diagnostic_event_id_t m_first_open_event;
> > + int m_mode;
> > +};
> > +
> > +fd_state_machine::fd_state_machine (logger *logger)
> > + : state_machine ("file-descriptor", logger),
> > + m_unchecked_read_write (add_state ("fd-unchecked-read-write")),
> > + m_unchecked_read_only (add_state ("fd-unchecked-read-only")),
> > + m_unchecked_write_only (add_state ("fd-unchecked-write-only")),
> > + m_null (add_state ("fd-null")), m_closed (add_state ("fd-
> > closed")),
> > + m_valid_read_write (add_state ("fd-valid-read-write")),
> > + m_valid_read_only (add_state ("fd-valid-read-only")),
> > + m_valid_write_only (add_state ("fd-valid-write-only")),
> > + m_stop (add_state ("fd-stop"))
> > +{
> > +}
> > +
> > +bool
> > +fd_state_machine::is_unchecked (state_t s) const
> > +{
> > + return s == m_unchecked_read_write || s == m_unchecked_read_only
> > + || s == m_unchecked_write_only;
>
> With multiline conditionals, once they occupy multiple lines, it's more
> readable to split them like this:
>
> return (s == m_unchecked_read_write
> || s == m_unchecked_read_only
> || s == m_unchecked_write_only);
>
>
> > +}
> > +
> > +bool
> > +fd_state_machine::is_valid (state_t s) const
> > +{
> > + return s == m_valid_read_write || s == m_valid_read_only
> > + || s == m_valid_write_only;
> > +}
>
> Similar here.
>
> > +
> > +bool
> > +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node,
> > + const gimple *stmt) const
> > +{
> > + if (const gcall *call = dyn_cast<const gcall *> (stmt))
> > + if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call))
> > + {
> > + if (is_named_call_p (callee_fndecl, "open", call, 2))
> > + {
> > + on_open (sm_ctxt, node, stmt, call);
> > + return true;
> > + } // "open"
> > +
> > + if (is_named_call_p (callee_fndecl, "close", call, 1))
> > + {
> > + on_close (sm_ctxt, node, stmt, call);
> > + return true;
> > + } // "close"
> > +
> > + if (is_named_call_p (callee_fndecl, "write", call, 3))
> > + {
> > + on_write (sm_ctxt, node, stmt, call);
> > + return true;
> > + } // "write"
> > +
> > + if (is_named_call_p (callee_fndecl, "read", call, 3))
> > + {
> > + on_read (sm_ctxt, node, stmt, call);
> > + return true;
> > + } // "read"
> > +
>
> You'll probably want to pass callee_fndecl to some of these "on_FNNAME"
> subroutines, so that it's available for the new m_fndecl fields of the
> warnings I suggested above.
>
>
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +void
> > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> > + const gimple *stmt, const gcall *call)
> > const
> > +{
> > + tree lhs = gimple_call_lhs (call);
> > + if (lhs)
> > + {
> > + tree arg = gimple_call_arg (call, 1);
> > + if (TREE_CODE (arg) == INTEGER_CST)
> > + {
> > + int flag = TREE_INT_CST_LOW (arg);
> > + /* FIXME: this code assumes the access modes on the host and
> > + target are the same, which in practice might not be the
> > case. */
> > + if (flag == O_RDONLY)
> > + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > + m_unchecked_read_only);
> > + else if (flag == O_WRONLY)
> > + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > + m_unchecked_write_only);
> > + else
> > + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> > + m_unchecked_read_write);
>
> Let's introduce an enum for the access mode, and a subroutine to get
> the enum value from "flag", so that we can isolate this "FIXME" in that
> subroutine.
>
> > + }
> > + }
> > + else
> > + {
> > + /* FIXME: add leak reporting */
>
> Do you have a testcase that exhibits this behavior?
>
>
> > + }
> > +}
> > +
> > +void
> > +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode
> > *node,
> > + const gimple *stmt, const gcall *call)
> > const
> > +{
> > + tree arg = gimple_call_arg (call, 0);
> > +
> > + sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed);
> > +
> > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write,
> > m_closed);
> > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only,
> > m_closed);
> > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only,
> > m_closed);
> > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write,
> > m_closed);
> > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only,
> > m_closed);
> > + sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only,
> > m_closed);
>
>
> > + sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed);
>
> Perhaps we should warn for operations on known invalid FDs? (and thus
> complain about close on a known-invalid FD).
>
> I believe that the OS will handle such code gracefully by setting errno
> to EBADF.
>
>
> > +
> > + if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > + {
> > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > + sm_ctxt->warn (node, stmt, arg, new double_close (*this,
> > diag_arg));
> > + sm_ctxt->set_next_state (stmt, arg, m_stop);
> > + }
> > +}
> > +void
> > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> > + const gimple *stmt, const gcall *call)
> > const
> > +{
> > + tree arg = gimple_call_arg (call, 0);
> > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > + if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > + {
> > + sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd
> > (*this,
> > diag_arg, 0));
> > + }
>
> Maybe consolidate on_read and on_write by adding a subroutine that
> checks for m_closed, and for checking access mode (maybe a
> "check_for_open_fd" that takes an access mode enum value amongst other
> params. If you pass around caller_fndecl, I think much of this code
> can be shared that way between on_read and on_write (which will help
> simplify things if we want to support further functions).
>
> > +
> > + if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > + {
> > + sm_ctxt->warn (node, stmt, arg,
> > + new possible_invalid_fd_diag (*this, diag_arg,
> > 0));
> > + }
> > +
> > + if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only
> > + || sm_ctxt->get_state (stmt, arg) == m_valid_write_only)
> > + {
> > + sm_ctxt->warn (node, stmt, arg,
> > + new read_write_diag_fd (*this, diag_arg, 0));
> > + }
> > +}
> > +void
> > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode
> > *node,
> > + const gimple *stmt, const gcall *call)
> > const
> > +{
> > + tree arg = gimple_call_arg (call, 0);
> > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > +
> > + if (sm_ctxt->get_state (stmt, arg) == m_closed)
> > + {
> > + sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this,
> > diag_arg, 1));
> > + }
> > + if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> > + {
> > + sm_ctxt->warn (node, stmt, arg,
> > + new possible_invalid_fd_diag (*this, diag_arg,
> > 1));
> > + }
> > +
> > + if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only
> > + || sm_ctxt->get_state (stmt, arg) == m_valid_read_only)
> > + {
> > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> > + sm_ctxt->warn (node, stmt, arg,
> > + new read_write_diag_fd (*this, diag_arg, 1));
> > + }
> > +}
> > +
> > +void
> > +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode
> > *node,
> > + const gimple *stmt, const svalue *lhs,
> > + enum tree_code op, const svalue *rhs)
> > const
> > +{
> > + if (!rhs->all_zeroes_p ())
> > + return;
> > +
> > + if (op == GE_EXPR)
> > + {
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > + m_valid_read_write);
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > + m_valid_read_only);
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > + m_valid_write_only);
> > + }
> > + else if (op == LT_EXPR)
> > + {
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write,
> > m_null);
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only,
> > m_null);
> > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only,
> > m_null);
> > + }
> > +}
> > +
> > +bool
> > +fd_state_machine::can_purge_p (state_t s) const
> > +{
> > + if (is_unchecked (s) || is_valid (s))
> > + return false;
> > + else
> > + return true;
> > +}
> > +
> > +pending_diagnostic *
> > +fd_state_machine::on_leak (tree var) const
> > +{
> > + return new fd_leak (*this, var);
> > +}
> > +} // namespace
> > +
> > +state_machine *
> > +make_fd_state_machine (logger *logger)
> > +{
> > + return new fd_state_machine (logger);
> > +}
> > +} // namespace ana
> > +
> > +#endif // ENABLE_ANALYZER
> > \ No newline at end of file
>
> [...snip...]
>
> >
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > new file mode 100644
> > index 00000000000..985e9ac75de
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c
> > @@ -0,0 +1,26 @@
> > +int open(const char *, int mode);
> > +#define O_RDONLY 0
> > +#define O_WRONLY 1
> > +#define O_RDWR 2
>
> As noted above, we want to somehow use the values for the O_ macros
> from the test code, rather than the <unistd.h> with which sm-fd.cc was
> built.
>
> > +
> > +void
> > +test_1 (const char *path)
> > +{
> > + int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened
> > here" }
> > */
> > + return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event"
> > { target *-*-* } .-1 } */
> > +}
> > +
> > +void
> > +test_2 (const char *path)
> > +{
> > + int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here"
> > } */
> > + if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file
> > descriptor" "event1" } */
> > + /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >=
> > 0'\\)..." "event2" { target *-*-* } .-1 } */
> > + {
> > + return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE-
> > 775\\\]"
> > "warning" } */
> > + /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1
> > } */
> > + /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)"
> > "event2" { target *-*-* } .-2 } */
> > + }
> > +}
>
> Looks good, but in future don't feel the need to express all of the
> events: I think the significant ones we should test for here are (1),
> (2), (5), whereas specifying directives for (3) and (4) are probably
> overkill. prune.exp will prune away stray lines beginning with "note:
> " so we don't need to write directives for all events, just the ones we
> care about.
>
>
> [...snip...]
>
> Lots of good test coverage here. With a state-machine based
> diagnostic, it's probably best to try to have coverage of all possible
> calls in all possible states - that way we can think systematically
> about what we should be warning for, and what we shouldn't be warning
> for. Note that once you've got test coverage of the basic events
> (like you have), you don't have to have dg-message directives for all
> of the events in all of the rest of the tests (that would be excessive,
> and too much work to create and maintain).
>
> A few other nits:
> - the Subject line for the commit should have an "analyzer: " prefix
> and reference the bugzilla problem report (PR 106003)
> - for better or worse, the project requires ChangeLog entries. You can
> use the script ./contrib/mklog.py to help generate this. Some more
> information is here:
> https://www.gnu.org/prep/standards/html_node/Change-Logs.html
> In GCC our ChangeLog entries are part of the git commit message, and we
> have a serverside script to update the ChangeLog files in the source
> tree (to avoid constantly having merge conflicts).
>
> I think we'll have at least one more iteration of this patch to review,
> and before I can approve you to push a later version of the patch to
> trunk, you'll need to "fully" test it, by doing a bootstrap, and
> running the full testsuite on the result. Do you need some help on how
> to do that? Do you have access to the GCC compile farm yet?
>
> I'll see if I can figure out how to access the preprocessor from the
> analyzer.
>
> Thanks again for the patch; hope the above makes sense and is
> constructive
>
> Dave
>
>
>
>
>
next prev parent reply other threads:[~2022-06-23 18:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-21 16:30 Mir Immad
2022-06-21 16:30 ` Mir Immad
2022-06-21 18:34 ` David Malcolm
2022-06-21 18:50 ` Joseph Myers
2022-06-21 20:31 ` David Malcolm
2022-06-21 21:18 ` Joseph Myers
2022-06-23 18:28 ` Mir Immad [this message]
2022-06-24 18:05 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAE1-7oxk=t7AzdBLB8+u+eU+5TDHYFbZ1-wtwdQJSRkC8iu9ug@mail.gmail.com' \
--to=mirimnan017@gmail.com \
--cc=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).