From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>, gcc@gcc.gnu.org
Subject: Re: [PATCH] analyzer PR 106003
Date: Thu, 23 Jun 2022 19:20:51 -0400 [thread overview]
Message-ID: <1da7fe0090c220a7c049b3527f2735ddc60b5659.camel@redhat.com> (raw)
In-Reply-To: <CAE1-7ox+fZV3oKNuq5sCMvLK4Y1Hyo7pOwmV8kTckbrEYPYEcQ@mail.gmail.com>
On Fri, 2022-06-24 at 00:00 +0530, Mir Immad wrote:
Thanks for the updated patch.
This is close to being ready.
One big issue is the target hook idea for isolating the target's
definition of the O_* flags as per Joseph's suggestion here:
https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html
Given Joseph's comment here:
https://gcc.gnu.org/pipermail/gcc/2022-June/238954.html
I'm inclined to think that the target hook stuff could be done as a
followup.
Indentation has gone a bit wrong (presumably due to issue with the
mailer); the main thing to watch out for is to use tabs for 8 space
indentations within the file. Enabling "visual whitespace" (or
whatever it's called) in your editor can help with this.
Every patch that adds any options (e.g. to gcc/analyzer/analyzer.opt)
needs to document those options, in gcc/doc/invoke.texi. Grep for one
of the existing analyzer options in invoke.texi to see what you need to
add for the new options.
You'll need a ChangeLog entry before this can be pushed. For better or
worse, the project requires ChangeLog entries. You can
use the script ./contrib/mklog.py to help generate this. Some more
information is here:
https://www.gnu.org/prep/standards/html_node/Change-Logs.html
In GCC our ChangeLog entries are part of the git commit message, and we
have a serverside script to update the ChangeLog files in the source
tree (to avoid constantly having merge conflicts).
The Subject line for the commit should have an "analyzer: " prefix
and reference the bugzilla problem report (PR 106003)
Before you can push the patch, you'll have to test it with a full
bootstrap and regression test run. Let me know if you need help with
that.
Various other comments inline below...
[...snip...]
> diff --git gcc/analyzer/analyzer.opt gcc/analyzer/analyzer.opt
> index 23dfc797cea..e2a629bb42c 100644
> --- gcc/analyzer/analyzer.opt
> +++ gcc/analyzer/analyzer.opt
[...snip...]
> +Wanalyzer-fd-use-without-check
> +Common Var(warn_analyzer_fd_use_without_check) Init(1) Warning
> +warn about code paths in which a possibly invalid file descriptor is
> passed to a must-be-a-valid file descriptor function argument.
These should be capitalized as English sentences, and the "must-be-a-
valid" wording doesn't feel quite right to me.
Reword this to something like:
"Warn about code paths in which a file descriptor is used without being
checked for validity."
or somesuch (all on one line).
[...snip...]
> diff --git gcc/analyzer/sm-fd.cc gcc/analyzer/sm-fd.cc
> new file mode 100644
> index 00000000000..048014d7a26
> --- /dev/null
> +++ gcc/analyzer/sm-fd.cc
> @@ -0,0 +1,770 @@
> +/* A state machine for detecting misuses of POSIX file descriptor APIs.
> + Copyright (C) 2019-2022 Free Software Foundation, Inc.
> + Contributed by Immad Mir <mirimmad17@gmail.com>.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful, but
> +WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tree.h"
> +#include "function.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "options.h"
> +#include "diagnostic-path.h"
> +#include "diagnostic-metadata.h"
> +#include "function.h"
> +#include "json.h"
> +#include "analyzer/analyzer.h"
> +#include "diagnostic-event-id.h"
> +#include "analyzer/analyzer-logging.h"
> +#include "analyzer/sm.h"
> +#include "analyzer/pending-diagnostic.h"
> +#include "analyzer/function-set.h"
> +#include "analyzer/analyzer-selftests.h"
> +#include "tristate.h"
> +#include "selftest.h"
> +#include "analyzer/call-string.h"
> +#include "analyzer/program-point.h"
> +#include "analyzer/store.h"
> +#include "analyzer/region-model.h"
> +
> +#include <unistd.h>
We'll want to drop the #include <unistd.h> once we have a target hook.
[...snip...]
> +
> + state_machine::state_t
> + get_default_state (const svalue *sval) const final override
> + {
> + if (tree cst = sval->maybe_get_constant ())
> + {
> + int val = TREE_INT_CST_LOW (cst);
> + if (val < 0)
> + {
> + return m_invalid;
> + }
> + }
> + return m_start;
> + }
Please add coverage to the testsuite for the case of accessing negative
and non-negative constant integer values: what happens when we hit the
above cases?
In particular, IIRC 0, 1, and 2 are stdin, stdout, and stderr
respectively, and are already open at the start of "main" (though not
necessarily at the start of any given *analysis path*, so we can't
assume that) - so there may be code out there that has hardcoded
accesses using these constant integer values.
[...snip...]
> +
> + /* State for reperesenting a file descriptor that is known to be valid (>= 0),
> + for three different access */
Add " modes." to the end of the comment.
[...snip...]
> +/* Base diagnostic class relative to fd_state_machine. */
> +class fd_diagnostic : public pending_diagnostic
> +{
> +public:
> + fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm), m_arg
> (arg)
> + {
> + }
> +
> + bool
> + subclass_equal_p (const pending_diagnostic &base_other) const override
> + {
> + return same_tree_p (m_arg, ((const fd_diagnostic &)base_other).m_arg);
> + }
> +
> + label_text
> + describe_state_change (const evdesc::state_change &change) override
> + {
> + if (change.m_old_state == m_sm.get_start_state ()
> + && m_sm.is_unchecked (change.m_new_state))
> + {
> + if (change.m_new_state == m_sm.m_unchecked_read_write)
> + return change.formatted_print ("opened here as %<read-write%>");
> +
> + if (change.m_new_state == m_sm.m_unchecked_read_only)
> + return change.formatted_print ("opened here as %<read-only%>");
> +
> + if (change.m_new_state == m_sm.m_unchecked_write_only)
> + return change.formatted_print ("opened here as %<write-only%>");
"read-write" etc in these messages are part of the text of the
messages, rather than quoting a source-language construct, so drop the
%< and %> here.
> + }
> +
> + if (change.m_new_state == m_sm.m_closed)
> + return change.formatted_print ("closed here");
> +
> + if (m_sm.is_unchecked (change.m_old_state)
> + && m_sm.is_valid (change.m_new_state))
> + if (change.m_expr)
> + return change.formatted_print (
> + "assuming %qE is a valid file descriptor (>= 0)",
> change.m_expr);
> + else
> + return change.formatted_print ("assuming a valid file descriptor");
> +
> + if (m_sm.is_unchecked (change.m_old_state)
> + && change.m_new_state == m_sm.m_invalid)
> + if (change.m_expr)
> + return change.formatted_print (
> + "assuming %qE is an invalid file descriptor (< 0)",
> change.m_expr);
> + else
> + return change.formatted_print ("assuming an invalid file
> descriptor");
> +
> + return label_text ();
> + }
> +
> +protected:
> + const fd_state_machine &m_sm;
> + tree m_arg;
> +};
> +
> +class fd_access_mode_mismatch : public fd_diagnostic
> +{
> +public:
> + /* mode specfies whether read on write was attempted or vice versa.
> + */
> + fd_access_mode_mismatch (const fd_state_machine &sm, tree arg,
> + enum access_mode mode, const tree callee_fndecl)
> + : m_mode (mode), m_callee_fndecl (callee_fndecl), fd_diagnostic (sm,
> arg)
> + {
> + }
Every subclass of fd_diagnostic that adds fields ought to have its own
override of:
subclass_equal_p (const pending_diagnostic &base_other) const override
which chains up to the parent class implementation, and also compares
the new fields.
> +
> + const char *
> + get_kind () const final override
> + {
> + return "fd_access_mode_mismatch";
> + }
> +
> + int
> + get_controlling_option () const final override
> + {
> + return OPT_Wanalyzer_fd_access_mode_mismatch;
> + }
> +
> + bool
> + emit (rich_location *rich_loc) final override
> + {
> + return warning_at (rich_loc, get_controlling_option (),
> + "%qE on %qs file descriptor %qE", m_callee_fndecl,
> + m_sm.get_string_for_access_mode (m_mode), m_arg);
It's bad for localization to be constructing strings from fragments
like this.
It's a pain, but please eliminate get_string_for_access_mode in favor
of switch statements wherever you'd use it, like here.
> + }
> +
> + label_text
> + describe_state_change (const evdesc::state_change &change) override
> + {
> + return fd_diagnostic::describe_state_change (change);
> + }
> +
> + label_text
> + describe_final_event (const evdesc::final_event &ev) final override
> + {
> + return ev.formatted_print ("%qE on %qs file descriptor %qE",
> + m_callee_fndecl,
> + m_sm.get_string_for_access_mode (m_mode), m_arg);
As above, please eliminate get_string_for_access_mode and convert to a
switch.
> + }
> +
> +private:
> + enum access_mode m_mode;
> + const tree m_callee_fndecl;
> +};
> +
[...snip...]
> +
> +class fd_use_after_close : public fd_diagnostic
> +{
> +public:
> + fd_use_after_close (const fd_state_machine &sm, tree arg,
> + const tree callee_fndecl)
> + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg)
> + {
> + }
This subclass has its own fields, and so needs to implement:
subclass_equal_p (const pending_diagnostic &base_other) const override
[...snip...]
> +class unchecked_use_of_fd : public fd_diagnostic
> +{
> +public:
> + unchecked_use_of_fd (const fd_state_machine &sm, tree arg,
> + const tree callee_fndecl)
> + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg)
> + {
> + }
> +
> + const char *
> + get_kind () const final override
> + {
> + return "unchecked_use_of_fd";
> + }
> +
> + int
> + get_controlling_option () const final override
> + {
> + return OPT_Wanalyzer_fd_use_without_check;
> + }
> +
> + bool
> + emit (rich_location *rich_loc) final override
> + {
> +
> + return warning_at (rich_loc, get_controlling_option (),
> + "%qE on possibly invalid file descriptor %qE",
> + m_callee_fndecl, m_arg);
How about:
"%qE on file descriptor %qE without checking %<open%> succeeded"
> + }
> +
> + label_text
> + describe_state_change (const evdesc::state_change &change) override
> + {
> + if (m_sm.is_unchecked (change.m_new_state))
> + {
> + m_first_open_event = change.m_event_id;
> + return label_text::borrow ("opened here");
> + }
> +
> + return fd_diagnostic::describe_state_change (change);
> + }
> +
> + label_text
> + describe_final_event (const evdesc::final_event &ev) final override
> + {
> + return ev.formatted_print ("%qE could be invalid: unchecked value from
> %@",
> + m_arg, &m_first_open_event);
> + }
> +
> +private:
> + diagnostic_event_id_t m_first_open_event;
> + const tree m_callee_fndecl;
> +};
> +
[...snip...]
> +bool
> +fd_state_machine::is_unchecked (state_t s) const
> +{
> + return s == m_unchecked_read_write
> + || s == m_unchecked_read_only
> + || s == m_unchecked_write_only;
> +}
Thanks for splitting these onto multiple lines, but please wrap them
with parentheses, so they look like:
return (s == m_unchecked_read_write
|| s == m_unchecked_read_only
|| s == m_unchecked_write_only);
> +
> +bool
> +fd_state_machine::is_valid (state_t s) const
> +{
> + return s == m_valid_read_write
> + || s == m_valid_read_only
> + || s == m_valid_write_only;
> +}
Likewise here.
> +
> +const char *
> +fd_state_machine::get_string_for_access_mode (enum access_mode mode) const
> +{
> + if (mode == READ_ONLY)
> + return "read-only";
> + else if (mode == WRITE_ONLY)
> + return "write-only";
> + else
> + return "read-write";
> +}
As noted above it's bad for localization to be constructing strings
from fragments like this, so please eliminate the above function, in
favor of switch statements wherever you'd use it.
> +
> +enum access_mode
> +fd_state_machine::get_access_mode_from_flag (int flag) const
> +{
> + /* FIXME: this code assumes the access modes on the host and
> + target are the same, which in practice might not be the case. */
Thanks for moving this into a subroutine.
Joseph says we should introduce a target hook for this:
https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html
You can see an example of adding a target hook in commit
8cdcea51c0fd753e6a652c9b236e91b3a6e0911c.
As noted above, I think we can leave adding the target hook until a
followup patch, if this is only going to be an issue with cross-
compilation between Hurd and non-Hurd systems.
> + if (flag == O_RDONLY)
> + {
> + return READ_ONLY;
> + }
> + else if (flag == O_WRONLY)
> + {
> + return WRITE_ONLY;
> + }
> + return READ_WRITE;
> +}
> +
> +enum access_mode
> +fd_state_machine::get_access_mode_from_state (state_t state) const
> +{
> + if (state == m_unchecked_read_only || state == m_valid_read_only)
> + return READ_ONLY;
> + else if (state == m_unchecked_write_only || state == m_valid_write_only)
> + return WRITE_ONLY;
> + else if (state == m_unchecked_read_write || state == m_valid_write_only)
I think there's a copy-and-paste error in the above line: ^^^^^^^^^^^^^^^^^^
> + return READ_WRITE;
> +}
[...snip...]
> +void
> +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node,
> + const gimple *stmt, const gcall *call) const
> +{
> + tree lhs = gimple_call_lhs (call);
> + if (lhs)
> + {
> + tree arg = gimple_call_arg (call, 1);
> + if (TREE_CODE (arg) == INTEGER_CST)
> + {
> + int flag = TREE_INT_CST_LOW (arg);
> + enum access_mode mode = get_access_mode_from_flag (flag);
> + if (mode == READ_ONLY)
> + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> + m_unchecked_read_only);
> + else if (mode == WRITE_ONLY)
> + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> + m_unchecked_write_only);
> + else
> + sm_ctxt->on_transition (node, stmt, lhs, m_start,
> + m_unchecked_read_write);
I think this can be a switch (mode).
> + }
> + }
> + else
> + {
> + /* FIXME: add leak reporting */
> + }
> +}
[...snip...]
> +void
> +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node,
> + const gimple *stmt, const gcall *call,
> + const tree callee_fndecl) const
> +{
> + tree arg = gimple_call_arg (call, 0);
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> + state_t state = sm_ctxt->get_state (stmt, arg);
> + enum access_mode mode = get_access_mode_from_state (state);
> +
> + if (check_for_closed_fd (state))
> + {
> + sm_ctxt->warn (node, stmt, arg,
> + new fd_use_after_close (*this, diag_arg, callee_fndecl));
> + }
> +
> + if (!check_for_closed_fd (state))
> + {
Why not just an "else" here?
> +
> + if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> + {
> + sm_ctxt->warn (
> + node, stmt, arg,
> + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl));
> + }
> +
> + if (mode == WRITE_ONLY)
> + {
> + sm_ctxt->warn (node, stmt, arg,
> + new fd_access_mode_mismatch (*this, diag_arg, mode,
> + callee_fndecl));
> + }
> + }
> +}
> +void
> +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node,
> + const gimple *stmt, const gcall *call,
> + const tree callee_fndecl) const
> +{
> + tree arg = gimple_call_arg (call, 0);
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> + state_t state = sm_ctxt->get_state (stmt, arg);
> + enum access_mode mode = get_access_mode_from_state (state);
> +
> + if (check_for_closed_fd (state))
> + {
> + sm_ctxt->warn (node, stmt, arg,
> + new fd_use_after_close (*this, diag_arg,
> callee_fndecl));
> + }
> +
> + if (!check_for_closed_fd (state))
> + {
Again, I think this can be another "else" here.
> +
> + if (!is_valid (sm_ctxt->get_state (stmt, arg)))
> + {
> + sm_ctxt->warn (
> + node, stmt, arg,
> + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl));
> + }
...but I think it's slightly cleaner to move all of this repeated logic
into a "check_for_open_fd" subroutine, that checks both for
fd_use_after_close and for unchecked_use_of_fd.
> +
> + if (mode == READ_ONLY)
> + {
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg);
> + sm_ctxt->warn (node, stmt, arg,
> + new fd_access_mode_mismatch (*this, diag_arg,
> mode,
> + callee_fndecl));
> + }
> + }
> +}
> +
[...snip...]
> diff --git gcc/testsuite/gcc.dg/analyzer/fd-1.c
> gcc/testsuite/gcc.dg/analyzer/fd-1.c
> new file mode 100644
> index 00000000000..985e9ac75de
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/analyzer/fd-1.c
> @@ -0,0 +1,26 @@
> +int open(const char *, int mode);
> +#define O_RDONLY 0
> +#define O_WRONLY 1
> +#define O_RDWR 2
If we're going with a target hook, then we should use the target's
<unistd.h> to get at the O_ macros. But as noted above, that can be
deferred to a follow-up patch.
[...snip...]
Hope this makes sense and is constructive; thanks again for the updated
patch
Dave
next prev parent reply other threads:[~2022-06-23 23:20 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-23 18:30 Mir Immad
2022-06-23 23:20 ` David Malcolm [this message]
2022-07-14 15:40 ` David Malcolm
2022-06-25 16:00 Mir Immad
2022-06-26 15:53 ` David Malcolm
2022-06-27 17:07 ` Mir Immad
2022-06-27 17:08 Mir Immad
2022-06-27 20:55 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1da7fe0090c220a7c049b3527f2735ddc60b5659.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=mirimnan017@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).