From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 25C2C3857423 for ; Thu, 23 Jun 2022 23:20:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 25C2C3857423 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-OObosWnRP7eECaiqqt3YfQ-1; Thu, 23 Jun 2022 19:20:55 -0400 X-MC-Unique: OObosWnRP7eECaiqqt3YfQ-1 Received: by mail-qv1-f71.google.com with SMTP id j6-20020a05621419c600b004704e6665caso1017518qvc.6 for ; Thu, 23 Jun 2022 16:20:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=HGf+HrxSrghGz11f8yC0m04H5lNYBpahEoktYiZfGyQ=; b=iNlNJxYFXwTGX/VIlE5GYgLZDcow0pPJppVskKhfXDxdhSsbtv8PG5NKaKNHeRoT/d DOm4xywogRx8HloqfpYnIfrOCYbVLf94UHBcBD4xaZkgtG5Ae3nyQc0DkliSbYs57Q4/ TIOhuQqCj1piexPoGnXW5B9jaWMxS1Av24N81nuMcnxHkk9vseuRLCbJkybd63ywfz9W ODV4FRpLOYIxr2I2An+yeOoI9YbdxXJWfaISP7HJ/oHeuDcXUd8cH6Kk64K+wvi85PYi Sywlg+n9b2ifFbS4/uMqCl6k7UNUnIOS+QvTfmAjVAPgKq2bwM3uyQZnDqiC7g7AEerz 0+jg== X-Gm-Message-State: AJIora9CJjqMqlp4oSRydnVDnH2qQYHS5mKXEaikyEHfVunIChzRLVcF PZQA08bvXvJZD7Gz6bSukQa5WYTDDRIBX4JUYhKFCVyxaaLxkqz4Qs4WUWW62G9priMzAnh65Y+ Whyhnyyk= X-Received: by 2002:a05:620a:2b46:b0:6ae:e9d8:507d with SMTP id dp6-20020a05620a2b4600b006aee9d8507dmr6384195qkb.83.1656026454868; Thu, 23 Jun 2022 16:20:54 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vCdqVNuuRnH8T3yprUm2qtDd3rQnfN4ioe0lx1zUgPBDXTDny5qHG10RcEOAjaL4DEVcjUFA== X-Received: by 2002:a05:620a:2b46:b0:6ae:e9d8:507d with SMTP id dp6-20020a05620a2b4600b006aee9d8507dmr6384165qkb.83.1656026454384; Thu, 23 Jun 2022 16:20:54 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id j13-20020a05620a288d00b006a3325fd985sm725532qkp.13.2022.06.23.16.20.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jun 2022 16:20:53 -0700 (PDT) Message-ID: <1da7fe0090c220a7c049b3527f2735ddc60b5659.camel@redhat.com> Subject: Re: [PATCH] analyzer PR 106003 From: David Malcolm To: Mir Immad , gcc@gcc.gnu.org Date: Thu, 23 Jun 2022 19:20:51 -0400 In-Reply-To: References: User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2022 23:21:00 -0000 On Fri, 2022-06-24 at 00:00 +0530, Mir Immad wrote: Thanks for the updated patch. This is close to being ready. One big issue is the target hook idea for isolating the target's definition of the O_* flags as per Joseph's suggestion here: https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html Given Joseph's comment here: https://gcc.gnu.org/pipermail/gcc/2022-June/238954.html I'm inclined to think that the target hook stuff could be done as a followup. Indentation has gone a bit wrong (presumably due to issue with the mailer); the main thing to watch out for is to use tabs for 8 space indentations within the file. Enabling "visual whitespace" (or whatever it's called) in your editor can help with this. Every patch that adds any options (e.g. to gcc/analyzer/analyzer.opt) needs to document those options, in gcc/doc/invoke.texi. Grep for one of the existing analyzer options in invoke.texi to see what you need to add for the new options. You'll need a ChangeLog entry before this can be pushed. For better or worse, the project requires ChangeLog entries. You can use the script ./contrib/mklog.py to help generate this. Some more information is here: https://www.gnu.org/prep/standards/html_node/Change-Logs.html In GCC our ChangeLog entries are part of the git commit message, and we have a serverside script to update the ChangeLog files in the source tree (to avoid constantly having merge conflicts). The Subject line for the commit should have an "analyzer: " prefix and reference the bugzilla problem report (PR 106003) Before you can push the patch, you'll have to test it with a full bootstrap and regression test run. Let me know if you need help with that. Various other comments inline below... [...snip...] > diff --git gcc/analyzer/analyzer.opt gcc/analyzer/analyzer.opt > index 23dfc797cea..e2a629bb42c 100644 > --- gcc/analyzer/analyzer.opt > +++ gcc/analyzer/analyzer.opt [...snip...] > +Wanalyzer-fd-use-without-check > +Common Var(warn_analyzer_fd_use_without_check) Init(1) Warning > +warn about code paths in which a possibly invalid file descriptor is > passed to a must-be-a-valid file descriptor function argument. These should be capitalized as English sentences, and the "must-be-a- valid" wording doesn't feel quite right to me. Reword this to something like: "Warn about code paths in which a file descriptor is used without being checked for validity." or somesuch (all on one line). [...snip...] > diff --git gcc/analyzer/sm-fd.cc gcc/analyzer/sm-fd.cc > new file mode 100644 > index 00000000000..048014d7a26 > --- /dev/null > +++ gcc/analyzer/sm-fd.cc > @@ -0,0 +1,770 @@ > +/* A state machine for detecting misuses of POSIX file descriptor APIs. > + Copyright (C) 2019-2022 Free Software Foundation, Inc. > + Contributed by Immad Mir . > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it > +under the terms of the GNU General Public License as published by > +the Free Software Foundation; either version 3, or (at your option) > +any later version. > + > +GCC is distributed in the hope that it will be useful, but > +WITHOUT ANY WARRANTY; without even the implied warranty of > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +General Public License for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +. */ > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "tree.h" > +#include "function.h" > +#include "basic-block.h" > +#include "gimple.h" > +#include "options.h" > +#include "diagnostic-path.h" > +#include "diagnostic-metadata.h" > +#include "function.h" > +#include "json.h" > +#include "analyzer/analyzer.h" > +#include "diagnostic-event-id.h" > +#include "analyzer/analyzer-logging.h" > +#include "analyzer/sm.h" > +#include "analyzer/pending-diagnostic.h" > +#include "analyzer/function-set.h" > +#include "analyzer/analyzer-selftests.h" > +#include "tristate.h" > +#include "selftest.h" > +#include "analyzer/call-string.h" > +#include "analyzer/program-point.h" > +#include "analyzer/store.h" > +#include "analyzer/region-model.h" > + > +#include We'll want to drop the #include once we have a target hook. [...snip...] > + > + state_machine::state_t > + get_default_state (const svalue *sval) const final override > + { > + if (tree cst = sval->maybe_get_constant ()) > + { > + int val = TREE_INT_CST_LOW (cst); > + if (val < 0) > + { > + return m_invalid; > + } > + } > + return m_start; > + } Please add coverage to the testsuite for the case of accessing negative and non-negative constant integer values: what happens when we hit the above cases? In particular, IIRC 0, 1, and 2 are stdin, stdout, and stderr respectively, and are already open at the start of "main" (though not necessarily at the start of any given *analysis path*, so we can't assume that) - so there may be code out there that has hardcoded accesses using these constant integer values. [...snip...] > + > + /* State for reperesenting a file descriptor that is known to be valid (>= 0), > + for three different access */ Add " modes." to the end of the comment. [...snip...] > +/* Base diagnostic class relative to fd_state_machine. */ > +class fd_diagnostic : public pending_diagnostic > +{ > +public: > + fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm), m_arg > (arg) > + { > + } > + > + bool > + subclass_equal_p (const pending_diagnostic &base_other) const override > + { > + return same_tree_p (m_arg, ((const fd_diagnostic &)base_other).m_arg); > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + if (change.m_old_state == m_sm.get_start_state () > + && m_sm.is_unchecked (change.m_new_state)) > + { > + if (change.m_new_state == m_sm.m_unchecked_read_write) > + return change.formatted_print ("opened here as %"); > + > + if (change.m_new_state == m_sm.m_unchecked_read_only) > + return change.formatted_print ("opened here as %"); > + > + if (change.m_new_state == m_sm.m_unchecked_write_only) > + return change.formatted_print ("opened here as %"); "read-write" etc in these messages are part of the text of the messages, rather than quoting a source-language construct, so drop the %< and %> here. > + } > + > + if (change.m_new_state == m_sm.m_closed) > + return change.formatted_print ("closed here"); > + > + if (m_sm.is_unchecked (change.m_old_state) > + && m_sm.is_valid (change.m_new_state)) > + if (change.m_expr) > + return change.formatted_print ( > + "assuming %qE is a valid file descriptor (>= 0)", > change.m_expr); > + else > + return change.formatted_print ("assuming a valid file descriptor"); > + > + if (m_sm.is_unchecked (change.m_old_state) > + && change.m_new_state == m_sm.m_invalid) > + if (change.m_expr) > + return change.formatted_print ( > + "assuming %qE is an invalid file descriptor (< 0)", > change.m_expr); > + else > + return change.formatted_print ("assuming an invalid file > descriptor"); > + > + return label_text (); > + } > + > +protected: > + const fd_state_machine &m_sm; > + tree m_arg; > +}; > + > +class fd_access_mode_mismatch : public fd_diagnostic > +{ > +public: > + /* mode specfies whether read on write was attempted or vice versa. > + */ > + fd_access_mode_mismatch (const fd_state_machine &sm, tree arg, > + enum access_mode mode, const tree callee_fndecl) > + : m_mode (mode), m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, > arg) > + { > + } Every subclass of fd_diagnostic that adds fields ought to have its own override of: subclass_equal_p (const pending_diagnostic &base_other) const override which chains up to the parent class implementation, and also compares the new fields. > + > + const char * > + get_kind () const final override > + { > + return "fd_access_mode_mismatch"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_access_mode_mismatch; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on %qs file descriptor %qE", m_callee_fndecl, > + m_sm.get_string_for_access_mode (m_mode), m_arg); It's bad for localization to be constructing strings from fragments like this. It's a pain, but please eliminate get_string_for_access_mode in favor of switch statements wherever you'd use it, like here. > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + return fd_diagnostic::describe_state_change (change); > + } > + > + label_text > + describe_final_event (const evdesc::final_event &ev) final override > + { > + return ev.formatted_print ("%qE on %qs file descriptor %qE", > + m_callee_fndecl, > + m_sm.get_string_for_access_mode (m_mode), m_arg); As above, please eliminate get_string_for_access_mode and convert to a switch. > + } > + > +private: > + enum access_mode m_mode; > + const tree m_callee_fndecl; > +}; > + [...snip...] > + > +class fd_use_after_close : public fd_diagnostic > +{ > +public: > + fd_use_after_close (const fd_state_machine &sm, tree arg, > + const tree callee_fndecl) > + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg) > + { > + } This subclass has its own fields, and so needs to implement: subclass_equal_p (const pending_diagnostic &base_other) const override [...snip...] > +class unchecked_use_of_fd : public fd_diagnostic > +{ > +public: > + unchecked_use_of_fd (const fd_state_machine &sm, tree arg, > + const tree callee_fndecl) > + : m_callee_fndecl (callee_fndecl), fd_diagnostic (sm, arg) > + { > + } > + > + const char * > + get_kind () const final override > + { > + return "unchecked_use_of_fd"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_use_without_check; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on possibly invalid file descriptor %qE", > + m_callee_fndecl, m_arg); How about: "%qE on file descriptor %qE without checking % succeeded" > + } > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + if (m_sm.is_unchecked (change.m_new_state)) > + { > + m_first_open_event = change.m_event_id; > + return label_text::borrow ("opened here"); > + } > + > + return fd_diagnostic::describe_state_change (change); > + } > + > + label_text > + describe_final_event (const evdesc::final_event &ev) final override > + { > + return ev.formatted_print ("%qE could be invalid: unchecked value from > %@", > + m_arg, &m_first_open_event); > + } > + > +private: > + diagnostic_event_id_t m_first_open_event; > + const tree m_callee_fndecl; > +}; > + [...snip...] > +bool > +fd_state_machine::is_unchecked (state_t s) const > +{ > + return s == m_unchecked_read_write > + || s == m_unchecked_read_only > + || s == m_unchecked_write_only; > +} Thanks for splitting these onto multiple lines, but please wrap them with parentheses, so they look like: return (s == m_unchecked_read_write || s == m_unchecked_read_only || s == m_unchecked_write_only); > + > +bool > +fd_state_machine::is_valid (state_t s) const > +{ > + return s == m_valid_read_write > + || s == m_valid_read_only > + || s == m_valid_write_only; > +} Likewise here. > + > +const char * > +fd_state_machine::get_string_for_access_mode (enum access_mode mode) const > +{ > + if (mode == READ_ONLY) > + return "read-only"; > + else if (mode == WRITE_ONLY) > + return "write-only"; > + else > + return "read-write"; > +} As noted above it's bad for localization to be constructing strings from fragments like this, so please eliminate the above function, in favor of switch statements wherever you'd use it. > + > +enum access_mode > +fd_state_machine::get_access_mode_from_flag (int flag) const > +{ > + /* FIXME: this code assumes the access modes on the host and > + target are the same, which in practice might not be the case. */ Thanks for moving this into a subroutine. Joseph says we should introduce a target hook for this: https://gcc.gnu.org/pipermail/gcc/2022-June/238961.html You can see an example of adding a target hook in commit 8cdcea51c0fd753e6a652c9b236e91b3a6e0911c. As noted above, I think we can leave adding the target hook until a followup patch, if this is only going to be an issue with cross- compilation between Hurd and non-Hurd systems. > + if (flag == O_RDONLY) > + { > + return READ_ONLY; > + } > + else if (flag == O_WRONLY) > + { > + return WRITE_ONLY; > + } > + return READ_WRITE; > +} > + > +enum access_mode > +fd_state_machine::get_access_mode_from_state (state_t state) const > +{ > + if (state == m_unchecked_read_only || state == m_valid_read_only) > + return READ_ONLY; > + else if (state == m_unchecked_write_only || state == m_valid_write_only) > + return WRITE_ONLY; > + else if (state == m_unchecked_read_write || state == m_valid_write_only) I think there's a copy-and-paste error in the above line: ^^^^^^^^^^^^^^^^^^ > + return READ_WRITE; > +} [...snip...] > +void > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call) const > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs) > + { > + tree arg = gimple_call_arg (call, 1); > + if (TREE_CODE (arg) == INTEGER_CST) > + { > + int flag = TREE_INT_CST_LOW (arg); > + enum access_mode mode = get_access_mode_from_flag (flag); > + if (mode == READ_ONLY) > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_only); > + else if (mode == WRITE_ONLY) > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_write_only); > + else > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_write); I think this can be a switch (mode). > + } > + } > + else > + { > + /* FIXME: add leak reporting */ > + } > +} [...snip...] > +void > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_mode mode = get_access_mode_from_state (state); > + > + if (check_for_closed_fd (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, callee_fndecl)); > + } > + > + if (!check_for_closed_fd (state)) > + { Why not just an "else" here? > + > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > + { > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } > + > + if (mode == WRITE_ONLY) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, mode, > + callee_fndecl)); > + } > + } > +} > +void > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_mode mode = get_access_mode_from_state (state); > + > + if (check_for_closed_fd (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, > callee_fndecl)); > + } > + > + if (!check_for_closed_fd (state)) > + { Again, I think this can be another "else" here. > + > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > + { > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } ...but I think it's slightly cleaner to move all of this repeated logic into a "check_for_open_fd" subroutine, that checks both for fd_use_after_close and for unchecked_use_of_fd. > + > + if (mode == READ_ONLY) > + { > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, > mode, > + callee_fndecl)); > + } > + } > +} > + [...snip...] > diff --git gcc/testsuite/gcc.dg/analyzer/fd-1.c > gcc/testsuite/gcc.dg/analyzer/fd-1.c > new file mode 100644 > index 00000000000..985e9ac75de > --- /dev/null > +++ gcc/testsuite/gcc.dg/analyzer/fd-1.c > @@ -0,0 +1,26 @@ > +int open(const char *, int mode); > +#define O_RDONLY 0 > +#define O_WRONLY 1 > +#define O_RDWR 2 If we're going with a target hook, then we should use the target's to get at the O_ macros. But as noted above, that can be deferred to a follow-up patch. [...snip...] Hope this makes sense and is constructive; thanks again for the updated patch Dave