From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 8843B38582AB for ; Sun, 26 Jun 2022 15:53:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8843B38582AB Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-354-j5HMbCxKPFey9aQCOixshw-1; Sun, 26 Jun 2022 11:53:49 -0400 X-MC-Unique: j5HMbCxKPFey9aQCOixshw-1 Received: by mail-qk1-f197.google.com with SMTP id i10-20020a05620a404a00b006a7609f54c6so7948336qko.7 for ; Sun, 26 Jun 2022 08:53:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=dQCc6QncJqTicG4RRaZNuxoKguaPcRycxLGOKYJzdgA=; b=0g6Fq9BeoJNp3zekAqwERLWpLD5UHnXrtiSmBIi5Hq6i+HF1WhrlPLc7+ON3GdZ+AX hW+pUxkpG+HTp+P6T30WKlypO9J4j0VOZPdby8YKICm7YjAN0TR64hJpKjfsu25ErSij 1FX/+LAO4aRYYXGuxdFe0eKcVgyOSTqf5PRC9d1drkXpX6fgomNF4Xooi2XBkkpQxLXE FIBioQ+etomfmiBjYY7LRDWBkraVtq1YYKhYsK9yDq2Ht8dLQRn/cUuYtUj9mwKxLMfM klo1GojzpJtR5rYnGULaaX4TRkJWixJ78DIGSI2FG3QXaVwt3IxlcqfBVwbdm7IN5zhN jZ3g== X-Gm-Message-State: AJIora/VKoRgHvIikGmwDfzLSOkHIfZl0lIPl/u1unbQmBflvUmEdMLr ZxtNqo2RM2U4flEEyKXFS4+GJ/fwqfF4Da0o8ZKkBhlXM2suFrD6KTrsuKSpO7OBCn/UYd/lRx2 tmq+ygCo= X-Received: by 2002:a05:620a:2848:b0:6ad:db0a:f227 with SMTP id h8-20020a05620a284800b006addb0af227mr5522252qkp.44.1656258827947; Sun, 26 Jun 2022 08:53:47 -0700 (PDT) X-Google-Smtp-Source: AGRyM1suJLriIzlAD7Rl/tIEUqwGvc6Pz52bndq9Bqmq39KOSzRAxsCX/X9kWS/Tkrw3/aHzBjerLg== X-Received: by 2002:a05:620a:2848:b0:6ad:db0a:f227 with SMTP id h8-20020a05620a284800b006addb0af227mr5522240qkp.44.1656258827521; Sun, 26 Jun 2022 08:53:47 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id x13-20020a05620a258d00b006a69f6793c5sm6847797qko.14.2022.06.26.08.53.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 26 Jun 2022 08:53:46 -0700 (PDT) Message-ID: <2ec314d7adfbc74e6c6fb1c7daa07bf54685d46b.camel@redhat.com> Subject: Re: [PATCH] analyzer PR 106003 From: David Malcolm To: Mir Immad , gcc@gcc.gnu.org Date: Sun, 26 Jun 2022 11:53:45 -0400 In-Reply-To: References: User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Jun 2022 15:53:52 -0000 Thanks for the updated patch. Various comments inline below. Sorry if this seems nitpicky in places. I think the patch is nearly ready; please write a ChangeLog entry for the next one (none of my proposed changes are going to affect what the ChangeLog will look like, apart from new test case files perhaps, given that much of it is going to be just: * sm-fd.cc: New file. [...snip...] > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc > new file mode 100644 > index 00000000000..83eb0df724d > --- /dev/null > +++ b/gcc/analyzer/sm-fd.cc [...snip...] > +class fd_leak : public fd_diagnostic > +{ > +public: > + fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm, arg) > {} > + > + const char * > + get_kind () const final override > + { > + return "fd_leak"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_leak; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + /*CWE-775: Missing Release of File Descriptor or Handle after Effective > + * Lifetime > + */ Formatting nit: we don't put these "*" on contination lines in multiline comments; this should read: > + /* CWE-775: Missing Release of File Descriptor or Handle after Effective > + Lifetime. */ [...snip...] > +class fd_access_mode_mismatch : public fd_diagnostic > +{ > +public: > + /* mode specfies whether read on write was attempted or vice versa. Typo: "specfies" -> "specifies" [...snip...] > + bool > + emit (rich_location *rich_loc) final override > + { > + > + switch (m_mode) > + { I like to put in a "default: gcc_unreachable ();" in switch statements, especially one that isn't covering all of the values in the enum, but this got me thinking: this class has "enum access_mode m_mode;", but that enum is describing a *set* of ways in which the FD can be validly accessed, whereas here we're trying to describe a particular way in which the FD is being invalidly accessed. analyzer.h has this enum: /* An enum for describing the direction of an access to memory. */ enum access_direction { DIR_READ, DIR_WRITE }; which I think would be a much better fit here for fd_access_mode_mismatch, so please change fd_access_mode_mismatch's: enum access_mode m_mode; to enum access_direction m_access_dir; and update this switch accordingly: > + case READ_ONLY: > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on % file descriptor %qE", > + m_callee_fndecl, m_arg); > + case WRITE_ONLY: > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on % file descriptor %qE", > + m_callee_fndecl, m_arg); > + } [...snip...] > + > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + return fd_diagnostic::describe_state_change (change); > + } This vfunc is redundant; it can be deleted so we simply inherit the fd_diagnostic implementation directly. > + > + label_text > + describe_final_event (const evdesc::final_event &ev) final override > + { > + switch (m_mode) > + { > + case READ_ONLY: > + return ev.formatted_print ("%qE on % file descriptor > %qE", > + m_callee_fndecl, m_arg); > + case WRITE_ONLY: > + return ev.formatted_print ("%qE on % file descriptor > %qE", > + m_callee_fndecl, m_arg); > + } > + } Similar comments here about the switch (m_mode) as per the emit vfunc. > + > +private: > + enum access_mode m_mode; ...and this field. > + const tree m_callee_fndecl; > +}; [...snip...] > + label_text > + describe_state_change (const evdesc::state_change &change) override > + { > + if (m_sm.is_unchecked (change.m_new_state)) > + { > + return label_text::borrow ("opened here"); > + } > + > + if (change.m_new_state == m_sm.m_closed) > + { > + return change.formatted_print ("closed here"); > + } Formatting nit: we don't add the { } when they're redundant, like in these "if" stmts (I dislike this rule in our conventions, but it's probably best to be consistent). [...snip...] > +bool > +fd_state_machine::check_for_closed_fd (state_t state) const > +{ > + return (state == m_closed); > +} > + > +bool > +fd_state_machine::check_for_invalid_fd (state_t state) const > +{ > + return (state == m_invalid); > +} > + > +bool > +fd_state_machine::check_for_constant_fd (state_t state) const > +{ > + return (state == m_constant_fd); You're using the prefix "check_for_" for two different kinds of functions: (a) the functions above are simple predicate functions that merely determine "is this state_t a " for some foo, and have no side- effects (b) the "check_for" functions later on *do* things: they issue warnings and make state transitions For clarity, please rename the type (a) "check_for_foo" functions to "is_foo_p" (we use the "_p" suffix to indicate a simple predicate without side-effects), and keep the "check_for_" prefix for the type (b) functions that actually make changes. Hence please rename the above to: fd_state_machine::is_closed_fd_p fd_state_machine::is_invalid_fd_p fd_state_machine::is_constant_fd_p so that the "check_for_" prefix for the functions lower down imply that warnings could be issued and state transitions could occur. > +void > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call) const > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs) > + { > + tree arg = gimple_call_arg (call, 1); > + if (TREE_CODE (arg) == INTEGER_CST) > + { > + int flag = TREE_INT_CST_LOW (arg); > + enum access_mode mode = get_access_mode_from_flag (flag); > + > + switch (mode) > + { > + case READ_ONLY: > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_only); > + break; > + case WRITE_ONLY: > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_write_only); > + break; > + default: > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > + m_unchecked_read_write); > + } > + } If the user writes code where the flag is determined with logic, then "arg" won't be an INTEGER_CST and we won't transition the lhs; consider e.g.: int flags = O_RDONLY; if (some_condition ()) flags |= O_NOATIME; int fd = open (path, flags) I don't think we need to handle this yet, but let's add test coverage for it, at least. > + } > + else > + { > + /* FIXME: add leak reporting */ > + } > +} > + Please add a testcase for this, with an "xfail" in the dg-warning directive. [...snip...] > +void > +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + tree arg = gimple_call_arg (call, 0); > + state_t state = sm_ctxt->get_state (stmt, arg); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + > + sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write, > m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only, > m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only, > m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write, m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only, m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only, m_closed); > + sm_ctxt->on_transition (node, stmt, arg, m_constant_fd, m_closed); > + > + if (check_for_closed_fd (state)) > + { > + > + sm_ctxt->warn (node, stmt, arg, new double_close (*this, diag_arg)); > + sm_ctxt->set_next_state (stmt, arg, m_stop); > + } > +} > +void > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + check_for_open_fd (sm_ctxt, node, stmt, call, callee_fndecl, true); > +} > +void > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl) const > +{ > + check_for_open_fd (sm_ctxt, node, stmt, call, callee_fndecl, false); > +} > + > +void > +fd_state_machine::check_for_open_fd (sm_context *sm_ctxt, const supernode > *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl, > + bool toggle) const "bool" arguments to functions are often a "code smell". This will be much clearer if you change "bool toggle" to "enum access_direction access_dir"... > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_mode mode = get_access_mode_from_state (state); ...and rename "mode" to "fd_mode", since that way it's clear to the reader that we're comparing how the FD is being accessed with how the FD expects to be accessed. > + > + if (check_for_closed_fd (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, > callee_fndecl)); > + } > + > + else > + { > + > + if (!is_valid (state)) > + { > + if (!check_for_constant_fd (state)) > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } > + if (toggle) > + /* This condition determines whether we are considering "read" or > + "write" function */ and so (hopefully) the above comment becomes redundant. > + { > + if (mode == WRITE_ONLY) > + { > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, > mode, > + callee_fndecl)); > + } > + } > + else > + { > + if (mode == READ_ONLY) > + { > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch (*this, diag_arg, > mode, > + callee_fndecl)); > + } > + } > + } > +} [...snip...] > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 174bc09e5cf..e5522b95ad2 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi [...snip...] > +@item -Wno-analyzer-fd-access-mode-mismatch > +@opindex Wanalyzer-fd-access-mode-mismatch > +@opindex Wno-analyzer-fd-access-mode-mismatch > +This warning requires @option{-fanalyzer}, which enables it; use > +@option{-Wno-analyzer-fd-access-mode-mismatch} > +to disable it. > + > +This diagnostic warns for paths through code in which a > +@code{read} on write-only file descriptor is attempted, or vice versa Grammar nit: "on write-only" -> "on a write-only" > +@item -Wno-analyzer-fd-double-close > +@opindex Wanalyzer-fd-double-close > +@opindex Wno-analyzer-fd-double-close > +This warning requires @option{-fanalyzer}, which enables it; use > +@option{-Wno-analyzer-fd-double-close} > +to disable it. > + > +This diagnostic warns for paths through code in which a > +file descriptor can closed more than once. Grammar nit: "can closed" -> "can be closed". > +@item -Wno-analyzer-fd-leak > +@opindex Wanalyzer-fd-leak > +@opindex Wno-analyzer-fd-leak > +This warning requires @option{-fanalyzer}, which enables it; use > +@option{-Wno-analyzer-fd-leak} > +to disable it. > + > +This diagnostic warns for paths through code in which a > +file descriptor is leaked. More precise as: "an open file descriptor". [...snip...] > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-3.c > b/gcc/testsuite/gcc.dg/analyzer/fd-3.c > new file mode 100644 > index 00000000000..d8c2ff26098 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-3.c > @@ -0,0 +1,43 @@ > +int open(const char *, int mode); > +void close(int fd); > +int write (int fd, void *buf, int nbytes); > +int read (int fd, void *buf, int nbytes); > + > +#define O_RDONLY 0 > +#define O_WRONLY 1 > +#define O_RDWR 2 > +#define STDIN 0 [...snip...] > +void > +test_4 (void *buf) > +{ > + int fd = STDIN; > + write (fd, buf, 1); > + close(fd); > +} Probably worth adding a comment to this test case, something along the lines of: FD 0 is stdin at the entry to "main" and thus read-only, but we have no guarantees here that it hasn't been closed and then reopened for writing, so we can't issue a warning. or similar. [...snip...] Thanks again for the patch; hope the above makes sense Dave