From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by sourceware.org (Postfix) with ESMTPS id DA0A6385AE70 for ; Thu, 23 Jun 2022 18:29:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DA0A6385AE70 Received: by mail-pj1-x102f.google.com with SMTP id cv13so393260pjb.4 for ; Thu, 23 Jun 2022 11:29:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=rdsL9ZdySsKGZiNc86QQ8gDzHITp8UDZCjVixYQkPTM=; b=gQNpAsd48iGs3D2ZLSglF55LIbzDFvGf8eRAJ18lVrcyxqbnc+dztGrO5x6kUDXl7d fKESXwQax8c16BsXCYvSwLHjmDlgHceJWR3eV0cxAfVuj3gj7OnEdRptNFWe3J5fMDtz OpJ2jwHtB++Gyam/jVdGO4/uOcvWH9flOGfsRKxIz5Lwe5OPcGc3pe5n0bB3vnhuA5T2 H7wV3alPWQkLdLCyD3yQj0b/P3QNWfduApSeEu9tNJYyZ5yYkB2NEW0wCS+KqvJC8HgA noQ33RemMYQrB0x+FZPiPzFXv3B/btiSK8V0ALPaaseWjVzSOLWTXFtrzaGTehRzpHad 3oRg== X-Gm-Message-State: AJIora/+emRcuvxR7Q+rMFjSw2w9LhTQffP9bgg05UsCZlv3KEXA7dLS xhfWEOszOwvQzmxzzcbqFZJIL+orQoI2dwj/qC+8AfAG8jBd8g== X-Google-Smtp-Source: AGRyM1trVTaHJKZ+rafbOFmEZo2spxeq/L044pe09ZNuIbJGQiHs8pLHlDqk8HIiG4A3y1t+bQ0mEC6uQPwOLDQsufg= X-Received: by 2002:a17:90b:3b82:b0:1ec:ac0d:5d3b with SMTP id pc2-20020a17090b3b8200b001ecac0d5d3bmr5412623pjb.158.1656008944660; Thu, 23 Jun 2022 11:29:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mir Immad Date: Thu, 23 Jun 2022 23:58:53 +0530 Message-ID: Subject: Re: [PATCH] static analysis support for posix file desccriptor APIs To: David Malcolm , gcc@gcc.gnu.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jun 2022 18:29:14 -0000 Hi Dave, Thanks for the suggestions, I changed most of the things that you suggested, however reporting for warnings like close of known invalid fd was problematic: consider the following code: if (fd >= 0) { write (fd,...); } close(fd); As I was checking the exploded graph for this; the "close(fd)" stmt when visited by the FALSE edge of if stmt (fd < 0) finds fd to be in m_invalid state; hence warns about "close on known invalid fd" which I believe is not true as fd at that point is not *known* to be invalid. I spent quite some time on this and decided not to add this diagnosis for now. Also, when close transitions m_invalid to m_close; this leads to double close even when the second close is outside the ( < 0 ) condition which again does not seem right. if (fd < 0) close(fd): close(fd); // double close here. > Maybe consolidate on_read and on_write by adding a subroutine that > checks for m_closed, and for checking access mode (maybe a > "check_for_open_fd" that takes an access mode enum value amongst other > params. If you pass around caller_fndecl, I think much of this code > can be shared that way between on_read and on_write (which will help > simplify things if we want to support further functions) I hope I got this right. > > + } > > + } > > + else > > + { > >+ /* FIXME: add leak reporting */ > Do you have a testcase that exhibits this behavior? I was thinking of the following case: void test() { open(..); } Here the resources are leaked because there is no way to free them. In "read" and "write" funcs, I'm warning for unchecked_use_of_fd and access_mode_mismatch only when we know fd is not in closed state. Otherwise, such code leads to lot of irrelevant warnings, example: void test1(const char *path, void *buf) { int fd = open(path, O_RDONLY); if (fd >= 0) { close(fd); read(fd, buf, 1); // read on closed fd + read on possibly invalid fd write(fd, buf, 1); // write on closed fd + write on possibly invlid fd } } Adding docs for new options still remains pending. I added some new tests; and all tests are passing. The stuff about O_* macros is left as-is. I'm sending a patch in another email. Thanks a lot. On Wed, Jun 22, 2022 at 12:04 AM David Malcolm wrote: > Hi Immad, thanks for this patch. > > Overall, looks like you're making good progress. > > Various notes and nitpicks inline below, throughout... > > On Tue, 2022-06-21 at 22:00 +0530, Mir Immad wrote: > > [...snip...] > > > > > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt > > index 23dfc797cea..d99acfbb069 100644 > > --- a/gcc/analyzer/analyzer.opt > > +++ b/gcc/analyzer/analyzer.opt > > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function > > for > > the analyzer to consider > > Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) > > Init(200) Param > > The maximum depth of exploded nodes that should appear in a dot dump > > before switching to a less verbose format. > > I'm not yet sure if this is a good idea, but I wonder if all of these > warnings ought to have a "-Wanalyzer-fd-" prefix? "file-descriptor" is > rather long, and the analyzer's warnings are already tending to be on > the long side, and having a consistent prefix might make it easier for > users to grok them. > > (though this feels like a "what color do we paint the bikeshed?" issue; > see e.g. https://bikeshed.org/ ) > > > > > +Wanalyzer-double-close > > +Common Var(warn_analyzer_double_close) Init(1) Warning > > +Warn about code paths in which a file descriptor can be closed more > > than > > once. > > ...so this could be Wanalyzer-fd-double-close > > > + > > Wanalyzer-double-fclose > > Common Var(warn_analyzer_double_fclose) Init(1) Warning > > Warn about code paths in which a stdio FILE can be closed more than > > once. > > @@ -66,6 +70,10 @@ Wanalyzer-exposure-through-output-file > > Common Var(warn_analyzer_exposure_through_output_file) Init(1) Warning > > Warn about code paths in which sensitive data is written to a file. > > > > +Wanalyzer-file-descriptor-leak > > +Common Var(warn_analyzer_file_descriptor_leak) Init(1) Warning > > +Warn about code paths in which a file descriptor is not closed. > > ...and Wanalyzer-fd-leak > > > + > > Wanalyzer-file-leak > > Common Var(warn_analyzer_file_leak) Init(1) Warning > > Warn about code paths in which a stdio FILE is not closed. > > @@ -82,6 +90,14 @@ Wanalyzer-mismatching-deallocation > > Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning > > Warn about code paths in which the wrong deallocation function is > > called. > > > > +Wanalyzer-mismatching-operation-on-file-descriptor > > +Common Var(warn_analyzer_mismatching_operation_on_file_descriptor) > > Init(1) > > Warning > > +Warn about the code paths in which read on write-only file descriptor > > or > > write on read-only file descriptor is called. > > Maybe: > Wanalyzer-fd-access-mode-mismatch > ? > > Lose the "the" in "the code paths", so maybe: "Warn about code paths in > which a write is attempted on a read-only file descriptor, or vice- > versa." > > > > + > > +Wanalyzer-possible-invalid-file-descriptor > > +Common Var(warn_analyzer_possible_invalid_file_descriptor) Init(1) > > Warning > > +warn about code paths in which a possibly invalid file descriptor is > > passed to a must-be-a-valid file descriptor function argument. > > + > > Wanalyzer-fd-use-without-check ? > > FWIW I believe that this isn't strictly speaking a security issue, that > if code blithely assumes the fd is open and then tries to read/write > it, the access will fail with errno set to EBADF. But it seems worth > warning for; code that doesn't bother implementing error-checking seems > suspect to me. > > > > > > > +Wanalyzer-read-write-on-closed-file-descriptor > > Maybe: Wanalyzer-fd-use-after-close ? > > > +Common Var(warn_analyzer_read_write_on_closed_file_descriptor) Init(1) > > Warning > > +Warn about code paths in which a read on write in performed on a > > closed > > file descriptor. > > Typo "read on write"; maybe replace with "Warn about code paths in > which a file descriptor is used after being closed." ? > > > > [...snip...] > > > + > > diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc > > new file mode 100644 > > index 00000000000..23e79e3e16a > > --- /dev/null > > +++ b/gcc/analyzer/sm-fd.cc > > @@ -0,0 +1,697 @@ > > +/* FIXME: add copyright header. */ > > Obviously the FIXME needs fixing :) > > > + > > +#include "config.h" > > +#include "system.h" > > +#include "coretypes.h" > > +#include "tree.h" > > +#include "function.h" > > +#include "basic-block.h" > > +#include "gimple.h" > > +#include "options.h" > > +#include "diagnostic-path.h" > > +#include "diagnostic-metadata.h" > > +#include "function.h" > > +#include "json.h" > > +#include "analyzer/analyzer.h" > > +#include "diagnostic-event-id.h" > > +#include "analyzer/analyzer-logging.h" > > +#include "analyzer/sm.h" > > +#include "analyzer/pending-diagnostic.h" > > +#include "analyzer/function-set.h" > > +#include "analyzer/analyzer-selftests.h" > > +#include "tristate.h" > > +#include "selftest.h" > > +#include "analyzer/call-string.h" > > +#include "analyzer/program-point.h" > > +#include "analyzer/store.h" > > +#include "analyzer/region-model.h" > > + > > +#include > > Presumably you're including here for the definitions of > O_RDONLY and O_WRONLY. > > As we discussed off-list, I think we want to avoid the use of the > host's O_RDONLY and O_WRONLY in this file, as they might not have the > same values as on the target, and I'm not sure that the host is even > guaranteed to have a . > > I think we want our own enum representing the three access modes: > READ_WRITE, READ_ONLY, WRITE_ONLY, and a subroutine for converting > target flags to access mode. > > So ultimately that's something we want to fix, though exactly how, I'm > not quite sure; we presumably want to look up the target's definitions > of those macros - but I don't think the analyzer has access to the > cpp_reader instance from the frontend. > > > > > +#if ENABLE_ANALYZER > > + > > +namespace ana > > +{ > > + > > +namespace > > +{ > > +class fd_state_machine : public state_machine > > +{ > > +public: > > + fd_state_machine (logger *logger); > > + > > + bool > > + inherited_state_p () const final override > > + { > > + return false; > > + } > > + > > + state_machine::state_t > > + get_default_state (const svalue *sval) const final override > > + { > > + if (tree cst = sval->maybe_get_constant ()) > > + { > > + int val = TREE_INT_CST_LOW (cst); > > + if (val < 0) > > + { > > + return m_null; > > Can we rename the invalid state to "m_invalid"? I find m_null > confusing. > > > + } > > + } > > + return m_start; > > + } > > + > > + bool on_stmt (sm_context *sm_ctxt, const supernode *node, > > + const gimple *stmt) const final override; > > + > > + void on_condition (sm_context *sm_ctxt, const supernode *node, > > + const gimple *stmt, const svalue *lhs, const > > tree_code op, > > + const svalue *rhs) const final override; > > + > > + bool can_purge_p (state_t s) const final override; > > + pending_diagnostic *on_leak (tree var) const final override; > > + > > + bool is_unchecked (state_t s) const; > > + bool is_valid (state_t s) const; > > + > > + /* State for a file descriptor that hasn't been checked for > > validity. */ > > + state_t m_unchecked_read_write; > > + > > + /* State for a file descriptor opened with O_RDONLY flag. */ > > + state_t m_unchecked_read_only; > > + > > + /* State for a file descriptor opneded with O_WRONLY flag. */ > > + state_t m_unchecked_write_only; > > Typo: opneded -> opened. > > Maybe group these three states in the source like this: > > /* States representing a file descriptor that hasn't yet been > checked for validity after opening, for three different > access modes. */ > state_t m_unchecked_read_write; > state_t m_unchecked_read_only; > state_t m_unchecked_write_only; > > > + > > + /* State for a file descriptor that is known to be invalid. */ > > + state_t m_null; > > Let's rename this to m_invalid; > The condition is that it's < 0, right? If so, please update the > comment to clarify that. > > > + > > + /* State for a file descriptor that was opened in read-write mode > > and is > > known > > + * to be valid. */ > > + state_t m_valid_read_write; > > + > > + /* State for a file descriptor that was opened as read-only and is > > known > > to be > > + * valid*/ > > + state_t m_valid_read_only; > > + > > + /* State for a file descriptor that was opened as write-only and is > > known to > > + * be valid*/ > > + state_t m_valid_write_only; > > Maybe group these three states in a similar way to above. > > > > + > > + /* State for a file descriptor that has been closed.*/ > > + state_t m_closed; > > + > > + /* State for a file descriptor that we do not want to track anymore > > . */ > > + state_t m_stop; > > + > > +private: > > + void on_open (sm_context *sm_ctxt, const supernode *node, const > > gimple > > *stmt, > > + const gcall *call) const; > > + void on_close (sm_context *sm_ctxt, const supernode *node, const > > gimple > > *stmt, > > + const gcall *call) const; > > + void on_read (sm_context *sm_ctxt, const supernode *node, const > > gimple > > *stmt, > > + const gcall *call) const; > > + void on_write (sm_context *sm_ctxt, const supernode *node, const > > gimple > > *stmt, > > + const gcall *call) const; > > +}; > > + > > +/* Base diagnostic class relative to fd_state_machine. */ > > +class fd_diagnostic : public pending_diagnostic > > +{ > > +public: > > + fd_diagnostic (const fd_state_machine &sm, tree arg) : m_sm (sm), > > m_arg > > (arg) > > + { > > + } > > + > > + bool > > + subclass_equal_p (const pending_diagnostic &base_other) const > > override > > + { > > + return same_tree_p (m_arg, ((const fd_diagnostic > > &)base_other).m_arg); > > + } > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) override > > + { > > + if (change.m_old_state == m_sm.get_start_state () > > + && m_sm.is_unchecked (change.m_new_state)) > > + { > > + if (change.m_new_state == m_sm.m_unchecked_read_write) > > + return change.formatted_print ("opened here as % > write%>"); > > + > > + if (change.m_new_state == m_sm.m_unchecked_read_only) > > + return change.formatted_print ("opened here as % > only%>"); > > + > > + if (change.m_new_state == m_sm.m_unchecked_write_only) > > + return change.formatted_print ("opened here as % > only%>"); > > + } > > + > > + if (m_sm.is_unchecked (change.m_old_state) > > + && m_sm.is_valid (change.m_new_state)) > > + if (change.m_expr) > > + return change.formatted_print ( > > + "assuming %qE is a valid file descriptor", change.m_expr); > > + else > > + return change.formatted_print ("assuming a valid file > > descriptor"); > > Maybe reword to "when %qE is a valid file descriptor (>= 0)" ? > > > > + > > + if (m_sm.is_unchecked (change.m_old_state) > > + && change.m_new_state == m_sm.m_null) > > + if (change.m_expr) > > + return change.formatted_print ( > > + "assuming %qE is an invalid file descriptor", > > change.m_expr); > > + else > > + return change.formatted_print ("assuming an invalid file > > descriptor"); > > Maybe reword to "when %qE is not a valid file descriptor (< 0)" ? > > > > + > > + return label_text (); > > + } > > + > > +protected: > > + const fd_state_machine &m_sm; > > + tree m_arg; > > +}; > > + > > +class fd_leak : public fd_diagnostic > > +{ > > +public: > > + fd_leak (const fd_state_machine &sm, tree arg) : fd_diagnostic (sm, > > arg) > > {} > > + > > + const char * > > + get_kind () const final override > > + { > > + return "fd_leak"; > > + } > > + > > + int > > + get_controlling_option () const final override > > + { > > + return OPT_Wanalyzer_file_descriptor_leak; > > + } > > + > > + bool > > + emit (rich_location *rich_loc) final override > > + { > > + /*CWE-775: Missing Release of File Descriptor or Handle after > > Effective > > + * Lifetime > > + */ > > + diagnostic_metadata m; > > + m.add_cwe (775); > > + if (m_arg) > > + return warning_meta (rich_loc, m, get_controlling_option (), > > + "leak of file descriptor %qE", m_arg); > > + else > > + return warning_meta (rich_loc, m, get_controlling_option (), > > + "leak of file descriptor"); > > + } > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) final > > override > > + { > > + if (m_sm.is_unchecked (change.m_new_state)) > > + { > > + m_open_event = change.m_event_id; > > + return label_text::borrow ("opened here"); > > + } > > + > > + return fd_diagnostic::describe_state_change (change); > > + } > > + > > + label_text > > + describe_final_event (const evdesc::final_event &ev) final override > > + { > > + if (m_open_event.known_p ()) > > + { > > + if (ev.m_expr) > > + return ev.formatted_print ("%qE leaks here; was opened at > > %@", > > + ev.m_expr, &m_open_event); > > + else > > + return ev.formatted_print ("leaks here; was opened at %@", > > + &m_open_event); > > + } > > + else > > + { > > + if (ev.m_expr) > > + return ev.formatted_print ("%qE leaks here", ev.m_expr); > > + else > > + return ev.formatted_print ("leaks here"); > > + } > > + } > > + > > +private: > > + diagnostic_event_id_t m_open_event; > > +}; > > + > > +class read_write_diag_fd : public fd_diagnostic > > Maybe rename to fd_access_mode_mismatch ? > > > > +{ > > +public: > > + /* mode specfies whether read on write was attempted or vice versa. > > + */ > > Make "mode" be an enum rather than an int. > > > + read_write_diag_fd (const fd_state_machine &sm, tree arg, int mode) > > + : m_mode (mode), fd_diagnostic (sm, arg) > > + { > > + } > > + > > + const char * > > + get_kind () const final override > > + { > > + return "read_write_diag_fd"; > > + } > > + > > + > > + int > > + get_controlling_option () const final override > > + { > > + return OPT_Wanalyzer_mismatching_operation_on_file_descriptor; > > + } > > + > > + bool > > + emit (rich_location *rich_loc) final override > > + { > > + if (m_mode) > > + { > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on % file descriptor > > %qE", > > + m_arg); > > + } > > + else > > + { > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on % file descriptor > > %qE", > > + m_arg); > > Rather than hardcoding "%" and "%", let's generalize > this so that the diagnostic has a "tree fndecl" argument and m_fndecl > field, which was the function that was called that attempted to used > the fd in the wrong way. You can then write %qE to print it in quotes. > > > + } > > + } > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) override > > + { > > + return fd_diagnostic::describe_state_change (change); > > + } > > Looks like this override is redundant (albeit harmless): it can > implicitly inherit the fd_diagnostic implementation. > > > > + > > + label_text > > + describe_final_event (const evdesc::final_event &ev) final override > > + { > > + if (m_mode) > > + return ev.formatted_print ("%qs on % file descriptor > > %qE", > > + "write", m_arg); > > + else > > + return ev.formatted_print ("%qs on % file > > descriptor > > %qE", > > + "read", m_arg); > > + } > > and generalize this also to work on m_fndecl. > > > + > > +private: > > + int m_mode; > > As noted above, please make this an enum. > > > > +}; > > + > > +class double_close : public fd_diagnostic > > +{ > > +public: > > + double_close (const fd_state_machine &sm, tree arg) : fd_diagnostic > > (sm, > > arg) > > + { > > + } > > + > > + const char * > > + get_kind () const final override > > + { > > + return "double_close"; > > + } > > + > > + > > + int > > + get_controlling_option () const final override > > + { > > + return OPT_Wanalyzer_double_close; > > + } > > + bool > > + emit (rich_location *rich_loc) final override > > + { > > + diagnostic_metadata m; > > + // CWE-1341: Multiple Releases of Same Resource or Handle > > + m.add_cwe (1341); > > + return warning_meta (rich_loc, m, get_controlling_option (), > > + "double % of file descriptor %qE", > > m_arg); > > + } > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) override > > + { > > + if (m_sm.is_unchecked (change.m_new_state)) > > + return label_text::borrow ("opened here"); > > + > > + if (change.m_new_state == m_sm.m_closed) > > + { > > + m_first_close_event = change.m_event_id; > > + return change.formatted_print ("first %qs here", "close"); > > + } > > + return fd_diagnostic::describe_state_change (change); > > + } > > + > > + label_text > > + describe_final_event (const evdesc::final_event &ev) final override > > + { > > + if (m_first_close_event.known_p ()) > > + return ev.formatted_print ("second %qs here; first %qs was at > > %@", > > + "close", "close", > > &m_first_close_event); > > + return ev.formatted_print ("second %qs here", "close"); > > + } > > + > > +private: > > + diagnostic_event_id_t m_first_close_event; > > +}; > > + > > +class read_write_on_closed_fd : public fd_diagnostic > > Maybe rename to "use_after_close_of_fd" or somesuch. > > > +{ > > +public: > > + read_write_on_closed_fd (const fd_state_machine &sm, tree arg, int > > mode) > > + : fd_diagnostic (sm, arg), m_mode (mode) > > + { > > + } > > + > > + const char * > > + get_kind () const final override > > + { > > + if (m_mode) > > + return "write_on_closed_fd"; > > + else > > + return "read_on_closed_fd"; > > + } > > + > > + > > + int > > + get_controlling_option () const final override > > + { > > + return OPT_Wanalyzer_read_write_on_closed_file_descriptor; > > + } > > + > > + bool > > + emit (rich_location *rich_loc) final override > > + { > > + if (m_mode) > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on closed file descriptor %qE", > > m_arg); > > + else > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on closed file descriptor %qE", > > m_arg); > > + } > > As above, maybe generalize this by adding a "tree m_fndecl;" field to > the class, giving what function was called on the the closed fd. I > think you might be able to eliminate m_mode from this subclass by doing > that instead. > > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) override > > + { > > + if (m_sm.is_unchecked (change.m_new_state)) > > + { > > + return label_text::borrow ("opened here"); > > + } > > + > > + if (change.m_new_state == m_sm.m_closed) > > + { > > + return change.formatted_print ("closed here"); > > + } > > + > > + return fd_diagnostic::describe_state_change (change); > > + } > > + > > + label_text > > + describe_final_event (const evdesc::final_event &ev) final override > > + { > > + if (m_mode) > > + return ev.formatted_print ("% on closed file descriptor > > %qE > > here", > > + m_arg); > > + else > > + return ev.formatted_print ("% on closed file descriptor > > %qE > > here", > > + m_arg); > > + } > > Likewise here. > > > + > > +private: > > + int m_mode; > > +}; > > + > > +class possible_invalid_fd_diag : public fd_diagnostic > > Rename to unchecked_use_of_fd ? > > > +{ > > +public: > > + possible_invalid_fd_diag (const fd_state_machine &sm, tree arg, int > > mode) > > + : fd_diagnostic (sm, arg), m_mode (mode) > > + { > > + } > > + > > + const char * > > + get_kind () const final override > > + { > > + return "possible_invalid_fd_diag"; > > + } > > + > > + > > + int > > + get_controlling_option () const final override > > + { > > + return OPT_Wanalyzer_possible_invalid_file_descriptor; > > + } > > + > > + bool > > + emit (rich_location *rich_loc) final override > > + { > > + if (m_mode) > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on possibly invalid file > > descriptor > > %qE", > > + m_arg); > > + else > > + return warning_at (rich_loc, get_controlling_option (), > > + "% on possibly invalid file descriptor > > %qE", > > + m_arg); > > + } > > As above, generalize this with a "tree m_fndecl;" capturing what fndecl > was called on the fd without checking, and with that, I think you can > eliminate the m_mode field. > > > > + > > + label_text > > + describe_state_change (const evdesc::state_change &change) override > > + { > > + if (m_sm.is_unchecked (change.m_new_state)) > > + { > > + m_first_open_event = change.m_event_id; > > + return label_text::borrow ("opened here"); > > + } > > + > > + return fd_diagnostic::describe_state_change (change); > > + } > > + > > + label_text > > + describe_final_event (const evdesc::final_event &ev) final override > > + { > > + return ev.formatted_print ("%qE could be invalid: unchecked value > > from > > %@", > > + m_arg, &m_first_open_event); > > + } > > + > > +private: > > + diagnostic_event_id_t m_first_open_event; > > + int m_mode; > > +}; > > + > > +fd_state_machine::fd_state_machine (logger *logger) > > + : state_machine ("file-descriptor", logger), > > + m_unchecked_read_write (add_state ("fd-unchecked-read-write")), > > + m_unchecked_read_only (add_state ("fd-unchecked-read-only")), > > + m_unchecked_write_only (add_state ("fd-unchecked-write-only")), > > + m_null (add_state ("fd-null")), m_closed (add_state ("fd- > > closed")), > > + m_valid_read_write (add_state ("fd-valid-read-write")), > > + m_valid_read_only (add_state ("fd-valid-read-only")), > > + m_valid_write_only (add_state ("fd-valid-write-only")), > > + m_stop (add_state ("fd-stop")) > > +{ > > +} > > + > > +bool > > +fd_state_machine::is_unchecked (state_t s) const > > +{ > > + return s == m_unchecked_read_write || s == m_unchecked_read_only > > + || s == m_unchecked_write_only; > > With multiline conditionals, once they occupy multiple lines, it's more > readable to split them like this: > > return (s == m_unchecked_read_write > || s == m_unchecked_read_only > || s == m_unchecked_write_only); > > > > +} > > + > > +bool > > +fd_state_machine::is_valid (state_t s) const > > +{ > > + return s == m_valid_read_write || s == m_valid_read_only > > + || s == m_valid_write_only; > > +} > > Similar here. > > > + > > +bool > > +fd_state_machine::on_stmt (sm_context *sm_ctxt, const supernode *node, > > + const gimple *stmt) const > > +{ > > + if (const gcall *call = dyn_cast (stmt)) > > + if (tree callee_fndecl = sm_ctxt->get_fndecl_for_call (call)) > > + { > > + if (is_named_call_p (callee_fndecl, "open", call, 2)) > > + { > > + on_open (sm_ctxt, node, stmt, call); > > + return true; > > + } // "open" > > + > > + if (is_named_call_p (callee_fndecl, "close", call, 1)) > > + { > > + on_close (sm_ctxt, node, stmt, call); > > + return true; > > + } // "close" > > + > > + if (is_named_call_p (callee_fndecl, "write", call, 3)) > > + { > > + on_write (sm_ctxt, node, stmt, call); > > + return true; > > + } // "write" > > + > > + if (is_named_call_p (callee_fndecl, "read", call, 3)) > > + { > > + on_read (sm_ctxt, node, stmt, call); > > + return true; > > + } // "read" > > + > > You'll probably want to pass callee_fndecl to some of these "on_FNNAME" > subroutines, so that it's available for the new m_fndecl fields of the > warnings I suggested above. > > > > + return true; > > + } > > + > > + return false; > > +} > > + > > +void > > +fd_state_machine::on_open (sm_context *sm_ctxt, const supernode *node, > > + const gimple *stmt, const gcall *call) > > const > > +{ > > + tree lhs = gimple_call_lhs (call); > > + if (lhs) > > + { > > + tree arg = gimple_call_arg (call, 1); > > + if (TREE_CODE (arg) == INTEGER_CST) > > + { > > + int flag = TREE_INT_CST_LOW (arg); > > + /* FIXME: this code assumes the access modes on the host and > > + target are the same, which in practice might not be the > > case. */ > > + if (flag == O_RDONLY) > > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > > + m_unchecked_read_only); > > + else if (flag == O_WRONLY) > > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > > + m_unchecked_write_only); > > + else > > + sm_ctxt->on_transition (node, stmt, lhs, m_start, > > + m_unchecked_read_write); > > Let's introduce an enum for the access mode, and a subroutine to get > the enum value from "flag", so that we can isolate this "FIXME" in that > subroutine. > > > + } > > + } > > + else > > + { > > + /* FIXME: add leak reporting */ > > Do you have a testcase that exhibits this behavior? > > > > + } > > +} > > + > > +void > > +fd_state_machine::on_close (sm_context *sm_ctxt, const supernode > > *node, > > + const gimple *stmt, const gcall *call) > > const > > +{ > > + tree arg = gimple_call_arg (call, 0); > > + > > + sm_ctxt->on_transition (node, stmt, arg, m_start, m_closed); > > + > > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_write, > > m_closed); > > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_read_only, > > m_closed); > > + sm_ctxt->on_transition (node, stmt, arg, m_unchecked_write_only, > > m_closed); > > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_write, > > m_closed); > > + sm_ctxt->on_transition (node, stmt, arg, m_valid_read_only, > > m_closed); > > + sm_ctxt->on_transition (node, stmt, arg, m_valid_write_only, > > m_closed); > > > > + sm_ctxt->on_transition (node, stmt, arg, m_null, m_closed); > > Perhaps we should warn for operations on known invalid FDs? (and thus > complain about close on a known-invalid FD). > > I believe that the OS will handle such code gracefully by setting errno > to EBADF. > > > > + > > + if (sm_ctxt->get_state (stmt, arg) == m_closed) > > + { > > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > > + sm_ctxt->warn (node, stmt, arg, new double_close (*this, > > diag_arg)); > > + sm_ctxt->set_next_state (stmt, arg, m_stop); > > + } > > +} > > +void > > +fd_state_machine::on_read (sm_context *sm_ctxt, const supernode *node, > > + const gimple *stmt, const gcall *call) > > const > > +{ > > + tree arg = gimple_call_arg (call, 0); > > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > > + > > + if (sm_ctxt->get_state (stmt, arg) == m_closed) > > + { > > + sm_ctxt->warn (node, stmt, arg, new read_write_on_closed_fd > > (*this, > > diag_arg, 0)); > > + } > > Maybe consolidate on_read and on_write by adding a subroutine that > checks for m_closed, and for checking access mode (maybe a > "check_for_open_fd" that takes an access mode enum value amongst other > params. If you pass around caller_fndecl, I think much of this code > can be shared that way between on_read and on_write (which will help > simplify things if we want to support further functions). > > > + > > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > > + { > > + sm_ctxt->warn (node, stmt, arg, > > + new possible_invalid_fd_diag (*this, diag_arg, > > 0)); > > + } > > + > > + if (sm_ctxt->get_state (stmt, arg) == m_unchecked_write_only > > + || sm_ctxt->get_state (stmt, arg) == m_valid_write_only) > > + { > > + sm_ctxt->warn (node, stmt, arg, > > + new read_write_diag_fd (*this, diag_arg, 0)); > > + } > > +} > > +void > > +fd_state_machine::on_write (sm_context *sm_ctxt, const supernode > > *node, > > + const gimple *stmt, const gcall *call) > > const > > +{ > > + tree arg = gimple_call_arg (call, 0); > > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > > + > > + if (sm_ctxt->get_state (stmt, arg) == m_closed) > > + { > > + sm_ctxt->warn (node, stmt, arg, new read_write_diag_fd (*this, > > diag_arg, 1)); > > + } > > + if (!is_valid (sm_ctxt->get_state (stmt, arg))) > > + { > > + sm_ctxt->warn (node, stmt, arg, > > + new possible_invalid_fd_diag (*this, diag_arg, > > 1)); > > + } > > + > > + if (sm_ctxt->get_state (stmt, arg) == m_unchecked_read_only > > + || sm_ctxt->get_state (stmt, arg) == m_valid_read_only) > > + { > > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > > + sm_ctxt->warn (node, stmt, arg, > > + new read_write_diag_fd (*this, diag_arg, 1)); > > + } > > +} > > + > > +void > > +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode > > *node, > > + const gimple *stmt, const svalue *lhs, > > + enum tree_code op, const svalue *rhs) > > const > > +{ > > + if (!rhs->all_zeroes_p ()) > > + return; > > + > > + if (op == GE_EXPR) > > + { > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write, > > + m_valid_read_write); > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only, > > + m_valid_read_only); > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only, > > + m_valid_write_only); > > + } > > + else if (op == LT_EXPR) > > + { > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write, > > m_null); > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only, > > m_null); > > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only, > > m_null); > > + } > > +} > > + > > +bool > > +fd_state_machine::can_purge_p (state_t s) const > > +{ > > + if (is_unchecked (s) || is_valid (s)) > > + return false; > > + else > > + return true; > > +} > > + > > +pending_diagnostic * > > +fd_state_machine::on_leak (tree var) const > > +{ > > + return new fd_leak (*this, var); > > +} > > +} // namespace > > + > > +state_machine * > > +make_fd_state_machine (logger *logger) > > +{ > > + return new fd_state_machine (logger); > > +} > > +} // namespace ana > > + > > +#endif // ENABLE_ANALYZER > > \ No newline at end of file > > [...snip...] > > > > > diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-1.c > > b/gcc/testsuite/gcc.dg/analyzer/fd-1.c > > new file mode 100644 > > index 00000000000..985e9ac75de > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/analyzer/fd-1.c > > @@ -0,0 +1,26 @@ > > +int open(const char *, int mode); > > +#define O_RDONLY 0 > > +#define O_WRONLY 1 > > +#define O_RDWR 2 > > As noted above, we want to somehow use the values for the O_ macros > from the test code, rather than the with which sm-fd.cc was > built. > > > + > > +void > > +test_1 (const char *path) > > +{ > > + int fd = open (path, O_RDONLY); /* { dg-message "\\(1\\) opened > > here" } > > */ > > + return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE- > > 775\\\]" > > "warning" } */ > > + /* { dg-message "\\(2\\) 'fd' leaks here; was opened at \\(1\\)" > > "event" > > { target *-*-* } .-1 } */ > > +} > > + > > +void > > +test_2 (const char *path) > > +{ > > + int fd = open (path, O_RDWR); /* { dg-message "\\(1\\) opened here" > > } */ > > + if (fd >= 0) /* { dg-message "\\(2\\) assuming 'fd' is a valid file > > descriptor" "event1" } */ > > + /* { dg-message "\\(3\\) following 'true' branch \\(when 'fd >= > > 0'\\)..." "event2" { target *-*-* } .-1 } */ > > + { > > + return; /* { dg-warning "leak of file descriptor 'fd' \\\[CWE- > > 775\\\]" > > "warning" } */ > > + /* { dg-message "\\(4\\) ...to here" "event1" { target *-*-* } .-1 > > } */ > > + /* { dg-message "\\(5\\) 'fd' leaks here; was opened at \\(1\\)" > > "event2" { target *-*-* } .-2 } */ > > + } > > +} > > Looks good, but in future don't feel the need to express all of the > events: I think the significant ones we should test for here are (1), > (2), (5), whereas specifying directives for (3) and (4) are probably > overkill. prune.exp will prune away stray lines beginning with "note: > " so we don't need to write directives for all events, just the ones we > care about. > > > [...snip...] > > Lots of good test coverage here. With a state-machine based > diagnostic, it's probably best to try to have coverage of all possible > calls in all possible states - that way we can think systematically > about what we should be warning for, and what we shouldn't be warning > for. Note that once you've got test coverage of the basic events > (like you have), you don't have to have dg-message directives for all > of the events in all of the rest of the tests (that would be excessive, > and too much work to create and maintain). > > A few other nits: > - the Subject line for the commit should have an "analyzer: " prefix > and reference the bugzilla problem report (PR 106003) > - for better or worse, the project requires ChangeLog entries. You can > use the script ./contrib/mklog.py to help generate this. Some more > information is here: > https://www.gnu.org/prep/standards/html_node/Change-Logs.html > In GCC our ChangeLog entries are part of the git commit message, and we > have a serverside script to update the ChangeLog files in the source > tree (to avoid constantly having merge conflicts). > > I think we'll have at least one more iteration of this patch to review, > and before I can approve you to push a later version of the patch to > trunk, you'll need to "fully" test it, by doing a bootstrap, and > running the full testsuite on the result. Do you need some help on how > to do that? Do you have access to the GCC compile farm yet? > > I'll see if I can figure out how to access the preprocessor from the > analyzer. > > Thanks again for the patch; hope the above makes sense and is > constructive > > Dave > > > > >