From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 28DC63858D32 for ; Mon, 27 Jun 2022 20:55:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 28DC63858D32 Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-106-FGjOSpJ7MjWtyIY9q7jkxw-1; Mon, 27 Jun 2022 16:55:36 -0400 X-MC-Unique: FGjOSpJ7MjWtyIY9q7jkxw-1 Received: by mail-qk1-f198.google.com with SMTP id 194-20020a3704cb000000b006af069d1e0aso9999816qke.15 for ; Mon, 27 Jun 2022 13:55:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=rjlPfw3eGQM0VYEWBvszyqbotze73ZDEAEN7+viFR3w=; b=cdnrfpai1/CfDf85ToWKhOR5O8xZMXXZ1mS+ovZNYfz+o9m07NWhlKNx7Mn8SsIvkF z7jLDecFObZPvMR87gmFZoxS8qequvnPHi1YtjFoZZRELBLK1HhFSe0ra7kgW1JI2NF2 Ym04r0OO4re8kL6F7wO05xjAihwCvAypktybkaeSrChaZpNJ7NtMPQPRxVd83hLQ/pO1 U3EcAlqIJoOZrUKJ186JhxddWt29rLP9gU5CUKtC/gqIupBpUtI0Tod8YZS1GFUYxXej 97K3ndHuW7NRIPFfKnQ6uJmpyxz3rynIORoO7WZoaq0QpY3rbDXUZZcy6V78PalR9zmq buAw== X-Gm-Message-State: AJIora/4KNDiC/FubiZzgsB6ejUHcGp7Y0vCMHuL2HhxzusKxUzBvMEj ppV8eY4bh+0iHJPFw/w18g4DmTJmsOlngkNn28kpxsLo7GqJ2vREBBJV5lfsH4t0+3mhnoAMvnU ghznTQtM= X-Received: by 2002:a05:622a:1704:b0:319:7bd9:dbaf with SMTP id h4-20020a05622a170400b003197bd9dbafmr6950043qtk.659.1656363335538; Mon, 27 Jun 2022 13:55:35 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vvJ9mCUENHZH9jA8iyyLQAOsXTQApVXTRFi2LLlSaG30GnLlU7foQNo32SSUYqKocDzn6Rmg== X-Received: by 2002:a05:622a:1704:b0:319:7bd9:dbaf with SMTP id h4-20020a05622a170400b003197bd9dbafmr6950025qtk.659.1656363335082; Mon, 27 Jun 2022 13:55:35 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id 20-20020ac85654000000b00304ec60f711sm7549086qtt.39.2022.06.27.13.55.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 13:55:34 -0700 (PDT) Message-ID: <9236f59ebb2710a9209c11f6e4ce58ef0588b844.camel@redhat.com> Subject: Re: [PATCH] analyzer PR 106003 From: David Malcolm To: Mir Immad , gcc@gcc.gnu.org Date: Mon, 27 Jun 2022 16:55:33 -0400 In-Reply-To: References: User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2022 20:55:40 -0000 Thanks for the updated patch. Various notes below; mostly nits, but I realized there's a logic error in fd_state_machine::on_condition that I hadn't spotted before... [...snip...] > diff --git a/gcc/analyzer/ChangeLog b/gcc/analyzer/ChangeLog > index 53b3ffb487b..d30e94f2f62 100644 > --- a/gcc/analyzer/ChangeLog > +++ b/gcc/analyzer/ChangeLog > @@ -1,3 +1,6 @@ > +2022-06-27 Immad Mir > + * sm-fd.cc: New file. > + The new ChangeLog entries should be part of the commit message of the patch, not expressed part of the patch itself. Some notes on this can be seen at: https://gcc.gnu.org/codingconventions.html#ChangeLogs [...snip...] > +/* An enum for describing the direction of an access to a file descriptor. > */ > + > +enum access_direction > +{ > + DIR_READ, > + DIR_WRITE > +}; We already have a declaration of "enum access_direction" in analyzer/analyzer.h, so let's just use that, rather than declaring it a second time. [...snip...] > +class fd_access_mode_mismatch : public fd_diagnostic > +{ > +public: > + fd_access_mode_mismatch (const fd_state_machine &sm, tree arg, > + enum access_direction access_dir, > + const tree callee_fndecl) > + : m_access_dir (access_dir), m_callee_fndecl (callee_fndecl), > + fd_diagnostic (sm, arg) The ordering in this initializer list looks wrong: I'd expect to see the base class initializer (fd_diagnostics) *before* the fields. I think we have a warning about this. > + { > + } > + > + const char * > + get_kind () const final override > + { > + return "fd_access_mode_mismatch"; > + } > + > + int > + get_controlling_option () const final override > + { > + return OPT_Wanalyzer_fd_access_mode_mismatch; > + } > + > + bool > + emit (rich_location *rich_loc) final override > + { > + > + switch (m_access_dir) > + { > + case DIR_READ: > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on % file descriptor %qE", > + m_callee_fndecl, m_arg); > + case DIR_WRITE: > + return warning_at (rich_loc, get_controlling_option (), > + "%qE on % file descriptor %qE", > + m_callee_fndecl, m_arg); > + default: > + gcc_unreachable (); > + } I got a bit confused here between the direction in which the FD expects accesses (e.g. due to O_RDONLY), as opposed to the direction of the operation being attempted on said FD (e.g. due to a "write" call). I realize that "access" is ambiguous in that it could refer to either meaning; "m_access_dir" here means the the former. How about the naming convention of "fd_dir" (and thus "m_fd_dir" here) where we're talking about the former, and "FOO_dir" for some FOO (e.g. "callee_fndecl_dir") when we're talking about the latter? Using that consistently throughout could clarify things. [...snip...] > +enum access_direction > +fd_state_machine::get_access_dir_from_state (state_t state) const > +{ > + if (state == m_unchecked_read_only || state == m_valid_read_only) > + return DIR_READ; > + else if (state == m_unchecked_write_only || state == m_valid_write_only) > + return DIR_WRITE; > +} Control will "fall off the end" of this function if "state" doesn't match one of the four states you test against. Hopefully we emit a warning for this (which should make the bootstrap build fail). This only gets used in one place, in fd_state_machine::check_for_open_fd, and the value is only needed once we know we have a valid_fd, so I think it's best to simply put the above logic there, within the switch statement... [...snip...] > +void > +fd_state_machine::check_for_open_fd (sm_context *sm_ctxt, const supernode > *node, > + const gimple *stmt, const gcall *call, > + const tree callee_fndecl, > + enum access_direction access_fn) const Please clarify this for me by renaming "access_fn" to "callee_fndecl_dir" or "attempted_dir" or similar. > +{ > + tree arg = gimple_call_arg (call, 0); > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + state_t state = sm_ctxt->get_state (stmt, arg); > + enum access_direction access_dir = get_access_dir_from_state (state); ....so drop this call... > + > + if (is_closed_fd_p (state)) > + { > + sm_ctxt->warn (node, stmt, arg, > + new fd_use_after_close (*this, diag_arg, > callee_fndecl)); > + } > + > + else > + { > + > + if (!is_valid_fd_p (state)) > + { > + if (!is_constant_fd_p (state)) > + sm_ctxt->warn ( > + node, stmt, arg, > + new unchecked_use_of_fd (*this, diag_arg, callee_fndecl)); > + } > + switch (access_fn) > + { > + case DIR_READ: ...in favor of simply checking for the write-only states here: (perhaps with a new "is_readonly_fd_p") > + if (access_dir == DIR_WRITE) > + { > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch ( > + *this, diag_arg, access_dir, > callee_fndecl)); > + } > + > + break; > + case DIR_WRITE: > + > + if (access_dir == DIR_READ) > + { ...and for the read-only states here: (perhaps with a new "is_writeonly_fd_p") > + tree diag_arg = sm_ctxt->get_diagnostic_tree (arg); > + sm_ctxt->warn (node, stmt, arg, > + new fd_access_mode_mismatch ( > + *this, diag_arg, access_dir, > callee_fndecl)); > + } > + break; > + } > + } > +} > + > +void > +fd_state_machine::on_condition (sm_context *sm_ctxt, const supernode *node, > + const gimple *stmt, const svalue *lhs, > + enum tree_code op, const svalue *rhs) const > +{ > + > + if (!rhs->all_zeroes_p ()) > + return; > + > + if (op == GE_EXPR) > + { > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write, > + m_valid_read_write); > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only, > + m_valid_read_only); > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only, > + m_valid_write_only); > + } > + else if (op == LT_EXPR) > + { > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_write, > + m_invalid); > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_read_only, > + m_invalid); > + sm_ctxt->on_transition (node, stmt, lhs, m_unchecked_write_only, > + m_invalid); > + } Here we're detecting when the result of "open" is tested for < 0 vs >=0. I did some refresher reading on file descriptors over the weekend (specifically W. Richard Stevens's classic "Advanced Programming in the UNIX Environment") and in his examples he always checks the result of "open" for != -1, rather than < 0. The man page for "open" specifies: https://www.man7.org/linux/man-pages/man2/open.2.html#RETURN_VALUE "On success, open(), openat(), and creat() return the new file descriptor (a nonnegative integer). On error, -1 is returned and errno is set to indicate the error." Similarly, https://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html has: "Upon successful completion, these functions shall open the file and return a non-negative integer representing the file descriptor. Otherwise, these functions shall return -1 and set errno to indicate the error. If -1 is returned, no files shall be created or modified." We should *not* warn about code that does a test for != -1, since that's what the standard says to test for. That said, I think that it would be overzealous to complain about code that uses < 0 as a test on the result of "open". So please add DejaGnu test coverage for such "!= -1" code, and ensure that fd_state_machine::on_condition handles EQ_EXPR/NE_EXPR against -1 (and continues to handle the "< 0" test). [...snip...] > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 604f4e3b0a8..37173529b2d 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,9 @@ > +2022-06-27 Immad Mir > + * gcc.dg/analyzer/fd-1.c: New test. > + * gcc.dg/analyzer/fd-2.c: New test. > + * gcc.dg/analyzer/fd-3.c: New test. > + * gcc.dg/analyzer/fd-4.c: New test. Same notes as above about how we manage ChangeLog entries. [...snip...] I think that with the above nits fixed, the next version of this is likely going to be ready for trunk - assuming that you bootstrap it and verify that the testsuite doesn't regress. Hope the above makes sense; thanks again for the patch. Dave