public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GSoC: Working on the static analyzer
Date: Sun, 13 Feb 2022 18:05:31 -0500	[thread overview]
Message-ID: <f2d2fa02c8818d1ccd6ac3677d365c91030dbed3.camel@redhat.com> (raw)
In-Reply-To: <d50d916217015f69a5b5561543ef7f3abc33594c.camel@redhat.com>

On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote:
> On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> > Hi,
> > 
> > I wanted some clarification on bifurcating the exploded graph at
> > call
> > to
> > open().
> > Should the analyzer warn for code like this "when open fails" (like
> > strchr
> > does when  'strchr' returns NULL)
> > 
> > int fd = open("NOFILE", O_RDONLY);
> > write(fd, "a", 1);
> > 
> > because of the bad file descriptor.
> > unless it is written like this:
> > if (!errno)
> >   write(fd, "a", 1);
> 
> "open" returns a non-negative integer on success, and returns -1 and
> sets errno on an error.  As I understand it, errno is never cleared

"modified" would be a better word than "cleared" here, sorry

Dave

> unless an error occurs, so errno could already be set due to a prior
> failure.  Hence I believe the correct way to write such code is to
> check fd for being -1, rather than checking errno.  
> 
> It's probably best to bifurcate the state at the open call, into
> "success" and "failure" outcomes, so that we can track that the
> caller
> "owns" the filedescriptor on success, and thus we can complain about
> fd
> leaks on paths that discard the value without closing it.
> 
> We can then warn if we see -1 passed as the fd value into "write",
> since that implies that the open call failed.  If we bifurcate the
> state at the open call, then for:
> 
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
> 
> we'd have the execution paths:
> 
> (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
> setting sm-state on that value)
> 
> (b) "when 'open' fails" (setting fd to -1, and errno to a
> conjured_svalue known to be nonzero)
> 
> and at the call to "write" on path (b) we see the -1 passed to it,
> and
> can complain accordingly. 
> 
> I'm not sure if we also want to track that the file was O_RDONLY
> (which
> suggests that the "write" should fail), but that would be a nice
> bonus.
> 
> Hope this is helpful
> Dave
> 
> 
> 
> > 
> > Thank you.
> > 
> > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com>
> > wrote:
> > 
> > > I worked around the leak detection and also handled the case
> > > where
> > > the fd
> > > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > > attaching
> > > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > > move to
> > > v12 next thing.
> > > 
> > > > I wonder if it's worth checking for attempts to write to a fd
> > > > that
> > > > was
> > > > opened with O_RDONLY, or the converse?
> > > 
> > > Yes, it does not make sense to warn for write on read-only closed
> > > fd
> > > or
> > > converse. But, it does make sense to warn for write operation on
> > > valid
> > > read-only fd. For this, we might have to analyze calls to the
> > > open()
> > > carefully -- get the literal value of the second argument to the
> > > call
> > > and
> > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY
> > > are
> > > in it.
> > > We might have to maintain separate vectors for these and check
> > > for
> > > them in
> > > calls to write and read -- warn if write is being called on read-
> > > only
> > > fd
> > > and converse. How do we equate them? Does gimple store unique ids
> > > for
> > > each
> > > tree? or something like SAME_TREE? Just thinking out loud.
> > > 
> > > > how much state does it make sense to track for a fd?
> > > Sorry, I didnt get it. Do you mean how may states I'm using?
> > > unchecked,
> > > closed, null and leak_by_not_saved (for the call-tree that does
> > > not
> > > have a
> > > lhs).
> > > 
> > > > Also, at some point, we're going to have to handle "errno" -
> > > > but
> > > > given
> > > > that might be somewhat fiddly it's OK to defer that until
> > > > you're
> > > > more
> > > > familiar with the code
> > > 
> > > I'm looking forward to figure it out.
> > > 
> > > Thank you.
> > > 
> > > 
> > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <
> > > dmalcolm@redhat.com>
> > > wrote:
> > > 
> > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > > Thank you for the detailed information.
> > > > > 
> > > > > I've been looking into the integer posix  file descriptor
> > > > > APIs
> > > > > and I
> > > > > decided to write proof-of-concept  checker for them. (not
> > > > > caring
> > > > > about
> > > > > errno). The checker tracks the fd returned by open(), warns
> > > > > if
> > > > > dup()
> > > > > is
> > > > > called with closed fd otherwise tracks the fd returned by
> > > > > dup(),
> > > > > it
> > > > > also
> > > > > warns if read() and write() functions were called on closed
> > > > > fd.
> > > > > I'm
> > > > > attaching a text file that lists some c sources and warnings
> > > > > by
> > > > > the
> > > > > static
> > > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > > this
> > > > > something that could also be added to the analyzer?
> > > > 
> > > > This looks great, and very promising as both new functionality
> > > > for
> > > > GCC
> > > > 13, and as a GSoC 2022 project.
> > > > 
> > > > BTW, it looks like you're working with GCC 11, but the analyzer
> > > > has
> > > > changed quite a bit on trunk for GCC 12, so it's worth trying
> > > > to
> > > > track
> > > > trunk.
> > > > 
> > > > I wonder if it's worth checking for attempts to write to a fd
> > > > that
> > > > was
> > > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > > thinking
> > > > aloud - how much state does it make sense to track for a fd?).
> > > > 
> > > > Also, at some point, we're going to have to handle "errno" -
> > > > but
> > > > given
> > > > that might be somewhat fiddly it's OK to defer that until
> > > > you're
> > > > more
> > > > familiar with the code.
> > > > 
> > > > > 
> > > > > About the fd leak, that's the next thing I'll try to get
> > > > > working.
> > > > > Since
> > > > > you've mentioned that it could be a GSoC project, this is
> > > > > what
> > > > > I'm
> > > > > going to
> > > > > focus on.
> > > > 
> > > > Excellent.
> > > > 
> > > > Let me know (via this mailing list) if you have any questions.
> > > > 
> > > > Thanks
> > > > Dave
> > > > 
> > > > > 
> > > > > Regards.
> > > > > 
> > > > > 
> > > > > 
> > > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > > 
> > > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > > > > Hi, sir.
> > > > > > > 
> > > > > > > I've been trying to understand the static analyzer's
> > > > > > > code. I
> > > > > > > spent most
> > > > > > > of
> > > > > > > my time learning the state machine's API. I learned how
> > > > > > > state
> > > > > > > machine's
> > > > > > > on_stmt is supposed to "recognize" specific functions and
> > > > > > > how
> > > > > > > on_transition
> > > > > > > takes a specific tree from one state to another, and how
> > > > > > > the
> > > > > > > captured
> > > > > > > states are used by pending_diagnostics to report the
> > > > > > > errors.
> > > > > > > Furthermore, I
> > > > > > > was able to create a dummy checker that mimicked the
> > > > > > > behaviour of
> > > > > > > sm-
> > > > > > > file's
> > > > > > > double_fclose and compile GCC with these changes. Is this
> > > > > > > the
> > > > > > > right way
> > > > > > > of
> > > > > > > learning?
> > > > > > 
> > > > > > This sounds great.
> > > > > > 
> > > > > > > 
> > > > > > > As you've mentioned on the projects page that you would
> > > > > > > like
> > > > > > > to
> > > > > > > add
> > > > > > > more
> > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > refer
> > > > > > > me to
> > > > > > > a) a
> > > > > > > simple C program that uses such an API (and also what the
> > > > > > > analyzer
> > > > > > > should
> > > > > > > have done) so that I can attempt to add such a checker to
> > > > > > > the
> > > > > > > analyzer.
> > > > > > 
> > > > > > A couple of project ideas:
> > > > > > 
> > > > > > (i) treat data coming from a network connection as tainted,
> > > > > > by
> > > > > > somehow
> > > > > > teaching the analyzer about networking APIs.  Ideally: look
> > > > > > at
> > > > > > some
> > > > > > subset of historical CVEs involving network-facing attacks
> > > > > > on
> > > > > > user-
> > > > > > space daemons, and find a way to detect them in the
> > > > > > analyzer
> > > > > > (need
> > > > > > to
> > > > > > find a way to mark the incoming data as tainted, so that
> > > > > > the
> > > > > > analyer
> > > > > > "know" about the trust boundary - that the incoming data
> > > > > > needs
> > > > > > to
> > > > > > be
> > > > > > sanitized and treated with extra caution; see
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > > > > for my attempts to do this for the Linux kernel).
> > > > > > 
> > > > > > Obviously this is potentially a huge project, so maybe just
> > > > > > picking
> > > > > > a
> > > > > > tiny subset and getting that working as a proof-of-concept
> > > > > > would be
> > > > > > a
> > > > > > good GSoC project.  Maybe find an old CVE that someone has
> > > > > > written
> > > > > > a
> > > > > > good write-up for, and think about "how could GCC/-
> > > > > > fanalyzer
> > > > > > have
> > > > > > spotted it?"
> > > > > > 
> > > > > > (ii) add leak-detection for POSIX file descriptors: i.e.
> > > > > > the
> > > > > > integer
> > > > > > values returned by "open", "dup", etc.  It would be good to
> > > > > > have a
> > > > > > check that the user's code doesn't leak these values e.g.
> > > > > > on
> > > > > > error-
> > > > > > handling paths, by failing to close a file-descriptor (and
> > > > > > not
> > > > > > storing
> > > > > > it anywhere).  I think that much of this could be done by
> > > > > > analogy
> > > > > > with
> > > > > > the sm-file.cc code.
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > when I
> > > > > > > mentioned
> > > > > > > it.
> > > > > > > I'd rather work on adding more checkers.
> > > > > > 
> > > > > > Fair enough.
> > > > > > 
> > > > > > Hope this above is constructive.
> > > > > > 
> > > > > > Dave
> > > > > > 
> > > > > > > 
> > > > > > > Regards.
> > > > > > > 
> > > > > > > Mir Immad
> > > > > > > 
> > > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <
> > > > > > > mirimnan017@gmail.com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Hi Sir,
> > > > > > > > 
> > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > code.
> > > > > > > > I
> > > > > > > > spent
> > > > > > > > most of
> > > > > > > > my time learning the state machine's API. I learned how
> > > > > > > > state
> > > > > > > > machine's
> > > > > > > > on_stmt is supposed to "recognize" specific functions
> > > > > > > > and
> > > > > > > > takes
> > > > > > > > a
> > > > > > > > specific
> > > > > > > > tree from one state to another, and how the captured
> > > > > > > > states
> > > > > > > > are
> > > > > > > > used
> > > > > > > > by
> > > > > > > > pending_diagnostics to report the errors. Furthermore,
> > > > > > > > I
> > > > > > > > was
> > > > > > > > able to
> > > > > > > > create
> > > > > > > > a dummy checker that mimicked the behaviour of sm-
> > > > > > > > file's
> > > > > > > > double_fclose and
> > > > > > > > compile GCC with these changes. Is this the right way
> > > > > > > > of
> > > > > > > > learning?
> > > > > > > > 
> > > > > > > > As you've mentioned on the projects page that you would
> > > > > > > > like to
> > > > > > > > add
> > > > > > > > more
> > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > refer
> > > > > > > > me
> > > > > > > > to a)
> > > > > > > > a
> > > > > > > > simple C program that uses such an API (and also what
> > > > > > > > the
> > > > > > > > analyzer
> > > > > > > > should
> > > > > > > > have done) so that I can attempt to add such a checker
> > > > > > > > to
> > > > > > > > the
> > > > > > > > analyzer.
> > > > > > > > 
> > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > when
> > > > > > > > I
> > > > > > > > mentioned it.
> > > > > > > > I'd rather work on adding more checkers.
> > > > > > > > 
> > > > > > > > Regards.
> > > > > > > > Mir Immad
> > > > > > > > 
> > > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > > > > dmalcolm@redhat.com>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > > > > HI David,
> > > > > > > > > > I've been tinkering with the static analyzer for
> > > > > > > > > > the
> > > > > > > > > > last
> > > > > > > > > > few
> > > > > > > > > > days. I
> > > > > > > > > > find
> > > > > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > > > > intresting.
> > > > > > > > > > I'm
> > > > > > > > > > writing
> > > > > > > > > > this to let you know that I'm trying to learn the
> > > > > > > > > > codebase.
> > > > > > > > > > Thank you.
> > > > > > > > > 
> > > > > > > > > Excellent.
> > > > > > > > > 
> > > > > > > > > BTW, I think adding SARIF output would involve
> > > > > > > > > working
> > > > > > > > > more
> > > > > > > > > with
> > > > > > > > > GCC's
> > > > > > > > > diagnostics subsystem than with the static analyzer,
> > > > > > > > > since
> > > > > > > > > (in
> > > > > > > > > theory)
> > > > > > > > > all of the static analyzer's output is passing
> > > > > > > > > through
> > > > > > > > > the
> > > > > > > > > diagnostics
> > > > > > > > > subsystem - though the static analyzer is probably
> > > > > > > > > the
> > > > > > > > > only
> > > > > > > > > GCC
> > > > > > > > > component generating diagnostic paths.
> > > > > > > > > 
> > > > > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > > > > subsystems
> > > > > > > > > and
> > > > > > > > > SARIF output would benefit both - but it would be
> > > > > > > > > rather
> > > > > > > > > tangential
> > > > > > > > > to
> > > > > > > > > the analyzer - so if you had specifically wanted to
> > > > > > > > > be
> > > > > > > > > working on
> > > > > > > > > the
> > > > > > > > > guts of the analyzer itself, you may want to pick a
> > > > > > > > > different
> > > > > > > > > subproject.
> > > > > > > > > 
> > > > > > > > > The SARIF standard is rather long and complicated,
> > > > > > > > > and we
> > > > > > > > > would
> > > > > > > > > want to
> > > > > > > > > be compatible with clang's implementation.
> > > > > > > > > 
> > > > > > > > > It would be very cool if gcc could also accept SARIF
> > > > > > > > > files as
> > > > > > > > > an
> > > > > > > > > *input* format, and emit them as diagnostics; that
> > > > > > > > > might
> > > > > > > > > help
> > > > > > > > > with
> > > > > > > > > debugging SARIF output.   (I have a old patch for
> > > > > > > > > adding
> > > > > > > > > JSON
> > > > > > > > > parsing
> > > > > > > > > support to GCC that could be used as a starting point
> > > > > > > > > for
> > > > > > > > > this).
> > > > > > > > > 
> > > > > > > > > Hope the above makes sense
> > > > > > > > > Dave
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via
> > > > > > > > > > > Gcc
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > 
> > > > > > > > > > > Hi, and welcome.
> > > > > > > > > > > 
> > > > > > > > > > > > I intend to work on the static analyzer. Are
> > > > > > > > > > > > these
> > > > > > > > > > > > documents
> > > > > > > > > > > > enough to
> > > > > > > > > > > > get
> > > > > > > > > > > > started:
> > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > 
> > > > > > 
> > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > > > > > 
> > > > > > > > > > > Yes.
> > > > > > > > > > > 
> > > > > > > > > > > There are also some high-level notes here:
> > > > > > > > > > >   
> > > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > > > > > 
> > > > > > > > > > > Also, given that the analyzer is part of GCC, the
> > > > > > > > > > > more
> > > > > > > > > > > general
> > > > > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > > > > > 
> > > > > > > > > > > I recommend creating a trivial C source file with
> > > > > > > > > > > a
> > > > > > > > > > > bug
> > > > > > > > > > > in it
> > > > > > > > > > > (e.g.
> > > > > > > > > > > a
> > > > > > > > > > > 3-line function with a use-after-free), and
> > > > > > > > > > > stepping
> > > > > > > > > > > through
> > > > > > > > > > > the
> > > > > > > > > > > analyzer to get a sense of how it works.
> > > > > > > > > > > 
> > > > > > > > > > > Hope this is helpful; don't hesitate to ask
> > > > > > > > > > > questions.
> > > > > > > > > > > Dave
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > 
> > > > 
> > > > 
> 



  reply	other threads:[~2022-02-13 23:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11  5:33 Mir Immad
2022-01-11 13:39 ` David Malcolm
2022-01-14 16:45   ` Mir Immad
2022-01-17  0:11     ` David Malcolm
     [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
2022-01-23 20:11         ` Mir Immad
2022-01-24 14:19           ` Ankur Saini
2022-01-26 14:31             ` David Malcolm
2022-01-26 14:26           ` David Malcolm
2022-01-29 14:52             ` Mir Immad
2022-01-29 17:39               ` David Malcolm
2022-02-01 14:58                 ` Mir Immad
2022-02-13 15:46                   ` Mir Immad
2022-02-13 22:57                     ` David Malcolm
2022-02-13 23:05                       ` David Malcolm [this message]
2022-04-04 16:16                         ` Mir Immad
2022-04-08 18:19                           ` David Malcolm
     [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
2022-02-14 12:59 ` Basile Starynkevitch
2022-02-14 13:12   ` Basile Starynkevitch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2d2fa02c8818d1ccd6ac3677d365c91030dbed3.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).