public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mir Immad <mirimnan017@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GSoC: Working on the static analyzer
Date: Sun, 13 Feb 2022 21:16:19 +0530	[thread overview]
Message-ID: <CAE1-7oxonAHtgCcYa-mqZZLHW1oCDXejrT=_mjbXQtQqiKZqWg@mail.gmail.com> (raw)
In-Reply-To: <CAE1-7owjFc_P-UC0GN0O15n1pTHqzaN0B3E8bV5hJ5pq5VyFDQ@mail.gmail.com>

Hi,

I wanted some clarification on bifurcating the exploded graph at call to
open().
Should the analyzer warn for code like this "when open fails" (like strchr
does when  'strchr' returns NULL)

int fd = open("NOFILE", O_RDONLY);
write(fd, "a", 1);

because of the bad file descriptor.
unless it is written like this:
if (!errno)
  write(fd, "a", 1);

Thank you.

On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com> wrote:

> I worked around the leak detection and also handled the case where the fd
> is not saved. I wonder why sm-file hasn't implemented it yet. I'm attaching
> a text file with analyzer warnings. I'm still on gcc-11.2.0, will move to
> v12 next thing.
>
> > I wonder if it's worth checking for attempts to write to a fd that was
> > opened with O_RDONLY, or the converse?
>
> Yes, it does not make sense to warn for write on read-only closed fd or
> converse. But, it does make sense to warn for write operation on valid
> read-only fd. For this, we might have to analyze calls to the open()
> carefully -- get the literal value of the second argument to the call and
> do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are in it.
> We might have to maintain separate vectors for these and check for them in
> calls to write and read -- warn if write is being called on read-only fd
> and converse. How do we equate them? Does gimple store unique ids for each
> tree? or something like SAME_TREE? Just thinking out loud.
>
> > how much state does it make sense to track for a fd?
> Sorry, I didnt get it. Do you mean how may states I'm using? unchecked,
> closed, null and leak_by_not_saved (for the call-tree that does not have a
> lhs).
>
> > Also, at some point, we're going to have to handle "errno" - but given
> > that might be somewhat fiddly it's OK to defer that until you're more
> > familiar with the code
>
> I'm looking forward to figure it out.
>
> Thank you.
>
>
> On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
>
>> On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
>> > Thank you for the detailed information.
>> >
>> > I've been looking into the integer posix  file descriptor APIs and I
>> > decided to write proof-of-concept  checker for them. (not caring
>> > about
>> > errno). The checker tracks the fd returned by open(), warns if dup()
>> > is
>> > called with closed fd otherwise tracks the fd returned by dup(), it
>> > also
>> > warns if read() and write() functions were called on closed fd. I'm
>> > attaching a text file that lists some c sources and warnings by the
>> > static
>> > analyzer. I've used the diagnostic meta-data from sm-file. Is this
>> > something that could also be added to the analyzer?
>>
>> This looks great, and very promising as both new functionality for GCC
>> 13, and as a GSoC 2022 project.
>>
>> BTW, it looks like you're working with GCC 11, but the analyzer has
>> changed quite a bit on trunk for GCC 12, so it's worth trying to track
>> trunk.
>>
>> I wonder if it's worth checking for attempts to write to a fd that was
>> opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
>> aloud - how much state does it make sense to track for a fd?).
>>
>> Also, at some point, we're going to have to handle "errno" - but given
>> that might be somewhat fiddly it's OK to defer that until you're more
>> familiar with the code.
>>
>> >
>> > About the fd leak, that's the next thing I'll try to get working.
>> > Since
>> > you've mentioned that it could be a GSoC project, this is what I'm
>> > going to
>> > focus on.
>>
>> Excellent.
>>
>> Let me know (via this mailing list) if you have any questions.
>>
>> Thanks
>> Dave
>>
>> >
>> > Regards.
>> >
>> >
>> >
>> > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com>
>> > wrote:
>> >
>> > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
>> > > > Hi, sir.
>> > > >
>> > > > I've been trying to understand the static analyzer's code. I
>> > > > spent most
>> > > > of
>> > > > my time learning the state machine's API. I learned how state
>> > > > machine's
>> > > > on_stmt is supposed to "recognize" specific functions and how
>> > > > on_transition
>> > > > takes a specific tree from one state to another, and how the
>> > > > captured
>> > > > states are used by pending_diagnostics to report the errors.
>> > > > Furthermore, I
>> > > > was able to create a dummy checker that mimicked the behaviour of
>> > > > sm-
>> > > > file's
>> > > > double_fclose and compile GCC with these changes. Is this the
>> > > > right way
>> > > > of
>> > > > learning?
>> > >
>> > > This sounds great.
>> > >
>> > > >
>> > > > As you've mentioned on the projects page that you would like to
>> > > > add
>> > > > more
>> > > > support for some POSIX APIs. Can you please write (or refer me to
>> > > > a) a
>> > > > simple C program that uses such an API (and also what the
>> > > > analyzer
>> > > > should
>> > > > have done) so that I can attempt to add such a checker to the
>> > > > analyzer.
>> > >
>> > > A couple of project ideas:
>> > >
>> > > (i) treat data coming from a network connection as tainted, by
>> > > somehow
>> > > teaching the analyzer about networking APIs.  Ideally: look at some
>> > > subset of historical CVEs involving network-facing attacks on user-
>> > > space daemons, and find a way to detect them in the analyzer (need
>> > > to
>> > > find a way to mark the incoming data as tainted, so that the
>> > > analyer
>> > > "know" about the trust boundary - that the incoming data needs to
>> > > be
>> > > sanitized and treated with extra caution; see
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
>> > > for my attempts to do this for the Linux kernel).
>> > >
>> > > Obviously this is potentially a huge project, so maybe just picking
>> > > a
>> > > tiny subset and getting that working as a proof-of-concept would be
>> > > a
>> > > good GSoC project.  Maybe find an old CVE that someone has written
>> > > a
>> > > good write-up for, and think about "how could GCC/-fanalyzer have
>> > > spotted it?"
>> > >
>> > > (ii) add leak-detection for POSIX file descriptors: i.e. the
>> > > integer
>> > > values returned by "open", "dup", etc.  It would be good to have a
>> > > check that the user's code doesn't leak these values e.g. on error-
>> > > handling paths, by failing to close a file-descriptor (and not
>> > > storing
>> > > it anywhere).  I think that much of this could be done by analogy
>> > > with
>> > > the sm-file.cc code.
>> > >
>> > >
>> > > >
>> > > > Also, I didn't realize the complexity of adding SARIF when I
>> > > > mentioned
>> > > > it.
>> > > > I'd rather work on adding more checkers.
>> > >
>> > > Fair enough.
>> > >
>> > > Hope this above is constructive.
>> > >
>> > > Dave
>> > >
>> > > >
>> > > > Regards.
>> > > >
>> > > > Mir Immad
>> > > >
>> > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Sir,
>> > > > >
>> > > > > I've been trying to understand the static analyzer's code. I
>> > > > > spent
>> > > > > most of
>> > > > > my time learning the state machine's API. I learned how state
>> > > > > machine's
>> > > > > on_stmt is supposed to "recognize" specific functions and takes
>> > > > > a
>> > > > > specific
>> > > > > tree from one state to another, and how the captured states are
>> > > > > used
>> > > > > by
>> > > > > pending_diagnostics to report the errors. Furthermore, I was
>> > > > > able to
>> > > > > create
>> > > > > a dummy checker that mimicked the behaviour of sm-file's
>> > > > > double_fclose and
>> > > > > compile GCC with these changes. Is this the right way of
>> > > > > learning?
>> > > > >
>> > > > > As you've mentioned on the projects page that you would like to
>> > > > > add
>> > > > > more
>> > > > > support for some POSIX APIs. Can you please write (or refer me
>> > > > > to a)
>> > > > > a
>> > > > > simple C program that uses such an API (and also what the
>> > > > > analyzer
>> > > > > should
>> > > > > have done) so that I can attempt to add such a checker to the
>> > > > > analyzer.
>> > > > >
>> > > > > Also, I didn't realize the complexity of adding SARIF when I
>> > > > > mentioned it.
>> > > > > I'd rather work on adding more checkers.
>> > > > >
>> > > > > Regards.
>> > > > > Mir Immad
>> > > > >
>> > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
>> > > > > dmalcolm@redhat.com>
>> > > > > wrote:
>> > > > >
>> > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
>> > > > > > > HI David,
>> > > > > > > I've been tinkering with the static analyzer for the last
>> > > > > > > few
>> > > > > > > days. I
>> > > > > > > find
>> > > > > > > the project of adding SARIF output to the analyzer
>> > > > > > > intresting.
>> > > > > > > I'm
>> > > > > > > writing
>> > > > > > > this to let you know that I'm trying to learn the codebase.
>> > > > > > > Thank you.
>> > > > > >
>> > > > > > Excellent.
>> > > > > >
>> > > > > > BTW, I think adding SARIF output would involve working more
>> > > > > > with
>> > > > > > GCC's
>> > > > > > diagnostics subsystem than with the static analyzer, since
>> > > > > > (in
>> > > > > > theory)
>> > > > > > all of the static analyzer's output is passing through the
>> > > > > > diagnostics
>> > > > > > subsystem - though the static analyzer is probably the only
>> > > > > > GCC
>> > > > > > component generating diagnostic paths.
>> > > > > >
>> > > > > > I'm happy to mentor such a project as I maintain both
>> > > > > > subsystems
>> > > > > > and
>> > > > > > SARIF output would benefit both - but it would be rather
>> > > > > > tangential
>> > > > > > to
>> > > > > > the analyzer - so if you had specifically wanted to be
>> > > > > > working on
>> > > > > > the
>> > > > > > guts of the analyzer itself, you may want to pick a different
>> > > > > > subproject.
>> > > > > >
>> > > > > > The SARIF standard is rather long and complicated, and we
>> > > > > > would
>> > > > > > want to
>> > > > > > be compatible with clang's implementation.
>> > > > > >
>> > > > > > It would be very cool if gcc could also accept SARIF files as
>> > > > > > an
>> > > > > > *input* format, and emit them as diagnostics; that might help
>> > > > > > with
>> > > > > > debugging SARIF output.   (I have a old patch for adding JSON
>> > > > > > parsing
>> > > > > > support to GCC that could be used as a starting point for
>> > > > > > this).
>> > > > > >
>> > > > > > Hope the above makes sense
>> > > > > > Dave
>> > > > > >
>> > > > > > >
>> > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
>> > > > > > > dmalcolm@redhat.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
>> > > > > > > > wrote:
>> > > > > > > > > Hi everyone,
>> > > > > > > >
>> > > > > > > > Hi, and welcome.
>> > > > > > > >
>> > > > > > > > > I intend to work on the static analyzer. Are these
>> > > > > > > > > documents
>> > > > > > > > > enough to
>> > > > > > > > > get
>> > > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > >
>> > >
>> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
>> > > > > > > >
>> > > > > > > > Yes.
>> > > > > > > >
>> > > > > > > > There are also some high-level notes here:
>> > > > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
>> > > > > > > >
>> > > > > > > > Also, given that the analyzer is part of GCC, the more
>> > > > > > > > general
>> > > > > > > > introductions to hacking on GCC will be useful.
>> > > > > > > >
>> > > > > > > > I recommend creating a trivial C source file with a bug
>> > > > > > > > in it
>> > > > > > > > (e.g.
>> > > > > > > > a
>> > > > > > > > 3-line function with a use-after-free), and stepping
>> > > > > > > > through
>> > > > > > > > the
>> > > > > > > > analyzer to get a sense of how it works.
>> > > > > > > >
>> > > > > > > > Hope this is helpful; don't hesitate to ask questions.
>> > > > > > > > Dave
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > >
>> > >
>> > >
>>
>>
>>

  reply	other threads:[~2022-02-13 15:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11  5:33 Mir Immad
2022-01-11 13:39 ` David Malcolm
2022-01-14 16:45   ` Mir Immad
2022-01-17  0:11     ` David Malcolm
     [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
2022-01-23 20:11         ` Mir Immad
2022-01-24 14:19           ` Ankur Saini
2022-01-26 14:31             ` David Malcolm
2022-01-26 14:26           ` David Malcolm
2022-01-29 14:52             ` Mir Immad
2022-01-29 17:39               ` David Malcolm
2022-02-01 14:58                 ` Mir Immad
2022-02-13 15:46                   ` Mir Immad [this message]
2022-02-13 22:57                     ` David Malcolm
2022-02-13 23:05                       ` David Malcolm
2022-04-04 16:16                         ` Mir Immad
2022-04-08 18:19                           ` David Malcolm
     [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
2022-02-14 12:59 ` Basile Starynkevitch
2022-02-14 13:12   ` Basile Starynkevitch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE1-7oxonAHtgCcYa-mqZZLHW1oCDXejrT=_mjbXQtQqiKZqWg@mail.gmail.com' \
    --to=mirimnan017@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).