public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mir Immad <mirimnan017@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GSoC: Working on the static analyzer
Date: Tue, 1 Feb 2022 20:28:13 +0530	[thread overview]
Message-ID: <CAE1-7owjFc_P-UC0GN0O15n1pTHqzaN0B3E8bV5hJ5pq5VyFDQ@mail.gmail.com> (raw)
In-Reply-To: <2cc1022a7d7bc2b00ceac2175ada29554d961bff.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11357 bytes --]

I worked around the leak detection and also handled the case where the fd
is not saved. I wonder why sm-file hasn't implemented it yet. I'm attaching
a text file with analyzer warnings. I'm still on gcc-11.2.0, will move to
v12 next thing.

> I wonder if it's worth checking for attempts to write to a fd that was
> opened with O_RDONLY, or the converse?

Yes, it does not make sense to warn for write on read-only closed fd or
converse. But, it does make sense to warn for write operation on valid
read-only fd. For this, we might have to analyze calls to the open()
carefully -- get the literal value of the second argument to the call and
do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are in it.
We might have to maintain separate vectors for these and check for them in
calls to write and read -- warn if write is being called on read-only fd
and converse. How do we equate them? Does gimple store unique ids for each
tree? or something like SAME_TREE? Just thinking out loud.

> how much state does it make sense to track for a fd?
Sorry, I didnt get it. Do you mean how may states I'm using? unchecked,
closed, null and leak_by_not_saved (for the call-tree that does not have a
lhs).

> Also, at some point, we're going to have to handle "errno" - but given
> that might be somewhat fiddly it's OK to defer that until you're more
> familiar with the code

I'm looking forward to figure it out.

Thank you.


On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > Thank you for the detailed information.
> >
> > I've been looking into the integer posix  file descriptor APIs and I
> > decided to write proof-of-concept  checker for them. (not caring
> > about
> > errno). The checker tracks the fd returned by open(), warns if dup()
> > is
> > called with closed fd otherwise tracks the fd returned by dup(), it
> > also
> > warns if read() and write() functions were called on closed fd. I'm
> > attaching a text file that lists some c sources and warnings by the
> > static
> > analyzer. I've used the diagnostic meta-data from sm-file. Is this
> > something that could also be added to the analyzer?
>
> This looks great, and very promising as both new functionality for GCC
> 13, and as a GSoC 2022 project.
>
> BTW, it looks like you're working with GCC 11, but the analyzer has
> changed quite a bit on trunk for GCC 12, so it's worth trying to track
> trunk.
>
> I wonder if it's worth checking for attempts to write to a fd that was
> opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
> aloud - how much state does it make sense to track for a fd?).
>
> Also, at some point, we're going to have to handle "errno" - but given
> that might be somewhat fiddly it's OK to defer that until you're more
> familiar with the code.
>
> >
> > About the fd leak, that's the next thing I'll try to get working.
> > Since
> > you've mentioned that it could be a GSoC project, this is what I'm
> > going to
> > focus on.
>
> Excellent.
>
> Let me know (via this mailing list) if you have any questions.
>
> Thanks
> Dave
>
> >
> > Regards.
> >
> >
> >
> > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> >
> > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > Hi, sir.
> > > >
> > > > I've been trying to understand the static analyzer's code. I
> > > > spent most
> > > > of
> > > > my time learning the state machine's API. I learned how state
> > > > machine's
> > > > on_stmt is supposed to "recognize" specific functions and how
> > > > on_transition
> > > > takes a specific tree from one state to another, and how the
> > > > captured
> > > > states are used by pending_diagnostics to report the errors.
> > > > Furthermore, I
> > > > was able to create a dummy checker that mimicked the behaviour of
> > > > sm-
> > > > file's
> > > > double_fclose and compile GCC with these changes. Is this the
> > > > right way
> > > > of
> > > > learning?
> > >
> > > This sounds great.
> > >
> > > >
> > > > As you've mentioned on the projects page that you would like to
> > > > add
> > > > more
> > > > support for some POSIX APIs. Can you please write (or refer me to
> > > > a) a
> > > > simple C program that uses such an API (and also what the
> > > > analyzer
> > > > should
> > > > have done) so that I can attempt to add such a checker to the
> > > > analyzer.
> > >
> > > A couple of project ideas:
> > >
> > > (i) treat data coming from a network connection as tainted, by
> > > somehow
> > > teaching the analyzer about networking APIs.  Ideally: look at some
> > > subset of historical CVEs involving network-facing attacks on user-
> > > space daemons, and find a way to detect them in the analyzer (need
> > > to
> > > find a way to mark the incoming data as tainted, so that the
> > > analyer
> > > "know" about the trust boundary - that the incoming data needs to
> > > be
> > > sanitized and treated with extra caution; see
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > for my attempts to do this for the Linux kernel).
> > >
> > > Obviously this is potentially a huge project, so maybe just picking
> > > a
> > > tiny subset and getting that working as a proof-of-concept would be
> > > a
> > > good GSoC project.  Maybe find an old CVE that someone has written
> > > a
> > > good write-up for, and think about "how could GCC/-fanalyzer have
> > > spotted it?"
> > >
> > > (ii) add leak-detection for POSIX file descriptors: i.e. the
> > > integer
> > > values returned by "open", "dup", etc.  It would be good to have a
> > > check that the user's code doesn't leak these values e.g. on error-
> > > handling paths, by failing to close a file-descriptor (and not
> > > storing
> > > it anywhere).  I think that much of this could be done by analogy
> > > with
> > > the sm-file.cc code.
> > >
> > >
> > > >
> > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > mentioned
> > > > it.
> > > > I'd rather work on adding more checkers.
> > >
> > > Fair enough.
> > >
> > > Hope this above is constructive.
> > >
> > > Dave
> > >
> > > >
> > > > Regards.
> > > >
> > > > Mir Immad
> > > >
> > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Sir,
> > > > >
> > > > > I've been trying to understand the static analyzer's code. I
> > > > > spent
> > > > > most of
> > > > > my time learning the state machine's API. I learned how state
> > > > > machine's
> > > > > on_stmt is supposed to "recognize" specific functions and takes
> > > > > a
> > > > > specific
> > > > > tree from one state to another, and how the captured states are
> > > > > used
> > > > > by
> > > > > pending_diagnostics to report the errors. Furthermore, I was
> > > > > able to
> > > > > create
> > > > > a dummy checker that mimicked the behaviour of sm-file's
> > > > > double_fclose and
> > > > > compile GCC with these changes. Is this the right way of
> > > > > learning?
> > > > >
> > > > > As you've mentioned on the projects page that you would like to
> > > > > add
> > > > > more
> > > > > support for some POSIX APIs. Can you please write (or refer me
> > > > > to a)
> > > > > a
> > > > > simple C program that uses such an API (and also what the
> > > > > analyzer
> > > > > should
> > > > > have done) so that I can attempt to add such a checker to the
> > > > > analyzer.
> > > > >
> > > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > > mentioned it.
> > > > > I'd rather work on adding more checkers.
> > > > >
> > > > > Regards.
> > > > > Mir Immad
> > > > >
> > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > HI David,
> > > > > > > I've been tinkering with the static analyzer for the last
> > > > > > > few
> > > > > > > days. I
> > > > > > > find
> > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > intresting.
> > > > > > > I'm
> > > > > > > writing
> > > > > > > this to let you know that I'm trying to learn the codebase.
> > > > > > > Thank you.
> > > > > >
> > > > > > Excellent.
> > > > > >
> > > > > > BTW, I think adding SARIF output would involve working more
> > > > > > with
> > > > > > GCC's
> > > > > > diagnostics subsystem than with the static analyzer, since
> > > > > > (in
> > > > > > theory)
> > > > > > all of the static analyzer's output is passing through the
> > > > > > diagnostics
> > > > > > subsystem - though the static analyzer is probably the only
> > > > > > GCC
> > > > > > component generating diagnostic paths.
> > > > > >
> > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > subsystems
> > > > > > and
> > > > > > SARIF output would benefit both - but it would be rather
> > > > > > tangential
> > > > > > to
> > > > > > the analyzer - so if you had specifically wanted to be
> > > > > > working on
> > > > > > the
> > > > > > guts of the analyzer itself, you may want to pick a different
> > > > > > subproject.
> > > > > >
> > > > > > The SARIF standard is rather long and complicated, and we
> > > > > > would
> > > > > > want to
> > > > > > be compatible with clang's implementation.
> > > > > >
> > > > > > It would be very cool if gcc could also accept SARIF files as
> > > > > > an
> > > > > > *input* format, and emit them as diagnostics; that might help
> > > > > > with
> > > > > > debugging SARIF output.   (I have a old patch for adding JSON
> > > > > > parsing
> > > > > > support to GCC that could be used as a starting point for
> > > > > > this).
> > > > > >
> > > > > > Hope the above makes sense
> > > > > > Dave
> > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > dmalcolm@redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
> > > > > > > > wrote:
> > > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > Hi, and welcome.
> > > > > > > >
> > > > > > > > > I intend to work on the static analyzer. Are these
> > > > > > > > > documents
> > > > > > > > > enough to
> > > > > > > > > get
> > > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > >
> > >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > There are also some high-level notes here:
> > > > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > >
> > > > > > > > Also, given that the analyzer is part of GCC, the more
> > > > > > > > general
> > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > >
> > > > > > > > I recommend creating a trivial C source file with a bug
> > > > > > > > in it
> > > > > > > > (e.g.
> > > > > > > > a
> > > > > > > > 3-line function with a use-after-free), and stepping
> > > > > > > > through
> > > > > > > > the
> > > > > > > > analyzer to get a sense of how it works.
> > > > > > > >
> > > > > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > > > > Dave
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > >
> > >
> > >
>
>
>

[-- Attachment #2: gcc-leak-output.txt --]
[-- Type: text/plain, Size: 1048 bytes --]

$ cat leak.c

#include <fcntl.h>

void test ()
{
   int fd = open("test", O_RDONLY);
}

void main(){}

$ gcc-11.2.0 leak.c -fanalyzer

leak.c: In function ‘test’:
leak.c:6:1: warning: leak of FILE ‘fd’ [CWE-775] [-Wanalyzer-file-leak]
    6 | }
      | ^
  ‘test’: events 1-2
    |
    |    5 |    int fd = open("test", O_RDONLY);
    |      |             ^~~~~~~~~~~~~~~~~~~~~~
    |      |             |
    |      |             (1) opened here
    |    6 | }
    |      | ~            
    |      | |
    |      | (2) leaks here
    |

$ cat leak-by-not-saving.c

#include <fcntl.h>

void test ()
{
   open("test", O_RDONLY);
}

void main(){}

$ gcc leak-by-not-saving.c -fanalyzer

leak.c: In function ‘test’:
leak.c:5:4: warning: leak by not saving fd [CWE-775] [-Wanalyzer-file-leak]
    5 |    open("test", O_RDONLY);
      |    ^~~~~~~~~~~~~~~~~~~~~~
  ‘test’: event 1
    |
    |    5 |    open("test", O_RDONLY);
    |      |    ^~~~~~~~~~~~~~~~~~~~~~
    |      |    |
    |      |    (1) leak by not saving fd
    |

  reply	other threads:[~2022-02-01 14:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11  5:33 Mir Immad
2022-01-11 13:39 ` David Malcolm
2022-01-14 16:45   ` Mir Immad
2022-01-17  0:11     ` David Malcolm
     [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
2022-01-23 20:11         ` Mir Immad
2022-01-24 14:19           ` Ankur Saini
2022-01-26 14:31             ` David Malcolm
2022-01-26 14:26           ` David Malcolm
2022-01-29 14:52             ` Mir Immad
2022-01-29 17:39               ` David Malcolm
2022-02-01 14:58                 ` Mir Immad [this message]
2022-02-13 15:46                   ` Mir Immad
2022-02-13 22:57                     ` David Malcolm
2022-02-13 23:05                       ` David Malcolm
2022-04-04 16:16                         ` Mir Immad
2022-04-08 18:19                           ` David Malcolm
     [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
2022-02-14 12:59 ` Basile Starynkevitch
2022-02-14 13:12   ` Basile Starynkevitch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAE1-7owjFc_P-UC0GN0O15n1pTHqzaN0B3E8bV5hJ5pq5VyFDQ@mail.gmail.com \
    --to=mirimnan017@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).