public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Mir Immad <mirimnan017@gmail.com>
Cc: gcc@gcc.gnu.org
Subject: Re: GSoC: Working on the static analyzer
Date: Wed, 26 Jan 2022 09:26:07 -0500	[thread overview]
Message-ID: <d08c57511dea46832da8ad30c30dc410342b24f3.camel@redhat.com> (raw)
In-Reply-To: <CAE1-7oz+8mHfRhFkwsKO=y9_Z5AqYsFSKxG+1fJc6LyN4-cR5g@mail.gmail.com>

On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> Hi, sir.
> 
> I've been trying to understand the static analyzer's code. I spent most
> of
> my time learning the state machine's API. I learned how state machine's
> on_stmt is supposed to "recognize" specific functions and how
> on_transition
> takes a specific tree from one state to another, and how the captured
> states are used by pending_diagnostics to report the errors.
> Furthermore, I
> was able to create a dummy checker that mimicked the behaviour of sm-
> file's
> double_fclose and compile GCC with these changes. Is this the right way
> of
> learning?

This sounds great.

> 
> As you've mentioned on the projects page that you would like to add
> more
> support for some POSIX APIs. Can you please write (or refer me to a) a
> simple C program that uses such an API (and also what the analyzer
> should
> have done) so that I can attempt to add such a checker to the analyzer.

A couple of project ideas:

(i) treat data coming from a network connection as tainted, by somehow
teaching the analyzer about networking APIs.  Ideally: look at some
subset of historical CVEs involving network-facing attacks on user-
space daemons, and find a way to detect them in the analyzer (need to
find a way to mark the incoming data as tainted, so that the analyer
"know" about the trust boundary - that the incoming data needs to be
sanitized and treated with extra caution; see
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
for my attempts to do this for the Linux kernel).

Obviously this is potentially a huge project, so maybe just picking a
tiny subset and getting that working as a proof-of-concept would be a
good GSoC project.  Maybe find an old CVE that someone has written a
good write-up for, and think about "how could GCC/-fanalyzer have
spotted it?"

(ii) add leak-detection for POSIX file descriptors: i.e. the integer
values returned by "open", "dup", etc.  It would be good to have a
check that the user's code doesn't leak these values e.g. on error-
handling paths, by failing to close a file-descriptor (and not storing
it anywhere).  I think that much of this could be done by analogy with
the sm-file.cc code.


> 
> Also, I didn't realize the complexity of adding SARIF when I mentioned
> it.
> I'd rather work on adding more checkers.

Fair enough.

Hope this above is constructive.

Dave

> 
> Regards.
> 
> Mir Immad
> 
> On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com> wrote:
> 
> > Hi Sir,
> > 
> > I've been trying to understand the static analyzer's code. I spent
> > most of
> > my time learning the state machine's API. I learned how state
> > machine's
> > on_stmt is supposed to "recognize" specific functions and takes a 
> > specific
> > tree from one state to another, and how the captured states are used
> > by
> > pending_diagnostics to report the errors. Furthermore, I was able to
> > create
> > a dummy checker that mimicked the behaviour of sm-file's
> > double_fclose and
> > compile GCC with these changes. Is this the right way of learning?
> > 
> > As you've mentioned on the projects page that you would like to add
> > more
> > support for some POSIX APIs. Can you please write (or refer me to a)
> > a
> > simple C program that uses such an API (and also what the analyzer
> > should
> > have done) so that I can attempt to add such a checker to the
> > analyzer.
> > 
> > Also, I didn't realize the complexity of adding SARIF when I
> > mentioned it.
> > I'd rather work on adding more checkers.
> > 
> > Regards.
> > Mir Immad
> > 
> > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > HI David,
> > > > I've been tinkering with the static analyzer for the last few
> > > > days. I
> > > > find
> > > > the project of adding SARIF output to the analyzer intresting.
> > > > I'm
> > > > writing
> > > > this to let you know that I'm trying to learn the codebase.
> > > > Thank you.
> > > 
> > > Excellent.
> > > 
> > > BTW, I think adding SARIF output would involve working more with
> > > GCC's
> > > diagnostics subsystem than with the static analyzer, since (in
> > > theory)
> > > all of the static analyzer's output is passing through the
> > > diagnostics
> > > subsystem - though the static analyzer is probably the only GCC
> > > component generating diagnostic paths.
> > > 
> > > I'm happy to mentor such a project as I maintain both subsystems
> > > and
> > > SARIF output would benefit both - but it would be rather tangential
> > > to
> > > the analyzer - so if you had specifically wanted to be working on
> > > the
> > > guts of the analyzer itself, you may want to pick a different
> > > subproject.
> > > 
> > > The SARIF standard is rather long and complicated, and we would
> > > want to
> > > be compatible with clang's implementation.
> > > 
> > > It would be very cool if gcc could also accept SARIF files as an
> > > *input* format, and emit them as diagnostics; that might help with
> > > debugging SARIF output.   (I have a old patch for adding JSON
> > > parsing
> > > support to GCC that could be used as a starting point for this).
> > > 
> > > Hope the above makes sense
> > > Dave
> > > 
> > > > 
> > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <  
> > > > dmalcolm@redhat.com>
> > > > wrote:
> > > > 
> > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > > > > Hi everyone,
> > > > > 
> > > > > Hi, and welcome.
> > > > > 
> > > > > > I intend to work on the static analyzer. Are these documents
> > > > > > enough to
> > > > > > get
> > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > 
> > > > > 
> > >   
> > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > There are also some high-level notes here:
> > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > 
> > > > > Also, given that the analyzer is part of GCC, the more general
> > > > > introductions to hacking on GCC will be useful.
> > > > > 
> > > > > I recommend creating a trivial C source file with a bug in it
> > > > > (e.g.
> > > > > a
> > > > > 3-line function with a use-after-free), and stepping through
> > > > > the
> > > > > analyzer to get a sense of how it works.
> > > > > 
> > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > Dave
> > > > > 
> > > > > 
> > > 
> > > 
> > > 



  parent reply	other threads:[~2022-01-26 14:26 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-11  5:33 Mir Immad
2022-01-11 13:39 ` David Malcolm
2022-01-14 16:45   ` Mir Immad
2022-01-17  0:11     ` David Malcolm
     [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
2022-01-23 20:11         ` Mir Immad
2022-01-24 14:19           ` Ankur Saini
2022-01-26 14:31             ` David Malcolm
2022-01-26 14:26           ` David Malcolm [this message]
2022-01-29 14:52             ` Mir Immad
2022-01-29 17:39               ` David Malcolm
2022-02-01 14:58                 ` Mir Immad
2022-02-13 15:46                   ` Mir Immad
2022-02-13 22:57                     ` David Malcolm
2022-02-13 23:05                       ` David Malcolm
2022-04-04 16:16                         ` Mir Immad
2022-04-08 18:19                           ` David Malcolm
     [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
2022-02-14 12:59 ` Basile Starynkevitch
2022-02-14 13:12   ` Basile Starynkevitch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d08c57511dea46832da8ad30c30dc410342b24f3.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=mirimnan017@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).