public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GSoC: Working on the static analyzer
@ 2022-01-11  5:33 Mir Immad
  2022-01-11 13:39 ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-01-11  5:33 UTC (permalink / raw)
  To: gcc

Hi everyone,

I intend to work on the static analyzer. Are these documents enough to get
started: https://gcc.gnu.org/onlinedocs/gccint and
https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
.

Thank you.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-11  5:33 GSoC: Working on the static analyzer Mir Immad
@ 2022-01-11 13:39 ` David Malcolm
  2022-01-14 16:45   ` Mir Immad
  0 siblings, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-01-11 13:39 UTC (permalink / raw)
  To: Mir Immad, gcc

On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> Hi everyone,

Hi, and welcome.

> I intend to work on the static analyzer. Are these documents enough to
> get
> started: https://gcc.gnu.org/onlinedocs/gccint and
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals

Yes.

There are also some high-level notes here:
  https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer

Also, given that the analyzer is part of GCC, the more general
introductions to hacking on GCC will be useful.

I recommend creating a trivial C source file with a bug in it (e.g. a
3-line function with a use-after-free), and stepping through the
analyzer to get a sense of how it works.

Hope this is helpful; don't hesitate to ask questions.
Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-11 13:39 ` David Malcolm
@ 2022-01-14 16:45   ` Mir Immad
  2022-01-17  0:11     ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-01-14 16:45 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

HI David,
I've been tinkering with the static analyzer for the last few days. I find
the project of adding SARIF output to the analyzer intresting. I'm writing
this to let you know that I'm trying to learn the codebase.
Thank you.

On Tue, Jan 11, 2022, 7:09 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > Hi everyone,
>
> Hi, and welcome.
>
> > I intend to work on the static analyzer. Are these documents enough to
> > get
> > started: https://gcc.gnu.org/onlinedocs/gccint and
> >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
>
> Yes.
>
> There are also some high-level notes here:
>   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
>
> Also, given that the analyzer is part of GCC, the more general
> introductions to hacking on GCC will be useful.
>
> I recommend creating a trivial C source file with a bug in it (e.g. a
> 3-line function with a use-after-free), and stepping through the
> analyzer to get a sense of how it works.
>
> Hope this is helpful; don't hesitate to ask questions.
> Dave
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-14 16:45   ` Mir Immad
@ 2022-01-17  0:11     ` David Malcolm
       [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
  0 siblings, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-01-17  0:11 UTC (permalink / raw)
  To: Mir Immad; +Cc: gcc

On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> HI David,
> I've been tinkering with the static analyzer for the last few days. I
> find
> the project of adding SARIF output to the analyzer intresting. I'm
> writing
> this to let you know that I'm trying to learn the codebase.
> Thank you.

Excellent.

BTW, I think adding SARIF output would involve working more with GCC's
diagnostics subsystem than with the static analyzer, since (in theory)
all of the static analyzer's output is passing through the diagnostics
subsystem - though the static analyzer is probably the only GCC
component generating diagnostic paths.

I'm happy to mentor such a project as I maintain both subsystems and
SARIF output would benefit both - but it would be rather tangential to
the analyzer - so if you had specifically wanted to be working on the
guts of the analyzer itself, you may want to pick a different
subproject.

The SARIF standard is rather long and complicated, and we would want to
be compatible with clang's implementation.

It would be very cool if gcc could also accept SARIF files as an
*input* format, and emit them as diagnostics; that might help with
debugging SARIF output.   (I have a old patch for adding JSON parsing 
support to GCC that could be used as a starting point for this).

Hope the above makes sense
Dave

> 
> On Tue, Jan 11, 2022, 7:09 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> 
> > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > Hi everyone,
> > 
> > Hi, and welcome.
> > 
> > > I intend to work on the static analyzer. Are these documents
> > > enough to
> > > get
> > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > 
> > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > 
> > Yes.
> > 
> > There are also some high-level notes here:
> >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > 
> > Also, given that the analyzer is part of GCC, the more general
> > introductions to hacking on GCC will be useful.
> > 
> > I recommend creating a trivial C source file with a bug in it (e.g.
> > a
> > 3-line function with a use-after-free), and stepping through the
> > analyzer to get a sense of how it works.
> > 
> > Hope this is helpful; don't hesitate to ask questions.
> > Dave
> > 
> > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
       [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
@ 2022-01-23 20:11         ` Mir Immad
  2022-01-24 14:19           ` Ankur Saini
  2022-01-26 14:26           ` David Malcolm
  0 siblings, 2 replies; 18+ messages in thread
From: Mir Immad @ 2022-01-23 20:11 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi, sir.

I've been trying to understand the static analyzer's code. I spent most of
my time learning the state machine's API. I learned how state machine's
on_stmt is supposed to "recognize" specific functions and how on_transition
takes a specific tree from one state to another, and how the captured
states are used by pending_diagnostics to report the errors. Furthermore, I
was able to create a dummy checker that mimicked the behaviour of sm-file's
double_fclose and compile GCC with these changes. Is this the right way of
learning?

As you've mentioned on the projects page that you would like to add more
support for some POSIX APIs. Can you please write (or refer me to a) a
simple C program that uses such an API (and also what the analyzer should
have done) so that I can attempt to add such a checker to the analyzer.

Also, I didn't realize the complexity of adding SARIF when I mentioned it.
I'd rather work on adding more checkers.

Regards.

Mir Immad

On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com> wrote:

> Hi Sir,
>
> I've been trying to understand the static analyzer's code. I spent most of
> my time learning the state machine's API. I learned how state machine's
> on_stmt is supposed to "recognize" specific functions and takes a  specific
> tree from one state to another, and how the captured states are used by
> pending_diagnostics to report the errors. Furthermore, I was able to create
> a dummy checker that mimicked the behaviour of sm-file's double_fclose and
> compile GCC with these changes. Is this the right way of learning?
>
> As you've mentioned on the projects page that you would like to add more
> support for some POSIX APIs. Can you please write (or refer me to a) a
> simple C program that uses such an API (and also what the analyzer should
> have done) so that I can attempt to add such a checker to the analyzer.
>
> Also, I didn't realize the complexity of adding SARIF when I mentioned it.
> I'd rather work on adding more checkers.
>
> Regards.
> Mir Immad
>
> On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalcolm@redhat.com> wrote:
>
>> On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
>> > HI David,
>> > I've been tinkering with the static analyzer for the last few days. I
>> > find
>> > the project of adding SARIF output to the analyzer intresting. I'm
>> > writing
>> > this to let you know that I'm trying to learn the codebase.
>> > Thank you.
>>
>> Excellent.
>>
>> BTW, I think adding SARIF output would involve working more with GCC's
>> diagnostics subsystem than with the static analyzer, since (in theory)
>> all of the static analyzer's output is passing through the diagnostics
>> subsystem - though the static analyzer is probably the only GCC
>> component generating diagnostic paths.
>>
>> I'm happy to mentor such a project as I maintain both subsystems and
>> SARIF output would benefit both - but it would be rather tangential to
>> the analyzer - so if you had specifically wanted to be working on the
>> guts of the analyzer itself, you may want to pick a different
>> subproject.
>>
>> The SARIF standard is rather long and complicated, and we would want to
>> be compatible with clang's implementation.
>>
>> It would be very cool if gcc could also accept SARIF files as an
>> *input* format, and emit them as diagnostics; that might help with
>> debugging SARIF output.   (I have a old patch for adding JSON parsing
>> support to GCC that could be used as a starting point for this).
>>
>> Hope the above makes sense
>> Dave
>>
>> >
>> > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <dmalcolm@redhat.com>
>> > wrote:
>> >
>> > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
>> > > > Hi everyone,
>> > >
>> > > Hi, and welcome.
>> > >
>> > > > I intend to work on the static analyzer. Are these documents
>> > > > enough to
>> > > > get
>> > > > started: https://gcc.gnu.org/onlinedocs/gccint and
>> > > >
>> > >
>> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
>> > >
>> > > Yes.
>> > >
>> > > There are also some high-level notes here:
>> > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
>> > >
>> > > Also, given that the analyzer is part of GCC, the more general
>> > > introductions to hacking on GCC will be useful.
>> > >
>> > > I recommend creating a trivial C source file with a bug in it (e.g.
>> > > a
>> > > 3-line function with a use-after-free), and stepping through the
>> > > analyzer to get a sense of how it works.
>> > >
>> > > Hope this is helpful; don't hesitate to ask questions.
>> > > Dave
>> > >
>> > >
>>
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-23 20:11         ` Mir Immad
@ 2022-01-24 14:19           ` Ankur Saini
  2022-01-26 14:31             ` David Malcolm
  2022-01-26 14:26           ` David Malcolm
  1 sibling, 1 reply; 18+ messages in thread
From: Ankur Saini @ 2022-01-24 14:19 UTC (permalink / raw)
  To: Mir Immad; +Cc: David Malcolm, gcc

The following can be a possible example of a case where the analyzer fails
to understand POSIX file-descriptor API.

- - -
#include <stdio.h>
#include <fcntl.h>

void test()
{
    int fd;
    fd = open("foo.txt", O_RDONLY | O_CREAT);
}

void test_2()
{
    FILE *f;
    f = fopen("demo.c", "r");
}

godbolt link: https://godbolt.org/z/vbTq6fTnd
- - -

You can see that unlike the "File *” pointer ( f ), analyzer is not
tracking integer file descriptor ( fd ) which is also leaking at the end of
function "test ()” and should ideally be reported with CWE-775
( https://cwe.mitre.org/data/definitions/775.html )

If you look at the exploded graph of the given program, the analyzer is not
able to identify the call to `open ()` and treating it as a "call to
unknown function”.

- Ankur

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-23 20:11         ` Mir Immad
  2022-01-24 14:19           ` Ankur Saini
@ 2022-01-26 14:26           ` David Malcolm
  2022-01-29 14:52             ` Mir Immad
  1 sibling, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-01-26 14:26 UTC (permalink / raw)
  To: Mir Immad; +Cc: gcc

On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> Hi, sir.
> 
> I've been trying to understand the static analyzer's code. I spent most
> of
> my time learning the state machine's API. I learned how state machine's
> on_stmt is supposed to "recognize" specific functions and how
> on_transition
> takes a specific tree from one state to another, and how the captured
> states are used by pending_diagnostics to report the errors.
> Furthermore, I
> was able to create a dummy checker that mimicked the behaviour of sm-
> file's
> double_fclose and compile GCC with these changes. Is this the right way
> of
> learning?

This sounds great.

> 
> As you've mentioned on the projects page that you would like to add
> more
> support for some POSIX APIs. Can you please write (or refer me to a) a
> simple C program that uses such an API (and also what the analyzer
> should
> have done) so that I can attempt to add such a checker to the analyzer.

A couple of project ideas:

(i) treat data coming from a network connection as tainted, by somehow
teaching the analyzer about networking APIs.  Ideally: look at some
subset of historical CVEs involving network-facing attacks on user-
space daemons, and find a way to detect them in the analyzer (need to
find a way to mark the incoming data as tainted, so that the analyer
"know" about the trust boundary - that the incoming data needs to be
sanitized and treated with extra caution; see
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
for my attempts to do this for the Linux kernel).

Obviously this is potentially a huge project, so maybe just picking a
tiny subset and getting that working as a proof-of-concept would be a
good GSoC project.  Maybe find an old CVE that someone has written a
good write-up for, and think about "how could GCC/-fanalyzer have
spotted it?"

(ii) add leak-detection for POSIX file descriptors: i.e. the integer
values returned by "open", "dup", etc.  It would be good to have a
check that the user's code doesn't leak these values e.g. on error-
handling paths, by failing to close a file-descriptor (and not storing
it anywhere).  I think that much of this could be done by analogy with
the sm-file.cc code.


> 
> Also, I didn't realize the complexity of adding SARIF when I mentioned
> it.
> I'd rather work on adding more checkers.

Fair enough.

Hope this above is constructive.

Dave

> 
> Regards.
> 
> Mir Immad
> 
> On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com> wrote:
> 
> > Hi Sir,
> > 
> > I've been trying to understand the static analyzer's code. I spent
> > most of
> > my time learning the state machine's API. I learned how state
> > machine's
> > on_stmt is supposed to "recognize" specific functions and takes a 
> > specific
> > tree from one state to another, and how the captured states are used
> > by
> > pending_diagnostics to report the errors. Furthermore, I was able to
> > create
> > a dummy checker that mimicked the behaviour of sm-file's
> > double_fclose and
> > compile GCC with these changes. Is this the right way of learning?
> > 
> > As you've mentioned on the projects page that you would like to add
> > more
> > support for some POSIX APIs. Can you please write (or refer me to a)
> > a
> > simple C program that uses such an API (and also what the analyzer
> > should
> > have done) so that I can attempt to add such a checker to the
> > analyzer.
> > 
> > Also, I didn't realize the complexity of adding SARIF when I
> > mentioned it.
> > I'd rather work on adding more checkers.
> > 
> > Regards.
> > Mir Immad
> > 
> > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > HI David,
> > > > I've been tinkering with the static analyzer for the last few
> > > > days. I
> > > > find
> > > > the project of adding SARIF output to the analyzer intresting.
> > > > I'm
> > > > writing
> > > > this to let you know that I'm trying to learn the codebase.
> > > > Thank you.
> > > 
> > > Excellent.
> > > 
> > > BTW, I think adding SARIF output would involve working more with
> > > GCC's
> > > diagnostics subsystem than with the static analyzer, since (in
> > > theory)
> > > all of the static analyzer's output is passing through the
> > > diagnostics
> > > subsystem - though the static analyzer is probably the only GCC
> > > component generating diagnostic paths.
> > > 
> > > I'm happy to mentor such a project as I maintain both subsystems
> > > and
> > > SARIF output would benefit both - but it would be rather tangential
> > > to
> > > the analyzer - so if you had specifically wanted to be working on
> > > the
> > > guts of the analyzer itself, you may want to pick a different
> > > subproject.
> > > 
> > > The SARIF standard is rather long and complicated, and we would
> > > want to
> > > be compatible with clang's implementation.
> > > 
> > > It would be very cool if gcc could also accept SARIF files as an
> > > *input* format, and emit them as diagnostics; that might help with
> > > debugging SARIF output.   (I have a old patch for adding JSON
> > > parsing
> > > support to GCC that could be used as a starting point for this).
> > > 
> > > Hope the above makes sense
> > > Dave
> > > 
> > > > 
> > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <  
> > > > dmalcolm@redhat.com>
> > > > wrote:
> > > > 
> > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > > > > Hi everyone,
> > > > > 
> > > > > Hi, and welcome.
> > > > > 
> > > > > > I intend to work on the static analyzer. Are these documents
> > > > > > enough to
> > > > > > get
> > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > 
> > > > > 
> > >   
> > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > 
> > > > > Yes.
> > > > > 
> > > > > There are also some high-level notes here:
> > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > 
> > > > > Also, given that the analyzer is part of GCC, the more general
> > > > > introductions to hacking on GCC will be useful.
> > > > > 
> > > > > I recommend creating a trivial C source file with a bug in it
> > > > > (e.g.
> > > > > a
> > > > > 3-line function with a use-after-free), and stepping through
> > > > > the
> > > > > analyzer to get a sense of how it works.
> > > > > 
> > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > Dave
> > > > > 
> > > > > 
> > > 
> > > 
> > > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-24 14:19           ` Ankur Saini
@ 2022-01-26 14:31             ` David Malcolm
  0 siblings, 0 replies; 18+ messages in thread
From: David Malcolm @ 2022-01-26 14:31 UTC (permalink / raw)
  To: Ankur Saini, Mir Immad; +Cc: gcc

On Mon, 2022-01-24 at 19:49 +0530, Ankur Saini wrote:
> The following can be a possible example of a case where the analyzer
> fails
> to understand POSIX file-descriptor API.
> 
> - - -
> #include <stdio.h>
> #include <fcntl.h>
> 
> void test()
> {
>     int fd;
>     fd = open("foo.txt", O_RDONLY | O_CREAT);
> }
> 
> void test_2()
> {
>     FILE *f;
>     f = fopen("demo.c", "r");
> }
> 
> godbolt link: https://godbolt.org/z/vbTq6fTnd
> - - -
> 
> You can see that unlike the "File *” pointer ( f ), analyzer is not
> tracking integer file descriptor ( fd ) which is also leaking at the
> end of
> function "test ()” and should ideally be reported with CWE-775
> ( https://cwe.mitre.org/data/definitions/775.html )
> 
> If you look at the exploded graph of the given program, the analyzer
> is not
> able to identify the call to `open ()` and treating it as a "call to
> unknown function”.

Thanks, that's a good explanation.

The analyzer could handle the "open" call by bifurcating the state into
"succeeded" and "failed" cases; see region_model::impl_call_strchr for
an example of this.  We don't yet have a way for the analyzer to know
about functions that set errno, but the "failed" case ought to do so.

Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-26 14:26           ` David Malcolm
@ 2022-01-29 14:52             ` Mir Immad
  2022-01-29 17:39               ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-01-29 14:52 UTC (permalink / raw)
  To: David Malcolm, gcc

[-- Attachment #1: Type: text/plain, Size: 7675 bytes --]

Thank you for the detailed information.

I've been looking into the integer posix  file descriptor APIs and I
decided to write proof-of-concept  checker for them. (not caring about
errno). The checker tracks the fd returned by open(), warns if dup() is
called with closed fd otherwise tracks the fd returned by dup(), it also
warns if read() and write() functions were called on closed fd. I'm
attaching a text file that lists some c sources and warnings by the static
analyzer. I've used the diagnostic meta-data from sm-file. Is this
something that could also be added to the analyzer?

About the fd leak, that's the next thing I'll try to get working. Since
you've mentioned that it could be a GSoC project, this is what I'm going to
focus on.

Regards.



On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > Hi, sir.
> >
> > I've been trying to understand the static analyzer's code. I spent most
> > of
> > my time learning the state machine's API. I learned how state machine's
> > on_stmt is supposed to "recognize" specific functions and how
> > on_transition
> > takes a specific tree from one state to another, and how the captured
> > states are used by pending_diagnostics to report the errors.
> > Furthermore, I
> > was able to create a dummy checker that mimicked the behaviour of sm-
> > file's
> > double_fclose and compile GCC with these changes. Is this the right way
> > of
> > learning?
>
> This sounds great.
>
> >
> > As you've mentioned on the projects page that you would like to add
> > more
> > support for some POSIX APIs. Can you please write (or refer me to a) a
> > simple C program that uses such an API (and also what the analyzer
> > should
> > have done) so that I can attempt to add such a checker to the analyzer.
>
> A couple of project ideas:
>
> (i) treat data coming from a network connection as tainted, by somehow
> teaching the analyzer about networking APIs.  Ideally: look at some
> subset of historical CVEs involving network-facing attacks on user-
> space daemons, and find a way to detect them in the analyzer (need to
> find a way to mark the incoming data as tainted, so that the analyer
> "know" about the trust boundary - that the incoming data needs to be
> sanitized and treated with extra caution; see
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> for my attempts to do this for the Linux kernel).
>
> Obviously this is potentially a huge project, so maybe just picking a
> tiny subset and getting that working as a proof-of-concept would be a
> good GSoC project.  Maybe find an old CVE that someone has written a
> good write-up for, and think about "how could GCC/-fanalyzer have
> spotted it?"
>
> (ii) add leak-detection for POSIX file descriptors: i.e. the integer
> values returned by "open", "dup", etc.  It would be good to have a
> check that the user's code doesn't leak these values e.g. on error-
> handling paths, by failing to close a file-descriptor (and not storing
> it anywhere).  I think that much of this could be done by analogy with
> the sm-file.cc code.
>
>
> >
> > Also, I didn't realize the complexity of adding SARIF when I mentioned
> > it.
> > I'd rather work on adding more checkers.
>
> Fair enough.
>
> Hope this above is constructive.
>
> Dave
>
> >
> > Regards.
> >
> > Mir Immad
> >
> > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com> wrote:
> >
> > > Hi Sir,
> > >
> > > I've been trying to understand the static analyzer's code. I spent
> > > most of
> > > my time learning the state machine's API. I learned how state
> > > machine's
> > > on_stmt is supposed to "recognize" specific functions and takes a
> > > specific
> > > tree from one state to another, and how the captured states are used
> > > by
> > > pending_diagnostics to report the errors. Furthermore, I was able to
> > > create
> > > a dummy checker that mimicked the behaviour of sm-file's
> > > double_fclose and
> > > compile GCC with these changes. Is this the right way of learning?
> > >
> > > As you've mentioned on the projects page that you would like to add
> > > more
> > > support for some POSIX APIs. Can you please write (or refer me to a)
> > > a
> > > simple C program that uses such an API (and also what the analyzer
> > > should
> > > have done) so that I can attempt to add such a checker to the
> > > analyzer.
> > >
> > > Also, I didn't realize the complexity of adding SARIF when I
> > > mentioned it.
> > > I'd rather work on adding more checkers.
> > >
> > > Regards.
> > > Mir Immad
> > >
> > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <dmalcolm@redhat.com>
> > > wrote:
> > >
> > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > HI David,
> > > > > I've been tinkering with the static analyzer for the last few
> > > > > days. I
> > > > > find
> > > > > the project of adding SARIF output to the analyzer intresting.
> > > > > I'm
> > > > > writing
> > > > > this to let you know that I'm trying to learn the codebase.
> > > > > Thank you.
> > > >
> > > > Excellent.
> > > >
> > > > BTW, I think adding SARIF output would involve working more with
> > > > GCC's
> > > > diagnostics subsystem than with the static analyzer, since (in
> > > > theory)
> > > > all of the static analyzer's output is passing through the
> > > > diagnostics
> > > > subsystem - though the static analyzer is probably the only GCC
> > > > component generating diagnostic paths.
> > > >
> > > > I'm happy to mentor such a project as I maintain both subsystems
> > > > and
> > > > SARIF output would benefit both - but it would be rather tangential
> > > > to
> > > > the analyzer - so if you had specifically wanted to be working on
> > > > the
> > > > guts of the analyzer itself, you may want to pick a different
> > > > subproject.
> > > >
> > > > The SARIF standard is rather long and complicated, and we would
> > > > want to
> > > > be compatible with clang's implementation.
> > > >
> > > > It would be very cool if gcc could also accept SARIF files as an
> > > > *input* format, and emit them as diagnostics; that might help with
> > > > debugging SARIF output.   (I have a old patch for adding JSON
> > > > parsing
> > > > support to GCC that could be used as a starting point for this).
> > > >
> > > > Hope the above makes sense
> > > > Dave
> > > >
> > > > >
> > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > > > > > Hi everyone,
> > > > > >
> > > > > > Hi, and welcome.
> > > > > >
> > > > > > > I intend to work on the static analyzer. Are these documents
> > > > > > > enough to
> > > > > > > get
> > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > >
> > > > > >
> > > >
> > > >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > There are also some high-level notes here:
> > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > >
> > > > > > Also, given that the analyzer is part of GCC, the more general
> > > > > > introductions to hacking on GCC will be useful.
> > > > > >
> > > > > > I recommend creating a trivial C source file with a bug in it
> > > > > > (e.g.
> > > > > > a
> > > > > > 3-line function with a use-after-free), and stepping through
> > > > > > the
> > > > > > analyzer to get a sense of how it works.
> > > > > >
> > > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > > Dave
> > > > > >
> > > > > >
> > > >
> > > >
> > > >
>
>
>

[-- Attachment #2: gcc-output.txt --]
[-- Type: text/plain, Size: 3649 bytes --]

$ cat double-close.c

#include <fcntl.h>
#include <unistd.h>
void test()
{
  int fd = open("test.txt", O_RDONLY);
  close(fd);
  close(fd);
}

int main() {}

$ gcc-11.2.0 double-close.c -fanalyzer

double-close.c: In function ‘test’:
double-close.c:7:3: warning: double close on fd [-Wanalyzer-double-fclose]
    7 |   close(fd);
      |   ^~~~~~~~~
  ‘test’: events 1-3
    |
    |    5 |   int fd = open("test.txt", O_RDONLY);
    |      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |            |
    |      |            (1) opened here
    |    6 |   close(fd);
    |      |   ~~~~~~~~~ 
    |      |   |
    |      |   (2) first ‘close’ here
    |    7 |   close(fd);
    |      |   ~~~~~~~~~ 
    |      |   |
    |      |   (3) second ‘close’ was here; first ‘close’ was at (2)
    |

$ cat dup.c

#include <fcntl.h>
#include <unistd.h>

void test() {
	int fd = open("test.txt", O_RDONLY);
	close(fd);
	int fd2 = dup(fd);
}
int main(){}

$ gcc-11.2.0 dup.c -fanalyzer

dup.c: In function ‘test’:
dup.c:7:19: warning: dup on closed fd [-Wanalyzer-double-fclose]
    7 |         int fd2 = dup(fd);
      |                   ^~~~~~~
  ‘test’: events 1-3
    |
    |    5 |         int fd = open("test.txt", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    6 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |    7 |         int fd2 = dup(fd);
    |      |                   ~~~~~~~
    |      |                   |
    |      |                   (3) dup on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |

$ cat read-write.c

#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>

void test ()
{
	int fd = open ("text", O_RDONLY);
	close(fd);
	write(fd, "hello", 5); //write in closed fd
	char * s = (char *) malloc(2);
	read(fd, s, 2); //read on closed fd
	free(s);
}

void main(){}

$ gcc-11.2.0 read-write.c -fanalyzer

read-write.c: In function ‘test’:
read-write.c:9:9: warning: write on closed fd [-Wanalyzer-double-fclose]
    9 |         write(fd, "hello", 5); //write in closed fd
      |         ^~~~~~~~~~~~~~~~~~~~~
  ‘test’: events 1-3
    |
    |    7 |         int fd = open ("text", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    8 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |    9 |         write(fd, "hello", 5); //write in closed fd
    |      |         ~~~~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) write on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |
read-write.c:11:9: warning: read on closed fd [-Wanalyzer-double-fclose]
   11 |         read(fd, s, 2); //read on closed fd
      |         ^~~~~~~~~~~~~~
  ‘test’: events 1-3
    |
    |    7 |         int fd = open ("text", O_RDONLY);
    |      |                  ^~~~~~~~~~~~~~~~~~~~~~~
    |      |                  |
    |      |                  (1) opened here
    |    8 |         close(fd);
    |      |         ~~~~~~~~~ 
    |      |         |
    |      |         (2) first ‘close’ here
    |......
    |   11 |         read(fd, s, 2); //read on closed fd
    |      |         ~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) read on closed file-descriptor ‘fd’ was called here which was closed at (2)
    |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-29 14:52             ` Mir Immad
@ 2022-01-29 17:39               ` David Malcolm
  2022-02-01 14:58                 ` Mir Immad
  0 siblings, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-01-29 17:39 UTC (permalink / raw)
  To: Mir Immad, gcc

On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> Thank you for the detailed information.
> 
> I've been looking into the integer posix  file descriptor APIs and I
> decided to write proof-of-concept  checker for them. (not caring
> about
> errno). The checker tracks the fd returned by open(), warns if dup()
> is
> called with closed fd otherwise tracks the fd returned by dup(), it
> also
> warns if read() and write() functions were called on closed fd. I'm
> attaching a text file that lists some c sources and warnings by the
> static
> analyzer. I've used the diagnostic meta-data from sm-file. Is this
> something that could also be added to the analyzer?

This looks great, and very promising as both new functionality for GCC
13, and as a GSoC 2022 project.

BTW, it looks like you're working with GCC 11, but the analyzer has
changed quite a bit on trunk for GCC 12, so it's worth trying to track
trunk.

I wonder if it's worth checking for attempts to write to a fd that was
opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
aloud - how much state does it make sense to track for a fd?).

Also, at some point, we're going to have to handle "errno" - but given
that might be somewhat fiddly it's OK to defer that until you're more
familiar with the code.

> 
> About the fd leak, that's the next thing I'll try to get working.
> Since
> you've mentioned that it could be a GSoC project, this is what I'm
> going to
> focus on.

Excellent.

Let me know (via this mailing list) if you have any questions.

Thanks
Dave

> 
> Regards.
> 
> 
> 
> On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> 
> > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > Hi, sir.
> > > 
> > > I've been trying to understand the static analyzer's code. I
> > > spent most
> > > of
> > > my time learning the state machine's API. I learned how state
> > > machine's
> > > on_stmt is supposed to "recognize" specific functions and how
> > > on_transition
> > > takes a specific tree from one state to another, and how the
> > > captured
> > > states are used by pending_diagnostics to report the errors.
> > > Furthermore, I
> > > was able to create a dummy checker that mimicked the behaviour of
> > > sm-
> > > file's
> > > double_fclose and compile GCC with these changes. Is this the
> > > right way
> > > of
> > > learning?
> > 
> > This sounds great.
> > 
> > > 
> > > As you've mentioned on the projects page that you would like to
> > > add
> > > more
> > > support for some POSIX APIs. Can you please write (or refer me to
> > > a) a
> > > simple C program that uses such an API (and also what the
> > > analyzer
> > > should
> > > have done) so that I can attempt to add such a checker to the
> > > analyzer.
> > 
> > A couple of project ideas:
> > 
> > (i) treat data coming from a network connection as tainted, by
> > somehow
> > teaching the analyzer about networking APIs.  Ideally: look at some
> > subset of historical CVEs involving network-facing attacks on user-
> > space daemons, and find a way to detect them in the analyzer (need
> > to
> > find a way to mark the incoming data as tainted, so that the
> > analyer
> > "know" about the trust boundary - that the incoming data needs to
> > be
> > sanitized and treated with extra caution; see
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > for my attempts to do this for the Linux kernel).
> > 
> > Obviously this is potentially a huge project, so maybe just picking
> > a
> > tiny subset and getting that working as a proof-of-concept would be
> > a
> > good GSoC project.  Maybe find an old CVE that someone has written
> > a
> > good write-up for, and think about "how could GCC/-fanalyzer have
> > spotted it?"
> > 
> > (ii) add leak-detection for POSIX file descriptors: i.e. the
> > integer
> > values returned by "open", "dup", etc.  It would be good to have a
> > check that the user's code doesn't leak these values e.g. on error-
> > handling paths, by failing to close a file-descriptor (and not
> > storing
> > it anywhere).  I think that much of this could be done by analogy
> > with
> > the sm-file.cc code.
> > 
> > 
> > > 
> > > Also, I didn't realize the complexity of adding SARIF when I
> > > mentioned
> > > it.
> > > I'd rather work on adding more checkers.
> > 
> > Fair enough.
> > 
> > Hope this above is constructive.
> > 
> > Dave
> > 
> > > 
> > > Regards.
> > > 
> > > Mir Immad
> > > 
> > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com>
> > > wrote:
> > > 
> > > > Hi Sir,
> > > > 
> > > > I've been trying to understand the static analyzer's code. I
> > > > spent
> > > > most of
> > > > my time learning the state machine's API. I learned how state
> > > > machine's
> > > > on_stmt is supposed to "recognize" specific functions and takes
> > > > a
> > > > specific
> > > > tree from one state to another, and how the captured states are
> > > > used
> > > > by
> > > > pending_diagnostics to report the errors. Furthermore, I was
> > > > able to
> > > > create
> > > > a dummy checker that mimicked the behaviour of sm-file's
> > > > double_fclose and
> > > > compile GCC with these changes. Is this the right way of
> > > > learning?
> > > > 
> > > > As you've mentioned on the projects page that you would like to
> > > > add
> > > > more
> > > > support for some POSIX APIs. Can you please write (or refer me
> > > > to a)
> > > > a
> > > > simple C program that uses such an API (and also what the
> > > > analyzer
> > > > should
> > > > have done) so that I can attempt to add such a checker to the
> > > > analyzer.
> > > > 
> > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > mentioned it.
> > > > I'd rather work on adding more checkers.
> > > > 
> > > > Regards.
> > > > Mir Immad
> > > > 
> > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm < 
> > > > dmalcolm@redhat.com>
> > > > wrote:
> > > > 
> > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > HI David,
> > > > > > I've been tinkering with the static analyzer for the last
> > > > > > few
> > > > > > days. I
> > > > > > find
> > > > > > the project of adding SARIF output to the analyzer
> > > > > > intresting.
> > > > > > I'm
> > > > > > writing
> > > > > > this to let you know that I'm trying to learn the codebase.
> > > > > > Thank you.
> > > > > 
> > > > > Excellent.
> > > > > 
> > > > > BTW, I think adding SARIF output would involve working more
> > > > > with
> > > > > GCC's
> > > > > diagnostics subsystem than with the static analyzer, since
> > > > > (in
> > > > > theory)
> > > > > all of the static analyzer's output is passing through the
> > > > > diagnostics
> > > > > subsystem - though the static analyzer is probably the only
> > > > > GCC
> > > > > component generating diagnostic paths.
> > > > > 
> > > > > I'm happy to mentor such a project as I maintain both
> > > > > subsystems
> > > > > and
> > > > > SARIF output would benefit both - but it would be rather
> > > > > tangential
> > > > > to
> > > > > the analyzer - so if you had specifically wanted to be
> > > > > working on
> > > > > the
> > > > > guts of the analyzer itself, you may want to pick a different
> > > > > subproject.
> > > > > 
> > > > > The SARIF standard is rather long and complicated, and we
> > > > > would
> > > > > want to
> > > > > be compatible with clang's implementation.
> > > > > 
> > > > > It would be very cool if gcc could also accept SARIF files as
> > > > > an
> > > > > *input* format, and emit them as diagnostics; that might help
> > > > > with
> > > > > debugging SARIF output.   (I have a old patch for adding JSON
> > > > > parsing
> > > > > support to GCC that could be used as a starting point for
> > > > > this).
> > > > > 
> > > > > Hope the above makes sense
> > > > > Dave
> > > > > 
> > > > > > 
> > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > dmalcolm@redhat.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
> > > > > > > wrote:
> > > > > > > > Hi everyone,
> > > > > > > 
> > > > > > > Hi, and welcome.
> > > > > > > 
> > > > > > > > I intend to work on the static analyzer. Are these
> > > > > > > > documents
> > > > > > > > enough to
> > > > > > > > get
> > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > 
> > > > > > > 
> > > > > 
> > > > > 
> >  
> > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > 
> > > > > > > Yes.
> > > > > > > 
> > > > > > > There are also some high-level notes here:
> > > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > 
> > > > > > > Also, given that the analyzer is part of GCC, the more
> > > > > > > general
> > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > 
> > > > > > > I recommend creating a trivial C source file with a bug
> > > > > > > in it
> > > > > > > (e.g.
> > > > > > > a
> > > > > > > 3-line function with a use-after-free), and stepping
> > > > > > > through
> > > > > > > the
> > > > > > > analyzer to get a sense of how it works.
> > > > > > > 
> > > > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > > > Dave
> > > > > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > 
> > 
> > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-01-29 17:39               ` David Malcolm
@ 2022-02-01 14:58                 ` Mir Immad
  2022-02-13 15:46                   ` Mir Immad
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-02-01 14:58 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 11357 bytes --]

I worked around the leak detection and also handled the case where the fd
is not saved. I wonder why sm-file hasn't implemented it yet. I'm attaching
a text file with analyzer warnings. I'm still on gcc-11.2.0, will move to
v12 next thing.

> I wonder if it's worth checking for attempts to write to a fd that was
> opened with O_RDONLY, or the converse?

Yes, it does not make sense to warn for write on read-only closed fd or
converse. But, it does make sense to warn for write operation on valid
read-only fd. For this, we might have to analyze calls to the open()
carefully -- get the literal value of the second argument to the call and
do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are in it.
We might have to maintain separate vectors for these and check for them in
calls to write and read -- warn if write is being called on read-only fd
and converse. How do we equate them? Does gimple store unique ids for each
tree? or something like SAME_TREE? Just thinking out loud.

> how much state does it make sense to track for a fd?
Sorry, I didnt get it. Do you mean how may states I'm using? unchecked,
closed, null and leak_by_not_saved (for the call-tree that does not have a
lhs).

> Also, at some point, we're going to have to handle "errno" - but given
> that might be somewhat fiddly it's OK to defer that until you're more
> familiar with the code

I'm looking forward to figure it out.

Thank you.


On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <dmalcolm@redhat.com> wrote:

> On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > Thank you for the detailed information.
> >
> > I've been looking into the integer posix  file descriptor APIs and I
> > decided to write proof-of-concept  checker for them. (not caring
> > about
> > errno). The checker tracks the fd returned by open(), warns if dup()
> > is
> > called with closed fd otherwise tracks the fd returned by dup(), it
> > also
> > warns if read() and write() functions were called on closed fd. I'm
> > attaching a text file that lists some c sources and warnings by the
> > static
> > analyzer. I've used the diagnostic meta-data from sm-file. Is this
> > something that could also be added to the analyzer?
>
> This looks great, and very promising as both new functionality for GCC
> 13, and as a GSoC 2022 project.
>
> BTW, it looks like you're working with GCC 11, but the analyzer has
> changed quite a bit on trunk for GCC 12, so it's worth trying to track
> trunk.
>
> I wonder if it's worth checking for attempts to write to a fd that was
> opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
> aloud - how much state does it make sense to track for a fd?).
>
> Also, at some point, we're going to have to handle "errno" - but given
> that might be somewhat fiddly it's OK to defer that until you're more
> familiar with the code.
>
> >
> > About the fd leak, that's the next thing I'll try to get working.
> > Since
> > you've mentioned that it could be a GSoC project, this is what I'm
> > going to
> > focus on.
>
> Excellent.
>
> Let me know (via this mailing list) if you have any questions.
>
> Thanks
> Dave
>
> >
> > Regards.
> >
> >
> >
> > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> >
> > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > Hi, sir.
> > > >
> > > > I've been trying to understand the static analyzer's code. I
> > > > spent most
> > > > of
> > > > my time learning the state machine's API. I learned how state
> > > > machine's
> > > > on_stmt is supposed to "recognize" specific functions and how
> > > > on_transition
> > > > takes a specific tree from one state to another, and how the
> > > > captured
> > > > states are used by pending_diagnostics to report the errors.
> > > > Furthermore, I
> > > > was able to create a dummy checker that mimicked the behaviour of
> > > > sm-
> > > > file's
> > > > double_fclose and compile GCC with these changes. Is this the
> > > > right way
> > > > of
> > > > learning?
> > >
> > > This sounds great.
> > >
> > > >
> > > > As you've mentioned on the projects page that you would like to
> > > > add
> > > > more
> > > > support for some POSIX APIs. Can you please write (or refer me to
> > > > a) a
> > > > simple C program that uses such an API (and also what the
> > > > analyzer
> > > > should
> > > > have done) so that I can attempt to add such a checker to the
> > > > analyzer.
> > >
> > > A couple of project ideas:
> > >
> > > (i) treat data coming from a network connection as tainted, by
> > > somehow
> > > teaching the analyzer about networking APIs.  Ideally: look at some
> > > subset of historical CVEs involving network-facing attacks on user-
> > > space daemons, and find a way to detect them in the analyzer (need
> > > to
> > > find a way to mark the incoming data as tainted, so that the
> > > analyer
> > > "know" about the trust boundary - that the incoming data needs to
> > > be
> > > sanitized and treated with extra caution; see
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > for my attempts to do this for the Linux kernel).
> > >
> > > Obviously this is potentially a huge project, so maybe just picking
> > > a
> > > tiny subset and getting that working as a proof-of-concept would be
> > > a
> > > good GSoC project.  Maybe find an old CVE that someone has written
> > > a
> > > good write-up for, and think about "how could GCC/-fanalyzer have
> > > spotted it?"
> > >
> > > (ii) add leak-detection for POSIX file descriptors: i.e. the
> > > integer
> > > values returned by "open", "dup", etc.  It would be good to have a
> > > check that the user's code doesn't leak these values e.g. on error-
> > > handling paths, by failing to close a file-descriptor (and not
> > > storing
> > > it anywhere).  I think that much of this could be done by analogy
> > > with
> > > the sm-file.cc code.
> > >
> > >
> > > >
> > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > mentioned
> > > > it.
> > > > I'd rather work on adding more checkers.
> > >
> > > Fair enough.
> > >
> > > Hope this above is constructive.
> > >
> > > Dave
> > >
> > > >
> > > > Regards.
> > > >
> > > > Mir Immad
> > > >
> > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Sir,
> > > > >
> > > > > I've been trying to understand the static analyzer's code. I
> > > > > spent
> > > > > most of
> > > > > my time learning the state machine's API. I learned how state
> > > > > machine's
> > > > > on_stmt is supposed to "recognize" specific functions and takes
> > > > > a
> > > > > specific
> > > > > tree from one state to another, and how the captured states are
> > > > > used
> > > > > by
> > > > > pending_diagnostics to report the errors. Furthermore, I was
> > > > > able to
> > > > > create
> > > > > a dummy checker that mimicked the behaviour of sm-file's
> > > > > double_fclose and
> > > > > compile GCC with these changes. Is this the right way of
> > > > > learning?
> > > > >
> > > > > As you've mentioned on the projects page that you would like to
> > > > > add
> > > > > more
> > > > > support for some POSIX APIs. Can you please write (or refer me
> > > > > to a)
> > > > > a
> > > > > simple C program that uses such an API (and also what the
> > > > > analyzer
> > > > > should
> > > > > have done) so that I can attempt to add such a checker to the
> > > > > analyzer.
> > > > >
> > > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > > mentioned it.
> > > > > I'd rather work on adding more checkers.
> > > > >
> > > > > Regards.
> > > > > Mir Immad
> > > > >
> > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > >
> > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > HI David,
> > > > > > > I've been tinkering with the static analyzer for the last
> > > > > > > few
> > > > > > > days. I
> > > > > > > find
> > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > intresting.
> > > > > > > I'm
> > > > > > > writing
> > > > > > > this to let you know that I'm trying to learn the codebase.
> > > > > > > Thank you.
> > > > > >
> > > > > > Excellent.
> > > > > >
> > > > > > BTW, I think adding SARIF output would involve working more
> > > > > > with
> > > > > > GCC's
> > > > > > diagnostics subsystem than with the static analyzer, since
> > > > > > (in
> > > > > > theory)
> > > > > > all of the static analyzer's output is passing through the
> > > > > > diagnostics
> > > > > > subsystem - though the static analyzer is probably the only
> > > > > > GCC
> > > > > > component generating diagnostic paths.
> > > > > >
> > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > subsystems
> > > > > > and
> > > > > > SARIF output would benefit both - but it would be rather
> > > > > > tangential
> > > > > > to
> > > > > > the analyzer - so if you had specifically wanted to be
> > > > > > working on
> > > > > > the
> > > > > > guts of the analyzer itself, you may want to pick a different
> > > > > > subproject.
> > > > > >
> > > > > > The SARIF standard is rather long and complicated, and we
> > > > > > would
> > > > > > want to
> > > > > > be compatible with clang's implementation.
> > > > > >
> > > > > > It would be very cool if gcc could also accept SARIF files as
> > > > > > an
> > > > > > *input* format, and emit them as diagnostics; that might help
> > > > > > with
> > > > > > debugging SARIF output.   (I have a old patch for adding JSON
> > > > > > parsing
> > > > > > support to GCC that could be used as a starting point for
> > > > > > this).
> > > > > >
> > > > > > Hope the above makes sense
> > > > > > Dave
> > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > dmalcolm@redhat.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
> > > > > > > > wrote:
> > > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > Hi, and welcome.
> > > > > > > >
> > > > > > > > > I intend to work on the static analyzer. Are these
> > > > > > > > > documents
> > > > > > > > > enough to
> > > > > > > > > get
> > > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > >
> > >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > There are also some high-level notes here:
> > > > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > >
> > > > > > > > Also, given that the analyzer is part of GCC, the more
> > > > > > > > general
> > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > >
> > > > > > > > I recommend creating a trivial C source file with a bug
> > > > > > > > in it
> > > > > > > > (e.g.
> > > > > > > > a
> > > > > > > > 3-line function with a use-after-free), and stepping
> > > > > > > > through
> > > > > > > > the
> > > > > > > > analyzer to get a sense of how it works.
> > > > > > > >
> > > > > > > > Hope this is helpful; don't hesitate to ask questions.
> > > > > > > > Dave
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > >
> > >
> > >
>
>
>

[-- Attachment #2: gcc-leak-output.txt --]
[-- Type: text/plain, Size: 1048 bytes --]

$ cat leak.c

#include <fcntl.h>

void test ()
{
   int fd = open("test", O_RDONLY);
}

void main(){}

$ gcc-11.2.0 leak.c -fanalyzer

leak.c: In function ‘test’:
leak.c:6:1: warning: leak of FILE ‘fd’ [CWE-775] [-Wanalyzer-file-leak]
    6 | }
      | ^
  ‘test’: events 1-2
    |
    |    5 |    int fd = open("test", O_RDONLY);
    |      |             ^~~~~~~~~~~~~~~~~~~~~~
    |      |             |
    |      |             (1) opened here
    |    6 | }
    |      | ~            
    |      | |
    |      | (2) leaks here
    |

$ cat leak-by-not-saving.c

#include <fcntl.h>

void test ()
{
   open("test", O_RDONLY);
}

void main(){}

$ gcc leak-by-not-saving.c -fanalyzer

leak.c: In function ‘test’:
leak.c:5:4: warning: leak by not saving fd [CWE-775] [-Wanalyzer-file-leak]
    5 |    open("test", O_RDONLY);
      |    ^~~~~~~~~~~~~~~~~~~~~~
  ‘test’: event 1
    |
    |    5 |    open("test", O_RDONLY);
    |      |    ^~~~~~~~~~~~~~~~~~~~~~
    |      |    |
    |      |    (1) leak by not saving fd
    |

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-02-01 14:58                 ` Mir Immad
@ 2022-02-13 15:46                   ` Mir Immad
  2022-02-13 22:57                     ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-02-13 15:46 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi,

I wanted some clarification on bifurcating the exploded graph at call to
open().
Should the analyzer warn for code like this "when open fails" (like strchr
does when  'strchr' returns NULL)

int fd = open("NOFILE", O_RDONLY);
write(fd, "a", 1);

because of the bad file descriptor.
unless it is written like this:
if (!errno)
  write(fd, "a", 1);

Thank you.

On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com> wrote:

> I worked around the leak detection and also handled the case where the fd
> is not saved. I wonder why sm-file hasn't implemented it yet. I'm attaching
> a text file with analyzer warnings. I'm still on gcc-11.2.0, will move to
> v12 next thing.
>
> > I wonder if it's worth checking for attempts to write to a fd that was
> > opened with O_RDONLY, or the converse?
>
> Yes, it does not make sense to warn for write on read-only closed fd or
> converse. But, it does make sense to warn for write operation on valid
> read-only fd. For this, we might have to analyze calls to the open()
> carefully -- get the literal value of the second argument to the call and
> do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are in it.
> We might have to maintain separate vectors for these and check for them in
> calls to write and read -- warn if write is being called on read-only fd
> and converse. How do we equate them? Does gimple store unique ids for each
> tree? or something like SAME_TREE? Just thinking out loud.
>
> > how much state does it make sense to track for a fd?
> Sorry, I didnt get it. Do you mean how may states I'm using? unchecked,
> closed, null and leak_by_not_saved (for the call-tree that does not have a
> lhs).
>
> > Also, at some point, we're going to have to handle "errno" - but given
> > that might be somewhat fiddly it's OK to defer that until you're more
> > familiar with the code
>
> I'm looking forward to figure it out.
>
> Thank you.
>
>
> On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
>
>> On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
>> > Thank you for the detailed information.
>> >
>> > I've been looking into the integer posix  file descriptor APIs and I
>> > decided to write proof-of-concept  checker for them. (not caring
>> > about
>> > errno). The checker tracks the fd returned by open(), warns if dup()
>> > is
>> > called with closed fd otherwise tracks the fd returned by dup(), it
>> > also
>> > warns if read() and write() functions were called on closed fd. I'm
>> > attaching a text file that lists some c sources and warnings by the
>> > static
>> > analyzer. I've used the diagnostic meta-data from sm-file. Is this
>> > something that could also be added to the analyzer?
>>
>> This looks great, and very promising as both new functionality for GCC
>> 13, and as a GSoC 2022 project.
>>
>> BTW, it looks like you're working with GCC 11, but the analyzer has
>> changed quite a bit on trunk for GCC 12, so it's worth trying to track
>> trunk.
>>
>> I wonder if it's worth checking for attempts to write to a fd that was
>> opened with O_RDONLY, or the converse?  (I'm not sure, just thinking
>> aloud - how much state does it make sense to track for a fd?).
>>
>> Also, at some point, we're going to have to handle "errno" - but given
>> that might be somewhat fiddly it's OK to defer that until you're more
>> familiar with the code.
>>
>> >
>> > About the fd leak, that's the next thing I'll try to get working.
>> > Since
>> > you've mentioned that it could be a GSoC project, this is what I'm
>> > going to
>> > focus on.
>>
>> Excellent.
>>
>> Let me know (via this mailing list) if you have any questions.
>>
>> Thanks
>> Dave
>>
>> >
>> > Regards.
>> >
>> >
>> >
>> > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <dmalcolm@redhat.com>
>> > wrote:
>> >
>> > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
>> > > > Hi, sir.
>> > > >
>> > > > I've been trying to understand the static analyzer's code. I
>> > > > spent most
>> > > > of
>> > > > my time learning the state machine's API. I learned how state
>> > > > machine's
>> > > > on_stmt is supposed to "recognize" specific functions and how
>> > > > on_transition
>> > > > takes a specific tree from one state to another, and how the
>> > > > captured
>> > > > states are used by pending_diagnostics to report the errors.
>> > > > Furthermore, I
>> > > > was able to create a dummy checker that mimicked the behaviour of
>> > > > sm-
>> > > > file's
>> > > > double_fclose and compile GCC with these changes. Is this the
>> > > > right way
>> > > > of
>> > > > learning?
>> > >
>> > > This sounds great.
>> > >
>> > > >
>> > > > As you've mentioned on the projects page that you would like to
>> > > > add
>> > > > more
>> > > > support for some POSIX APIs. Can you please write (or refer me to
>> > > > a) a
>> > > > simple C program that uses such an API (and also what the
>> > > > analyzer
>> > > > should
>> > > > have done) so that I can attempt to add such a checker to the
>> > > > analyzer.
>> > >
>> > > A couple of project ideas:
>> > >
>> > > (i) treat data coming from a network connection as tainted, by
>> > > somehow
>> > > teaching the analyzer about networking APIs.  Ideally: look at some
>> > > subset of historical CVEs involving network-facing attacks on user-
>> > > space daemons, and find a way to detect them in the analyzer (need
>> > > to
>> > > find a way to mark the incoming data as tainted, so that the
>> > > analyer
>> > > "know" about the trust boundary - that the incoming data needs to
>> > > be
>> > > sanitized and treated with extra caution; see
>> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
>> > > for my attempts to do this for the Linux kernel).
>> > >
>> > > Obviously this is potentially a huge project, so maybe just picking
>> > > a
>> > > tiny subset and getting that working as a proof-of-concept would be
>> > > a
>> > > good GSoC project.  Maybe find an old CVE that someone has written
>> > > a
>> > > good write-up for, and think about "how could GCC/-fanalyzer have
>> > > spotted it?"
>> > >
>> > > (ii) add leak-detection for POSIX file descriptors: i.e. the
>> > > integer
>> > > values returned by "open", "dup", etc.  It would be good to have a
>> > > check that the user's code doesn't leak these values e.g. on error-
>> > > handling paths, by failing to close a file-descriptor (and not
>> > > storing
>> > > it anywhere).  I think that much of this could be done by analogy
>> > > with
>> > > the sm-file.cc code.
>> > >
>> > >
>> > > >
>> > > > Also, I didn't realize the complexity of adding SARIF when I
>> > > > mentioned
>> > > > it.
>> > > > I'd rather work on adding more checkers.
>> > >
>> > > Fair enough.
>> > >
>> > > Hope this above is constructive.
>> > >
>> > > Dave
>> > >
>> > > >
>> > > > Regards.
>> > > >
>> > > > Mir Immad
>> > > >
>> > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <mirimnan017@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Sir,
>> > > > >
>> > > > > I've been trying to understand the static analyzer's code. I
>> > > > > spent
>> > > > > most of
>> > > > > my time learning the state machine's API. I learned how state
>> > > > > machine's
>> > > > > on_stmt is supposed to "recognize" specific functions and takes
>> > > > > a
>> > > > > specific
>> > > > > tree from one state to another, and how the captured states are
>> > > > > used
>> > > > > by
>> > > > > pending_diagnostics to report the errors. Furthermore, I was
>> > > > > able to
>> > > > > create
>> > > > > a dummy checker that mimicked the behaviour of sm-file's
>> > > > > double_fclose and
>> > > > > compile GCC with these changes. Is this the right way of
>> > > > > learning?
>> > > > >
>> > > > > As you've mentioned on the projects page that you would like to
>> > > > > add
>> > > > > more
>> > > > > support for some POSIX APIs. Can you please write (or refer me
>> > > > > to a)
>> > > > > a
>> > > > > simple C program that uses such an API (and also what the
>> > > > > analyzer
>> > > > > should
>> > > > > have done) so that I can attempt to add such a checker to the
>> > > > > analyzer.
>> > > > >
>> > > > > Also, I didn't realize the complexity of adding SARIF when I
>> > > > > mentioned it.
>> > > > > I'd rather work on adding more checkers.
>> > > > >
>> > > > > Regards.
>> > > > > Mir Immad
>> > > > >
>> > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
>> > > > > dmalcolm@redhat.com>
>> > > > > wrote:
>> > > > >
>> > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
>> > > > > > > HI David,
>> > > > > > > I've been tinkering with the static analyzer for the last
>> > > > > > > few
>> > > > > > > days. I
>> > > > > > > find
>> > > > > > > the project of adding SARIF output to the analyzer
>> > > > > > > intresting.
>> > > > > > > I'm
>> > > > > > > writing
>> > > > > > > this to let you know that I'm trying to learn the codebase.
>> > > > > > > Thank you.
>> > > > > >
>> > > > > > Excellent.
>> > > > > >
>> > > > > > BTW, I think adding SARIF output would involve working more
>> > > > > > with
>> > > > > > GCC's
>> > > > > > diagnostics subsystem than with the static analyzer, since
>> > > > > > (in
>> > > > > > theory)
>> > > > > > all of the static analyzer's output is passing through the
>> > > > > > diagnostics
>> > > > > > subsystem - though the static analyzer is probably the only
>> > > > > > GCC
>> > > > > > component generating diagnostic paths.
>> > > > > >
>> > > > > > I'm happy to mentor such a project as I maintain both
>> > > > > > subsystems
>> > > > > > and
>> > > > > > SARIF output would benefit both - but it would be rather
>> > > > > > tangential
>> > > > > > to
>> > > > > > the analyzer - so if you had specifically wanted to be
>> > > > > > working on
>> > > > > > the
>> > > > > > guts of the analyzer itself, you may want to pick a different
>> > > > > > subproject.
>> > > > > >
>> > > > > > The SARIF standard is rather long and complicated, and we
>> > > > > > would
>> > > > > > want to
>> > > > > > be compatible with clang's implementation.
>> > > > > >
>> > > > > > It would be very cool if gcc could also accept SARIF files as
>> > > > > > an
>> > > > > > *input* format, and emit them as diagnostics; that might help
>> > > > > > with
>> > > > > > debugging SARIF output.   (I have a old patch for adding JSON
>> > > > > > parsing
>> > > > > > support to GCC that could be used as a starting point for
>> > > > > > this).
>> > > > > >
>> > > > > > Hope the above makes sense
>> > > > > > Dave
>> > > > > >
>> > > > > > >
>> > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
>> > > > > > > dmalcolm@redhat.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
>> > > > > > > > wrote:
>> > > > > > > > > Hi everyone,
>> > > > > > > >
>> > > > > > > > Hi, and welcome.
>> > > > > > > >
>> > > > > > > > > I intend to work on the static analyzer. Are these
>> > > > > > > > > documents
>> > > > > > > > > enough to
>> > > > > > > > > get
>> > > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
>> > > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > >
>> > >
>> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
>> > > > > > > >
>> > > > > > > > Yes.
>> > > > > > > >
>> > > > > > > > There are also some high-level notes here:
>> > > > > > > >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
>> > > > > > > >
>> > > > > > > > Also, given that the analyzer is part of GCC, the more
>> > > > > > > > general
>> > > > > > > > introductions to hacking on GCC will be useful.
>> > > > > > > >
>> > > > > > > > I recommend creating a trivial C source file with a bug
>> > > > > > > > in it
>> > > > > > > > (e.g.
>> > > > > > > > a
>> > > > > > > > 3-line function with a use-after-free), and stepping
>> > > > > > > > through
>> > > > > > > > the
>> > > > > > > > analyzer to get a sense of how it works.
>> > > > > > > >
>> > > > > > > > Hope this is helpful; don't hesitate to ask questions.
>> > > > > > > > Dave
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > >
>> > >
>> > >
>>
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-02-13 15:46                   ` Mir Immad
@ 2022-02-13 22:57                     ` David Malcolm
  2022-02-13 23:05                       ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-02-13 22:57 UTC (permalink / raw)
  To: Mir Immad; +Cc: gcc

On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> Hi,
> 
> I wanted some clarification on bifurcating the exploded graph at call
> to
> open().
> Should the analyzer warn for code like this "when open fails" (like
> strchr
> does when  'strchr' returns NULL)
> 
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
> 
> because of the bad file descriptor.
> unless it is written like this:
> if (!errno)
>   write(fd, "a", 1);

"open" returns a non-negative integer on success, and returns -1 and
sets errno on an error.  As I understand it, errno is never cleared
unless an error occurs, so errno could already be set due to a prior
failure.  Hence I believe the correct way to write such code is to
check fd for being -1, rather than checking errno.  

It's probably best to bifurcate the state at the open call, into
"success" and "failure" outcomes, so that we can track that the caller
"owns" the filedescriptor on success, and thus we can complain about fd
leaks on paths that discard the value without closing it.

We can then warn if we see -1 passed as the fd value into "write",
since that implies that the open call failed.  If we bifurcate the
state at the open call, then for:

int fd = open("NOFILE", O_RDONLY);
write(fd, "a", 1);

we'd have the execution paths:

(a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
setting sm-state on that value)

(b) "when 'open' fails" (setting fd to -1, and errno to a
conjured_svalue known to be nonzero)

and at the call to "write" on path (b) we see the -1 passed to it, and
can complain accordingly. 

I'm not sure if we also want to track that the file was O_RDONLY (which
suggests that the "write" should fail), but that would be a nice bonus.

Hope this is helpful
Dave



> 
> Thank you.
> 
> On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com> wrote:
> 
> > I worked around the leak detection and also handled the case where
> > the fd
> > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > attaching
> > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > move to
> > v12 next thing.
> > 
> > > I wonder if it's worth checking for attempts to write to a fd that
> > > was
> > > opened with O_RDONLY, or the converse?
> > 
> > Yes, it does not make sense to warn for write on read-only closed fd
> > or
> > converse. But, it does make sense to warn for write operation on
> > valid
> > read-only fd. For this, we might have to analyze calls to the open()
> > carefully -- get the literal value of the second argument to the call
> > and
> > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY are
> > in it.
> > We might have to maintain separate vectors for these and check for
> > them in
> > calls to write and read -- warn if write is being called on read-only
> > fd
> > and converse. How do we equate them? Does gimple store unique ids for
> > each
> > tree? or something like SAME_TREE? Just thinking out loud.
> > 
> > > how much state does it make sense to track for a fd?
> > Sorry, I didnt get it. Do you mean how may states I'm using?
> > unchecked,
> > closed, null and leak_by_not_saved (for the call-tree that does not
> > have a
> > lhs).
> > 
> > > Also, at some point, we're going to have to handle "errno" - but
> > > given
> > > that might be somewhat fiddly it's OK to defer that until you're
> > > more
> > > familiar with the code
> > 
> > I'm looking forward to figure it out.
> > 
> > Thank you.
> > 
> > 
> > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > Thank you for the detailed information.
> > > > 
> > > > I've been looking into the integer posix  file descriptor APIs
> > > > and I
> > > > decided to write proof-of-concept  checker for them. (not caring
> > > > about
> > > > errno). The checker tracks the fd returned by open(), warns if
> > > > dup()
> > > > is
> > > > called with closed fd otherwise tracks the fd returned by dup(),
> > > > it
> > > > also
> > > > warns if read() and write() functions were called on closed fd.
> > > > I'm
> > > > attaching a text file that lists some c sources and warnings by
> > > > the
> > > > static
> > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > this
> > > > something that could also be added to the analyzer?
> > > 
> > > This looks great, and very promising as both new functionality for
> > > GCC
> > > 13, and as a GSoC 2022 project.
> > > 
> > > BTW, it looks like you're working with GCC 11, but the analyzer has
> > > changed quite a bit on trunk for GCC 12, so it's worth trying to
> > > track
> > > trunk.
> > > 
> > > I wonder if it's worth checking for attempts to write to a fd that
> > > was
> > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > thinking
> > > aloud - how much state does it make sense to track for a fd?).
> > > 
> > > Also, at some point, we're going to have to handle "errno" - but
> > > given
> > > that might be somewhat fiddly it's OK to defer that until you're
> > > more
> > > familiar with the code.
> > > 
> > > > 
> > > > About the fd leak, that's the next thing I'll try to get working.
> > > > Since
> > > > you've mentioned that it could be a GSoC project, this is what
> > > > I'm
> > > > going to
> > > > focus on.
> > > 
> > > Excellent.
> > > 
> > > Let me know (via this mailing list) if you have any questions.
> > > 
> > > Thanks
> > > Dave
> > > 
> > > > 
> > > > Regards.
> > > > 
> > > > 
> > > > 
> > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <
> > > > dmalcolm@redhat.com>
> > > > wrote:
> > > > 
> > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > > > Hi, sir.
> > > > > > 
> > > > > > I've been trying to understand the static analyzer's code. I
> > > > > > spent most
> > > > > > of
> > > > > > my time learning the state machine's API. I learned how state
> > > > > > machine's
> > > > > > on_stmt is supposed to "recognize" specific functions and how
> > > > > > on_transition
> > > > > > takes a specific tree from one state to another, and how the
> > > > > > captured
> > > > > > states are used by pending_diagnostics to report the errors.
> > > > > > Furthermore, I
> > > > > > was able to create a dummy checker that mimicked the
> > > > > > behaviour of
> > > > > > sm-
> > > > > > file's
> > > > > > double_fclose and compile GCC with these changes. Is this the
> > > > > > right way
> > > > > > of
> > > > > > learning?
> > > > > 
> > > > > This sounds great.
> > > > > 
> > > > > > 
> > > > > > As you've mentioned on the projects page that you would like
> > > > > > to
> > > > > > add
> > > > > > more
> > > > > > support for some POSIX APIs. Can you please write (or refer
> > > > > > me to
> > > > > > a) a
> > > > > > simple C program that uses such an API (and also what the
> > > > > > analyzer
> > > > > > should
> > > > > > have done) so that I can attempt to add such a checker to the
> > > > > > analyzer.
> > > > > 
> > > > > A couple of project ideas:
> > > > > 
> > > > > (i) treat data coming from a network connection as tainted, by
> > > > > somehow
> > > > > teaching the analyzer about networking APIs.  Ideally: look at
> > > > > some
> > > > > subset of historical CVEs involving network-facing attacks on
> > > > > user-
> > > > > space daemons, and find a way to detect them in the analyzer
> > > > > (need
> > > > > to
> > > > > find a way to mark the incoming data as tainted, so that the
> > > > > analyer
> > > > > "know" about the trust boundary - that the incoming data needs
> > > > > to
> > > > > be
> > > > > sanitized and treated with extra caution; see
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > > > for my attempts to do this for the Linux kernel).
> > > > > 
> > > > > Obviously this is potentially a huge project, so maybe just
> > > > > picking
> > > > > a
> > > > > tiny subset and getting that working as a proof-of-concept
> > > > > would be
> > > > > a
> > > > > good GSoC project.  Maybe find an old CVE that someone has
> > > > > written
> > > > > a
> > > > > good write-up for, and think about "how could GCC/-fanalyzer
> > > > > have
> > > > > spotted it?"
> > > > > 
> > > > > (ii) add leak-detection for POSIX file descriptors: i.e. the
> > > > > integer
> > > > > values returned by "open", "dup", etc.  It would be good to
> > > > > have a
> > > > > check that the user's code doesn't leak these values e.g. on
> > > > > error-
> > > > > handling paths, by failing to close a file-descriptor (and not
> > > > > storing
> > > > > it anywhere).  I think that much of this could be done by
> > > > > analogy
> > > > > with
> > > > > the sm-file.cc code.
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Also, I didn't realize the complexity of adding SARIF when I
> > > > > > mentioned
> > > > > > it.
> > > > > > I'd rather work on adding more checkers.
> > > > > 
> > > > > Fair enough.
> > > > > 
> > > > > Hope this above is constructive.
> > > > > 
> > > > > Dave
> > > > > 
> > > > > > 
> > > > > > Regards.
> > > > > > 
> > > > > > Mir Immad
> > > > > > 
> > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <
> > > > > > mirimnan017@gmail.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi Sir,
> > > > > > > 
> > > > > > > I've been trying to understand the static analyzer's code.
> > > > > > > I
> > > > > > > spent
> > > > > > > most of
> > > > > > > my time learning the state machine's API. I learned how
> > > > > > > state
> > > > > > > machine's
> > > > > > > on_stmt is supposed to "recognize" specific functions and
> > > > > > > takes
> > > > > > > a
> > > > > > > specific
> > > > > > > tree from one state to another, and how the captured states
> > > > > > > are
> > > > > > > used
> > > > > > > by
> > > > > > > pending_diagnostics to report the errors. Furthermore, I
> > > > > > > was
> > > > > > > able to
> > > > > > > create
> > > > > > > a dummy checker that mimicked the behaviour of sm-file's
> > > > > > > double_fclose and
> > > > > > > compile GCC with these changes. Is this the right way of
> > > > > > > learning?
> > > > > > > 
> > > > > > > As you've mentioned on the projects page that you would
> > > > > > > like to
> > > > > > > add
> > > > > > > more
> > > > > > > support for some POSIX APIs. Can you please write (or refer
> > > > > > > me
> > > > > > > to a)
> > > > > > > a
> > > > > > > simple C program that uses such an API (and also what the
> > > > > > > analyzer
> > > > > > > should
> > > > > > > have done) so that I can attempt to add such a checker to
> > > > > > > the
> > > > > > > analyzer.
> > > > > > > 
> > > > > > > Also, I didn't realize the complexity of adding SARIF when
> > > > > > > I
> > > > > > > mentioned it.
> > > > > > > I'd rather work on adding more checkers.
> > > > > > > 
> > > > > > > Regards.
> > > > > > > Mir Immad
> > > > > > > 
> > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > > > dmalcolm@redhat.com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > > > HI David,
> > > > > > > > > I've been tinkering with the static analyzer for the
> > > > > > > > > last
> > > > > > > > > few
> > > > > > > > > days. I
> > > > > > > > > find
> > > > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > > > intresting.
> > > > > > > > > I'm
> > > > > > > > > writing
> > > > > > > > > this to let you know that I'm trying to learn the
> > > > > > > > > codebase.
> > > > > > > > > Thank you.
> > > > > > > > 
> > > > > > > > Excellent.
> > > > > > > > 
> > > > > > > > BTW, I think adding SARIF output would involve working
> > > > > > > > more
> > > > > > > > with
> > > > > > > > GCC's
> > > > > > > > diagnostics subsystem than with the static analyzer,
> > > > > > > > since
> > > > > > > > (in
> > > > > > > > theory)
> > > > > > > > all of the static analyzer's output is passing through
> > > > > > > > the
> > > > > > > > diagnostics
> > > > > > > > subsystem - though the static analyzer is probably the
> > > > > > > > only
> > > > > > > > GCC
> > > > > > > > component generating diagnostic paths.
> > > > > > > > 
> > > > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > > > subsystems
> > > > > > > > and
> > > > > > > > SARIF output would benefit both - but it would be rather
> > > > > > > > tangential
> > > > > > > > to
> > > > > > > > the analyzer - so if you had specifically wanted to be
> > > > > > > > working on
> > > > > > > > the
> > > > > > > > guts of the analyzer itself, you may want to pick a
> > > > > > > > different
> > > > > > > > subproject.
> > > > > > > > 
> > > > > > > > The SARIF standard is rather long and complicated, and we
> > > > > > > > would
> > > > > > > > want to
> > > > > > > > be compatible with clang's implementation.
> > > > > > > > 
> > > > > > > > It would be very cool if gcc could also accept SARIF
> > > > > > > > files as
> > > > > > > > an
> > > > > > > > *input* format, and emit them as diagnostics; that might
> > > > > > > > help
> > > > > > > > with
> > > > > > > > debugging SARIF output.   (I have a old patch for adding
> > > > > > > > JSON
> > > > > > > > parsing
> > > > > > > > support to GCC that could be used as a starting point for
> > > > > > > > this).
> > > > > > > > 
> > > > > > > > Hope the above makes sense
> > > > > > > > Dave
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > 
> > > > > > > > > > Hi, and welcome.
> > > > > > > > > > 
> > > > > > > > > > > I intend to work on the static analyzer. Are these
> > > > > > > > > > > documents
> > > > > > > > > > > enough to
> > > > > > > > > > > get
> > > > > > > > > > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > 
> > > > > 
> > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > > > > 
> > > > > > > > > > Yes.
> > > > > > > > > > 
> > > > > > > > > > There are also some high-level notes here:
> > > > > > > > > >   
> > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > > > > 
> > > > > > > > > > Also, given that the analyzer is part of GCC, the
> > > > > > > > > > more
> > > > > > > > > > general
> > > > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > > > > 
> > > > > > > > > > I recommend creating a trivial C source file with a
> > > > > > > > > > bug
> > > > > > > > > > in it
> > > > > > > > > > (e.g.
> > > > > > > > > > a
> > > > > > > > > > 3-line function with a use-after-free), and stepping
> > > > > > > > > > through
> > > > > > > > > > the
> > > > > > > > > > analyzer to get a sense of how it works.
> > > > > > > > > > 
> > > > > > > > > > Hope this is helpful; don't hesitate to ask
> > > > > > > > > > questions.
> > > > > > > > > > Dave
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > 
> > > 
> > > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-02-13 22:57                     ` David Malcolm
@ 2022-02-13 23:05                       ` David Malcolm
  2022-04-04 16:16                         ` Mir Immad
  0 siblings, 1 reply; 18+ messages in thread
From: David Malcolm @ 2022-02-13 23:05 UTC (permalink / raw)
  To: Mir Immad; +Cc: gcc

On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote:
> On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> > Hi,
> > 
> > I wanted some clarification on bifurcating the exploded graph at
> > call
> > to
> > open().
> > Should the analyzer warn for code like this "when open fails" (like
> > strchr
> > does when  'strchr' returns NULL)
> > 
> > int fd = open("NOFILE", O_RDONLY);
> > write(fd, "a", 1);
> > 
> > because of the bad file descriptor.
> > unless it is written like this:
> > if (!errno)
> >   write(fd, "a", 1);
> 
> "open" returns a non-negative integer on success, and returns -1 and
> sets errno on an error.  As I understand it, errno is never cleared

"modified" would be a better word than "cleared" here, sorry

Dave

> unless an error occurs, so errno could already be set due to a prior
> failure.  Hence I believe the correct way to write such code is to
> check fd for being -1, rather than checking errno.  
> 
> It's probably best to bifurcate the state at the open call, into
> "success" and "failure" outcomes, so that we can track that the
> caller
> "owns" the filedescriptor on success, and thus we can complain about
> fd
> leaks on paths that discard the value without closing it.
> 
> We can then warn if we see -1 passed as the fd value into "write",
> since that implies that the open call failed.  If we bifurcate the
> state at the open call, then for:
> 
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
> 
> we'd have the execution paths:
> 
> (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
> setting sm-state on that value)
> 
> (b) "when 'open' fails" (setting fd to -1, and errno to a
> conjured_svalue known to be nonzero)
> 
> and at the call to "write" on path (b) we see the -1 passed to it,
> and
> can complain accordingly. 
> 
> I'm not sure if we also want to track that the file was O_RDONLY
> (which
> suggests that the "write" should fail), but that would be a nice
> bonus.
> 
> Hope this is helpful
> Dave
> 
> 
> 
> > 
> > Thank you.
> > 
> > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com>
> > wrote:
> > 
> > > I worked around the leak detection and also handled the case
> > > where
> > > the fd
> > > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > > attaching
> > > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > > move to
> > > v12 next thing.
> > > 
> > > > I wonder if it's worth checking for attempts to write to a fd
> > > > that
> > > > was
> > > > opened with O_RDONLY, or the converse?
> > > 
> > > Yes, it does not make sense to warn for write on read-only closed
> > > fd
> > > or
> > > converse. But, it does make sense to warn for write operation on
> > > valid
> > > read-only fd. For this, we might have to analyze calls to the
> > > open()
> > > carefully -- get the literal value of the second argument to the
> > > call
> > > and
> > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY
> > > are
> > > in it.
> > > We might have to maintain separate vectors for these and check
> > > for
> > > them in
> > > calls to write and read -- warn if write is being called on read-
> > > only
> > > fd
> > > and converse. How do we equate them? Does gimple store unique ids
> > > for
> > > each
> > > tree? or something like SAME_TREE? Just thinking out loud.
> > > 
> > > > how much state does it make sense to track for a fd?
> > > Sorry, I didnt get it. Do you mean how may states I'm using?
> > > unchecked,
> > > closed, null and leak_by_not_saved (for the call-tree that does
> > > not
> > > have a
> > > lhs).
> > > 
> > > > Also, at some point, we're going to have to handle "errno" -
> > > > but
> > > > given
> > > > that might be somewhat fiddly it's OK to defer that until
> > > > you're
> > > > more
> > > > familiar with the code
> > > 
> > > I'm looking forward to figure it out.
> > > 
> > > Thank you.
> > > 
> > > 
> > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <
> > > dmalcolm@redhat.com>
> > > wrote:
> > > 
> > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > > Thank you for the detailed information.
> > > > > 
> > > > > I've been looking into the integer posix  file descriptor
> > > > > APIs
> > > > > and I
> > > > > decided to write proof-of-concept  checker for them. (not
> > > > > caring
> > > > > about
> > > > > errno). The checker tracks the fd returned by open(), warns
> > > > > if
> > > > > dup()
> > > > > is
> > > > > called with closed fd otherwise tracks the fd returned by
> > > > > dup(),
> > > > > it
> > > > > also
> > > > > warns if read() and write() functions were called on closed
> > > > > fd.
> > > > > I'm
> > > > > attaching a text file that lists some c sources and warnings
> > > > > by
> > > > > the
> > > > > static
> > > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > > this
> > > > > something that could also be added to the analyzer?
> > > > 
> > > > This looks great, and very promising as both new functionality
> > > > for
> > > > GCC
> > > > 13, and as a GSoC 2022 project.
> > > > 
> > > > BTW, it looks like you're working with GCC 11, but the analyzer
> > > > has
> > > > changed quite a bit on trunk for GCC 12, so it's worth trying
> > > > to
> > > > track
> > > > trunk.
> > > > 
> > > > I wonder if it's worth checking for attempts to write to a fd
> > > > that
> > > > was
> > > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > > thinking
> > > > aloud - how much state does it make sense to track for a fd?).
> > > > 
> > > > Also, at some point, we're going to have to handle "errno" -
> > > > but
> > > > given
> > > > that might be somewhat fiddly it's OK to defer that until
> > > > you're
> > > > more
> > > > familiar with the code.
> > > > 
> > > > > 
> > > > > About the fd leak, that's the next thing I'll try to get
> > > > > working.
> > > > > Since
> > > > > you've mentioned that it could be a GSoC project, this is
> > > > > what
> > > > > I'm
> > > > > going to
> > > > > focus on.
> > > > 
> > > > Excellent.
> > > > 
> > > > Let me know (via this mailing list) if you have any questions.
> > > > 
> > > > Thanks
> > > > Dave
> > > > 
> > > > > 
> > > > > Regards.
> > > > > 
> > > > > 
> > > > > 
> > > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > > 
> > > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > > > > Hi, sir.
> > > > > > > 
> > > > > > > I've been trying to understand the static analyzer's
> > > > > > > code. I
> > > > > > > spent most
> > > > > > > of
> > > > > > > my time learning the state machine's API. I learned how
> > > > > > > state
> > > > > > > machine's
> > > > > > > on_stmt is supposed to "recognize" specific functions and
> > > > > > > how
> > > > > > > on_transition
> > > > > > > takes a specific tree from one state to another, and how
> > > > > > > the
> > > > > > > captured
> > > > > > > states are used by pending_diagnostics to report the
> > > > > > > errors.
> > > > > > > Furthermore, I
> > > > > > > was able to create a dummy checker that mimicked the
> > > > > > > behaviour of
> > > > > > > sm-
> > > > > > > file's
> > > > > > > double_fclose and compile GCC with these changes. Is this
> > > > > > > the
> > > > > > > right way
> > > > > > > of
> > > > > > > learning?
> > > > > > 
> > > > > > This sounds great.
> > > > > > 
> > > > > > > 
> > > > > > > As you've mentioned on the projects page that you would
> > > > > > > like
> > > > > > > to
> > > > > > > add
> > > > > > > more
> > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > refer
> > > > > > > me to
> > > > > > > a) a
> > > > > > > simple C program that uses such an API (and also what the
> > > > > > > analyzer
> > > > > > > should
> > > > > > > have done) so that I can attempt to add such a checker to
> > > > > > > the
> > > > > > > analyzer.
> > > > > > 
> > > > > > A couple of project ideas:
> > > > > > 
> > > > > > (i) treat data coming from a network connection as tainted,
> > > > > > by
> > > > > > somehow
> > > > > > teaching the analyzer about networking APIs.  Ideally: look
> > > > > > at
> > > > > > some
> > > > > > subset of historical CVEs involving network-facing attacks
> > > > > > on
> > > > > > user-
> > > > > > space daemons, and find a way to detect them in the
> > > > > > analyzer
> > > > > > (need
> > > > > > to
> > > > > > find a way to mark the incoming data as tainted, so that
> > > > > > the
> > > > > > analyer
> > > > > > "know" about the trust boundary - that the incoming data
> > > > > > needs
> > > > > > to
> > > > > > be
> > > > > > sanitized and treated with extra caution; see
> > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > > > > for my attempts to do this for the Linux kernel).
> > > > > > 
> > > > > > Obviously this is potentially a huge project, so maybe just
> > > > > > picking
> > > > > > a
> > > > > > tiny subset and getting that working as a proof-of-concept
> > > > > > would be
> > > > > > a
> > > > > > good GSoC project.  Maybe find an old CVE that someone has
> > > > > > written
> > > > > > a
> > > > > > good write-up for, and think about "how could GCC/-
> > > > > > fanalyzer
> > > > > > have
> > > > > > spotted it?"
> > > > > > 
> > > > > > (ii) add leak-detection for POSIX file descriptors: i.e.
> > > > > > the
> > > > > > integer
> > > > > > values returned by "open", "dup", etc.  It would be good to
> > > > > > have a
> > > > > > check that the user's code doesn't leak these values e.g.
> > > > > > on
> > > > > > error-
> > > > > > handling paths, by failing to close a file-descriptor (and
> > > > > > not
> > > > > > storing
> > > > > > it anywhere).  I think that much of this could be done by
> > > > > > analogy
> > > > > > with
> > > > > > the sm-file.cc code.
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > when I
> > > > > > > mentioned
> > > > > > > it.
> > > > > > > I'd rather work on adding more checkers.
> > > > > > 
> > > > > > Fair enough.
> > > > > > 
> > > > > > Hope this above is constructive.
> > > > > > 
> > > > > > Dave
> > > > > > 
> > > > > > > 
> > > > > > > Regards.
> > > > > > > 
> > > > > > > Mir Immad
> > > > > > > 
> > > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <
> > > > > > > mirimnan017@gmail.com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > Hi Sir,
> > > > > > > > 
> > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > code.
> > > > > > > > I
> > > > > > > > spent
> > > > > > > > most of
> > > > > > > > my time learning the state machine's API. I learned how
> > > > > > > > state
> > > > > > > > machine's
> > > > > > > > on_stmt is supposed to "recognize" specific functions
> > > > > > > > and
> > > > > > > > takes
> > > > > > > > a
> > > > > > > > specific
> > > > > > > > tree from one state to another, and how the captured
> > > > > > > > states
> > > > > > > > are
> > > > > > > > used
> > > > > > > > by
> > > > > > > > pending_diagnostics to report the errors. Furthermore,
> > > > > > > > I
> > > > > > > > was
> > > > > > > > able to
> > > > > > > > create
> > > > > > > > a dummy checker that mimicked the behaviour of sm-
> > > > > > > > file's
> > > > > > > > double_fclose and
> > > > > > > > compile GCC with these changes. Is this the right way
> > > > > > > > of
> > > > > > > > learning?
> > > > > > > > 
> > > > > > > > As you've mentioned on the projects page that you would
> > > > > > > > like to
> > > > > > > > add
> > > > > > > > more
> > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > refer
> > > > > > > > me
> > > > > > > > to a)
> > > > > > > > a
> > > > > > > > simple C program that uses such an API (and also what
> > > > > > > > the
> > > > > > > > analyzer
> > > > > > > > should
> > > > > > > > have done) so that I can attempt to add such a checker
> > > > > > > > to
> > > > > > > > the
> > > > > > > > analyzer.
> > > > > > > > 
> > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > when
> > > > > > > > I
> > > > > > > > mentioned it.
> > > > > > > > I'd rather work on adding more checkers.
> > > > > > > > 
> > > > > > > > Regards.
> > > > > > > > Mir Immad
> > > > > > > > 
> > > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > > > > dmalcolm@redhat.com>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > > > > HI David,
> > > > > > > > > > I've been tinkering with the static analyzer for
> > > > > > > > > > the
> > > > > > > > > > last
> > > > > > > > > > few
> > > > > > > > > > days. I
> > > > > > > > > > find
> > > > > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > > > > intresting.
> > > > > > > > > > I'm
> > > > > > > > > > writing
> > > > > > > > > > this to let you know that I'm trying to learn the
> > > > > > > > > > codebase.
> > > > > > > > > > Thank you.
> > > > > > > > > 
> > > > > > > > > Excellent.
> > > > > > > > > 
> > > > > > > > > BTW, I think adding SARIF output would involve
> > > > > > > > > working
> > > > > > > > > more
> > > > > > > > > with
> > > > > > > > > GCC's
> > > > > > > > > diagnostics subsystem than with the static analyzer,
> > > > > > > > > since
> > > > > > > > > (in
> > > > > > > > > theory)
> > > > > > > > > all of the static analyzer's output is passing
> > > > > > > > > through
> > > > > > > > > the
> > > > > > > > > diagnostics
> > > > > > > > > subsystem - though the static analyzer is probably
> > > > > > > > > the
> > > > > > > > > only
> > > > > > > > > GCC
> > > > > > > > > component generating diagnostic paths.
> > > > > > > > > 
> > > > > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > > > > subsystems
> > > > > > > > > and
> > > > > > > > > SARIF output would benefit both - but it would be
> > > > > > > > > rather
> > > > > > > > > tangential
> > > > > > > > > to
> > > > > > > > > the analyzer - so if you had specifically wanted to
> > > > > > > > > be
> > > > > > > > > working on
> > > > > > > > > the
> > > > > > > > > guts of the analyzer itself, you may want to pick a
> > > > > > > > > different
> > > > > > > > > subproject.
> > > > > > > > > 
> > > > > > > > > The SARIF standard is rather long and complicated,
> > > > > > > > > and we
> > > > > > > > > would
> > > > > > > > > want to
> > > > > > > > > be compatible with clang's implementation.
> > > > > > > > > 
> > > > > > > > > It would be very cool if gcc could also accept SARIF
> > > > > > > > > files as
> > > > > > > > > an
> > > > > > > > > *input* format, and emit them as diagnostics; that
> > > > > > > > > might
> > > > > > > > > help
> > > > > > > > > with
> > > > > > > > > debugging SARIF output.   (I have a old patch for
> > > > > > > > > adding
> > > > > > > > > JSON
> > > > > > > > > parsing
> > > > > > > > > support to GCC that could be used as a starting point
> > > > > > > > > for
> > > > > > > > > this).
> > > > > > > > > 
> > > > > > > > > Hope the above makes sense
> > > > > > > > > Dave
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via
> > > > > > > > > > > Gcc
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > 
> > > > > > > > > > > Hi, and welcome.
> > > > > > > > > > > 
> > > > > > > > > > > > I intend to work on the static analyzer. Are
> > > > > > > > > > > > these
> > > > > > > > > > > > documents
> > > > > > > > > > > > enough to
> > > > > > > > > > > > get
> > > > > > > > > > > > started:
> > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > 
> > > > > > 
> > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > > > > > 
> > > > > > > > > > > Yes.
> > > > > > > > > > > 
> > > > > > > > > > > There are also some high-level notes here:
> > > > > > > > > > >   
> > > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > > > > > 
> > > > > > > > > > > Also, given that the analyzer is part of GCC, the
> > > > > > > > > > > more
> > > > > > > > > > > general
> > > > > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > > > > > 
> > > > > > > > > > > I recommend creating a trivial C source file with
> > > > > > > > > > > a
> > > > > > > > > > > bug
> > > > > > > > > > > in it
> > > > > > > > > > > (e.g.
> > > > > > > > > > > a
> > > > > > > > > > > 3-line function with a use-after-free), and
> > > > > > > > > > > stepping
> > > > > > > > > > > through
> > > > > > > > > > > the
> > > > > > > > > > > analyzer to get a sense of how it works.
> > > > > > > > > > > 
> > > > > > > > > > > Hope this is helpful; don't hesitate to ask
> > > > > > > > > > > questions.
> > > > > > > > > > > Dave
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > 
> > > > 
> > > > 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-02-13 23:05                       ` David Malcolm
@ 2022-04-04 16:16                         ` Mir Immad
  2022-04-08 18:19                           ` David Malcolm
  0 siblings, 1 reply; 18+ messages in thread
From: Mir Immad @ 2022-04-04 16:16 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

Hi David,

Sorry for such late reply. I've been busy with classes and exams.

As the contributor applications are opening, I would like to put forward a
proposal for a medium project for extending the static analyzer to work
with POSIX file descriptor APIs. As evident in this thread, I've been
interested in this project for quite some time, and have in fact written
some proof-of-concept code. I'm pretty much familiar with the parts of the
code base  which are relevant for this project and am confident in
completing it in given time frame.

Do you have any inputs on how to start writing the proposal? Should I start
by reading the past year proposals?

Regards,

Immad MIr


On Mon, Feb 14, 2022 at 4:35 AM David Malcolm <dmalcolm@redhat.com> wrote:

> On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote:
> > On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> > > Hi,
> > >
> > > I wanted some clarification on bifurcating the exploded graph at
> > > call
> > > to
> > > open().
> > > Should the analyzer warn for code like this "when open fails" (like
> > > strchr
> > > does when  'strchr' returns NULL)
> > >
> > > int fd = open("NOFILE", O_RDONLY);
> > > write(fd, "a", 1);
> > >
> > > because of the bad file descriptor.
> > > unless it is written like this:
> > > if (!errno)
> > >   write(fd, "a", 1);
> >
> > "open" returns a non-negative integer on success, and returns -1 and
> > sets errno on an error.  As I understand it, errno is never cleared
>
> "modified" would be a better word than "cleared" here, sorry
>
> Dave
>
> > unless an error occurs, so errno could already be set due to a prior
> > failure.  Hence I believe the correct way to write such code is to
> > check fd for being -1, rather than checking errno.
> >
> > It's probably best to bifurcate the state at the open call, into
> > "success" and "failure" outcomes, so that we can track that the
> > caller
> > "owns" the filedescriptor on success, and thus we can complain about
> > fd
> > leaks on paths that discard the value without closing it.
> >
> > We can then warn if we see -1 passed as the fd value into "write",
> > since that implies that the open call failed.  If we bifurcate the
> > state at the open call, then for:
> >
> > int fd = open("NOFILE", O_RDONLY);
> > write(fd, "a", 1);
> >
> > we'd have the execution paths:
> >
> > (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0, and
> > setting sm-state on that value)
> >
> > (b) "when 'open' fails" (setting fd to -1, and errno to a
> > conjured_svalue known to be nonzero)
> >
> > and at the call to "write" on path (b) we see the -1 passed to it,
> > and
> > can complain accordingly.
> >
> > I'm not sure if we also want to track that the file was O_RDONLY
> > (which
> > suggests that the "write" should fail), but that would be a nice
> > bonus.
> >
> > Hope this is helpful
> > Dave
> >
> >
> >
> > >
> > > Thank you.
> > >
> > > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com>
> > > wrote:
> > >
> > > > I worked around the leak detection and also handled the case
> > > > where
> > > > the fd
> > > > is not saved. I wonder why sm-file hasn't implemented it yet. I'm
> > > > attaching
> > > > a text file with analyzer warnings. I'm still on gcc-11.2.0, will
> > > > move to
> > > > v12 next thing.
> > > >
> > > > > I wonder if it's worth checking for attempts to write to a fd
> > > > > that
> > > > > was
> > > > > opened with O_RDONLY, or the converse?
> > > >
> > > > Yes, it does not make sense to warn for write on read-only closed
> > > > fd
> > > > or
> > > > converse. But, it does make sense to warn for write operation on
> > > > valid
> > > > read-only fd. For this, we might have to analyze calls to the
> > > > open()
> > > > carefully -- get the literal value of the second argument to the
> > > > call
> > > > and
> > > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY
> > > > are
> > > > in it.
> > > > We might have to maintain separate vectors for these and check
> > > > for
> > > > them in
> > > > calls to write and read -- warn if write is being called on read-
> > > > only
> > > > fd
> > > > and converse. How do we equate them? Does gimple store unique ids
> > > > for
> > > > each
> > > > tree? or something like SAME_TREE? Just thinking out loud.
> > > >
> > > > > how much state does it make sense to track for a fd?
> > > > Sorry, I didnt get it. Do you mean how may states I'm using?
> > > > unchecked,
> > > > closed, null and leak_by_not_saved (for the call-tree that does
> > > > not
> > > > have a
> > > > lhs).
> > > >
> > > > > Also, at some point, we're going to have to handle "errno" -
> > > > > but
> > > > > given
> > > > > that might be somewhat fiddly it's OK to defer that until
> > > > > you're
> > > > > more
> > > > > familiar with the code
> > > >
> > > > I'm looking forward to figure it out.
> > > >
> > > > Thank you.
> > > >
> > > >
> > > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <
> > > > dmalcolm@redhat.com>
> > > > wrote:
> > > >
> > > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > > > Thank you for the detailed information.
> > > > > >
> > > > > > I've been looking into the integer posix  file descriptor
> > > > > > APIs
> > > > > > and I
> > > > > > decided to write proof-of-concept  checker for them. (not
> > > > > > caring
> > > > > > about
> > > > > > errno). The checker tracks the fd returned by open(), warns
> > > > > > if
> > > > > > dup()
> > > > > > is
> > > > > > called with closed fd otherwise tracks the fd returned by
> > > > > > dup(),
> > > > > > it
> > > > > > also
> > > > > > warns if read() and write() functions were called on closed
> > > > > > fd.
> > > > > > I'm
> > > > > > attaching a text file that lists some c sources and warnings
> > > > > > by
> > > > > > the
> > > > > > static
> > > > > > analyzer. I've used the diagnostic meta-data from sm-file. Is
> > > > > > this
> > > > > > something that could also be added to the analyzer?
> > > > >
> > > > > This looks great, and very promising as both new functionality
> > > > > for
> > > > > GCC
> > > > > 13, and as a GSoC 2022 project.
> > > > >
> > > > > BTW, it looks like you're working with GCC 11, but the analyzer
> > > > > has
> > > > > changed quite a bit on trunk for GCC 12, so it's worth trying
> > > > > to
> > > > > track
> > > > > trunk.
> > > > >
> > > > > I wonder if it's worth checking for attempts to write to a fd
> > > > > that
> > > > > was
> > > > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > > > thinking
> > > > > aloud - how much state does it make sense to track for a fd?).
> > > > >
> > > > > Also, at some point, we're going to have to handle "errno" -
> > > > > but
> > > > > given
> > > > > that might be somewhat fiddly it's OK to defer that until
> > > > > you're
> > > > > more
> > > > > familiar with the code.
> > > > >
> > > > > >
> > > > > > About the fd leak, that's the next thing I'll try to get
> > > > > > working.
> > > > > > Since
> > > > > > you've mentioned that it could be a GSoC project, this is
> > > > > > what
> > > > > > I'm
> > > > > > going to
> > > > > > focus on.
> > > > >
> > > > > Excellent.
> > > > >
> > > > > Let me know (via this mailing list) if you have any questions.
> > > > >
> > > > > Thanks
> > > > > Dave
> > > > >
> > > > > >
> > > > > > Regards.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <
> > > > > > dmalcolm@redhat.com>
> > > > > > wrote:
> > > > > >
> > > > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > > > > > Hi, sir.
> > > > > > > >
> > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > code. I
> > > > > > > > spent most
> > > > > > > > of
> > > > > > > > my time learning the state machine's API. I learned how
> > > > > > > > state
> > > > > > > > machine's
> > > > > > > > on_stmt is supposed to "recognize" specific functions and
> > > > > > > > how
> > > > > > > > on_transition
> > > > > > > > takes a specific tree from one state to another, and how
> > > > > > > > the
> > > > > > > > captured
> > > > > > > > states are used by pending_diagnostics to report the
> > > > > > > > errors.
> > > > > > > > Furthermore, I
> > > > > > > > was able to create a dummy checker that mimicked the
> > > > > > > > behaviour of
> > > > > > > > sm-
> > > > > > > > file's
> > > > > > > > double_fclose and compile GCC with these changes. Is this
> > > > > > > > the
> > > > > > > > right way
> > > > > > > > of
> > > > > > > > learning?
> > > > > > >
> > > > > > > This sounds great.
> > > > > > >
> > > > > > > >
> > > > > > > > As you've mentioned on the projects page that you would
> > > > > > > > like
> > > > > > > > to
> > > > > > > > add
> > > > > > > > more
> > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > refer
> > > > > > > > me to
> > > > > > > > a) a
> > > > > > > > simple C program that uses such an API (and also what the
> > > > > > > > analyzer
> > > > > > > > should
> > > > > > > > have done) so that I can attempt to add such a checker to
> > > > > > > > the
> > > > > > > > analyzer.
> > > > > > >
> > > > > > > A couple of project ideas:
> > > > > > >
> > > > > > > (i) treat data coming from a network connection as tainted,
> > > > > > > by
> > > > > > > somehow
> > > > > > > teaching the analyzer about networking APIs.  Ideally: look
> > > > > > > at
> > > > > > > some
> > > > > > > subset of historical CVEs involving network-facing attacks
> > > > > > > on
> > > > > > > user-
> > > > > > > space daemons, and find a way to detect them in the
> > > > > > > analyzer
> > > > > > > (need
> > > > > > > to
> > > > > > > find a way to mark the incoming data as tainted, so that
> > > > > > > the
> > > > > > > analyer
> > > > > > > "know" about the trust boundary - that the incoming data
> > > > > > > needs
> > > > > > > to
> > > > > > > be
> > > > > > > sanitized and treated with extra caution; see
> > > > > > >
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > > > > > for my attempts to do this for the Linux kernel).
> > > > > > >
> > > > > > > Obviously this is potentially a huge project, so maybe just
> > > > > > > picking
> > > > > > > a
> > > > > > > tiny subset and getting that working as a proof-of-concept
> > > > > > > would be
> > > > > > > a
> > > > > > > good GSoC project.  Maybe find an old CVE that someone has
> > > > > > > written
> > > > > > > a
> > > > > > > good write-up for, and think about "how could GCC/-
> > > > > > > fanalyzer
> > > > > > > have
> > > > > > > spotted it?"
> > > > > > >
> > > > > > > (ii) add leak-detection for POSIX file descriptors: i.e.
> > > > > > > the
> > > > > > > integer
> > > > > > > values returned by "open", "dup", etc.  It would be good to
> > > > > > > have a
> > > > > > > check that the user's code doesn't leak these values e.g.
> > > > > > > on
> > > > > > > error-
> > > > > > > handling paths, by failing to close a file-descriptor (and
> > > > > > > not
> > > > > > > storing
> > > > > > > it anywhere).  I think that much of this could be done by
> > > > > > > analogy
> > > > > > > with
> > > > > > > the sm-file.cc code.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > when I
> > > > > > > > mentioned
> > > > > > > > it.
> > > > > > > > I'd rather work on adding more checkers.
> > > > > > >
> > > > > > > Fair enough.
> > > > > > >
> > > > > > > Hope this above is constructive.
> > > > > > >
> > > > > > > Dave
> > > > > > >
> > > > > > > >
> > > > > > > > Regards.
> > > > > > > >
> > > > > > > > Mir Immad
> > > > > > > >
> > > > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <
> > > > > > > > mirimnan017@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Sir,
> > > > > > > > >
> > > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > > code.
> > > > > > > > > I
> > > > > > > > > spent
> > > > > > > > > most of
> > > > > > > > > my time learning the state machine's API. I learned how
> > > > > > > > > state
> > > > > > > > > machine's
> > > > > > > > > on_stmt is supposed to "recognize" specific functions
> > > > > > > > > and
> > > > > > > > > takes
> > > > > > > > > a
> > > > > > > > > specific
> > > > > > > > > tree from one state to another, and how the captured
> > > > > > > > > states
> > > > > > > > > are
> > > > > > > > > used
> > > > > > > > > by
> > > > > > > > > pending_diagnostics to report the errors. Furthermore,
> > > > > > > > > I
> > > > > > > > > was
> > > > > > > > > able to
> > > > > > > > > create
> > > > > > > > > a dummy checker that mimicked the behaviour of sm-
> > > > > > > > > file's
> > > > > > > > > double_fclose and
> > > > > > > > > compile GCC with these changes. Is this the right way
> > > > > > > > > of
> > > > > > > > > learning?
> > > > > > > > >
> > > > > > > > > As you've mentioned on the projects page that you would
> > > > > > > > > like to
> > > > > > > > > add
> > > > > > > > > more
> > > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > > refer
> > > > > > > > > me
> > > > > > > > > to a)
> > > > > > > > > a
> > > > > > > > > simple C program that uses such an API (and also what
> > > > > > > > > the
> > > > > > > > > analyzer
> > > > > > > > > should
> > > > > > > > > have done) so that I can attempt to add such a checker
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > analyzer.
> > > > > > > > >
> > > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > > when
> > > > > > > > > I
> > > > > > > > > mentioned it.
> > > > > > > > > I'd rather work on adding more checkers.
> > > > > > > > >
> > > > > > > > > Regards.
> > > > > > > > > Mir Immad
> > > > > > > > >
> > > > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > > > > > HI David,
> > > > > > > > > > > I've been tinkering with the static analyzer for
> > > > > > > > > > > the
> > > > > > > > > > > last
> > > > > > > > > > > few
> > > > > > > > > > > days. I
> > > > > > > > > > > find
> > > > > > > > > > > the project of adding SARIF output to the analyzer
> > > > > > > > > > > intresting.
> > > > > > > > > > > I'm
> > > > > > > > > > > writing
> > > > > > > > > > > this to let you know that I'm trying to learn the
> > > > > > > > > > > codebase.
> > > > > > > > > > > Thank you.
> > > > > > > > > >
> > > > > > > > > > Excellent.
> > > > > > > > > >
> > > > > > > > > > BTW, I think adding SARIF output would involve
> > > > > > > > > > working
> > > > > > > > > > more
> > > > > > > > > > with
> > > > > > > > > > GCC's
> > > > > > > > > > diagnostics subsystem than with the static analyzer,
> > > > > > > > > > since
> > > > > > > > > > (in
> > > > > > > > > > theory)
> > > > > > > > > > all of the static analyzer's output is passing
> > > > > > > > > > through
> > > > > > > > > > the
> > > > > > > > > > diagnostics
> > > > > > > > > > subsystem - though the static analyzer is probably
> > > > > > > > > > the
> > > > > > > > > > only
> > > > > > > > > > GCC
> > > > > > > > > > component generating diagnostic paths.
> > > > > > > > > >
> > > > > > > > > > I'm happy to mentor such a project as I maintain both
> > > > > > > > > > subsystems
> > > > > > > > > > and
> > > > > > > > > > SARIF output would benefit both - but it would be
> > > > > > > > > > rather
> > > > > > > > > > tangential
> > > > > > > > > > to
> > > > > > > > > > the analyzer - so if you had specifically wanted to
> > > > > > > > > > be
> > > > > > > > > > working on
> > > > > > > > > > the
> > > > > > > > > > guts of the analyzer itself, you may want to pick a
> > > > > > > > > > different
> > > > > > > > > > subproject.
> > > > > > > > > >
> > > > > > > > > > The SARIF standard is rather long and complicated,
> > > > > > > > > > and we
> > > > > > > > > > would
> > > > > > > > > > want to
> > > > > > > > > > be compatible with clang's implementation.
> > > > > > > > > >
> > > > > > > > > > It would be very cool if gcc could also accept SARIF
> > > > > > > > > > files as
> > > > > > > > > > an
> > > > > > > > > > *input* format, and emit them as diagnostics; that
> > > > > > > > > > might
> > > > > > > > > > help
> > > > > > > > > > with
> > > > > > > > > > debugging SARIF output.   (I have a old patch for
> > > > > > > > > > adding
> > > > > > > > > > JSON
> > > > > > > > > > parsing
> > > > > > > > > > support to GCC that could be used as a starting point
> > > > > > > > > > for
> > > > > > > > > > this).
> > > > > > > > > >
> > > > > > > > > > Hope the above makes sense
> > > > > > > > > > Dave
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via
> > > > > > > > > > > > Gcc
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > Hi, and welcome.
> > > > > > > > > > > >
> > > > > > > > > > > > > I intend to work on the static analyzer. Are
> > > > > > > > > > > > > these
> > > > > > > > > > > > > documents
> > > > > > > > > > > > > enough to
> > > > > > > > > > > > > get
> > > > > > > > > > > > > started:
> > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > > > > > >
> > > > > > > > > > > > Yes.
> > > > > > > > > > > >
> > > > > > > > > > > > There are also some high-level notes here:
> > > > > > > > > > > >
> > > > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > > > > > >
> > > > > > > > > > > > Also, given that the analyzer is part of GCC, the
> > > > > > > > > > > > more
> > > > > > > > > > > > general
> > > > > > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > > > > > >
> > > > > > > > > > > > I recommend creating a trivial C source file with
> > > > > > > > > > > > a
> > > > > > > > > > > > bug
> > > > > > > > > > > > in it
> > > > > > > > > > > > (e.g.
> > > > > > > > > > > > a
> > > > > > > > > > > > 3-line function with a use-after-free), and
> > > > > > > > > > > > stepping
> > > > > > > > > > > > through
> > > > > > > > > > > > the
> > > > > > > > > > > > analyzer to get a sense of how it works.
> > > > > > > > > > > >
> > > > > > > > > > > > Hope this is helpful; don't hesitate to ask
> > > > > > > > > > > > questions.
> > > > > > > > > > > > Dave
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> >
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-04-04 16:16                         ` Mir Immad
@ 2022-04-08 18:19                           ` David Malcolm
  0 siblings, 0 replies; 18+ messages in thread
From: David Malcolm @ 2022-04-08 18:19 UTC (permalink / raw)
  To: Mir Immad; +Cc: gcc

On Mon, 2022-04-04 at 21:46 +0530, Mir Immad wrote:
> Hi David,
> 
> Sorry for such late reply. I've been busy with classes and exams.
> 
> As the contributor applications are opening, I would like to put
> forward a
> proposal for a medium project for extending the static analyzer to work
> with POSIX file descriptor APIs. As evident in this thread, I've been
> interested in this project for quite some time, and have in fact
> written
> some proof-of-concept code. I'm pretty much familiar with the parts of
> the
> code base  which are relevant for this project and am confident in
> completing it in given time frame.
> 
> Do you have any inputs on how to start writing the proposal? Should I
> start
> by reading the past year proposals?

You should read:
  https://gcc.gnu.org/wiki/SummerOfCode#Before_you_apply
  https://gcc.gnu.org/wiki/SummerOfCode#Application

Beyond the stuff listed there, what I'm hoping to see from a successful
candidate is:

(a) that they already know enough C/C++ to be able to work with the
project

(b) some level of understanding of the subject area

Patches to the project itself are probably the most direct way to
demonstrate both of these, and IIRC you've already sent some proof-of-
concept code, so your proposal should contain links to those.

Links to code you've written is also helpful (class projects, etc), or
to if you've done any papers/theses, etc

Hope this is helpful
Dave


> 
> Regards,
> 
> Immad MIr
> 
> 
> On Mon, Feb 14, 2022 at 4:35 AM David Malcolm <dmalcolm@redhat.com>
> wrote:
> 
> > On Sun, 2022-02-13 at 17:57 -0500, David Malcolm wrote:
> > > On Sun, 2022-02-13 at 21:16 +0530, Mir Immad wrote:
> > > > Hi,
> > > > 
> > > > I wanted some clarification on bifurcating the exploded graph at
> > > > call
> > > > to
> > > > open().
> > > > Should the analyzer warn for code like this "when open fails"
> > > > (like
> > > > strchr
> > > > does when  'strchr' returns NULL)
> > > > 
> > > > int fd = open("NOFILE", O_RDONLY);
> > > > write(fd, "a", 1);
> > > > 
> > > > because of the bad file descriptor.
> > > > unless it is written like this:
> > > > if (!errno)
> > > >   write(fd, "a", 1);
> > > 
> > > "open" returns a non-negative integer on success, and returns -1
> > > and
> > > sets errno on an error.  As I understand it, errno is never cleared
> > 
> > "modified" would be a better word than "cleared" here, sorry
> > 
> > Dave
> > 
> > > unless an error occurs, so errno could already be set due to a
> > > prior
> > > failure.  Hence I believe the correct way to write such code is to
> > > check fd for being -1, rather than checking errno.
> > > 
> > > It's probably best to bifurcate the state at the open call, into
> > > "success" and "failure" outcomes, so that we can track that the
> > > caller
> > > "owns" the filedescriptor on success, and thus we can complain
> > > about
> > > fd
> > > leaks on paths that discard the value without closing it.
> > > 
> > > We can then warn if we see -1 passed as the fd value into "write",
> > > since that implies that the open call failed.  If we bifurcate the
> > > state at the open call, then for:
> > > 
> > > int fd = open("NOFILE", O_RDONLY);
> > > write(fd, "a", 1);
> > > 
> > > we'd have the execution paths:
> > > 
> > > (a) "when 'open' succeeds" (setting fd to a conjured_svalue >=0,
> > > and
> > > setting sm-state on that value)
> > > 
> > > (b) "when 'open' fails" (setting fd to -1, and errno to a
> > > conjured_svalue known to be nonzero)
> > > 
> > > and at the call to "write" on path (b) we see the -1 passed to it,
> > > and
> > > can complain accordingly.
> > > 
> > > I'm not sure if we also want to track that the file was O_RDONLY
> > > (which
> > > suggests that the "write" should fail), but that would be a nice
> > > bonus.
> > > 
> > > Hope this is helpful
> > > Dave
> > > 
> > > 
> > > 
> > > > 
> > > > Thank you.
> > > > 
> > > > On Tue, Feb 1, 2022 at 8:28 PM Mir Immad <mirimnan017@gmail.com>
> > > > wrote:
> > > > 
> > > > > I worked around the leak detection and also handled the case
> > > > > where
> > > > > the fd
> > > > > is not saved. I wonder why sm-file hasn't implemented it yet.
> > > > > I'm
> > > > > attaching
> > > > > a text file with analyzer warnings. I'm still on gcc-11.2.0,
> > > > > will
> > > > > move to
> > > > > v12 next thing.
> > > > > 
> > > > > > I wonder if it's worth checking for attempts to write to a fd
> > > > > > that
> > > > > > was
> > > > > > opened with O_RDONLY, or the converse?
> > > > > 
> > > > > Yes, it does not make sense to warn for write on read-only
> > > > > closed
> > > > > fd
> > > > > or
> > > > > converse. But, it does make sense to warn for write operation
> > > > > on
> > > > > valid
> > > > > read-only fd. For this, we might have to analyze calls to the
> > > > > open()
> > > > > carefully -- get the literal value of the second argument to
> > > > > the
> > > > > call
> > > > > and
> > > > > do bit-mask decomposition on it to find if O_RDONLY or O_WRONLY
> > > > > are
> > > > > in it.
> > > > > We might have to maintain separate vectors for these and check
> > > > > for
> > > > > them in
> > > > > calls to write and read -- warn if write is being called on
> > > > > read-
> > > > > only
> > > > > fd
> > > > > and converse. How do we equate them? Does gimple store unique
> > > > > ids
> > > > > for
> > > > > each
> > > > > tree? or something like SAME_TREE? Just thinking out loud.
> > > > > 
> > > > > > how much state does it make sense to track for a fd?
> > > > > Sorry, I didnt get it. Do you mean how may states I'm using?
> > > > > unchecked,
> > > > > closed, null and leak_by_not_saved (for the call-tree that does
> > > > > not
> > > > > have a
> > > > > lhs).
> > > > > 
> > > > > > Also, at some point, we're going to have to handle "errno" -
> > > > > > but
> > > > > > given
> > > > > > that might be somewhat fiddly it's OK to defer that until
> > > > > > you're
> > > > > > more
> > > > > > familiar with the code
> > > > > 
> > > > > I'm looking forward to figure it out.
> > > > > 
> > > > > Thank you.
> > > > > 
> > > > > 
> > > > > On Sat, Jan 29, 2022 at 11:09 PM David Malcolm <
> > > > > dmalcolm@redhat.com>
> > > > > wrote:
> > > > > 
> > > > > > On Sat, 2022-01-29 at 20:22 +0530, Mir Immad wrote:
> > > > > > > Thank you for the detailed information.
> > > > > > > 
> > > > > > > I've been looking into the integer posix  file descriptor
> > > > > > > APIs
> > > > > > > and I
> > > > > > > decided to write proof-of-concept  checker for them. (not
> > > > > > > caring
> > > > > > > about
> > > > > > > errno). The checker tracks the fd returned by open(), warns
> > > > > > > if
> > > > > > > dup()
> > > > > > > is
> > > > > > > called with closed fd otherwise tracks the fd returned by
> > > > > > > dup(),
> > > > > > > it
> > > > > > > also
> > > > > > > warns if read() and write() functions were called on closed
> > > > > > > fd.
> > > > > > > I'm
> > > > > > > attaching a text file that lists some c sources and
> > > > > > > warnings
> > > > > > > by
> > > > > > > the
> > > > > > > static
> > > > > > > analyzer. I've used the diagnostic meta-data from sm-file.
> > > > > > > Is
> > > > > > > this
> > > > > > > something that could also be added to the analyzer?
> > > > > > 
> > > > > > This looks great, and very promising as both new
> > > > > > functionality
> > > > > > for
> > > > > > GCC
> > > > > > 13, and as a GSoC 2022 project.
> > > > > > 
> > > > > > BTW, it looks like you're working with GCC 11, but the
> > > > > > analyzer
> > > > > > has
> > > > > > changed quite a bit on trunk for GCC 12, so it's worth trying
> > > > > > to
> > > > > > track
> > > > > > trunk.
> > > > > > 
> > > > > > I wonder if it's worth checking for attempts to write to a fd
> > > > > > that
> > > > > > was
> > > > > > opened with O_RDONLY, or the converse?  (I'm not sure, just
> > > > > > thinking
> > > > > > aloud - how much state does it make sense to track for a
> > > > > > fd?).
> > > > > > 
> > > > > > Also, at some point, we're going to have to handle "errno" -
> > > > > > but
> > > > > > given
> > > > > > that might be somewhat fiddly it's OK to defer that until
> > > > > > you're
> > > > > > more
> > > > > > familiar with the code.
> > > > > > 
> > > > > > > 
> > > > > > > About the fd leak, that's the next thing I'll try to get
> > > > > > > working.
> > > > > > > Since
> > > > > > > you've mentioned that it could be a GSoC project, this is
> > > > > > > what
> > > > > > > I'm
> > > > > > > going to
> > > > > > > focus on.
> > > > > > 
> > > > > > Excellent.
> > > > > > 
> > > > > > Let me know (via this mailing list) if you have any
> > > > > > questions.
> > > > > > 
> > > > > > Thanks
> > > > > > Dave
> > > > > > 
> > > > > > > 
> > > > > > > Regards.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Wed, Jan 26, 2022 at 7:56 PM David Malcolm <
> > > > > > > dmalcolm@redhat.com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > On Mon, 2022-01-24 at 01:41 +0530, Mir Immad wrote:
> > > > > > > > > Hi, sir.
> > > > > > > > > 
> > > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > > code. I
> > > > > > > > > spent most
> > > > > > > > > of
> > > > > > > > > my time learning the state machine's API. I learned how
> > > > > > > > > state
> > > > > > > > > machine's
> > > > > > > > > on_stmt is supposed to "recognize" specific functions
> > > > > > > > > and
> > > > > > > > > how
> > > > > > > > > on_transition
> > > > > > > > > takes a specific tree from one state to another, and
> > > > > > > > > how
> > > > > > > > > the
> > > > > > > > > captured
> > > > > > > > > states are used by pending_diagnostics to report the
> > > > > > > > > errors.
> > > > > > > > > Furthermore, I
> > > > > > > > > was able to create a dummy checker that mimicked the
> > > > > > > > > behaviour of
> > > > > > > > > sm-
> > > > > > > > > file's
> > > > > > > > > double_fclose and compile GCC with these changes. Is
> > > > > > > > > this
> > > > > > > > > the
> > > > > > > > > right way
> > > > > > > > > of
> > > > > > > > > learning?
> > > > > > > > 
> > > > > > > > This sounds great.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > As you've mentioned on the projects page that you would
> > > > > > > > > like
> > > > > > > > > to
> > > > > > > > > add
> > > > > > > > > more
> > > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > > refer
> > > > > > > > > me to
> > > > > > > > > a) a
> > > > > > > > > simple C program that uses such an API (and also what
> > > > > > > > > the
> > > > > > > > > analyzer
> > > > > > > > > should
> > > > > > > > > have done) so that I can attempt to add such a checker
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > analyzer.
> > > > > > > > 
> > > > > > > > A couple of project ideas:
> > > > > > > > 
> > > > > > > > (i) treat data coming from a network connection as
> > > > > > > > tainted,
> > > > > > > > by
> > > > > > > > somehow
> > > > > > > > teaching the analyzer about networking APIs.  Ideally:
> > > > > > > > look
> > > > > > > > at
> > > > > > > > some
> > > > > > > > subset of historical CVEs involving network-facing
> > > > > > > > attacks
> > > > > > > > on
> > > > > > > > user-
> > > > > > > > space daemons, and find a way to detect them in the
> > > > > > > > analyzer
> > > > > > > > (need
> > > > > > > > to
> > > > > > > > find a way to mark the incoming data as tainted, so that
> > > > > > > > the
> > > > > > > > analyer
> > > > > > > > "know" about the trust boundary - that the incoming data
> > > > > > > > needs
> > > > > > > > to
> > > > > > > > be
> > > > > > > > sanitized and treated with extra caution; see
> > > > > > > > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584372.html
> > > > > > > > for my attempts to do this for the Linux kernel).
> > > > > > > > 
> > > > > > > > Obviously this is potentially a huge project, so maybe
> > > > > > > > just
> > > > > > > > picking
> > > > > > > > a
> > > > > > > > tiny subset and getting that working as a proof-of-
> > > > > > > > concept
> > > > > > > > would be
> > > > > > > > a
> > > > > > > > good GSoC project.  Maybe find an old CVE that someone
> > > > > > > > has
> > > > > > > > written
> > > > > > > > a
> > > > > > > > good write-up for, and think about "how could GCC/-
> > > > > > > > fanalyzer
> > > > > > > > have
> > > > > > > > spotted it?"
> > > > > > > > 
> > > > > > > > (ii) add leak-detection for POSIX file descriptors: i.e.
> > > > > > > > the
> > > > > > > > integer
> > > > > > > > values returned by "open", "dup", etc.  It would be good
> > > > > > > > to
> > > > > > > > have a
> > > > > > > > check that the user's code doesn't leak these values e.g.
> > > > > > > > on
> > > > > > > > error-
> > > > > > > > handling paths, by failing to close a file-descriptor
> > > > > > > > (and
> > > > > > > > not
> > > > > > > > storing
> > > > > > > > it anywhere).  I think that much of this could be done by
> > > > > > > > analogy
> > > > > > > > with
> > > > > > > > the sm-file.cc code.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > > when I
> > > > > > > > > mentioned
> > > > > > > > > it.
> > > > > > > > > I'd rather work on adding more checkers.
> > > > > > > > 
> > > > > > > > Fair enough.
> > > > > > > > 
> > > > > > > > Hope this above is constructive.
> > > > > > > > 
> > > > > > > > Dave
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Regards.
> > > > > > > > > 
> > > > > > > > > Mir Immad
> > > > > > > > > 
> > > > > > > > > On Sun, Jan 23, 2022, 11:04 PM Mir Immad <
> > > > > > > > > mirimnan017@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Hi Sir,
> > > > > > > > > > 
> > > > > > > > > > I've been trying to understand the static analyzer's
> > > > > > > > > > code.
> > > > > > > > > > I
> > > > > > > > > > spent
> > > > > > > > > > most of
> > > > > > > > > > my time learning the state machine's API. I learned
> > > > > > > > > > how
> > > > > > > > > > state
> > > > > > > > > > machine's
> > > > > > > > > > on_stmt is supposed to "recognize" specific functions
> > > > > > > > > > and
> > > > > > > > > > takes
> > > > > > > > > > a
> > > > > > > > > > specific
> > > > > > > > > > tree from one state to another, and how the captured
> > > > > > > > > > states
> > > > > > > > > > are
> > > > > > > > > > used
> > > > > > > > > > by
> > > > > > > > > > pending_diagnostics to report the errors.
> > > > > > > > > > Furthermore,
> > > > > > > > > > I
> > > > > > > > > > was
> > > > > > > > > > able to
> > > > > > > > > > create
> > > > > > > > > > a dummy checker that mimicked the behaviour of sm-
> > > > > > > > > > file's
> > > > > > > > > > double_fclose and
> > > > > > > > > > compile GCC with these changes. Is this the right way
> > > > > > > > > > of
> > > > > > > > > > learning?
> > > > > > > > > > 
> > > > > > > > > > As you've mentioned on the projects page that you
> > > > > > > > > > would
> > > > > > > > > > like to
> > > > > > > > > > add
> > > > > > > > > > more
> > > > > > > > > > support for some POSIX APIs. Can you please write (or
> > > > > > > > > > refer
> > > > > > > > > > me
> > > > > > > > > > to a)
> > > > > > > > > > a
> > > > > > > > > > simple C program that uses such an API (and also what
> > > > > > > > > > the
> > > > > > > > > > analyzer
> > > > > > > > > > should
> > > > > > > > > > have done) so that I can attempt to add such a
> > > > > > > > > > checker
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > analyzer.
> > > > > > > > > > 
> > > > > > > > > > Also, I didn't realize the complexity of adding SARIF
> > > > > > > > > > when
> > > > > > > > > > I
> > > > > > > > > > mentioned it.
> > > > > > > > > > I'd rather work on adding more checkers.
> > > > > > > > > > 
> > > > > > > > > > Regards.
> > > > > > > > > > Mir Immad
> > > > > > > > > > 
> > > > > > > > > > On Mon, Jan 17, 2022 at 5:41 AM David Malcolm <
> > > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> > > > > > > > > > > > HI David,
> > > > > > > > > > > > I've been tinkering with the static analyzer for
> > > > > > > > > > > > the
> > > > > > > > > > > > last
> > > > > > > > > > > > few
> > > > > > > > > > > > days. I
> > > > > > > > > > > > find
> > > > > > > > > > > > the project of adding SARIF output to the
> > > > > > > > > > > > analyzer
> > > > > > > > > > > > intresting.
> > > > > > > > > > > > I'm
> > > > > > > > > > > > writing
> > > > > > > > > > > > this to let you know that I'm trying to learn the
> > > > > > > > > > > > codebase.
> > > > > > > > > > > > Thank you.
> > > > > > > > > > > 
> > > > > > > > > > > Excellent.
> > > > > > > > > > > 
> > > > > > > > > > > BTW, I think adding SARIF output would involve
> > > > > > > > > > > working
> > > > > > > > > > > more
> > > > > > > > > > > with
> > > > > > > > > > > GCC's
> > > > > > > > > > > diagnostics subsystem than with the static
> > > > > > > > > > > analyzer,
> > > > > > > > > > > since
> > > > > > > > > > > (in
> > > > > > > > > > > theory)
> > > > > > > > > > > all of the static analyzer's output is passing
> > > > > > > > > > > through
> > > > > > > > > > > the
> > > > > > > > > > > diagnostics
> > > > > > > > > > > subsystem - though the static analyzer is probably
> > > > > > > > > > > the
> > > > > > > > > > > only
> > > > > > > > > > > GCC
> > > > > > > > > > > component generating diagnostic paths.
> > > > > > > > > > > 
> > > > > > > > > > > I'm happy to mentor such a project as I maintain
> > > > > > > > > > > both
> > > > > > > > > > > subsystems
> > > > > > > > > > > and
> > > > > > > > > > > SARIF output would benefit both - but it would be
> > > > > > > > > > > rather
> > > > > > > > > > > tangential
> > > > > > > > > > > to
> > > > > > > > > > > the analyzer - so if you had specifically wanted to
> > > > > > > > > > > be
> > > > > > > > > > > working on
> > > > > > > > > > > the
> > > > > > > > > > > guts of the analyzer itself, you may want to pick a
> > > > > > > > > > > different
> > > > > > > > > > > subproject.
> > > > > > > > > > > 
> > > > > > > > > > > The SARIF standard is rather long and complicated,
> > > > > > > > > > > and we
> > > > > > > > > > > would
> > > > > > > > > > > want to
> > > > > > > > > > > be compatible with clang's implementation.
> > > > > > > > > > > 
> > > > > > > > > > > It would be very cool if gcc could also accept
> > > > > > > > > > > SARIF
> > > > > > > > > > > files as
> > > > > > > > > > > an
> > > > > > > > > > > *input* format, and emit them as diagnostics; that
> > > > > > > > > > > might
> > > > > > > > > > > help
> > > > > > > > > > > with
> > > > > > > > > > > debugging SARIF output.   (I have a old patch for
> > > > > > > > > > > adding
> > > > > > > > > > > JSON
> > > > > > > > > > > parsing
> > > > > > > > > > > support to GCC that could be used as a starting
> > > > > > > > > > > point
> > > > > > > > > > > for
> > > > > > > > > > > this).
> > > > > > > > > > > 
> > > > > > > > > > > Hope the above makes sense
> > > > > > > > > > > Dave
> > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, Jan 11, 2022, 7:09 PM David Malcolm <
> > > > > > > > > > > > dmalcolm@redhat.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad
> > > > > > > > > > > > > via
> > > > > > > > > > > > > Gcc
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hi, and welcome.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > I intend to work on the static analyzer. Are
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > documents
> > > > > > > > > > > > > > enough to
> > > > > > > > > > > > > > get
> > > > > > > > > > > > > > started:
> > > > > > > > > > > > > > https://gcc.gnu.org/onlinedocs/gccint and
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Yes.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > There are also some high-level notes here:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Also, given that the analyzer is part of GCC,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > more
> > > > > > > > > > > > > general
> > > > > > > > > > > > > introductions to hacking on GCC will be useful.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I recommend creating a trivial C source file
> > > > > > > > > > > > > with
> > > > > > > > > > > > > a
> > > > > > > > > > > > > bug
> > > > > > > > > > > > > in it
> > > > > > > > > > > > > (e.g.
> > > > > > > > > > > > > a
> > > > > > > > > > > > > 3-line function with a use-after-free), and
> > > > > > > > > > > > > stepping
> > > > > > > > > > > > > through
> > > > > > > > > > > > > the
> > > > > > > > > > > > > analyzer to get a sense of how it works.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Hope this is helpful; don't hesitate to ask
> > > > > > > > > > > > > questions.
> > > > > > > > > > > > > Dave
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > 
> > 
> > 
> > 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: GSoC: Working on the static analyzer
  2022-02-14 12:59 ` Basile Starynkevitch
@ 2022-02-14 13:12   ` Basile Starynkevitch
  0 siblings, 0 replies; 18+ messages in thread
From: Basile Starynkevitch @ 2022-02-14 13:12 UTC (permalink / raw)
  To: gcc, mirimnan017


On 2/14/22 13:59, Basile Starynkevitch wrote:
>
> Hello,
>
>
> Mir Immad asked:
>
>> Should the analyzer warn for code like this "when open fails" (like strchr
>> does when  'strchr' returns NULL)
>>
>> int fd = open("NOFILE", O_RDONLY);
>> write(fd, "a", 1);
>>
>> because of the bad file descriptor.
>> unless it is written like this:
>> if (!errno)
>>    write(fd, "a", 1);
>
> My opinion is yes, in most cases. BTW, the write should fail for a 
> read-only file descriptor.
>
>
> A case (on Linux) where a check is probably not needed: isint 
> fd=open("/proc/self/exe", O_RDONLY); or int fd=open ("/dev/random", 
> O_RDONLY); done *near the beginning* of main. There are only 
> pathological cases where they won't succeed. I suspect that except for 
> very critical executable, testing such failures is practically useless.
>
> And your analyzer might start from https://github.com/bstarynk/bismon/ 
> or use https://frama-c.com/ <https://frama-c.com/>
>
>
>
> PS. My pet project is http://refpersys.org/ (Soon generating code 
> compiled by GCC). It is not GCC related.
>



Be of course aware of Rice's theorem 
<https://en.wikipedia.org/wiki/Rice%27s_theorem> 😁 so don't expect 
writing the ultimate, perfect, static source code (or Gimple code) analyzer.


Cheers


-- 
Basile Starynkevitch<basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* GSoC: Working on the static analyzer
       [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
@ 2022-02-14 12:59 ` Basile Starynkevitch
  2022-02-14 13:12   ` Basile Starynkevitch
  0 siblings, 1 reply; 18+ messages in thread
From: Basile Starynkevitch @ 2022-02-14 12:59 UTC (permalink / raw)
  To: gcc, mirimnan017

Hello,


Mir Immad asked:

> Should the analyzer warn for code like this "when open fails" (like strchr
> does when  'strchr' returns NULL)
>
> int fd = open("NOFILE", O_RDONLY);
> write(fd, "a", 1);
>
> because of the bad file descriptor.
> unless it is written like this:
> if (!errno)
>    write(fd, "a", 1);

My opinion is yes, in most cases. BTW, the write should fail for a 
read-only file descriptor.


A case (on Linux) where a check is probably not needed: isint 
fd=open("/proc/self/exe", O_RDONLY); or int fd=open ("/dev/random", 
O_RDONLY); done *near the beginning* of main. There are only 
pathological cases where they won't succeed. I suspect that except for 
very critical executable, testing such failures is practically useless.

And your analyzer might start from https://github.com/bstarynk/bismon/ 
or use https://frama-c.com/ <https://frama-c.com/>



PS. My pet project is http://refpersys.org/ (Soon generating code 
compiled by GCC). It is not GCC related.

-- 
Basile Starynkevitch<basile@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-08 18:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11  5:33 GSoC: Working on the static analyzer Mir Immad
2022-01-11 13:39 ` David Malcolm
2022-01-14 16:45   ` Mir Immad
2022-01-17  0:11     ` David Malcolm
     [not found]       ` <CAE1-7ozOGbp-sypHeWxJpoLWzNbqYtWrrsnhLK9RdhYrLe6z1w@mail.gmail.com>
2022-01-23 20:11         ` Mir Immad
2022-01-24 14:19           ` Ankur Saini
2022-01-26 14:31             ` David Malcolm
2022-01-26 14:26           ` David Malcolm
2022-01-29 14:52             ` Mir Immad
2022-01-29 17:39               ` David Malcolm
2022-02-01 14:58                 ` Mir Immad
2022-02-13 15:46                   ` Mir Immad
2022-02-13 22:57                     ` David Malcolm
2022-02-13 23:05                       ` David Malcolm
2022-04-04 16:16                         ` Mir Immad
2022-04-08 18:19                           ` David Malcolm
     [not found] <mailman.6588.1644793069.2100866.gcc@gcc.gnu.org>
2022-02-14 12:59 ` Basile Starynkevitch
2022-02-14 13:12   ` Basile Starynkevitch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).