public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [GSoC][Static Analyzer] Some questions and request for a small patch to work on
@ 2023-02-21 21:26 Shengyu Huang
  2023-02-21 22:55 ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-02-21 21:26 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

Dear all,

I want to work on the Static Analyzer project and just started to read the documentation these days, but what’s mentioned in 27.1.6 in the internal document (https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals) seems outdated or not detailed enough. For example, I didn’t find any “TODO/xfail” in the testsuite. Some points sound very interesting to me (specifically, 1) adding function pointer analysis, 2) assumption of reflexivity of constraint-handling, 3) expensive implementation of transitivity in constraint_manager), and I would like to look into these three points if you believe they can be converted into a valid GSoC project. Could you give me more details regarding these three points? 

In addition, would there be anything related and simple enough that I can work on before the application period? 

Best,
Shengyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-21 21:26 [GSoC][Static Analyzer] Some questions and request for a small patch to work on Shengyu Huang
@ 2023-02-21 22:55 ` David Malcolm
  2023-02-22 14:03   ` Jonathan Wakely
  2023-02-22 14:11   ` Shengyu Huang
  0 siblings, 2 replies; 30+ messages in thread
From: David Malcolm @ 2023-02-21 22:55 UTC (permalink / raw)
  To: Shengyu Huang, gcc

On Tue, 2023-02-21 at 22:26 +0100, Shengyu Huang via Gcc wrote:
> Dear all,

Hi Shengyu, and welcome.
> 
> I want to work on the Static Analyzer project and just started to
> read the documentation these days, 

Excellent!  I'm the author/maintainer of the analyzer, so I would
mentor any such GSoC project.


> but what’s mentioned in 27.1.6 in the internal document
> (https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyz
> er-Internals) seems outdated or not detailed enough. 

Yeah, that section is rather out of date.  Sorry about that.

We now have some function pointer analysis (thanks to Ankur Saini's
GSoC 2021 project).

> For example, I didn’t find any “TODO/xfail” in the testsuite. 

Sorry, when it says "grep for TODO/xfail in the testsuite" I really
meant grep for "TODO" or for "xfail" in the testsuite.

But a better place to look would probably be in our bugzilla; see the
links on the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer 
The "open bugs" list currently has 41 "RFE" bugs ("request for
enhancement" i.e. ideas for new features), some of which might make
suitable GSoC ideas, and/or be of interest to you (ideally both!)

Also, the GSoC wiki page has some project ideas:
  https://gcc.gnu.org/wiki/SummerOfCode#Selected_Project_Ideas

> Some points sound very interesting to me (specifically, 1) adding
> function pointer analysis, 

> 2) assumption of reflexivity of constraint-handling, 3) expensive
> implementation of transitivity in constraint_manager), and I would
> like to look into these three points if you believe they can be
> converted into a valid GSoC project. Could you give me more details
> regarding these three points? 

I believe that (2) and (3) are still true, but I don't think they'd
make good GSoC projects; I think the stuff in bugzilla is much more
interesting and higher priority.

> 
> In addition, would there be anything related and simple enough that I
> can work on before the application period? 

If you haven't seen it yet, you might find my guide to GCC for new
contributors helpful:
  https://gcc-newbies-guide.readthedocs.io/en/latest/index.html

IIRC I saw you post a few days ago about trying to build gcc on your M2
laptop; did you manage this?  Building GCC trunk from a git checkout,
and hacking in a "hello world" warning would be a great place to start.
See the guide above for some hints on how to make this process quicker,
and let me know if you need help if you run into difficulties.  Given
that the analyzer is about emitting warnings, rather than generating
code, it may be that although we don't yet fully support your hardware,
we *might* already support it enough to allow for hacking on the
analyzer, perhaps with some judicious choices of "configure" options.

Hope this is helpful; welcome again.
Dave

> 
> Best,
> Shengyu
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-21 22:55 ` David Malcolm
@ 2023-02-22 14:03   ` Jonathan Wakely
  2023-02-23 11:17     ` James K. Lowden
  2023-02-22 14:11   ` Shengyu Huang
  1 sibling, 1 reply; 30+ messages in thread
From: Jonathan Wakely @ 2023-02-22 14:03 UTC (permalink / raw)
  To: David Malcolm; +Cc: Shengyu Huang, gcc

On Tue, 21 Feb 2023 at 22:55, David Malcolm via Gcc <gcc@gcc.gnu.org> wrote:
> IIRC I saw you post a few days ago about trying to build gcc on your M2
> laptop; did you manage this?  Building GCC trunk from a git checkout,
> and hacking in a "hello world" warning would be a great place to start.
> See the guide above for some hints on how to make this process quicker,
> and let me know if you need help if you run into difficulties.  Given
> that the analyzer is about emitting warnings, rather than generating
> code, it may be that although we don't yet fully support your hardware,
> we *might* already support it enough to allow for hacking on the
> analyzer, perhaps with some judicious choices of "configure" options.

I think GCC trunk won't even build on M2, you need Iain Sandoe's
out-of-tree patches.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-21 22:55 ` David Malcolm
  2023-02-22 14:03   ` Jonathan Wakely
@ 2023-02-22 14:11   ` Shengyu Huang
  2023-02-22 14:53     ` Iain Sandoe
                       ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Shengyu Huang @ 2023-02-22 14:11 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 2313 bytes --]

Hi Dave,


> But a better place to look would probably be in our bugzilla; see the
> links on the wiki page:
>  https://gcc.gnu.org/wiki/StaticAnalyzer 
> The "open bugs" list currently has 41 "RFE" bugs ("request for
> enhancement" i.e. ideas for new features), some of which might make
> suitable GSoC ideas, and/or be of interest to you (ideally both!)
> 
> Also, the GSoC wiki page has some project ideas:
>  https://gcc.gnu.org/wiki/SummerOfCode#Selected_Project_Ideas
> 

Yeah I was also searching for interesting ideas on the bugzilla, and I will communicate to you once I have any more concrete ideas.
> 
> If you haven't seen it yet, you might find my guide to GCC for new
> contributors helpful:
>  https://gcc-newbies-guide.readthedocs.io/en/latest/index.html
> 

Just started reading it. Thanks for sharing!

> IIRC I saw you post a few days ago about trying to build gcc on your M2
> laptop; did you manage this?  Building GCC trunk from a git checkout,
> and hacking in a "hello world" warning would be a great place to start.
> See the guide above for some hints on how to make this process quicker,
> and let me know if you need help if you run into difficulties.  Given
> that the analyzer is about emitting warnings, rather than generating
> code, it may be that although we don't yet fully support your hardware,
> we *might* already support it enough to allow for hacking on the
> analyzer, perhaps with some judicious choices of "configure" options.

I just finished building it today on my laptop (Thanks Iain!). The GCC-12 branch did not work (I got `configure: error: C preprocessor "/lib/cpp" fails sanity check`) but the development branch works for me (haven’t encountered the potential pitfalls mentioned yet after testing it with some simple programs). Besides, I have also set up everything on both my university server and the compile farm machine, so I can test my work on these two machines as well.

What do you mean by a “hello world” warning? You meant to write some simple code like

```
#include <stdio.h>
int main ()
{
  int a;
  printf ("hello world\n");
  return a;
}
```

and to get the warning “warning: use of uninitialized value ‘a’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]”? 

Best,
Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 14:11   ` Shengyu Huang
@ 2023-02-22 14:53     ` Iain Sandoe
  2023-02-22 15:13       ` Iain Sandoe
  2023-02-27 13:35       ` Shengyu Huang
  2023-02-22 15:43     ` David Malcolm
  2023-02-28 14:46     ` [GSoC][Static Analyzer] Ideas for proposal Shengyu Huang
  2 siblings, 2 replies; 30+ messages in thread
From: Iain Sandoe @ 2023-02-22 14:53 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: David Malcolm, GCC Development

Hi Shengyu,

> On 22 Feb 2023, at 14:11, Shengyu Huang via Gcc <gcc@gcc.gnu.org> wrote:
> 
>> IIRC I saw you post a few days ago about trying to build gcc on your M2
>> laptop; did you manage this?  Building GCC trunk from a git checkout,
>> and hacking in a "hello world" warning would be a great place to start.
>> See the guide above for some hints on how to make this process quicker,
>> and let me know if you need help if you run into difficulties.  Given
>> that the analyzer is about emitting warnings, rather than generating
>> code, it may be that although we don't yet fully support your hardware,
>> we *might* already support it enough to allow for hacking on the
>> analyzer, perhaps with some judicious choices of "configure" options.
> 
> I just finished building it today on my laptop (Thanks Iain!). The GCC-12 branch did not work (I got `configure: error: C preprocessor "/lib/cpp" fails sanity check`)

Possibly my bad - there are some additional fixes needed for newer Darwin on the 12 branch (I need to re-release 12.1 Darwin):

https://github.com/iains/gcc-12-branch/tree/gcc-12-1-darwin-pre-r1 
If that doesn’t work please file an issue.

> but the development branch works for me (haven’t encountered the potential pitfalls mentioned yet after testing it with some simple programs).

In general, new work needs to be on trunk anyway - so that this would probably be what you would use.

Note: I _rebase_ the work onto latest trunk from time to time .. (actually overdue at the moment) so you have to keep an eye out for that .

good luck with the GSoC application!
Iain


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 14:53     ` Iain Sandoe
@ 2023-02-22 15:13       ` Iain Sandoe
  2023-02-27 13:35       ` Shengyu Huang
  1 sibling, 0 replies; 30+ messages in thread
From: Iain Sandoe @ 2023-02-22 15:13 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development, David Malcolm

Hi Shengyu,

oops finger-trouble…

> On 22 Feb 2023, at 14:53, Iain Sandoe <idsandoe@googlemail.com> wrote:
>> On 22 Feb 2023, at 14:11, Shengyu Huang via Gcc <gcc@gcc.gnu.org> wrote:
> 

> Possibly my bad - there are some additional fixes needed for newer Darwin on the 12 branch (I need to re-release 12.1 Darwin):

12.2…

> https://github.com/iains/gcc-12-branch/tree/gcc-12-1-darwin-pre-r1 

https://github.com/iains/gcc-12-branch/tree/gcc-12-2-darwin-pre-r1

> If that doesn’t work please file an issue.
Iain


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 14:11   ` Shengyu Huang
  2023-02-22 14:53     ` Iain Sandoe
@ 2023-02-22 15:43     ` David Malcolm
  2023-02-28  9:18       ` Shengyu Huang
  2023-02-28 14:46     ` [GSoC][Static Analyzer] Ideas for proposal Shengyu Huang
  2 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-02-22 15:43 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: gcc

On Wed, 2023-02-22 at 15:11 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> 
> > But a better place to look would probably be in our bugzilla; see
> > the
> > links on the wiki page:
> >  https://gcc.gnu.org/wiki/StaticAnalyzer 
> > The "open bugs" list currently has 41 "RFE" bugs ("request for
> > enhancement" i.e. ideas for new features), some of which might make
> > suitable GSoC ideas, and/or be of interest to you (ideally both!)
> > 
> > Also, the GSoC wiki page has some project ideas:
> >  https://gcc.gnu.org/wiki/SummerOfCode#Selected_Project_Ideas
> > 
> 
> Yeah I was also searching for interesting ideas on the bugzilla, and
> I will communicate to you once I have any more concrete ideas.
> > 
> > If you haven't seen it yet, you might find my guide to GCC for new
> > contributors helpful:
> >  https://gcc-newbies-guide.readthedocs.io/en/latest/index.html
> > 
> 
> Just started reading it. Thanks for sharing!
> 
> > IIRC I saw you post a few days ago about trying to build gcc on
> > your M2
> > laptop; did you manage this?  Building GCC trunk from a git
> > checkout,
> > and hacking in a "hello world" warning would be a great place to
> > start.
> > See the guide above for some hints on how to make this process
> > quicker,
> > and let me know if you need help if you run into difficulties. 
> > Given
> > that the analyzer is about emitting warnings, rather than
> > generating
> > code, it may be that although we don't yet fully support your
> > hardware,
> > we *might* already support it enough to allow for hacking on the
> > analyzer, perhaps with some judicious choices of "configure"
> > options.
> 
> I just finished building it today on my laptop (Thanks Iain!). The
> GCC-12 branch did not work (I got `configure: error: C preprocessor
> "/lib/cpp" fails sanity check`) but the development branch works for
> me (haven’t encountered the potential pitfalls mentioned yet after
> testing it with some simple programs). Besides, I have also set up
> everything on both my university server and the compile farm machine,
> so I can test my work on these two machines as well.
> 
> What do you mean by a “hello world” warning? You meant to write some
> simple code like
> 
> ```
> #include <stdio.h>
> int main ()
> {
>   int a;
>   printf ("hello world\n");
>   return a;
> }
> ```
> 
> and to get the warning “warning: use of uninitialized value ‘a’ [CWE-
> 457] [-Wanalyzer-use-of-uninitialized-value]”? 

Sorry, I was unclear; I was referring to this part of my guide:
https://gcc-newbies-guide.readthedocs.io/en/latest/getting-started.html#hello-world-from-the-compiler

i.e. try writing a new warning that simply emits something like:

test.c:2:1: warning: hello world, I'm compiling 'main'
    2 | int main ()
      |     ^~~~

for each function that it sees.

That way you can easily see that you've modified the compiler, that
your code is running inside it.

Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 14:03   ` Jonathan Wakely
@ 2023-02-23 11:17     ` James K. Lowden
  2023-02-23 16:30       ` Jonathan Wakely
  0 siblings, 1 reply; 30+ messages in thread
From: James K. Lowden @ 2023-02-23 11:17 UTC (permalink / raw)
  To: gcc; +Cc: Jonathan Wakely

On Wed, 22 Feb 2023 14:03:36 +0000
Jonathan Wakely via Gcc <gcc@gcc.gnu.org> wrote:

> I think GCC trunk won't even build on M2, you need Iain Sandoe's
> out-of-tree patches.

https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/-/jobs/2822

We've been building on aarch64 based on GCC trunk since December. 

--jkl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-23 11:17     ` James K. Lowden
@ 2023-02-23 16:30       ` Jonathan Wakely
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Wakely @ 2023-02-23 16:30 UTC (permalink / raw)
  To: James K. Lowden; +Cc: gcc

On Thu, 23 Feb 2023 at 15:57, James K. Lowden wrote:
>
> On Wed, 22 Feb 2023 14:03:36 +0000
> Jonathan Wakely via Gcc <gcc@gcc.gnu.org> wrote:
>
> > I think GCC trunk won't even build on M2, you need Iain Sandoe's
> > out-of-tree patches.
>
> https://gitlab.cobolworx.com/COBOLworx/gcc-cobol/-/jobs/2822
>
> We've been building on aarch64 based on GCC trunk since December.

It looks like you're using aarch64-linux-gnu, but we're talking about
macOS on M2. I should have been more precise, but for the vast
majority of end users with M2 hardware, they're using macOS.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 14:53     ` Iain Sandoe
  2023-02-22 15:13       ` Iain Sandoe
@ 2023-02-27 13:35       ` Shengyu Huang
  2023-02-27 13:45         ` Iain Sandoe
  2023-02-27 13:49         ` Iain Sandoe
  1 sibling, 2 replies; 30+ messages in thread
From: Shengyu Huang @ 2023-02-27 13:35 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: David Malcolm, GCC Development

[-- Attachment #1: Type: text/plain, Size: 1722 bytes --]

Hi Iain,

Sorry for the late reply. I just built gcc-12-2 on my machine with bootstrap and it worked.

However, maybe due to some missing configuration, building without bootstrap does not work on my machine (development branch or gcc-12-2). Specifically, my configuration is as follows.

```
$HOME/dropbox/projects/gcc-darwin-arm64/configure --disable-bootstrap --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk --prefix=$PWD
```

And I encountered the following error

```
Adding multilib support to Makefile in /Users/kumom/dropbox/projects/gcc-darwin-arm64/libgcc
multidirs=
with_multisubdir=
make: *** [all] Error 2
```

Should I file an issue for that? It is practically impossible to develop on my machine if building without bootstrapping is not possible. 

By the way, is it expected that I need to configure with `--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk`? Without specifying this, I will encounter (for both —disable-bootstrap and —enable-bootstrap)

```
The directory that should contain system headers does not exist:
  /usr/include
make[3]: *** [stmp-fixinc] Error 1
rm gcov.pod fsf-funding.pod lto-dump.pod gpl.pod cpp.pod gfdl.pod gcc.pod gcov-dump.pod gcov-tool.pod
make[2]: *** [all-stage1-gcc] Error 2
make[1]: *** [stage1-bubble] Error 2
make: *** [all] Error 2
```

Best,
Shengyu

> On 22 Feb 2023, at 15:53, Iain Sandoe <idsandoe@googlemail.com> wrote:
> 
> Possibly my bad - there are some additional fixes needed for newer Darwin on the 12 branch (I need to re-release 12.1 Darwin):
> 
> https://github.com/iains/gcc-12-branch/tree/gcc-12-1-darwin-pre-r1 
> If that doesn’t work please file an issue.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-27 13:35       ` Shengyu Huang
@ 2023-02-27 13:45         ` Iain Sandoe
  2023-02-27 13:51           ` Shengyu Huang
  2023-02-27 13:49         ` Iain Sandoe
  1 sibling, 1 reply; 30+ messages in thread
From: Iain Sandoe @ 2023-02-27 13:45 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: David Malcolm, GCC Development

Hi Shengyu

> On 27 Feb 2023, at 13:35, Shengyu Huang via Gcc <gcc@gcc.gnu.org> wrote:

> Sorry for the late reply. I just built gcc-12-2 on my machine with bootstrap and it worked.

great,
> 
> However, maybe due to some missing configuration, building without bootstrap does not work on my machine (development branch or gcc-12-2). Specifically, my configuration is as follows.
> 
> ```
> $HOME/dropbox/projects/gcc-darwin-arm64/configure --disable-bootstrap --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk --prefix=$PWD
> ```

Please file an issue against the branch on github (since this is a development branch it’s not appropriate to use the GCC bugzilla yet).

I think that there is at least one known issue with non-bootstrap builds (to do with flags not recognised by clang) but not sure, yet, if this is the same case.

In general, if you want to build without boostrapping [as is often the case with development] then the best thing to do is to bootstrap a recent compiler [or the unmodified sources] and then use that GCC as your build compiler (this ensures you have compatible flags and builtins).

thanks
Iain


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-27 13:35       ` Shengyu Huang
  2023-02-27 13:45         ` Iain Sandoe
@ 2023-02-27 13:49         ` Iain Sandoe
  2023-02-27 14:01           ` Floyd, Paul
  1 sibling, 1 reply; 30+ messages in thread
From: Iain Sandoe @ 2023-02-27 13:49 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

Hi Shengyu,

> On 27 Feb 2023, at 13:35, Shengyu Huang via Gcc <gcc@gcc.gnu.org> wrote:
> 

> By the way, is it expected that I need to configure with `--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk`?

Yes.
 - for some time now macOS no longer installs headers in /usr/include (and on newer versions most of the libraries are omitted from /usr/lib).
 - So you must use a sysroot pointing to a usable SDK.

thanks
Iain


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-27 13:45         ` Iain Sandoe
@ 2023-02-27 13:51           ` Shengyu Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Shengyu Huang @ 2023-02-27 13:51 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: David Malcolm, GCC Development

[-- Attachment #1: Type: text/plain, Size: 429 bytes --]

Hi Iain,

> On 27 Feb 2023, at 14:45, Iain Sandoe <idsandoe@googlemail.com> wrote:
> 
> Please file an issue against the branch on github (since this is a development branch it’s not appropriate to use the GCC bugzilla yet).

The same error occurred for both the development branch and the gcc-12-2 branch. Should I only file an issue in the development repo (https://github.com/iains/gcc-darwin-arm64)?

Best,
Shengyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-27 13:49         ` Iain Sandoe
@ 2023-02-27 14:01           ` Floyd, Paul
  0 siblings, 0 replies; 30+ messages in thread
From: Floyd, Paul @ 2023-02-27 14:01 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]


On 27/02/2023 14:49, Iain Sandoe wrote:
> Hi Shengyu,
>> By the way, is it expected that I need to configure with `--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk`?
>>
>> Yes.
>>   - for some time now macOS no longer installs headers in /usr/include (and on newer versions most of the libraries are omitted from /usr/lib).

The libs are now squirreled away in the DSC (dylib shared cache). For 
our protection.

If you want to script the detection of sysroot you can use

xcrun --sdk macosx --show-sdk-path

(but that doesn't work with old versions of XCode).

A+

Pau

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-22 15:43     ` David Malcolm
@ 2023-02-28  9:18       ` Shengyu Huang
  2023-02-28 23:59         ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-02-28  9:18 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 1258 bytes --]

Hi Dave,

Do you want me to follow the steps 7-10 (https://gcc-newbies-guide.readthedocs.io/en/latest/how-to-improve-the-location-of-a-diagnostic.html) or tell you where I add the code simply? Basically, I added

warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "hello world, I’m compiling %qE", node->decl);

to the loop of cgraph_node inside impl_run_checkrs (logger *logger) of analyzer/engine.cc <http://engine.cc/>. 

(I also tried adding this code to cgraph_node::cgraphunit.cc <http://cgraphunit.cc/> in cgraphunit.cc <http://cgraphunit.cc/>, and then I found out the warning_at is different in that scope…but inform would work.)

Best,
Shengyu

P.S. Shall I continue put mailing list in my cc? Not sure the community wants to receive that many GSoC related emails.

> On 22 Feb 2023, at 16:43, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> Sorry, I was unclear; I was referring to this part of my guide:
> https://gcc-newbies-guide.readthedocs.io/en/latest/getting-started.html#hello-world-from-the-compiler
> 
> i.e. try writing a new warning that simply emits something like:
> 
> test.c:2:1: warning: hello world, I'm compiling 'main'
>    2 | int main ()
>      |     ^~~~
> 
> for each function that it sees.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Ideas for proposal
  2023-02-22 14:11   ` Shengyu Huang
  2023-02-22 14:53     ` Iain Sandoe
  2023-02-22 15:43     ` David Malcolm
@ 2023-02-28 14:46     ` Shengyu Huang
  2023-03-01  0:22       ` David Malcolm
  2 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-02-28 14:46 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 3138 bytes --]

Hi Dave,

> On 22 Feb 2023, at 15:11, Shengyu Huang <kumom.huang@gmail.com> wrote:
> 
>> But a better place to look would probably be in our bugzilla; see the
>> links on the wiki page:
>>  https://gcc.gnu.org/wiki/StaticAnalyzer 
>> The "open bugs" list currently has 41 "RFE" bugs ("request for
>> enhancement" i.e. ideas for new features), some of which might make
>> suitable GSoC ideas, and/or be of interest to you (ideally both!)
>> 
>> Also, the GSoC wiki page has some project ideas:
>>  https://gcc.gnu.org/wiki/SummerOfCode#Selected_Project_Ideas
>> 
> 
> Yeah I was also searching for interesting ideas on the bugzilla, and I will communicate to you once I have any more concrete ideas.

I spent some time searching through Bugzilla this weekend while familiarizing with the analyzer internals, and I found the following things interesting, and it’d be great if you can give me some preliminary feedback:

1. I am not sure why we added the class `shift_count_negative_diagnostic` in region-model.cc <http://region-model.cc/>, because there is a similar warning issued from c/c-typeck.cc <http://c-typeck.cc/>, and when I compiled with -fanalyzer that has the code `b = b << -1`, I got two warnings that mean the same thing. Maybe interestingly, when I compiled my test case with -O2, I got the warning from -Wshift-count-negative but not from -Wanalyzer-shift-count-negative. Would it be considered as a false negative for the analyzer? 

2. Something related to 1. is PR98447 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98447)

3. PR104955 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104955) still takes a long without -Wno-analyzer-double-free. I’d be interested in further investigating the problem (probably as you said sharing one feasible_graph can fix the problem).

4. What’s the most interesting to me are PR103533 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103533), PR104940 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104940) because I focus on formal methods in my university studies, and I’m currently looking into Dafny internals for my semester project.

5. PR105891 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105891) seems fitted to get started during the project phase, or be used as a warm-up before the official project phase.

6. PR106147 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106147) says you are implementing a prototype already, so I guess I’ll leave it out, but I am also quite interested in this analysis. At a glimpse I am not quite sure why infinite recursion and infinite loop should be treated differently (maybe it’ll become clearer to me once I am more familiar with the internals). In addition, a simple function that looks like this

void re (int c)
{
  if (c > 0)
    re (c + 1);
  else
    re (1);
}

can also be concluded as infinite recursion because there is no base case in all possible paths.

7. Other PRs that interest me: PR106006 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106006) and PR107017 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107017, already mentioned in the GSoC page).

Best,
Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-28  9:18       ` Shengyu Huang
@ 2023-02-28 23:59         ` David Malcolm
  2023-03-01 11:16           ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-02-28 23:59 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Tue, 2023-02-28 at 10:18 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> Do you want me to follow the steps 7-10
> (https://gcc-newbies-guide.readthedocs.io/en/latest/how-to-improve-th
> e-location-of-a-diagnostic.html) or tell you where I add the code
> simply? Basically, I added
> 
> warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "hello world, I’m
> compiling %qE", node->decl);
> 
> to the loop of cgraph_node inside impl_run_checkrs (logger *logger)
> of analyzer/engine.cc <http://engine.cc/>. 
> 
> (I also tried adding this code to cgraph_node::cgraphunit.cc
> <http://cgraphunit.cc/> in cgraphunit.cc <http://cgraphunit.cc/>, and
> then I found out the warning_at is different in that scope…but inform
> would work.)

Did you get it to output your messages?

The next thing to do might be to try stepping through the code in the
debugger; that's often a good way to learn about a new codebase.  See:
  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
and maybe have a look at the support scripts mentioned on that page.


BTW, are you building trunk, or GCC 12?  I've made a *lot* of changes
to the analyzer in trunk, so it would be good for you to be working
with something that's reasonably up-to-date.

> 
> Best,
> Shengyu
> 
> P.S. Shall I continue put mailing list in my cc? Not sure the
> community wants to receive that many GSoC related emails.

I'd prefer to keep the mailing list involved in the conversation, as
other new contributors to GCC might find the info useful, and other
existing GCC contributors might have input on some of the discussion.

Thanks
Dave

> 
> > On 22 Feb 2023, at 16:43, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > Sorry, I was unclear; I was referring to this part of my guide:
> > https://gcc-newbies-guide.readthedocs.io/en/latest/getting-started.html#hello-world-from-the-compiler
> > 
> > i.e. try writing a new warning that simply emits something like:
> > 
> > test.c:2:1: warning: hello world, I'm compiling 'main'
> >    2 | int main ()
> >      |     ^~~~
> > 
> > for each function that it sees.
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Ideas for proposal
  2023-02-28 14:46     ` [GSoC][Static Analyzer] Ideas for proposal Shengyu Huang
@ 2023-03-01  0:22       ` David Malcolm
  2023-03-12 22:20         ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-03-01  0:22 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Tue, 2023-02-28 at 15:46 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> > On 22 Feb 2023, at 15:11, Shengyu Huang <kumom.huang@gmail.com>
> > wrote:
> > 
> > > But a better place to look would probably be in our bugzilla; see
> > > the
> > > links on the wiki page:
> > >  https://gcc.gnu.org/wiki/StaticAnalyzer 
> > > The "open bugs" list currently has 41 "RFE" bugs ("request for
> > > enhancement" i.e. ideas for new features), some of which might
> > > make
> > > suitable GSoC ideas, and/or be of interest to you (ideally both!)
> > > 
> > > Also, the GSoC wiki page has some project ideas:
> > >  https://gcc.gnu.org/wiki/SummerOfCode#Selected_Project_Ideas
> > > 
> > 
> > Yeah I was also searching for interesting ideas on the bugzilla,
> > and I will communicate to you once I have any more concrete ideas.
> 
> I spent some time searching through Bugzilla this weekend while
> familiarizing with the analyzer internals, and I found the following
> things interesting, and it’d be great if you can give me some
> preliminary feedback:

Great; thanks.

> 
> 1. I am not sure why we added the class
> `shift_count_negative_diagnostic` in region-model.cc
> <http://region-model.cc/>, because there is a similar warning issued
> from c/c-typeck.cc <http://c-typeck.cc/>, and when I compiled with -
> fanalyzer that has the code `b = b << -1`, I got two warnings that
> mean the same thing. 

The analyzer works based on execution paths, and so can potentially
find more things that a purely AST-based approach.  That said, I'm not
happy with the signal:noise ratio of that analyzer warning; maybe it
needs some tuning?

> Maybe interestingly, when I compiled my test case with -O2, I got the
> warning from -Wshift-count-negative but not from -Wanalyzer-shift-
> count-negative. Would it be considered as a false negative for the
> analyzer? 

The analyzer runs relatively late compared to most analyzers, on the
gimple-ssa representation after some optimizations have already run, so
that's probably why you didn't get the warning at -O2.

> 
> 2. Something related to 1. is PR98447
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98447)


> 
> 3. PR104955 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104955)
> still takes a long without -Wno-analyzer-double-free. I’d be
> interested in further investigating the problem (probably as you said
> sharing one feasible_graph can fix the problem).

That one involves deep surgery that could affect every analyzer
warning, so I'm worried that it could turn out to be too hard to do as
a first project on the analyzer.  Also, some warnings that have grown a
pending_diagnostic::check_valid_fpath_p vfunc since I wrote that
comment, which could complicate the implementation :-/

> 
> 4. What’s the most interesting to me are PR103533
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103533),

Turning on taint detection by default would be a great project.  It
would be good to run the integration tests:
  https://github.com/davidmalcolm/gcc-analyzer-integration-tests
to see if anything regresses, or if it adds noise - so this might be a
bit of an open-ended project, in that we'd want to fix whatever issues
show up there, as well as the known ones that are documented in that
bug.


>  PR104940 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104940)

I've actually been prototyping SMT solver support in a private branch,
but based on writing out SMT-LIB scripts and invoking z3 as a
subprocess, and parsing the results, which is much slower than it could
be, as I'm reevaluating all the constraints as each one is added,
rather than using an API and sharing/pushing/popping state.

Taking my SMT prototype and turning it into something usable could be a
good project, I think.  I'll look at posting what I've got somewhere
public if you're interested.

> because I focus on formal methods in my university studies, and I’m
> currently looking into Dafny internals for my semester project.
> 
> 5. PR105891 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105891)
> seems fitted to get started during the project phase, or be used as a
> warm-up before the official project phase.

Yeah, but probably would only count as part of full GSoC project.  Last
year Tim Lange did a highly successful GSoC project on the analyzer
which was a grab-bag of adding various new warnings, each of which
wasn't quite a full project.

Could be a good warmup; though I wonder what would show up when running
it on real world C examples (e.g. via my integration test suite).

> 
> 6. PR106147 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106147)
> says you are implementing a prototype already, so I guess I’ll leave
> it out, but I am also quite interested in this analysis. 

GCC 13 has the infinite recursion detection, but I didn't get the
infinite loop detection ready before feature freeze.

If you're interested, I can post my prototype of the latter if you or
someone else wants to take it up and finish it.

> At a glimpse I am not quite sure why infinite recursion and infinite
> loop should be treated differently (maybe it’ll become clearer to me
> once I am more familiar with the internals). In addition, a simple
> function that looks like this
> 
> void re (int c)
> {
>   if (c > 0)
>     re (c + 1);
>   else
>     re (1);
> }
> 
> can also be concluded as infinite recursion because there is no base
> case in all possible paths.

Bother, we don't complain about that yet; so fixing that could be part
of a GSoC project, and would be a good fit with the infinite loop idea.

FWIW I'm about to commit a fix for
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108935
(assuming it passes testing) so the inf-recursion warning is about to
change implementation slightly.

> 
> 7. Other PRs that interest me: PR106006
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106006
i.e.  RFE: analyzer should treat data from a socket as "tainted" 

This might work well with the "Turning on taint detection by default"
project (PR103533).


> and PR107017 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107017,
> already mentioned in the GSoC page).

i.e. "RFE: support printf-style formatted functions in -fanalyzer"

yeah, having someone work on this would be most welcome.


Hope this is helpful; I think there are quite a few viable GSoC
projects in the above.

Thanks
Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-02-28 23:59         ` David Malcolm
@ 2023-03-01 11:16           ` Shengyu Huang
  2023-03-01 13:48             ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-03-01 11:16 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 4232 bytes --]

Hi Dave,

> On 1 Mar 2023, at 00:59, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> Did you get it to output your messages?
> 


Yes, I chose to emit the warning before the supergraph or exploded graph is created (I guess this is enough, right?). I checked out from the trunk a week ago, and I checked out from the latest trunk just now and built from modified source again, by adding a line in the following code in analyzer/engine.cc:

FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node) {
  node->get_untransformed_body ();
  warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "hello world, I’m compiling %qE", node->decl); // ADDED
}

Compiling my own test script without optimizations, I got the output (surprisingly no warning from -Wanalyzer-shift-count-negative anymore):

test.c: In function 'main':
test.c:42:9: warning: left shift count is negative [-Wshift-count-negative]
   42 |   b = b << -1;
      |         ^~
test.c: At top level:
test.c:36:5: warning: hello world, I'm compiling 'main'
   36 | int main()
      |     ^~~~
test.c:27:6: warning: hello world, I'm compiling 're'
   27 | void re (int c)
      |      ^~
test.c:12:6: warning: hello world, I'm compiling 'f'
   12 | void f (unsigned long *p, int r, int i)
      |      ^
test.c:9:5: warning: hello world, I'm compiling 'fun2'
    9 | int fun2()
      |     ^~~~
test.c:4:5: warning: hello world, I'm compiling 'fun1'
    4 | int fun1()
      |     ^~~~
test.c: In function 'main':
test.c:40:8: warning: use of uninitialized value 'a' [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
   40 |   int* c = a;
      |        ^
  'main': events 1-3
    |
    |   38 |   int* a;
    |      |        ^
    |      |        |
    |      |        (1) region created on stack here
    |      |        (2) capacity: 8 bytes
    |   39 |   int b = 'c';
    |   40 |   int* c = a;
    |      |        ~
    |      |        |
    |      |        (3) use of uninitialized value 'a' here
    |
~~

If I compiled it with -O2, I got additionally 

test.c: In function 'f':
test.c:20:34: warning: shift by count ('64') >= precision of type ('64') [-Wanalyzer-shift-count-overflow]
   20 |       p[i--] = b + 1 >= 64 ? 0UL : 1UL << (b + 1);
      |                ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
  'f': events 1-5
    |
    |   16 |   while (i >= 0)
    |      |          ~~^~~~
    |      |            |
    |      |            (1) following 'true' branch (when 'i >= 0')...
    |   17 |   {
    |   18 |     if (n > b)
    |      |        ~    
    |      |        |
    |      |        (2) ...to here
    |      |        (3) following 'true' branch (when 'b < n')...
    |   19 |     {
    |   20 |       p[i--] = b + 1 >= 64 ? 0UL : 1UL << (b + 1);
    |      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                                  |           |
    |      |                                  |           (4) ...to here
    |      |                                  (5) shift by count '64' here
    |


which is documented as a false positive in PR98447.


> 
> The next thing to do might be to try stepping through the code in the
> debugger; that's often a good way to learn about a new codebase.  See:
>  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
> and maybe have a look at the support scripts mentioned on that page.
> 

I did try to use gdb more to inspect the internals, but one thing I noticed when using it is that I got `??()` in the backtrace, which I’ve never seen before. Some online sources say it happened due to “corrupted stack”, but I don’t know how that can happen either…However, after pulling changes from the trunk and rebuilding from the source, “??()” disappeared and now I can step through the execution without any problem (previously `step` and `continue` did not work as expected…). Do you have any clues what happened so that I can fix it myself later if that happens again?

Best,
Shengyu

> BTW, are you building trunk, or GCC 12?  I've made a *lot* of changes
> to the analyzer in trunk, so it would be good for you to be working
> with something that's reasonably up-to-date.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Some questions and request for a small patch to work on
  2023-03-01 11:16           ` Shengyu Huang
@ 2023-03-01 13:48             ` David Malcolm
  0 siblings, 0 replies; 30+ messages in thread
From: David Malcolm @ 2023-03-01 13:48 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Wed, 2023-03-01 at 12:16 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> > On 1 Mar 2023, at 00:59, David Malcolm <dmalcolm@redhat.com> wrote:
> > 
> > Did you get it to output your messages?
> > 
> 
> 
> Yes, I chose to emit the warning before the supergraph or exploded
> graph is created (I guess this is enough, right?). I checked out from
> the trunk a week ago, and I checked out from the latest trunk just
> now and built from modified source again, by adding a line in the
> following code in analyzer/engine.cc:
> 
> FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node) {
>   node->get_untransformed_body ();
>   warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "hello world, I’m
> compiling %qE", node->decl); // ADDED
> }
> 
> Compiling my own test script without optimizations, I got the output
> (surprisingly no warning from -Wanalyzer-shift-count-negative
> anymore):
> 
> test.c: In function 'main':
> test.c:42:9: warning: left shift count is negative [-Wshift-count-
> negative]
>    42 |   b = b << -1;
>       |         ^~
> test.c: At top level:
> test.c:36:5: warning: hello world, I'm compiling 'main'
>    36 | int main()
>       |     ^~~~
> test.c:27:6: warning: hello world, I'm compiling 're'
>    27 | void re (int c)
>       |      ^~
> test.c:12:6: warning: hello world, I'm compiling 'f'
>    12 | void f (unsigned long *p, int r, int i)
>       |      ^
> test.c:9:5: warning: hello world, I'm compiling 'fun2'
>     9 | int fun2()
>       |     ^~~~
> test.c:4:5: warning: hello world, I'm compiling 'fun1'
>     4 | int fun1()
>       |     ^~~~

Looks great.

[...snip...]

> 
> 
> 
> > 
> > The next thing to do might be to try stepping through the code in
> > the
> > debugger; that's often a good way to learn about a new codebase. 
> > See:
> >  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
> > and maybe have a look at the support scripts mentioned on that
> > page.
> > 
> 
> I did try to use gdb more to inspect the internals, but one thing I
> noticed when using it is that I got `??()` in the backtrace, which
> I’ve never seen before. Some online sources say it happened due to
> “corrupted stack”, but I don’t know how that can happen
> either…However, after pulling changes from the trunk and rebuilding
> from the source, “??()” disappeared and now I can step through the
> execution without any problem (previously `step` and `continue` did
> not work as expected…). Do you have any clues what happened so that I
> can fix it myself later if that happens again?

I've noticed that if I invoke "make" in the top-level build directory,
when it recurses into the "gcc" subdirectory it builds with -O2,
whereas if I invoke "make" directly in the build tree's gcc
subdirectory it builds without optimization, and I get a much better
debugging experience.  Maybe that's what happened?  Because of this, I
sometimes find I have to do "rm analyzer/*.o" in the "gcc" subdirectory
of the build, and rerun "make" in there.  Though even with -O2 my gdb
still gives me function names in the backtrace, albeit on x86_64, and
probably a different version of gdb...

> 
> Best,
> Shengyu
> 
> > BTW, are you building trunk, or GCC 12?  I've made a *lot* of
> > changes
> > to the analyzer in trunk, so it would be good for you to be working
> > with something that's reasonably up-to-date.

Sounds like you're pulling from trunk; good.

Thanks
Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Ideas for proposal
  2023-03-01  0:22       ` David Malcolm
@ 2023-03-12 22:20         ` Shengyu Huang
  2023-03-13 15:51           ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-03-12 22:20 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 2603 bytes --]

Hi Dave,

>> 
>> 4. What’s the most interesting to me are PR103533
>> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103533),
> 
> Turning on taint detection by default would be a great project.  It
> would be good to run the integration tests:
>  https://github.com/davidmalcolm/gcc-analyzer-integration-tests
> to see if anything regresses, or if it adds noise - so this might be a
> bit of an open-ended project, in that we'd want to fix whatever issues
> show up there, as well as the known ones that are documented in that
> bug.
> 

Sorry for replying to you late due to another project from my university. 

Since most other ideas are being worked on by you or not big enough to make a GSoC project, I decided to take on this project and have been getting familiar with the analyzer this weekend. I want to sort several things out before writing the proposal.

1. What should I do with the integration tests?

2. I ran gcc -fanalyzer -fanalyzer-checker=taint ./gcc-src/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c , but I got different results from what you documented in PR103533:

/usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
(.text+0x17): undefined reference to `main'
collect2: error: ld returned 1 exit status

3. What does “ICE” mean when you said “ICE in alt_get_inherited_state in abs-1.c, …”?

4. For the following program, nothing is reported with the taint mode turned on. But there is -Wanalyzer-tained-divisor, is it expected?

__attribute__((tainted_args))
int fun0(int a)
{ return a; }

int main()
{
  int b = 3 / fun0(0);
  return b;
}

5. I guess the project would mostly modify constraint-manager.h and sm-taint.cc <http://sm-taint.cc/>. Or are there other files that you suspect relevant for this project?

6. Is the current implementation based on some papers? I found this (https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) and this (https://www.ndss-symposium.org/wp-content/uploads/2017/09/Dynamic-Taint-Analysis-for-Automatic-Detection-Analysis-and-SignatureGeneration-of-Exploits-on-Commodity-Software-Dawn-Song.pdf), but haven’t started reading yet. In addition, purging states of the constraint manager sounds like a problem other people may have looked at. Is there any related progress since you documented in PR103533?

As you said, this would be an open-ended project, so it would be very helpful to get some feedback from you so that I know how to draft my proposal. In addition, is it ok to deviate from the proposal after I start working? 

Best,
Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] Ideas for proposal
  2023-03-12 22:20         ` Shengyu Huang
@ 2023-03-13 15:51           ` David Malcolm
  2023-03-20 17:28             ` [GSoC][Static Analyzer] First proposal draft and a few more questions/requests Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-03-13 15:51 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Sun, 2023-03-12 at 23:20 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> > > 
> > > 4. What’s the most interesting to me are PR103533
> > > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103533),
> > 
> > Turning on taint detection by default would be a great project.  It
> > would be good to run the integration tests:
> >  https://github.com/davidmalcolm/gcc-analyzer-integration-tests
> > to see if anything regresses, or if it adds noise - so this might
> > be a
> > bit of an open-ended project, in that we'd want to fix whatever
> > issues
> > show up there, as well as the known ones that are documented in
> > that
> > bug.
> > 
> 
> Sorry for replying to you late due to another project from my
> university. 
> 
> Since most other ideas are being worked on by you or not big enough
> to make a GSoC project, I decided to take on this project and have
> been getting familiar with the analyzer this weekend. 

Excellent; thanks.

> I want to sort several things out before writing the proposal.
> 
> 1. What should I do with the integration tests?

First of all, AFAIK I'm the only person who's tried running the
integration tests.  They're the test scripts I wrote to help me
validate my own patches, so there will be rough edges; please let me
know as you run into them, so I can fix/document them.

I have scripts that run the integration test's test.py, passing in the
path to the built gcc and a "run directory" where the builds happen; I
do this for a "control" build of gcc, and for an "experiment" build
that has a patch (each with their own run directory).  This script
attempts to use that gcc to build the various projects, capturing the
diagnostics as lots of little .sarif files in the build dir.

One of these run directories takes about 17G of drive space, and takes
about an hour for me on a fast machine I have (64 cores).  We'll
probably need to get you set up with an account on the gcc compile
farm, which has lots of powerful machines that you can ssh into, unless
your university has something powerful you can use (with plenty of
cores, RAM, and free disk space, e.g. at least 60G of disk)

I then have a script that runs compare-by-warning.py, passing in the
paths to the two run dirs; this recurses through the two rundirs,
loading the .sarif files, and attempts to compare the before vs after
diagnostics.  I've attempted to classify the results I've seen via the
known-issues/*.txt files, so that the comparison has some knowledge
about whether the changes we've seen are e.g.:
- new false positives vs 
- new true positives vs
- false positives going away vs 
- true positives going away 
(etc)

That said, the "Juliet" results are currently rather unwieldy (many
more results than for the other projects, and 9.1G of the 17G by
space), so I tend to move them out of the way before doing the
comparison.

> 
> 2. I ran gcc -fanalyzer -fanalyzer-checker=taint ./gcc-
> src/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c , but
> I got different results from what you documented in PR103533:
> 
> /usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
> (.text+0x17): undefined reference to `main'
> collect2: error: ld returned 1 exit status

gcc's default is to try to compile, assemble, and link into an
executable.  This testcase doesn't have a "main" function, hence the
linker complains.  If you pass "-S", it will merely compile the .c to a
.s assembler file whilst still running the analyzer.

In terms of actually running the test suite via DejaGnu, see:
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html

I typically use:

  make -k -jN \
  && time make check-gcc \
         RUNTESTFLAGS="-v -v --target_board=unix\{-m32,-m64\} analyzer-torture.exp=*.c analyzer.exp=*.c"

when testing the analyzer regression test suite, where N is the number
of cores on my box

When I run an individual testcase, I do something like:

./xgcc -B. -S -fanalyzer ../../src/PATH_TO_TEST_CASE

in the "gcc" subdirectory of the build directory.


> 
> 3. What does “ICE” mean when you said “ICE in alt_get_inherited_state
> in abs-1.c, …”?

ICE is our jargon for "internal compiler error" i.e. a crash of gcc
itself.

> 
> 4. For the following program, nothing is reported with the taint mode
> turned on. But there is -Wanalyzer-tained-divisor, is it expected?
> 
> __attribute__((tainted_args))
> int fun0(int a)
> { return a; }
> 
> int main()
> {
>   int b = 3 / fun0(0);
>   return b;
> }

Yes: in that the 0 came from the source of the program, rather than
from an attacker, so it's not tainted.  The analyzer doesn't have a
good way to attach state-machine state to a constant, only to other
kinds of symbolic value.

See
  gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c 
  gcc/testsuite/gcc.dg/plugin/taint-antipatterns-1.c
for examples that ought to report tainted divisors (the former from
"fread", the latter from "copy_from_user" via a plugin)


> 
> 5. I guess the project would mostly modify constraint-manager.h and
> sm-taint.cc <http://sm-taint.cc/>. Or are there other files that you
> suspect relevant for this project?

I think region-model.{cc,h} is likely to be very relevant here, and
possibly program-state.{cc,h}; I think one of the challenges will be to
see to what extent when we enable the taint state machine by default it
bloats the program states (much of which is handled in class
region_model) to the point where the exploded_graph gets much bigger,
and we lose coverage compared to what we had before.  I think we're
going to need to improve state purging so that e.g. if there's a buffer
containing tainted data that only gets used in one part of the function
that we can stop bothering to track its taintedness after it becomes
relevant.

I suspect the project may be rather open-ended, in that it's a case of
turning the feature on, trying it on real-world C projects (as well as
just the regression testsuite), and seeing:
- to what extent it's useful, and 
- to what extent it's spamming the user, and
- what breaks

and fixing the issues you encounter up to the point where it's
reasonable to enable the feature for GCC 14 (hopefully).

> 
> 6. Is the current implementation based on some papers?

I confess I haven't read much in this space; I'm looking forward to
reading the papers you linked to


>  I found this
> (https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) and this
> (https://www.ndss-symposium.org/wp-content/uploads/2017/09/Dynamic-Ta
> int-Analysis-for-Automatic-Detection-Analysis-and-
> SignatureGeneration-of-Exploits-on-Commodity-Software-Dawn-Song.pdf),
> but haven’t started reading yet. In addition, purging states of the
> constraint manager sounds like a problem other people may have looked
> at. Is there any related progress since you documented in PR103533?
> 
> As you said, this would be an open-ended project, so it would be very
> helpful to get some feedback from you so that I know how to draft my
> proposal. 

(nods)

> In addition, is it ok to deviate from the proposal after I start
> working? 

Yes: as noted above, much of the project would be to try turning it on
for real-world C code, seeing what breaks, and fixing that, so we can't
yet know what that will be.  Depending on how hard the issues are
"success" for the project could be "fixed all issues and enabled it in
trunk for GCC 14" vs "identified and wrote up a set of issues that need
resolving", or somewhere in between.

Hope this makes sense (and isn't too intimidating!)
Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-13 15:51           ` David Malcolm
@ 2023-03-20 17:28             ` Shengyu Huang
  2023-03-20 22:50               ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-03-20 17:28 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 4697 bytes --]

Hi Dave,

Thanks for always getting back to me so promptly! I am drafting the proposal today. Here is the link:

https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv13qe1W0Bj0g/

(The proposal was first written in markdown and then copied pasted to Google Docs, so some formatting may look ugly...)

In the timeline section, I mention your name twice where I expect your input can help me speed up the work. For example, the mentioned paper (https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) has a section “performance” on page 12 that lists out several solutions to mitigate the exponential blow up in straightforward implementation of symbolic execution, but the current implementation may have some clever tricks already (e.g., purging the states?) that some of the solutions may not be applicable to us.

I can further polish this proposal based on your feedback. I may not be as responsive as you are because I have several deadlines from coursework every week.

>> 1. What should I do with the integration tests?
> 
> First of all, AFAIK I'm the only person who's tried running the
> integration tests.  They're the test scripts I wrote to help me
> validate my own patches, so there will be rough edges; please let me
> know as you run into them, so I can fix/document them.

You append the path “../sarifdump” in results.py, but this path is not in the repo. 

>> 2. I ran gcc -fanalyzer -fanalyzer-checker=taint ./gcc-
>> src/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c , but
>> I got different results from what you documented in PR103533:
>> 
>> /usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
>> (.text+0x17): undefined reference to `main'
>> collect2: error: ld returned 1 exit status
> 
> gcc's default is to try to compile, assemble, and link into an
> executable.  This testcase doesn't have a "main" function, hence the
> linker complains.  If you pass "-S", it will merely compile the .c to a
> .s assembler file whilst still running the analyzer.
> 
> In terms of actually running the test suite via DejaGnu, see:
> https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
> 
> I typically use:
> 
>  make -k -jN \
>  && time make check-gcc \
>         RUNTESTFLAGS="-v -v --target_board=unix\{-m32,-m64\} analyzer-torture.exp=*.c analyzer.exp=*.c"
> 
> when testing the analyzer regression test suite, where N is the number
> of cores on my box
> 
> When I run an individual testcase, I do something like:
> 
> ./xgcc -B. -S -fanalyzer ../../src/PATH_TO_TEST_CASE
> 
> in the "gcc" subdirectory of the build directory.

Yeah sorry for not taking a good look at the testcase before sending you this question…the tips were very helpful still, thanks a lot!

Under latest trunk, all the individual testcases documented in PR103533 compile with no error (no ICE or state explosion). I double checked that I did turn on -fanalyzer-checker=taint (although it is a bit annoying there is no error or warning when I mistyped it as -fanalyzer-checker=tai8nt). I also ran the test suite via DejaGNU, and there are only four unexpected failures (no unexpected successes) and some unsupported tests:

```
FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess errors) 
FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors) 
FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess errors) 
FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors)
```

(Why is the same file reported twice in the summary?)

These testcases are not relevant for taint analysis, but indeed when I turned on the taint mode other checkers are suppressed without any warnings (I guess this should be one of the goals if we don’t manage to turn on the taint mode by default in the end).

Does it mean there are no small testcases that will cause state explosion at the moment? It is a bit tricky for me to have an intuition for where the problem stems when I don’t have a concrete example to investigate…During the project, how often do you expect we need to run the integration tests? I guess we run it whenever we don’t have a small example to work at hand, and iteratively we use the integration test results to construct a minimal example to fix the next encountered issue?

By the way, I have applied for the compile farm account after the first email exchanges and I have been working on compile farm for a while now.


Best,
Shengyu

P.S. There is no more `pr93032-mztools.c` in the testsuit, and the two files `pr93032-mztools-{simplified, signed-char, unsigned-char}.c` do not incur state explosion.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-20 17:28             ` [GSoC][Static Analyzer] First proposal draft and a few more questions/requests Shengyu Huang
@ 2023-03-20 22:50               ` David Malcolm
  2023-03-26 16:03                 ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-03-20 22:50 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 7657 bytes --]

On Mon, 2023-03-20 at 18:28 +0100, Shengyu Huang wrote:
> Hi Dave,
> 
> Thanks for always getting back to me so promptly! I am drafting the
> proposal today. Here is the link:
> 
> https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv13qe1W0Bj0g/
> 
> (The proposal was first written in markdown and then copied pasted to
> Google Docs, so some formatting may look ugly...)

Thanks.

> 
> In the timeline section, I mention your name twice where I expect
> your input can help me speed up the work. For example, the mentioned
> paper (https://users.ece.cmu.edu/~aavgerin/papers/Oakland10.pdf) has
> a section “performance” on page 12 that lists out several solutions
> to mitigate the exponential blow up in straightforward implementation
> of symbolic execution, but the current implementation may have some
> clever tricks already (e.g., purging the states?) that some of the
> solutions may not be applicable to us.

One note on the timeline: I'm planning to take a vacation from July
1st-16th, and will only have intermittent access to a computer during
that time, but hopefully will at least be able to respond to questions.

> 
> I can further polish this proposal based on your feedback. I may not
> be as responsive as you are because I have several deadlines from
> coursework every week.

I seem to have spent the whole day looking at bugzilla reports, so I
hope to look at your draft properly later in the week.

> 
> > > 1. What should I do with the integration tests?
> > 
> > First of all, AFAIK I'm the only person who's tried running the
> > integration tests.  They're the test scripts I wrote to help me
> > validate my own patches, so there will be rough edges; please let
> > me
> > know as you run into them, so I can fix/document them.
> 
> You append the path “../sarifdump” in results.py, but this path is
> not in the repo. 

Sorry about this; it's this repo:
  https://github.com/davidmalcolm/sarif-dump
which I'm simply checking out to a sister directory for now.

> 
> > > 2. I ran gcc -fanalyzer -fanalyzer-checker=taint ./gcc-
> > > src/gcc/testsuite/gcc.dg/analyzer/pr93032-mztools-signed-char.c ,
> > > but
> > > I got different results from what you documented in PR103533:
> > > 
> > > /usr/bin/ld: /lib/x86_64-linux-gnu/crt1.o: in function `_start':
> > > (.text+0x17): undefined reference to `main'
> > > collect2: error: ld returned 1 exit status
> > 
> > gcc's default is to try to compile, assemble, and link into an
> > executable.  This testcase doesn't have a "main" function, hence
> > the
> > linker complains.  If you pass "-S", it will merely compile the .c
> > to a
> > .s assembler file whilst still running the analyzer.
> > 
> > In terms of actually running the test suite via DejaGnu, see:
> > https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
> > 
> > I typically use:
> > 
> >  make -k -jN \
> >  && time make check-gcc \
> >         RUNTESTFLAGS="-v -v --target_board=unix\{-m32,-m64\}
> > analyzer-torture.exp=*.c analyzer.exp=*.c"
> > 
> > when testing the analyzer regression test suite, where N is the
> > number
> > of cores on my box
> > 
> > When I run an individual testcase, I do something like:
> > 
> > ./xgcc -B. -S -fanalyzer ../../src/PATH_TO_TEST_CASE
> > 
> > in the "gcc" subdirectory of the build directory.
> 
> Yeah sorry for not taking a good look at the testcase before sending
> you this question…the tips were very helpful still, thanks a lot!
> 
> Under latest trunk, all the individual testcases documented in
> PR103533 compile with no error (no ICE or state explosion). I double
> checked that I did turn on -fanalyzer-checker=taint (although it is a
> bit annoying there is no error or warning when I mistyped it as -
> fanalyzer-checker=tai8nt). 
> bit annoying there is no error or warning when I mistyped it as -
> fanalyzer-checker=tai8nt). 

Oops, sorry about that; good catch.  I've filed that as:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109220



> I also ran the test suite via DejaGNU, and there are only four
> unexpected failures (no unexpected successes) and some unsupported
> tests:
> 
> ```
> FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess
> errors) 
> FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors) 
> FAIL: gcc.dg/analyzer/file-CWE-1341-example.c (test for excess
> errors) 
> FAIL: gcc.dg/analyzer/pipe-glibc.c (test for excess errors)
> ```
> 
> (Why is the same file reported twice in the summary?)

If you're running with -m32,-m64 as above, it's running all the tests
twice: once for 32-bit, and again for 64-bit x86 targets.

> 
> These testcases are not relevant for taint analysis, but indeed when
> I turned on the taint mode other checkers are suppressed without any
> warnings (I guess this should be one of the goals if we don’t manage
> to turn on the taint mode by default in the end).

Yeah,  -fanalyzer-checker=NAME means "just run state machine NAME".

I've been testing the "run with taint-checking by default" via:

diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc
index 1f329cb11d0..31bcf6e2f8e 100644
--- a/gcc/analyzer/sm.cc
+++ b/gcc/analyzer/sm.cc
@@ -188,10 +188,14 @@ make_checkers (auto_delete_vec <state_machine> &out, logger *logger)
   out.safe_push (make_malloc_state_machine (logger));
   out.safe_push (make_fileptr_state_machine (logger));
   out.safe_push (make_fd_state_machine (logger));
+#if 1
+  out.safe_push (make_taint_state_machine (logger));
+#else
   /* The "taint" checker must be explicitly enabled (as it currently
      leads to state explosions that stop the other checkers working).  */
   if (flag_analyzer_checker)
     out.safe_push (make_taint_state_machine (logger));
+#endif
   out.safe_push (make_sensitive_state_machine (logger));
   out.safe_push (make_signal_state_machine (logger));
   out.safe_push (make_va_list_state_machine (logger));

i.e. add the taint machine to the bank of state machines already being
tracked, so that there's that many more opportunities for differences
to occur between state nodes, and thus more risk of "blow up" of the
size of the graph being explored.

I'm attaching a work-in-progress patch that I wrote for this, that does
the above, plus some other related things, but be wary that it's from
an old working copy on my hard drive: it's at least a year out of date,
judging by the copyright dates :-/

> 
> Does it mean there are no small testcases that will cause state
> explosion at the moment? It is a bit tricky for me to have an
> intuition for where the problem stems when I don’t have a concrete
> example to investigate…During the project, how often do you expect we
> need to run the integration tests? I guess we run it whenever we
> don’t have a small example to work at hand, and iteratively we use
> the integration test results to construct a minimal example to fix
> the next encountered issue?

I think if you try the patch to sm.cc above, then you'll see
various existing DejaGnu tests below gcc.dg/analyzer will fail with
state explosions.



> 
> By the way, I have applied for the compile farm account after the
> first email exchanges and I have been working on compile farm for a
> while now.

Excellent.

Dave

> 
> 
> Best,
> Shengyu
> 
> P.S. There is no more `pr93032-mztools.c` in the testsuit, and the
> two files `pr93032-mztools-{simplified, signed-char, unsigned-
> char}.c` do not incur state explosion.
> 
> 


[-- Attachment #2: 0001-FIXME-WIP-on-enabling-taint-state-machine-by-default.patch --]
[-- Type: text/x-patch, Size: 31068 bytes --]

From ab923f680bdfabc5d66d05d078752367cac0a6f8 Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Thu, 2 Dec 2021 14:40:34 -0500
Subject: [PATCH] FIXME: WIP on enabling taint state machine by default (PR
 analyzer/103533)

---
 gcc/analyzer/sm-taint.cc                      |  2 +-
 gcc/analyzer/sm.cc                            |  4 ++
 gcc/doc/invoke.texi                           | 63 ++++---------------
 .../gcc.dg/analyzer/attr-tainted_args-1.c     |  3 -
 gcc/testsuite/gcc.dg/analyzer/fread-1.c       |  2 -
 gcc/testsuite/gcc.dg/analyzer/pr104029.c      |  3 -
 gcc/testsuite/gcc.dg/analyzer/pr93382.c       |  2 -
 .../gcc.dg/analyzer/taint-CVE-2011-2210-1.c   |  3 -
 .../gcc.dg/analyzer/taint-CVE-2020-13143-1.c  |  3 -
 .../gcc.dg/analyzer/taint-CVE-2020-13143-2.c  |  3 -
 .../gcc.dg/analyzer/taint-CVE-2020-13143.h    |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c |  2 -
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-alloc-5.c |  3 -
 .../gcc.dg/analyzer/taint-assert-BUG_ON.c     |  3 -
 .../analyzer/taint-assert-macro-expansion.c   |  3 -
 .../analyzer/taint-assert-system-header.c     |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-assert.c  |  3 -
 .../gcc.dg/analyzer/taint-divisor-1.c         |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-merger.c  |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-ops.c     |  1 +
 .../gcc.dg/analyzer/taint-read-index-1.c      |  3 -
 .../gcc.dg/analyzer/taint-read-offset-1.c     |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-realloc.c |  3 -
 gcc/testsuite/gcc.dg/analyzer/taint-size-1.c  |  3 -
 .../analyzer/taint-size-access-attr-1.c       |  3 +-
 .../gcc.dg/analyzer/taint-write-index-1.c     |  3 -
 .../gcc.dg/analyzer/taint-write-offset-1.c    |  3 -
 .../analyzer/torture/taint-read-index-2.c     |  2 -
 .../analyzer/torture/taint-read-index-3.c     |  2 -
 .../plugin/taint-CVE-2011-0521-1-fixed.c      |  2 -
 .../gcc.dg/plugin/taint-CVE-2011-0521-1.c     |  2 -
 .../plugin/taint-CVE-2011-0521-2-fixed.c      |  5 --
 .../gcc.dg/plugin/taint-CVE-2011-0521-2.c     |  2 -
 .../plugin/taint-CVE-2011-0521-3-fixed.c      |  5 --
 .../gcc.dg/plugin/taint-CVE-2011-0521-3.c     |  2 -
 .../gcc.dg/plugin/taint-CVE-2011-0521-4.c     |  4 +-
 .../plugin/taint-CVE-2011-0521-5-fixed.c      |  3 +-
 .../gcc.dg/plugin/taint-CVE-2011-0521-5.c     |  3 +-
 .../gcc.dg/plugin/taint-CVE-2011-0521-6.c     |  3 +-
 .../gcc.dg/plugin/taint-antipatterns-1.c      |  3 +-
 43 files changed, 24 insertions(+), 159 deletions(-)

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index a2b442a4ef2..d69b05fa020 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -1,4 +1,4 @@
-/* An experimental state machine, for tracking "taint": unsanitized uses
+/* A state machine for tracking "taint": unsanitized uses
    of data potentially under an attacker's control.
 
    Copyright (C) 2019-2022 Free Software Foundation, Inc.
diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc
index 1f329cb11d0..31bcf6e2f8e 100644
--- a/gcc/analyzer/sm.cc
+++ b/gcc/analyzer/sm.cc
@@ -188,10 +188,14 @@ make_checkers (auto_delete_vec <state_machine> &out, logger *logger)
   out.safe_push (make_malloc_state_machine (logger));
   out.safe_push (make_fileptr_state_machine (logger));
   out.safe_push (make_fd_state_machine (logger));
+#if 1
+  out.safe_push (make_taint_state_machine (logger));
+#else
   /* The "taint" checker must be explicitly enabled (as it currently
      leads to state explosions that stop the other checkers working).  */
   if (flag_analyzer_checker)
     out.safe_push (make_taint_state_machine (logger));
+#endif
   out.safe_push (make_sensitive_state_machine (logger));
   out.safe_push (make_signal_state_machine (logger));
   out.safe_push (make_va_list_state_machine (logger));
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 80d942917bd..a76e38c1c4f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10093,6 +10093,12 @@ Enabling this option effectively enables the following warnings:
 -Wanalyzer-shift-count-negative @gol
 -Wanalyzer-shift-count-overflow @gol
 -Wanalyzer-stale-setjmp-buffer @gol
+-Wanalyzer-tainted-allocation-size @gol
+-Wanalyzer-tainted-array-index @gol
+-Wanalyzer-tainted-assertion @gol
+-Wanalyzer-tainted-divisor @gol
+-Wanalyzer-tainted-offset @gol
+-Wanalyzer-tainted-size @gol
 -Wanalyzer-unsafe-call-within-signal-handler @gol
 -Wanalyzer-use-after-free @gol
 -Wanalyzer-use-of-pointer-in-stale-stack-frame @gol
@@ -10104,13 +10110,6 @@ Enabling this option effectively enables the following warnings:
 -Wanalyzer-write-to-const @gol
 -Wanalyzer-write-to-string-literal @gol
 }
-@ignore
--Wanalyzer-tainted-allocation-size @gol
--Wanalyzer-tainted-array-index @gol
--Wanalyzer-tainted-divisor @gol
--Wanalyzer-tainted-offset @gol
--Wanalyzer-tainted-size @gol
-@end ignore
 
 This option is only available if GCC was configured with analyzer
 support enabled.
@@ -10532,8 +10531,7 @@ no longer exists, and likely lead to a crash (or worse).
 @item -Wno-analyzer-tainted-allocation-size
 @opindex Wanalyzer-tainted-allocation-size
 @opindex Wno-analyzer-tainted-allocation-size
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-allocation-size} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10548,8 +10546,7 @@ See @uref{https://cwe.mitre.org/data/definitions/789.html, CWE-789: Memory Alloc
 @opindex Wanalyzer-tainted-assertion
 @opindex Wno-analyzer-tainted-assertion
 
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-assertion} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10610,8 +10607,7 @@ despite the above not being an assertion failure, strictly speaking.
 @item -Wno-analyzer-tainted-array-index
 @opindex Wanalyzer-tainted-array-index
 @opindex Wno-analyzer-tainted-array-index
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-array-index} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10624,8 +10620,7 @@ See @uref{https://cwe.mitre.org/data/definitions/129.html, CWE-129: Improper Val
 @item -Wno-analyzer-tainted-divisor
 @opindex Wanalyzer-tainted-divisor
 @opindex Wno-analyzer-tainted-divisor
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-divisor} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10638,8 +10633,7 @@ See @uref{https://cwe.mitre.org/data/definitions/369.html, CWE-369: Divide By Ze
 @item -Wno-analyzer-tainted-offset
 @opindex Wanalyzer-tainted-offset
 @opindex Wno-analyzer-tainted-offset
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-offset} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10652,8 +10646,7 @@ See @uref{https://cwe.mitre.org/data/definitions/823.html, CWE-823: Use of Out-o
 @item -Wno-analyzer-tainted-size
 @opindex Wanalyzer-tainted-size
 @opindex Wno-analyzer-tainted-size
-This warning requires both @option{-fanalyzer} and
-@option{-fanalyzer-checker=taint} to enable it;
+This warning requires @option{-fanalyzer} which enables it;
 use @option{-Wno-analyzer-tainted-size} to disable it.
 
 This diagnostic warns for paths through the code in which a value
@@ -10897,38 +10890,6 @@ call site, and that are sufficiently complicated (as per
 @opindex fanalyzer-checker
 Restrict the analyzer to run just the named checker, and enable it.
 
-Some checkers are disabled by default (even with @option{-fanalyzer}),
-such as the @code{taint} checker that implements
-@option{-Wanalyzer-tainted-array-index}, and this option is required
-to enable them.
-
-@emph{Note:} currently, @option{-fanalyzer-checker=taint} disables the
-following warnings from @option{-fanalyzer}:
-
-@gccoptlist{ @gol
--Wanalyzer-deref-before-check @gol
--Wanalyzer-double-fclose @gol
--Wanalyzer-double-free @gol
--Wanalyzer-exposure-through-output-file @gol
--Wanalyzer-fd-access-mode-mismatch @gol
--Wanalyzer-fd-double-close @gol
--Wanalyzer-fd-leak @gol
--Wanalyzer-fd-use-after-close @gol
--Wanalyzer-fd-use-without-check @gol
--Wanalyzer-file-leak @gol
--Wanalyzer-free-of-non-heap @gol
--Wanalyzer-malloc-leak @gol
--Wanalyzer-mismatching-deallocation @gol
--Wanalyzer-null-argument @gol
--Wanalyzer-null-dereference @gol
--Wanalyzer-possible-null-argument @gol
--Wanalyzer-possible-null-dereference @gol
--Wanalyzer-unsafe-call-within-signal-handler @gol
--Wanalyzer-use-after-free @gol
--Wanalyzer-va-list-leak @gol
--Wanalyzer-va-list-use-after-va-end @gol
-}
-
 @item -fno-analyzer-feasibility
 @opindex fanalyzer-feasibility
 @opindex fno-analyzer-feasibility
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
index e1d87c9cece..aaae0276deb 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-tainted_args-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 
 struct arg_buf
diff --git a/gcc/testsuite/gcc.dg/analyzer/fread-1.c b/gcc/testsuite/gcc.dg/analyzer/fread-1.c
index 593cb7f4aa0..467467ee8a6 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fread-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fread-1.c
@@ -1,5 +1,3 @@
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 typedef __SIZE_TYPE__ size_t;
 
 extern size_t fread (void *, size_t, size_t, void *);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr104029.c b/gcc/testsuite/gcc.dg/analyzer/pr104029.c
index adf15ed356f..46f1041939d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr104029.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr104029.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 typedef __SIZE_TYPE__ size_t;
 typedef const void *t_comptype;
 typedef int (*t_compfunc)(t_comptype, t_comptype);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr93382.c b/gcc/testsuite/gcc.dg/analyzer/pr93382.c
index 1e6612ddc05..91eab2192ad 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr93382.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr93382.c
@@ -1,5 +1,3 @@
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 typedef __SIZE_TYPE__ size_t;
 
 int idx;
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
index b44be993568..fa89bda6f0f 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2011-2210-1.c
@@ -7,9 +7,6 @@
    Fixed in 3d0475119d8722798db5e88f26493f6547a4bb5b on linux-2.6.39.y
    in linux-stable.  */
 
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include "test-uaccess.h"
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
index 328c5799145..1b81c1bb1f8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-1.c
@@ -1,9 +1,6 @@
 /* See notes in this header.  */
 #include "taint-CVE-2020-13143.h"
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 struct configfs_attribute {
 	/* [...snip...] */
 	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
index c74a460b01e..f53e42bd6aa 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143-2.c
@@ -1,9 +1,6 @@
 /* See notes in this header.  */
 #include "taint-CVE-2020-13143.h"
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 struct configfs_attribute {
 	/* [...snip...] */
 	ssize_t (*store)(struct config_item *, const char *, size_t) /* { dg-message "\\(1\\) field 'store' of 'struct configfs_attribute' is marked with '__attribute__\\(\\(tainted_args\\)\\)'" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
index 0ba023539af..93f90d49013 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-CVE-2020-13143.h
@@ -8,9 +8,6 @@
    Fixed by 15753588bcd4bbffae1cca33c8ced5722477fe1f on linux-5.7.y
    in linux-stable.  */
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include <stddef.h>
 
 /* Adapted from include/uapi/asm-generic/posix_types.h  */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
index cb2db6c69cf..dfb585bc613 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-1.c
@@ -1,5 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
 /* { dg-require-effective-target alloca } */
 
 #include "analyzer-decls.h"
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
index 72dbca5cbf0..68fbce9188c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-2.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
index 80d8f0b8247..ce6a3271d2a 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-3.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
index bd47097b1d5..9df9422491c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-4.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-5.c b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-5.c
index 9a159800c61..18dbff0298e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-alloc-5.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-alloc-5.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 
 struct foo
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-assert-BUG_ON.c b/gcc/testsuite/gcc.dg/analyzer/taint-assert-BUG_ON.c
index 8aef0a44a6d..328940d2983 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-assert-BUG_ON.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-assert-BUG_ON.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* We need this, otherwise the warnings are emitted inside the macros, which
    makes it hard to write the DejaGnu directives.  */
 /* { dg-additional-options " -ftrack-macro-expansion=0" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-assert-macro-expansion.c b/gcc/testsuite/gcc.dg/analyzer/taint-assert-macro-expansion.c
index 24b175a0973..78357ae62b8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-assert-macro-expansion.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-assert-macro-expansion.c
@@ -2,9 +2,6 @@
    -Wanalyzer-tainted-assertion with macro-tracking enabled
    (the default).  */
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* { dg-additional-options "-fdiagnostics-show-path-depths" } */
 /* { dg-additional-options "-fdiagnostics-path-format=inline-events -fdiagnostics-show-caret" } */
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-assert-system-header.c b/gcc/testsuite/gcc.dg/analyzer/taint-assert-system-header.c
index a65853c7886..bd47ab79188 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-assert-system-header.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-assert-system-header.c
@@ -3,9 +3,6 @@
    (the default), where the assertion macro is defined in
    a system header.  */
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* { dg-additional-options "-fdiagnostics-show-path-depths" } */
 /* { dg-additional-options "-fdiagnostics-path-format=inline-events -fdiagnostics-show-caret" } */
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-assert.c b/gcc/testsuite/gcc.dg/analyzer/taint-assert.c
index b09f8c51a16..855ed5f705f 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-assert.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-assert.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* We need this, otherwise the warnings are emitted inside the macros, which
    makes it hard to write the DejaGnu directives.  */
 /* { dg-additional-options " -ftrack-macro-expansion=0" } */
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
index b7c1faef1c4..438a2095382 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-divisor-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-merger.c b/gcc/testsuite/gcc.dg/analyzer/taint-merger.c
index e4e48f3db03..b7d562b9704 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-merger.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-merger.c
@@ -1,6 +1,3 @@
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-// TODO: remove need for this option
-
 #include "analyzer-decls.h"
 
 int v_start;
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-ops.c b/gcc/testsuite/gcc.dg/analyzer/taint-ops.c
index 729dbe53a0c..753500d5693 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-ops.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-ops.c
@@ -2,6 +2,7 @@
 // TODO: remove need for this option
 /* This test can probably be removed when -fanalyzer enables
    the taint checker by default.  */
+// FIXME: remove it?
 
 #include "analyzer-decls.h"
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-read-index-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-read-index-1.c
index 71c0816fd1f..1ec78b52629 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-read-index-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-read-index-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
index 6db59bcc615..bb5d0930cdb 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-read-offset-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-realloc.c b/gcc/testsuite/gcc.dg/analyzer/taint-realloc.c
index bd0ed0010fb..27e36afea64 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-realloc.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-realloc.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-size-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
index 0b166f7a86a..e0fa3fb0cb3 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-size-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include "analyzer-decls.h"
 #include <stdio.h>
 #include <stdlib.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
index 7d243a9570f..d4da3d77fa1 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-size-access-attr-1.c
@@ -1,8 +1,7 @@
 /* Passing tainted sizes to external functions with attribute ((access)) with
    a size-index.  */
 
-// TODO: remove need for the explicit taint option:
-/* { dg-additional-options "-fanalyzer-checker=taint -fanalyzer-show-duplicate-count" } */
+/* { dg-additional-options "-fanalyzer-show-duplicate-count" } */
 
 #include "analyzer-decls.h"
 #include <stdio.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
index cc7ab1ca4f6..62222069578 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-write-index-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c b/gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
index d0df6220315..21794ce4cf8 100644
--- a/gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/taint-write-offset-1.c
@@ -1,6 +1,3 @@
-// TODO: remove need for this option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-2.c b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-2.c
index b3dc177cb14..81421330e8d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-2.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-2.c
@@ -1,5 +1,3 @@
-// TODO: remove need for the taint option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
 /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
 
 #define LOWER_LIMIT 5
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c
index 8eb6061a08b..86bdedede7e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/taint-read-index-3.c
@@ -1,5 +1,3 @@
-// TODO: remove need for the taint option:
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
 /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } { "" } } */
 
 struct raw_ep {
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1-fixed.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1-fixed.c
index 0ca8137c3ef..7ca0c8ee48b 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1-fixed.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1-fixed.c
@@ -1,6 +1,4 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1.c
index cde12b3b761..6aa62a30e42 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-1.c
@@ -1,6 +1,4 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2-fixed.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2-fixed.c
index 8a211cefe4e..b962e9dfb79 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2-fixed.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2-fixed.c
@@ -1,14 +1,9 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
 #include "taint-CVE-2011-0521.h"
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
 
 int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2.c
index 30cab38e002..70bb4b35320 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-2.c
@@ -1,6 +1,4 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3-fixed.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3-fixed.c
index b7852b40dcf..df8ba5b4e27 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3-fixed.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3-fixed.c
@@ -1,14 +1,9 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
 #include "taint-CVE-2011-0521.h"
 
-// TODO: remove need for this option
-/* { dg-additional-options "-fanalyzer-checker=taint" } */
-
 /* Adapted from drivers/media/dvb/ttpci/av7110_ca.c  */
 
 int dvb_ca_ioctl(struct file *file, unsigned int cmd, void *parg)
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3.c
index 6b9e034eea7..183348faa71 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-3.c
@@ -1,6 +1,4 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-4.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-4.c
index f314c64ce70..3a0e1ff76ca 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-4.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-4.c
@@ -1,8 +1,6 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
 // TODO: remove need for --param=analyzer-max-svalue-depth=25 here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint --param=analyzer-max-svalue-depth=25" } */
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
+/* { dg-options "-fanalyzer --param=analyzer-max-svalue-depth=25" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c
index 8cb067c6542..c22d1481900 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
 // TODO: remove need for --param=analyzer-max-svalue-depth=25 here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint --param=analyzer-max-svalue-depth=25" } */
+/* { dg-options "-fanalyzer --param=analyzer-max-svalue-depth=25" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5.c
index 4ce047902d3..66f334a61a1 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-5.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
 // TODO: remove need for --param=analyzer-max-svalue-depth=25 here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint --param=analyzer-max-svalue-depth=25" } */
+/* { dg-options "-fanalyzer --param=analyzer-max-svalue-depth=25" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-6.c b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-6.c
index c54af790a56..5af799158b1 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-6.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-CVE-2011-0521-6.c
@@ -1,7 +1,6 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
 // TODO: remove need for --param=analyzer-max-svalue-depth=25 here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint --param=analyzer-max-svalue-depth=25" } */
+/* { dg-options "-fanalyzer --param=analyzer-max-svalue-depth=25" } */
 /* { dg-require-effective-target analyzer } */
 
 /* See notes in this header.  */
diff --git a/gcc/testsuite/gcc.dg/plugin/taint-antipatterns-1.c b/gcc/testsuite/gcc.dg/plugin/taint-antipatterns-1.c
index 6bb6f1be25c..cdd9a4f1f50 100644
--- a/gcc/testsuite/gcc.dg/plugin/taint-antipatterns-1.c
+++ b/gcc/testsuite/gcc.dg/plugin/taint-antipatterns-1.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-// TODO: remove need for -fanalyzer-checker=taint here:
-/* { dg-options "-fanalyzer -fanalyzer-checker=taint" } */
+/* { dg-options "-fanalyzer" } */
 /* { dg-require-effective-target analyzer } */
 
 #include "test-uaccess.h"
-- 
2.26.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-20 22:50               ` David Malcolm
@ 2023-03-26 16:03                 ` Shengyu Huang
  2023-03-26 17:14                   ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-03-26 16:03 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

Hi Dave,

(I forgot to cc the list in the last email and it was too late to unsend. Sorry for sending you the same email again.)

> On 20 Mar 2023, at 23:50, David Malcolm <dmalcolm@redhat.com <mailto:dmalcolm@redhat.com>> wrote:
> 
> I think if you try the patch to sm.cc <http://sm.cc/> above, then you'll see
> various existing DejaGnu tests below gcc.dg/analyzer will fail with
> state explosions.

After patching on the latest trunk, the DejaGnu tests report two cases with state explosion:

pr93032-mztools-{signed, unsigned}-char.c

I didn’t see any cases with ICE though.

In addition, although I did see “warning: terminating analysis for this program point…” in the test log, nothing was reported when I ran the individual test (with or without gdb)…Did I miss anything?

Just by looking at these test files, it seems that it may have to do with how the analyzer does path selection, because there are many nested conditionals in these two files. As I mentioned in the proposal, it would be curious if this state explosion only happens for taint analysis, because I don’t think there is anything special about taint analysis that would cause state explosion (unless there is some buggy implementation?).

I will look at your latest patch. It seems that there are many useful tips that can help me further investigate the internals of analyzer. Thanks a lot!

Best,
Shengyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-26 16:03                 ` Shengyu Huang
@ 2023-03-26 17:14                   ` David Malcolm
  2023-03-26 21:46                     ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-03-26 17:14 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Sun, 2023-03-26 at 18:03 +0200, Shengyu Huang wrote:
> Hi Dave,
> 
> (I forgot to cc the list in the last email and it was too late to
> unsend. Sorry for sending you the same email again.)
> 
> > On 20 Mar 2023, at 23:50, David Malcolm
> > <dmalcolm@redhat.com <mailto:dmalcolm@redhat.com>> wrote:
> > 
> > I think if you try the patch to sm.cc <http://sm.cc/> above, then
> > you'll see
> > various existing DejaGnu tests below gcc.dg/analyzer will fail with
> > state explosions.
> 
> After patching on the latest trunk, the DejaGnu tests report two
> cases with state explosion:
> 
> pr93032-mztools-{signed, unsigned}-char.c
> 
> I didn’t see any cases with ICE though.
> 
> In addition, although I did see “warning: terminating analysis for
> this program point…” in the test log, nothing was reported when I ran
> the individual test (with or without gdb)…Did I miss anything?

The warning is coming from -Wanalyzer-too-complex.  This is disabled by
default with -fanalyzer, so you won't see it if you try to compile the
.c file "by hand", but the testsuite enables it by default (in
analyzer.exp).

> 
> Just by looking at these test files, it seems that it may have to do
> with how the analyzer does path selection, because there are many
> nested conditionals in these two files. As I mentioned in the
> proposal, it would be curious if this state explosion only happens
> for taint analysis, because I don’t think there is anything special
> about taint analysis that would cause state explosion (unless there
> is some buggy implementation?).

I has looked into compiling those files with the patch some time ago;
looking at my notes, one issue was with this on-stack buffer:
    char extra[1024];
declared outside the loop.  Inside the loop, it gets modified in
various ways:
    extra[0] = '\0';
and
    if (fread(extra, 1, extsize, fpZip) == extsize) {
where the latter means "extra" becomes tainted.

However "extra" is barely used, and is effectively reset each time
through the loop - but the analyzer doesn't figure that out.  So the
loop analysis explodes, as it tries to keep track of the possibility
that "extra" is still tainted from previous iteration(s), despite the
fact that it's going to be clobbered before it ever gets used.

So one fix might be to extend the state-purging code so that it somehow
"sees" that "extra" gets clobbered before it gets used, and thus we can
purge the tainted state from it.

Hope that makes sense
Dave


> 
> I will look at your latest patch. It seems that there are many useful
> tips that can help me further investigate the internals of analyzer.
> Thanks a lot!
> 
> Best,
> Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-26 17:14                   ` David Malcolm
@ 2023-03-26 21:46                     ` Shengyu Huang
  2023-04-01 14:19                       ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-03-26 21:46 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Hi Dave,

> On 26 Mar 2023, at 19:14, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> I has looked into compiling those files with the patch some time ago;
> looking at my notes, one issue was with this on-stack buffer:
>    char extra[1024];
> declared outside the loop.  Inside the loop, it gets modified in
> various ways:
>    extra[0] = '\0';
> and
>    if (fread(extra, 1, extsize, fpZip) == extsize) {
> where the latter means "extra" becomes tainted.
> 
> However "extra" is barely used, and is effectively reset each time
> through the loop - but the analyzer doesn't figure that out.  So the
> loop analysis explodes, as it tries to keep track of the possibility
> that "extra" is still tainted from previous iteration(s), despite the
> fact that it's going to be clobbered before it ever gets used.
> 
> So one fix might be to extend the state-purging code so that it somehow
> "sees" that "extra" gets clobbered before it gets used, and thus we can
> purge the tainted state from it.

Thanks for your notes. I think we may be talking about the same thing? If you look at the updated proposal (I have changed it quite a lot since I first sent it out), you’ll see there is one relevant paper for state merging (although it is slightly different from state purging, I think the goal and general methodology is similar): https://dslab.epfl.ch/pubs/stateMerging.pdf 

I was trying to say if some similar situation happened for other types of checkers, I expected state explosion would also happen. I tried to construct a similar example (with the same kind of reset and nested conditionals + a loop) but for double-free, so far no success yet. I’ll pick it up afterwards, at latest by next Saturday, because I need to prepare for a coming midterm on Friday. I will also put this test case to the proposal because it seems like a very good starting point for the project.

Best,
Shengyu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-03-26 21:46                     ` Shengyu Huang
@ 2023-04-01 14:19                       ` Shengyu Huang
  2023-04-02 22:53                         ` David Malcolm
  0 siblings, 1 reply; 30+ messages in thread
From: Shengyu Huang @ 2023-04-01 14:19 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 2729 bytes --]

Hi Dave,

>> 
>> I has looked into compiling those files with the patch some time ago;
>> looking at my notes, one issue was with this on-stack buffer:
>>    char extra[1024];
>> declared outside the loop.  Inside the loop, it gets modified in
>> various ways:
>>    extra[0] = '\0';
>> and
>>    if (fread(extra, 1, extsize, fpZip) == extsize) {
>> where the latter means "extra" becomes tainted.
>> 
>> However "extra" is barely used, and is effectively reset each time
>> through the loop - but the analyzer doesn't figure that out.  So the
>> loop analysis explodes, as it tries to keep track of the possibility
>> that "extra" is still tainted from previous iteration(s), despite the
>> fact that it's going to be clobbered before it ever gets used.
>> 
>> So one fix might be to extend the state-purging code so that it somehow
>> "sees" that "extra" gets clobbered before it gets used, and thus we can
>> purge the tainted state from it.
> 
> Thanks for your notes. I think we may be talking about the same thing? If you look at the updated proposal (I have changed it quite a lot since I first sent it out), you’ll see there is one relevant paper for state merging (although it is slightly different from state purging, I think the goal and general methodology is similar): https://dslab.epfl.ch/pubs/stateMerging.pdf 
> 
> I was trying to say if some similar situation happened for other types of checkers, I expected state explosion would also happen. I tried to construct a similar example (with the same kind of reset and nested conditionals + a loop) but for double-free, so far no success yet. I’ll pick it up afterwards, at latest by next Saturday, because I need to prepare for a coming midterm on Friday. I will also put this test case to the proposal because it seems like a very good starting point for the project.

As promised, below is a small example that causes state explosion without taint state machine involved.

void test()
{
  void *p;
  int a;
  scanf(“%d", &a);
  
  while (a > 0)
  {
     p = malloc (1024);
     if (a > 1)
       free(p);
     a--;
   }
   
   if (a >0)
     free(p);
}

This example not only causes state explosion, but also reports false positive of double-free.

By the way, do you have any feedback regarding my proposal (https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv13qe1W0Bj0g <https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv13qe1W0Bj0g/edit>)? I am happy to allocate more time polishing the proposal if you find anything off there. If you prefer me sending it via email again (for ease of reference in the future maybe?), I am happy to do so as well.

Best,
Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-04-01 14:19                       ` Shengyu Huang
@ 2023-04-02 22:53                         ` David Malcolm
  2023-04-03  0:02                           ` Shengyu Huang
  0 siblings, 1 reply; 30+ messages in thread
From: David Malcolm @ 2023-04-02 22:53 UTC (permalink / raw)
  To: Shengyu Huang; +Cc: GCC Development

On Sat, 2023-04-01 at 16:19 +0200, Shengyu Huang wrote:
> Hi Dave,
> 
> > > 
> > > I has looked into compiling those files with the patch some time
> > > ago;
> > > looking at my notes, one issue was with this on-stack buffer:
> > >    char extra[1024];
> > > declared outside the loop.  Inside the loop, it gets modified in
> > > various ways:
> > >    extra[0] = '\0';
> > > and
> > >    if (fread(extra, 1, extsize, fpZip) == extsize) {
> > > where the latter means "extra" becomes tainted.
> > > 
> > > However "extra" is barely used, and is effectively reset each
> > > time
> > > through the loop - but the analyzer doesn't figure that out.  So
> > > the
> > > loop analysis explodes, as it tries to keep track of the
> > > possibility
> > > that "extra" is still tainted from previous iteration(s), despite
> > > the
> > > fact that it's going to be clobbered before it ever gets used.
> > > 
> > > So one fix might be to extend the state-purging code so that it
> > > somehow
> > > "sees" that "extra" gets clobbered before it gets used, and thus
> > > we can
> > > purge the tainted state from it.
> > 
> > Thanks for your notes. I think we may be talking about the same
> > thing? If you look at the updated proposal (I have changed it quite
> > a lot since I first sent it out), you’ll see there is one relevant
> > paper for state merging (although it is slightly different from
> > state purging, I think the goal and general methodology is
> > similar): https://dslab.epfl.ch/pubs/stateMerging.pdf 
> > 
> > I was trying to say if some similar situation happened for other
> > types of checkers, I expected state explosion would also happen. I
> > tried to construct a similar example (with the same kind of reset
> > and nested conditionals + a loop) but for double-free, so far no
> > success yet. I’ll pick it up afterwards, at latest by next
> > Saturday, because I need to prepare for a coming midterm on Friday.
> > I will also put this test case to the proposal because it seems
> > like a very good starting point for the project.
> 
> As promised, below is a small example that causes state explosion
> without taint state machine involved.
> 
> void test()
> {
>   void *p;
>   int a;
>   scanf(“%d", &a);
>   
>   while (a > 0)
>   {
>      p = malloc (1024);
>      if (a > 1)
>        free(p);
>      a--;
>    }
>    
>    if (a >0)
>      free(p);
> }
> 
> This example not only causes state explosion, but also reports false
> positive of double-free.

(nods)

Yeah, our handling of loops isn't great.  There's plenty of opportunity
within a GSoC project for tackling that.

> 
> By the way, do you have any feedback regarding my proposal
> (https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv
> 13qe1W0Bj0g <
> https://docs.google.com/document/d/1MRI1R5DaX8kM6DaqRQsEri5Mx2FvHmWv1
> 3qe1W0Bj0g/edit>)? I am happy to allocate more time polishing the
> proposal if you find anything off there. If you prefer me sending it
> via email again (for ease of reference in the future maybe?), I am
> happy to do so as well.

Thanks for the proposal. 

Overall, it looks great.  Some notes:
- maybe specify the *GCC* static analyzer you first mention it
- you talk about "timeout" warnings.  The analyzer already can emit a
"timeout" warning of sorts, via -Wanalyzer-too-complex, though this is
based on the complexity of the exploded graph (e.g. # of nodes), rather
than actual timings.  Is the latter the kind of thing you had in mind,
or where you thinking about ways of making the "too complex" heuristics
smarter?  (I confess that you seem much more familiar with the theory
of this than I am!)
- the numbering of your references seems to have gotten out-of-sync; I
see references to [3] as a paper "Schwartz et al", but that's a link to
one of my blog posts.
- do you a link to a github account, or somewhere else that
demonstrates code you've written?  In particular, how is your C++ ?

Note that the deadline for submitting proposals to the official GSoC
website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that
Google are very strict about that deadline; see:
https://developers.google.com/open-source/gsoc/timeline

Good luck
Dave


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [GSoC][Static Analyzer] First proposal draft and a few more questions/requests
  2023-04-02 22:53                         ` David Malcolm
@ 2023-04-03  0:02                           ` Shengyu Huang
  0 siblings, 0 replies; 30+ messages in thread
From: Shengyu Huang @ 2023-04-03  0:02 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Development

[-- Attachment #1: Type: text/plain, Size: 3963 bytes --]

Hi Dave,

> Overall, it looks great.  Some notes:
> - maybe specify the *GCC* static analyzer you first mention it

Done.

> - you talk about "timeout" warnings.  The analyzer already can emit a
> "timeout" warning of sorts, via -Wanalyzer-too-complex, though this is
> based on the complexity of the exploded graph (e.g. # of nodes), rather
> than actual timings.  Is the latter the kind of thing you had in mind,
> or where you thinking about ways of making the "too complex" heuristics
> smarter?  (I confess that you seem much more familiar with the theory
> of this than I am!)

I was not ware of `-Wanalyzer-too-complex` when I wrote that proposal, and I forgot to rewrite this part. I planned to ask you why we did not turn on this flag by default. To avoid state explosion altogether, it is for sure that we need to bear with false positives in some cases. I am not yet sure what is a good approach to balance the soundness and completeness in symbolic execution, but my intuition (just based on my limited experience with other kinds of formal methods) is that we don’t want to avoid state explosion in all cases because we want to have more precision (that is, we don’t want too many false positives). Imagine a dummy static analyzer that just reports warnings regardless the program. It will not have any state explosion problems, but it will have lots of false positives. Therefore, I think we should consider turning it on by default. Maybe you have other considerations that I missed?

Another point but irrelevant for this project is that we will surely encounter timeout when we integrate SMT solvers in the future (I don’t know whether it is the plan for GCC14). It is just unavoidable…the current approach does not sound transferable to the timeout issued by, say, Z3. Maybe we want a unified approach at some point?

Anyway, this part does not seem too urgent anymore after I know the flag -Wanalyzer-too-complex exists…if you have some working solution in terms of how to handle timeout from SMT solvers, I’d be happy to know.

> - the numbering of your references seems to have gotten out-of-sync; I
> see references to [3] as a paper "Schwartz et al", but that's a link to
> one of my blog posts.

Thanks for letting me know that. Indeed I forgot to fix the numbering after adding your blog to the references.

> - do you a link to a github account, or somewhere else that
> demonstrates code you've written?  In particular, how is your C++ ?
> 

My Github account is https://github.com/kumom, but I would not post any code from my course projects there since it will violate honor code and promote plagiarism (I will attach a small? lab project to you in another private email). I have taken courses like systems programming and computer architecture, where I wrote plenty of C code and some C++ code. For C++, I’ve written maybe just a few thousand lines of code. Unfortunately, in all my previous jobs as student assistant where I coded mainly in Python and TypeScript, my code was neither open source nor owned by me…Now I am working on a semester project (on formal verification) using Dafny and a course project (on compiler design) using Scala, but I admit they are a bit far from C++. I have planned to read Effective C++ after the Easter break before you raised this question, but maybe you can recommend something else that you find helpful. Since I am relative familiar with programming language concepts in general, I believe I will get more fluent at C++ within a short amount of time once I get my hands dirty.

> Note that the deadline for submitting proposals to the official GSoC
> website is April 4 - 18:00 UTC (i.e. this coming Tuesday) and that
> Google are very strict about that deadline; see:
> https://developers.google.com/open-source/gsoc/timeline

Thanks for the reminder. I have kept this in mind and will submit it before the deadline.

Best,
Shengyu


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-04-03  0:03 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-21 21:26 [GSoC][Static Analyzer] Some questions and request for a small patch to work on Shengyu Huang
2023-02-21 22:55 ` David Malcolm
2023-02-22 14:03   ` Jonathan Wakely
2023-02-23 11:17     ` James K. Lowden
2023-02-23 16:30       ` Jonathan Wakely
2023-02-22 14:11   ` Shengyu Huang
2023-02-22 14:53     ` Iain Sandoe
2023-02-22 15:13       ` Iain Sandoe
2023-02-27 13:35       ` Shengyu Huang
2023-02-27 13:45         ` Iain Sandoe
2023-02-27 13:51           ` Shengyu Huang
2023-02-27 13:49         ` Iain Sandoe
2023-02-27 14:01           ` Floyd, Paul
2023-02-22 15:43     ` David Malcolm
2023-02-28  9:18       ` Shengyu Huang
2023-02-28 23:59         ` David Malcolm
2023-03-01 11:16           ` Shengyu Huang
2023-03-01 13:48             ` David Malcolm
2023-02-28 14:46     ` [GSoC][Static Analyzer] Ideas for proposal Shengyu Huang
2023-03-01  0:22       ` David Malcolm
2023-03-12 22:20         ` Shengyu Huang
2023-03-13 15:51           ` David Malcolm
2023-03-20 17:28             ` [GSoC][Static Analyzer] First proposal draft and a few more questions/requests Shengyu Huang
2023-03-20 22:50               ` David Malcolm
2023-03-26 16:03                 ` Shengyu Huang
2023-03-26 17:14                   ` David Malcolm
2023-03-26 21:46                     ` Shengyu Huang
2023-04-01 14:19                       ` Shengyu Huang
2023-04-02 22:53                         ` David Malcolm
2023-04-03  0:02                           ` Shengyu Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).