GCC static analysis branch now available on Compiler Explorer

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* GCC static analysis branch now available on Compiler Explorer
@ 2019-12-10 15:47 David Malcolm
  2019-12-10 16:04 ` Marek Polacek
  2019-12-10 16:36 ` Martin Sebor
  0 siblings, 2 replies; 10+ messages in thread
From: David Malcolm @ 2019-12-10 15:47 UTC (permalink / raw)
  To: gcc

For the adventurous/curious, my static analyzer branch of GCC [1] is
now available on Compiler Explorer (aka godbolt.org) so you can try it
out without building it yourself.  [Thanks to Matt Godbolt, Patrick
Quist and others at the Compiler Explorer project]

On https://godbolt.org/ within the C and C++ languages, select
  "x86-64 gcc (static analysis)"
as the compiler (though strictly speaking only C is in-scope right
now).  It's configured to automatically inject -fanalyzer (just on this
site; it isn't the default in the branch).

Some precanned examples:
  * various malloc issues: https://godbolt.org/z/tnx65n
  * stdio issues: https://godbolt.org/z/4BP-Tj
  * fprintf in signal handler: https://godbolt.org/z/ew7mW6
  * tainted data affecting control flow: https://godbolt.org/z/3v8vSj
  * password-leakage: https://godbolt.org/z/pRPYiv
(the non-malloc examples are much more in "proof-of-concept" territory)

Would it make sense to add an "analyzer" component to our bugzilla,
even though this is still on a branch? (with me as default assignee)

Dave

[1] https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 15:47 GCC static analysis branch now available on Compiler Explorer David Malcolm
@ 2019-12-10 16:04 ` Marek Polacek
  2019-12-10 17:11   ` David Malcolm
  2019-12-10 16:36 ` Martin Sebor
  1 sibling, 1 reply; 10+ messages in thread
From: Marek Polacek @ 2019-12-10 16:04 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Tue, Dec 10, 2019 at 10:46:59AM -0500, David Malcolm wrote:
> For the adventurous/curious, my static analyzer branch of GCC [1] is
> now available on Compiler Explorer (aka godbolt.org) so you can try it
> out without building it yourself.  [Thanks to Matt Godbolt, Patrick
> Quist and others at the Compiler Explorer project]

Congrats!

> On https://godbolt.org/ within the C and C++ languages, select
>   "x86-64 gcc (static analysis)"
> as the compiler (though strictly speaking only C is in-scope right
> now).  It's configured to automatically inject -fanalyzer (just on this
> site; it isn't the default in the branch).
> 
> Some precanned examples:
>   * various malloc issues: https://godbolt.org/z/tnx65n
>   * stdio issues: https://godbolt.org/z/4BP-Tj
>   * fprintf in signal handler: https://godbolt.org/z/ew7mW6
>   * tainted data affecting control flow: https://godbolt.org/z/3v8vSj
>   * password-leakage: https://godbolt.org/z/pRPYiv
> (the non-malloc examples are much more in "proof-of-concept" territory)
> 
> Would it make sense to add an "analyzer" component to our bugzilla,
> even though this is still on a branch? (with me as default assignee)

I think so, we have it for e.g. JIT already, and it's probably just a matter
of time before the analyzer is merged.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 15:47 GCC static analysis branch now available on Compiler Explorer David Malcolm
  2019-12-10 16:04 ` Marek Polacek
@ 2019-12-10 16:36 ` Martin Sebor
  2019-12-10 17:57   ` David Malcolm
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Sebor @ 2019-12-10 16:36 UTC (permalink / raw)
  To: David Malcolm, gcc

On 12/10/19 8:46 AM, David Malcolm wrote:
> For the adventurous/curious, my static analyzer branch of GCC [1] is
> now available on Compiler Explorer (aka godbolt.org) so you can try it
> out without building it yourself.  [Thanks to Matt Godbolt, Patrick
> Quist and others at the Compiler Explorer project]
> 
> On https://godbolt.org/ within the C and C++ languages, select
>    "x86-64 gcc (static analysis)"
> as the compiler (though strictly speaking only C is in-scope right
> now).  It's configured to automatically inject -fanalyzer (just on this
> site; it isn't the default in the branch).
> 
> Some precanned examples:
>    * various malloc issues: https://godbolt.org/z/tnx65n
>    * stdio issues: https://godbolt.org/z/4BP-Tj
>    * fprintf in signal handler: https://godbolt.org/z/ew7mW6
>    * tainted data affecting control flow: https://godbolt.org/z/3v8vSj
>    * password-leakage: https://godbolt.org/z/pRPYiv
> (the non-malloc examples are much more in "proof-of-concept" territory)
> 
> Would it make sense to add an "analyzer" component to our bugzilla,
> even though this is still on a branch? (with me as default assignee)

Yes, that would make sense to me for bugs/enhancements specific to
just the analyzer.

I think it's important to have a shared understanding how requests
for new warnings (either completely new features or enhancements to
existing ones) should be triaged in Bugzilla.  For instance, pr82520
is a request to enhance -Wreturn-local-addr to detect other forms of
escaping dangling pointers besides by returning them from functions.
I think the request (and a number of others) would be a good fit for
the analyzer as well, but assigning it to the analyzer component
would suggest that it's suitable only for it.  How do we track
requests that we'd like handled in both?

Martin

> 
> Dave
> 
> [1] https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 16:04 ` Marek Polacek
@ 2019-12-10 17:11   ` David Malcolm
  2019-12-10 17:20     ` David Malcolm
  0 siblings, 1 reply; 10+ messages in thread
From: David Malcolm @ 2019-12-10 17:11 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc

On Tue, 2019-12-10 at 11:04 -0500, Marek Polacek wrote:
> On Tue, Dec 10, 2019 at 10:46:59AM -0500, David Malcolm wrote:
[...]
> > Would it make sense to add an "analyzer" component to our bugzilla,
> > even though this is still on a branch? (with me as default
> > assignee)
> 
> I think so, we have it for e.g. JIT already, and it's probably just a
> matter
> of time before the analyzer is merged.

It turns out I had admin rights to do this, so I've gone ahead and
created an "analyzer" component for the "gcc" product within our
bugzilla; I'm the default assignee.

Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 17:11   ` David Malcolm
@ 2019-12-10 17:20     ` David Malcolm
  0 siblings, 0 replies; 10+ messages in thread
From: David Malcolm @ 2019-12-10 17:20 UTC (permalink / raw)
  To: Marek Polacek; +Cc: gcc

On Tue, 2019-12-10 at 12:11 -0500, David Malcolm wrote:
> On Tue, 2019-12-10 at 11:04 -0500, Marek Polacek wrote:
> > On Tue, Dec 10, 2019 at 10:46:59AM -0500, David Malcolm wrote:
> [...]
> > > Would it make sense to add an "analyzer" component to our
> > > bugzilla,
> > > even though this is still on a branch? (with me as default
> > > assignee)
> > 
> > I think so, we have it for e.g. JIT already, and it's probably just
> > a
> > matter
> > of time before the analyzer is merged.
> 
> It turns out I had admin rights to do this, so I've gone ahead and
> created an "analyzer" component for the "gcc" product within our
> bugzilla; I'm the default assignee.

I've also created an "analyzer branch" version within BZ for such bugs,
given that they'd be being filed against my branch, rather than trunk.

Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 16:36 ` Martin Sebor
@ 2019-12-10 17:57   ` David Malcolm
  2019-12-10 19:51     ` Martin Sebor
  0 siblings, 1 reply; 10+ messages in thread
From: David Malcolm @ 2019-12-10 17:57 UTC (permalink / raw)
  To: Martin Sebor, gcc

On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
> On 12/10/19 8:46 AM, David Malcolm wrote:
> > For the adventurous/curious, my static analyzer branch of GCC [1]
> > is
> > now available on Compiler Explorer (aka godbolt.org) so you can try
> > it
> > out without building it yourself.  [Thanks to Matt Godbolt, Patrick
> > Quist and others at the Compiler Explorer project]
> > 
> > On https://godbolt.org/ within the C and C++ languages, select
> >    "x86-64 gcc (static analysis)"
> > as the compiler (though strictly speaking only C is in-scope right
> > now).  It's configured to automatically inject -fanalyzer (just on
> > this
> > site; it isn't the default in the branch).
> > 
> > Some precanned examples:
> >    * various malloc issues: https://godbolt.org/z/tnx65n
> >    * stdio issues: https://godbolt.org/z/4BP-Tj
> >    * fprintf in signal handler: https://godbolt.org/z/ew7mW6
> >    * tainted data affecting control flow: 
> > https://godbolt.org/z/3v8vSj
> >    * password-leakage: https://godbolt.org/z/pRPYiv
> > (the non-malloc examples are much more in "proof-of-concept"
> > territory)
> > 
> > Would it make sense to add an "analyzer" component to our bugzilla,
> > even though this is still on a branch? (with me as default
> > assignee)
> 
> Yes, that would make sense to me for bugs/enhancements specific to
> just the analyzer.
> 
> I think it's important to have a shared understanding how requests
> for new warnings (either completely new features or enhancements to
> existing ones) should be triaged in Bugzilla.  For instance, pr82520
> is a request to enhance -Wreturn-local-addr to detect other forms of
> escaping dangling pointers besides by returning them from functions.
> I think the request (and a number of others) would be a good fit for
> the analyzer as well, but assigning it to the analyzer component
> would suggest that it's suitable only for it.  How do we track
> requests that we'd like handled in both?

It feels to me that the question of BZ component is a proxy for a
deeper scope question about what we want to handle in each warning.  If
it's clear that we want an issue handled by both the core code *and* by
the analyzer, then we should have two PRs.  Perhaps discuss in the PRs
which things we expect to be handled by which warning (e.g. some cases
might be capable of being caught by -Wreturn-local-addr, others might
require the deeper analysis of -fanalyzer).

Otherwise, have a single PR and assign it accordingly.

I'm not sure if that answers your question.

Dave

> Martin
> 
> > Dave
> > 
> > [1] https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > 
> > 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 17:57   ` David Malcolm
@ 2019-12-10 19:51     ` Martin Sebor
  2019-12-10 20:42       ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Sebor @ 2019-12-10 19:51 UTC (permalink / raw)
  To: David Malcolm, gcc

On 12/10/19 10:56 AM, David Malcolm wrote:
> On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
>> On 12/10/19 8:46 AM, David Malcolm wrote:
>>> For the adventurous/curious, my static analyzer branch of GCC [1]
>>> is
>>> now available on Compiler Explorer (aka godbolt.org) so you can try
>>> it
>>> out without building it yourself.  [Thanks to Matt Godbolt, Patrick
>>> Quist and others at the Compiler Explorer project]
>>>
>>> On https://godbolt.org/ within the C and C++ languages, select
>>>     "x86-64 gcc (static analysis)"
>>> as the compiler (though strictly speaking only C is in-scope right
>>> now).  It's configured to automatically inject -fanalyzer (just on
>>> this
>>> site; it isn't the default in the branch).
>>>
>>> Some precanned examples:
>>>     * various malloc issues: https://godbolt.org/z/tnx65n
>>>     * stdio issues: https://godbolt.org/z/4BP-Tj
>>>     * fprintf in signal handler: https://godbolt.org/z/ew7mW6
>>>     * tainted data affecting control flow:
>>> https://godbolt.org/z/3v8vSj
>>>     * password-leakage: https://godbolt.org/z/pRPYiv
>>> (the non-malloc examples are much more in "proof-of-concept"
>>> territory)
>>>
>>> Would it make sense to add an "analyzer" component to our bugzilla,
>>> even though this is still on a branch? (with me as default
>>> assignee)
>>
>> Yes, that would make sense to me for bugs/enhancements specific to
>> just the analyzer.
>>
>> I think it's important to have a shared understanding how requests
>> for new warnings (either completely new features or enhancements to
>> existing ones) should be triaged in Bugzilla.  For instance, pr82520
>> is a request to enhance -Wreturn-local-addr to detect other forms of
>> escaping dangling pointers besides by returning them from functions.
>> I think the request (and a number of others) would be a good fit for
>> the analyzer as well, but assigning it to the analyzer component
>> would suggest that it's suitable only for it.  How do we track
>> requests that we'd like handled in both?
> 
> It feels to me that the question of BZ component is a proxy for a
> deeper scope question about what we want to handle in each warning.  If
> it's clear that we want an issue handled by both the core code *and* by
> the analyzer, then we should have two PRs.  Perhaps discuss in the PRs
> which things we expect to be handled by which warning (e.g. some cases
> might be capable of being caught by -Wreturn-local-addr, others might
> require the deeper analysis of -fanalyzer).

You're quite right that the two questions are related, and it is
the latter that prompted my comment above.

My concern is that I'm not sure we have a good rule of thumb to
decide whether a given warning is better suited to just the analyzer
or just the middle-end, or that it should be implemented in both.
Or even a good basis for making such decisions.  We have not discussed
yet if we should even worry about overlap (duplicate warnings issued
by both the analyzer and the middle-end).  I asked a few questions to
help get more clarity here in my comments on the initial patch but only
heard from Jeff.

So here are some of my thoughts.  I'd be very interested in yours (or
anyone else's who cares about this subject).

I can think of no warning that's currently implemented in the middle-
end (or that could be implemented there) that would not also be
appropriate for the analyzer.  IIUC, it has the potential for finding
more problems but may be more noisy than a middle-end implementation,
and take longer to analyze code.  Are there any kinds of problems that
you think are not appropriate for it?  Any inherent limitations?

At the same time I also think there's room for and value in providing
at least some support in the middle-end for pretty much every warning
that the analyzer already exposes options for.  They may not find as
many problems as the analyzer eventually will but they don't have as
much overhead (either to implement, or in terms of compile time --
the analysis often benefits optimizations as well).  In addition, by
being tied to the optimizers the detection provides an opportunity
to also prevent the bugs (by folding invalid code away, or inserting
traps or branches around it, perhaps under the control of users as we
have discussed).

In my mind the only reason to have to choose between one and other
is the duplication of effort and resource constraints: we don't have
enough contributors into either area.  There isn't a lot we can about
the latter but we should make an effort to minimize the duplication
(starting with bug triage).

But maybe I'm overanalyzing it and playing it by ear will work just
fine.

Martin

> 
> Otherwise, have a single PR and assign it accordingly.
> 
> I'm not sure if that answers your question.
> 
> Dave
> 
>> Martin
>>
>>> Dave
>>>
>>> [1] https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
>>>
>>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 19:51     ` Martin Sebor
@ 2019-12-10 20:42       ` Jeff Law
  2019-12-11 20:39         ` David Malcolm
  2019-12-12 16:27         ` Martin Sebor
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff Law @ 2019-12-10 20:42 UTC (permalink / raw)
  To: Martin Sebor, David Malcolm, gcc

On Tue, 2019-12-10 at 12:51 -0700, Martin Sebor wrote:
> On 12/10/19 10:56 AM, David Malcolm wrote:
> > On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
> > > On 12/10/19 8:46 AM, David Malcolm wrote:
> > > > For the adventurous/curious, my static analyzer branch of GCC
> > > > [1]
> > > > is
> > > > now available on Compiler Explorer (aka godbolt.org) so you can
> > > > try
> > > > it
> > > > out without building it yourself.  [Thanks to Matt Godbolt,
> > > > Patrick
> > > > Quist and others at the Compiler Explorer project]
> > > > 
> > > > On https://godbolt.org/ within the C and C++ languages, select
> > > >     "x86-64 gcc (static analysis)"
> > > > as the compiler (though strictly speaking only C is in-scope
> > > > right
> > > > now).  It's configured to automatically inject -fanalyzer (just
> > > > on
> > > > this
> > > > site; it isn't the default in the branch).
> > > > 
> > > > Some precanned examples:
> > > >     * various malloc issues: https://godbolt.org/z/tnx65n
> > > >     * stdio issues: https://godbolt.org/z/4BP-Tj
> > > >     * fprintf in signal handler: https://godbolt.org/z/ew7mW6
> > > >     * tainted data affecting control flow:
> > > > https://godbolt.org/z/3v8vSj
> > > >     * password-leakage: https://godbolt.org/z/pRPYiv
> > > > (the non-malloc examples are much more in "proof-of-concept"
> > > > territory)
> > > > 
> > > > Would it make sense to add an "analyzer" component to our
> > > > bugzilla,
> > > > even though this is still on a branch? (with me as default
> > > > assignee)
> > > 
> > > Yes, that would make sense to me for bugs/enhancements specific
> > > to
> > > just the analyzer.
> > > 
> > > I think it's important to have a shared understanding how
> > > requests
> > > for new warnings (either completely new features or enhancements
> > > to
> > > existing ones) should be triaged in Bugzilla.  For instance,
> > > pr82520
> > > is a request to enhance -Wreturn-local-addr to detect other forms
> > > of
> > > escaping dangling pointers besides by returning them from
> > > functions.
> > > I think the request (and a number of others) would be a good fit
> > > for
> > > the analyzer as well, but assigning it to the analyzer component
> > > would suggest that it's suitable only for it.  How do we track
> > > requests that we'd like handled in both?
> > 
> > It feels to me that the question of BZ component is a proxy for a
> > deeper scope question about what we want to handle in each
> > warning.  If
> > it's clear that we want an issue handled by both the core code
> > *and* by
> > the analyzer, then we should have two PRs.  Perhaps discuss in the
> > PRs
> > which things we expect to be handled by which warning (e.g. some
> > cases
> > might be capable of being caught by -Wreturn-local-addr, others
> > might
> > require the deeper analysis of -fanalyzer).
> 
> You're quite right that the two questions are related, and it is
> the latter that prompted my comment above.
> 
> My concern is that I'm not sure we have a good rule of thumb to
> decide whether a given warning is better suited to just the analyzer
> or just the middle-end, or that it should be implemented in both.
> Or even a good basis for making such decisions.  We have not
> discussed
> yet if we should even worry about overlap (duplicate warnings issued
> by both the analyzer and the middle-end).  I asked a few questions to
> help get more clarity here in my comments on the initial patch but
> only
> heard from Jeff.
> 
> So here are some of my thoughts.  I'd be very interested in yours (or
> anyone else's who cares about this subject).
> 
> I can think of no warning that's currently implemented in the middle-
> end (or that could be implemented there) that would not also be
> appropriate for the analyzer.  IIUC, it has the potential for finding
> more problems but may be more noisy than a middle-end implementation,
> and take longer to analyze code.  Are there any kinds of problems
> that
> you think are not appropriate for it?  Any inherent limitations?
> 
> At the same time I also think there's room for and value in providing
> at least some support in the middle-end for pretty much every warning
> that the analyzer already exposes options for.  They may not find as
> many problems as the analyzer eventually will but they don't have as
> much overhead (either to implement, or in terms of compile time --
> the analysis often benefits optimizations as well).  In addition, by
> being tied to the optimizers the detection provides an opportunity
> to also prevent the bugs (by folding invalid code away, or inserting
> traps or branches around it, perhaps under the control of users as we
> have discussed).
> 
> In my mind the only reason to have to choose between one and other
> is the duplication of effort and resource constraints: we don't have
> enough contributors into either area.  There isn't a lot we can about
> the latter but we should make an effort to minimize the duplication
> (starting with bug triage).
> 
> But maybe I'm overanalyzing it and playing it by ear will work just
> fine.
I think one of the key components will be whether or not one needs
intra-procedural state tracking to do a reasonable job.  THe double-
free checking is a great example.  You can build one with what's in the
tree today, but the result isn't really useful except for toy examples.
 

To do a reasonable job you've got to explode the graphs and track state
intra-procedurally.  Thus it's a good fit for David's analyzer.

THere will almost certainly be cases where some additional bit of
intra-procedural data tracking will help the existing middle end
analysis/warnings.    But what's in there today works and works
reasonably well.  So yes, some additional data might make
-Wuninitialized better, but it does a pretty good job with the existing
analysis without having to track data across function calls.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 20:42       ` Jeff Law
@ 2019-12-11 20:39         ` David Malcolm
  2019-12-12 16:27         ` Martin Sebor
  1 sibling, 0 replies; 10+ messages in thread
From: David Malcolm @ 2019-12-11 20:39 UTC (permalink / raw)
  To: law, Martin Sebor, gcc

On Tue, 2019-12-10 at 13:42 -0700, Jeff Law wrote:
> On Tue, 2019-12-10 at 12:51 -0700, Martin Sebor wrote:
> > On 12/10/19 10:56 AM, David Malcolm wrote:
> > > On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
> > > > On 12/10/19 8:46 AM, David Malcolm wrote:
> > > > > For the adventurous/curious, my static analyzer branch of GCC
> > > > > [1]
> > > > > is
> > > > > now available on Compiler Explorer (aka godbolt.org) so you
> > > > > can
> > > > > try
> > > > > it
> > > > > out without building it yourself.  [Thanks to Matt Godbolt,
> > > > > Patrick
> > > > > Quist and others at the Compiler Explorer project]
> > > > > 
> > > > > On https://godbolt.org/ within the C and C++ languages,
> > > > > select
> > > > >     "x86-64 gcc (static analysis)"
> > > > > as the compiler (though strictly speaking only C is in-scope
> > > > > right
> > > > > now).  It's configured to automatically inject -fanalyzer
> > > > > (just
> > > > > on
> > > > > this
> > > > > site; it isn't the default in the branch).
> > > > > 
> > > > > Some precanned examples:
> > > > >     * various malloc issues: https://godbolt.org/z/tnx65n
> > > > >     * stdio issues: https://godbolt.org/z/4BP-Tj
> > > > >     * fprintf in signal handler: https://godbolt.org/z/ew7mW6
> > > > >     * tainted data affecting control flow:
> > > > > https://godbolt.org/z/3v8vSj
> > > > >     * password-leakage: https://godbolt.org/z/pRPYiv
> > > > > (the non-malloc examples are much more in "proof-of-concept"
> > > > > territory)
> > > > > 
> > > > > Would it make sense to add an "analyzer" component to our
> > > > > bugzilla,
> > > > > even though this is still on a branch? (with me as default
> > > > > assignee)
> > > > 
> > > > Yes, that would make sense to me for bugs/enhancements specific
> > > > to
> > > > just the analyzer.
> > > > 
> > > > I think it's important to have a shared understanding how
> > > > requests
> > > > for new warnings (either completely new features or
> > > > enhancements
> > > > to
> > > > existing ones) should be triaged in Bugzilla.  For instance,
> > > > pr82520
> > > > is a request to enhance -Wreturn-local-addr to detect other
> > > > forms
> > > > of
> > > > escaping dangling pointers besides by returning them from
> > > > functions.
> > > > I think the request (and a number of others) would be a good
> > > > fit
> > > > for
> > > > the analyzer as well, but assigning it to the analyzer
> > > > component
> > > > would suggest that it's suitable only for it.  How do we track
> > > > requests that we'd like handled in both?
> > > 
> > > It feels to me that the question of BZ component is a proxy for a
> > > deeper scope question about what we want to handle in each
> > > warning.  If
> > > it's clear that we want an issue handled by both the core code
> > > *and* by
> > > the analyzer, then we should have two PRs.  Perhaps discuss in
> > > the
> > > PRs
> > > which things we expect to be handled by which warning (e.g. some
> > > cases
> > > might be capable of being caught by -Wreturn-local-addr, others
> > > might
> > > require the deeper analysis of -fanalyzer).
> > 
> > You're quite right that the two questions are related, and it is
> > the latter that prompted my comment above.
> > 
> > My concern is that I'm not sure we have a good rule of thumb to
> > decide whether a given warning is better suited to just the
> > analyzer
> > or just the middle-end, or that it should be implemented in both.
> > Or even a good basis for making such decisions.  We have not
> > discussed
> > yet if we should even worry about overlap (duplicate warnings
> > issued
> > by both the analyzer and the middle-end).  I asked a few questions
> > to
> > help get more clarity here in my comments on the initial patch but
> > only
> > heard from Jeff.
> > 
> > So here are some of my thoughts.  I'd be very interested in yours
> > (or
> > anyone else's who cares about this subject).
> > 
> > I can think of no warning that's currently implemented in the
> > middle-
> > end (or that could be implemented there) that would not also be
> > appropriate for the analyzer.  IIUC, it has the potential for
> > finding
> > more problems but may be more noisy than a middle-end
> > implementation,
> > and take longer to analyze code.  Are there any kinds of problems
> > that
> > you think are not appropriate for it?  Any inherent limitations?
> > 
> > At the same time I also think there's room for and value in
> > providing
> > at least some support in the middle-end for pretty much every
> > warning
> > that the analyzer already exposes options for.  They may not find
> > as
> > many problems as the analyzer eventually will but they don't have
> > as
> > much overhead (either to implement, or in terms of compile time --
> > the analysis often benefits optimizations as well).  In addition,
> > by
> > being tied to the optimizers the detection provides an opportunity
> > to also prevent the bugs (by folding invalid code away, or
> > inserting
> > traps or branches around it, perhaps under the control of users as
> > we
> > have discussed).
> > 
> > In my mind the only reason to have to choose between one and other
> > is the duplication of effort and resource constraints: we don't
> > have
> > enough contributors into either area.  There isn't a lot we can
> > about
> > the latter but we should make an effort to minimize the duplication
> > (starting with bug triage).
> > 
> > But maybe I'm overanalyzing it and playing it by ear will work just
> > fine.
> I think one of the key components will be whether or not one needs
> intra-procedural state tracking to do a reasonable job.  THe double-
> free checking is a great example.  You can build one with what's in
> the
> tree today, but the result isn't really useful except for toy
> examples.

Did you mean "interprocedural" here rather than "intraprocedural"?  If
so, I think I agree.

[I dislike this terminology and wish we had something better; our
jargon earns +1 point for showing off that we know a little Latin, but
-100 points for making communication more difficult (end-rant)]


> To do a reasonable job you've got to explode the graphs and track
> state
> intra-procedurally.  Thus it's a good fit for David's analyzer.

Again: s/intra/inter/ ?

> THere will almost certainly be cases where some additional bit of
> intra-procedural data tracking will help the existing middle end
> analysis/warnings.

Not sure whether you meant intra or inter here; sorry.

>     But what's in there today works and works
> reasonably well.  So yes, some additional data might make
> -Wuninitialized better, but it does a pretty good job with the
> existing
> analysis without having to track data across function calls

Hope I'm following your meaning here.  I think I agree with you, modulo
terminological snafu.

Dave

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GCC static analysis branch now available on Compiler Explorer
  2019-12-10 20:42       ` Jeff Law
  2019-12-11 20:39         ` David Malcolm
@ 2019-12-12 16:27         ` Martin Sebor
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Sebor @ 2019-12-12 16:27 UTC (permalink / raw)
  To: law, David Malcolm, gcc

On 12/10/19 1:42 PM, Jeff Law wrote:
> On Tue, 2019-12-10 at 12:51 -0700, Martin Sebor wrote:
>> On 12/10/19 10:56 AM, David Malcolm wrote:
>>> On Tue, 2019-12-10 at 09:36 -0700, Martin Sebor wrote:
>>>> On 12/10/19 8:46 AM, David Malcolm wrote:
>>>>> For the adventurous/curious, my static analyzer branch of GCC
>>>>> [1]
>>>>> is
>>>>> now available on Compiler Explorer (aka godbolt.org) so you can
>>>>> try
>>>>> it
>>>>> out without building it yourself.  [Thanks to Matt Godbolt,
>>>>> Patrick
>>>>> Quist and others at the Compiler Explorer project]
>>>>>
>>>>> On https://godbolt.org/ within the C and C++ languages, select
>>>>>      "x86-64 gcc (static analysis)"
>>>>> as the compiler (though strictly speaking only C is in-scope
>>>>> right
>>>>> now).  It's configured to automatically inject -fanalyzer (just
>>>>> on
>>>>> this
>>>>> site; it isn't the default in the branch).
>>>>>
>>>>> Some precanned examples:
>>>>>      * various malloc issues: https://godbolt.org/z/tnx65n
>>>>>      * stdio issues: https://godbolt.org/z/4BP-Tj
>>>>>      * fprintf in signal handler: https://godbolt.org/z/ew7mW6
>>>>>      * tainted data affecting control flow:
>>>>> https://godbolt.org/z/3v8vSj
>>>>>      * password-leakage: https://godbolt.org/z/pRPYiv
>>>>> (the non-malloc examples are much more in "proof-of-concept"
>>>>> territory)
>>>>>
>>>>> Would it make sense to add an "analyzer" component to our
>>>>> bugzilla,
>>>>> even though this is still on a branch? (with me as default
>>>>> assignee)
>>>>
>>>> Yes, that would make sense to me for bugs/enhancements specific
>>>> to
>>>> just the analyzer.
>>>>
>>>> I think it's important to have a shared understanding how
>>>> requests
>>>> for new warnings (either completely new features or enhancements
>>>> to
>>>> existing ones) should be triaged in Bugzilla.  For instance,
>>>> pr82520
>>>> is a request to enhance -Wreturn-local-addr to detect other forms
>>>> of
>>>> escaping dangling pointers besides by returning them from
>>>> functions.
>>>> I think the request (and a number of others) would be a good fit
>>>> for
>>>> the analyzer as well, but assigning it to the analyzer component
>>>> would suggest that it's suitable only for it.  How do we track
>>>> requests that we'd like handled in both?
>>>
>>> It feels to me that the question of BZ component is a proxy for a
>>> deeper scope question about what we want to handle in each
>>> warning.  If
>>> it's clear that we want an issue handled by both the core code
>>> *and* by
>>> the analyzer, then we should have two PRs.  Perhaps discuss in the
>>> PRs
>>> which things we expect to be handled by which warning (e.g. some
>>> cases
>>> might be capable of being caught by -Wreturn-local-addr, others
>>> might
>>> require the deeper analysis of -fanalyzer).
>>
>> You're quite right that the two questions are related, and it is
>> the latter that prompted my comment above.
>>
>> My concern is that I'm not sure we have a good rule of thumb to
>> decide whether a given warning is better suited to just the analyzer
>> or just the middle-end, or that it should be implemented in both.
>> Or even a good basis for making such decisions.  We have not
>> discussed
>> yet if we should even worry about overlap (duplicate warnings issued
>> by both the analyzer and the middle-end).  I asked a few questions to
>> help get more clarity here in my comments on the initial patch but
>> only
>> heard from Jeff.
>>
>> So here are some of my thoughts.  I'd be very interested in yours (or
>> anyone else's who cares about this subject).
>>
>> I can think of no warning that's currently implemented in the middle-
>> end (or that could be implemented there) that would not also be
>> appropriate for the analyzer.  IIUC, it has the potential for finding
>> more problems but may be more noisy than a middle-end implementation,
>> and take longer to analyze code.  Are there any kinds of problems
>> that
>> you think are not appropriate for it?  Any inherent limitations?
>>
>> At the same time I also think there's room for and value in providing
>> at least some support in the middle-end for pretty much every warning
>> that the analyzer already exposes options for.  They may not find as
>> many problems as the analyzer eventually will but they don't have as
>> much overhead (either to implement, or in terms of compile time --
>> the analysis often benefits optimizations as well).  In addition, by
>> being tied to the optimizers the detection provides an opportunity
>> to also prevent the bugs (by folding invalid code away, or inserting
>> traps or branches around it, perhaps under the control of users as we
>> have discussed).
>>
>> In my mind the only reason to have to choose between one and other
>> is the duplication of effort and resource constraints: we don't have
>> enough contributors into either area.  There isn't a lot we can about
>> the latter but we should make an effort to minimize the duplication
>> (starting with bug triage).
>>
>> But maybe I'm overanalyzing it and playing it by ear will work just
>> fine.
> I think one of the key components will be whether or not one needs
> intra-procedural state tracking to do a reasonable job.  THe double-
> free checking is a great example.  You can build one with what's in the
> tree today, but the result isn't really useful except for toy examples.
>   
> 
> To do a reasonable job you've got to explode the graphs and track state
> intra-procedurally.  Thus it's a good fit for David's analyzer.

To do a thorough job, sure.  But detecting bugs without inlining
everything is also valuable and (as you note below) the middle-end
does a reasonable job of it despite the limited interprocedural
analysis.  Code that allocates and deallocates memory in the same
function (or in functions inlined into it) isn't uncommon; it's
even the basis of some optimizations.  So I think having a simple
double-free checker in the middle-end would be beneficial as well.

> THere will almost certainly be cases where some additional bit of
> intra-procedural data tracking will help the existing middle end
> analysis/warnings.    But what's in there today works and works
> reasonably well.  So yes, some additional data might make
> -Wuninitialized better, but it does a pretty good job with the existing
> analysis without having to track data across function calls.

Double-free bugs are a small subset of the more general class of
using objects outside their lifetime: because it either hasn't
started yet (i.e., they're not initialized) or it has ended (e.g.,
after a call to free).  Below is a double-free example caused by
an uninitialized read.

As another example, replace the call to free below with strcpy.
It's far more readily exploitable than the double-free but
a checker that looks just for double free won't find it.
Regrettably, -Wmaybe-uninitialized doesn't diagnose either
of these bugs but that can be fixed.  (The analyzer preview
on Godolt doesn't either, and that too I'm sure can be fixed.)

My point here is that exposing the uninitialized detection logic
to the same deep analysis as the analyzer employs for double-free
would help find not just all the double-free bugs but many others
as well.

   __attribute__ ((noipa)) void f (void *p) { }

   __attribute__ ((noipa)) void g (int i, int j)
   {
     struct S { void *p; } s;
     if (i)
       s.p = __builtin_malloc (32);

     f (&s);

     if (j)
       __builtin_free (s.p);
   }

   int main (void)
   {
     g (1, 1);
     g (0, 1);
   }

I view the capabilities of the middle-end as complementary to
the analyzer, but I don't see either as ideally suited to dealing
with any particular kinds of bugs.  They each have different goals,
strengths, and limitations  By not being subject to the same
constraints as the inliner, the analyzer can do a better job
detecting bugs in complex code across function boundaries.
The middle-end, on the other hand, can use its more localized
analysis not only to detect potential bugs but also to prevent
them from taking effect at runtime, or to emit more efficient
object code.

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-12-12 16:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-10 15:47 GCC static analysis branch now available on Compiler Explorer David Malcolm
2019-12-10 16:04 ` Marek Polacek
2019-12-10 17:11   ` David Malcolm
2019-12-10 17:20     ` David Malcolm
2019-12-10 16:36 ` Martin Sebor
2019-12-10 17:57   ` David Malcolm
2019-12-10 19:51     ` Martin Sebor
2019-12-10 20:42       ` Jeff Law
2019-12-11 20:39         ` David Malcolm
2019-12-12 16:27         ` Martin Sebor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).