public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds
@ 2023-04-05 17:50 Benjamin Priour
  2023-04-06  0:32 ` David Malcolm
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Priour @ 2023-04-05 17:50 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 1971 bytes --]

Hi David,

I used the below code snippet to experiment with out-of-bounds (OOB) on
trunk. Three things occurred that I believe could see some improvement. See
https://godbolt.org/z/57n459cEs for the warnings.

int consecutive_oob_in_frame ()
{
    int arr[] = {1,2,3,4,5,6,7};
    int y1 = arr[9]; // only  this one get warnings (3*2 actually), expect
only 1 OOB though
    int y2 = arr[10]; // expect a warning too, despite fooling with asm
    int y3 = arr[50]; // expect a warning for that one too (see asm)
    return (y1+y2+y3);
}

int goo () {
    int x = consecutive_oob_in_frame (); // causes another pair of warnings
    return 2 * x;
}

int main () {
    goo (); // causes another pair of warning
    consecutive_oob_in_frame (); // silent
    int x [] = {1,2};
    x[5]; /* silent, probably because another set of OOB warnings
    has already been issued with this frame being the source */
    return 0;
}


First, as the subject line reads, I get a
-Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-bounds. I
feel it might be too much, as fixing the OOB would fix the former.
So maybe only OOB could result in a warning here ?

Second, it seems that if a frame was a cause for a OOB (either by
containing the spurious code or by being a caller to such code), it will
only emit one set of warning, rather than at each unique compromising
statements.

Finally, I think the diagnostic path should only go at deep as the
declaration of the injurious index.

What do you think ? If they also make sense to you I will open a RFE and
try my hands at fixing them.

Also, have you considered extending the current call summaries model
(symbolic execution from what you told me), to implement a partial VASCO
model ? There would still be no need for distributive problems.
Maybe a start could be that functions without parameters working on
non-mutable global data should not generate EN more than once, rather than
at each call sites.

Best,
Benjamin.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds
  2023-04-05 17:50 -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds Benjamin Priour
@ 2023-04-06  0:32 ` David Malcolm
  2023-04-06 11:02   ` Benjamin Priour
  0 siblings, 1 reply; 4+ messages in thread
From: David Malcolm @ 2023-04-06  0:32 UTC (permalink / raw)
  To: Benjamin Priour; +Cc: gcc

On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> Hi David,
> 
> I used the below code snippet to experiment with out-of-bounds (OOB)
> on
> trunk. Three things occurred that I believe could see some
> improvement. See
> https://godbolt.org/z/57n459cEs for the warnings.
> 
> int consecutive_oob_in_frame ()
> {
>     int arr[] = {1,2,3,4,5,6,7};
>     int y1 = arr[9]; // only  this one get warnings (3*2 actually),
> expect
> only 1 OOB though
>     int y2 = arr[10]; // expect a warning too, despite fooling with
> asm
>     int y3 = arr[50]; // expect a warning for that one too (see asm)
>     return (y1+y2+y3);
> }
> 
> int goo () {
>     int x = consecutive_oob_in_frame (); // causes another pair of
> warnings
>     return 2 * x;
> }
> 
> int main () {
>     goo (); // causes another pair of warning
>     consecutive_oob_in_frame (); // silent
>     int x [] = {1,2};
>     x[5]; /* silent, probably because another set of OOB warnings
>     has already been issued with this frame being the source */
>     return 0;
> }

There's quite a bit of duplication here.  My recollection is that
there's code in the analyzer that's meant to be eliminating some of
this e.g. we want to show the OOB when consecutive_oob_in_frame is
called directly; we *don't* want to show it when
consecutive_oob_in_frame is called by goo.  Perhaps this deduplication
code isn't working?  Can you reproduce similar behavior with C, or is
it specific to C++?


> 
> First, as the subject line reads, I get a
> -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> bounds. I
> feel it might be too much, as fixing the OOB would fix the former.
> So maybe only OOB could result in a warning here ?

Yes, that's a good point.  Can you file a bug about this in bugzilla
please?  (and feel free to assign it to yourself if you want to have a
go at fixing it)

Maybe we could fix this by having region_model::check_region_bounds
return a bool that signifies if the access is valid, and propagating
that value up through callers so that we can return a non-
poisoned_svalue at the point where we'd normally return an
"uninitialized" poisoned_svalue.

Alternatively, we could simply terminate any analysis path in which an
OOB access is detected (by implementing the
pending_diagnostic::terminate_path_p virtual function for class
out_of_bounds).

> 
> Second, it seems that if a frame was a cause for a OOB (either by
> containing the spurious code or by being a caller to such code), it
> will
> only emit one set of warning, rather than at each unique compromising
> statements.

Maybe.  There's a pending_diagnostic::supercedes_p virtual function
that perhaps we could implement for out_of_bounds (or its subclasses).

> 
> Finally, I think the diagnostic path should only go at deep as the
> declaration of the injurious index.

I'm not quite sure what you mean by this, sorry.

> 
> What do you think ? If they also make sense to you I will open a RFE
> and
> try my hands at fixing them.

Please do.

> 
> Also, have you considered extending the current call summaries model
> (symbolic execution from what you told me), to implement a partial
> VASCO
> model ? There would still be no need for distributive problems.
> Maybe a start could be that functions without parameters working on
> non-mutable global data should not generate EN more than once, rather
> than
> at each call sites.

Please forgive my ignorance; do you have a link to a paper describing
this?


Thanks
Dave


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds
  2023-04-06  0:32 ` David Malcolm
@ 2023-04-06 11:02   ` Benjamin Priour
  2023-04-06 21:59     ` David Malcolm
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Priour @ 2023-04-06 11:02 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 5904 bytes --]

Hi David,
I haven't yet looked into your suggestions, probably won't have time until
tomorrow actually :/
Still, here are some updates

On Thu, Apr 6, 2023 at 2:32 AM David Malcolm <dmalcolm@redhat.com> wrote:

> On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> > Hi David,
> >
> > I used the below code snippet to experiment with out-of-bounds (OOB)
> > on
> > trunk. Three things occurred that I believe could see some
> > improvement. See
> > https://godbolt.org/z/57n459cEs for the warnings.
> >
> > int consecutive_oob_in_frame ()
> > {
> >     int arr[] = {1,2,3,4,5,6,7};
> >     int y1 = arr[9]; // only  this one get warnings (3*2 actually),
> > expect
> > only 1 OOB though
> >     int y2 = arr[10]; // expect a warning too, despite fooling with
> > asm
> >     int y3 = arr[50]; // expect a warning for that one too (see asm)
> >     return (y1+y2+y3);
> > }
> >
> > int goo () {
> >     int x = consecutive_oob_in_frame (); // causes another pair of
> > warnings
> >     return 2 * x;
> > }
> >
> > int main () {
> >     goo (); // causes another pair of warning
> >     consecutive_oob_in_frame (); // silent
> >     int x [] = {1,2};
> >     x[5]; /* silent, probably because another set of OOB warnings
> >     has already been issued with this frame being the source */
> >     return 0;
> > }
>
>

> There's quite a bit of duplication here.  My recollection is that
> there's code in the analyzer that's meant to be eliminating some of
> this e.g. we want to show the OOB when consecutive_oob_in_frame is
> called directly; we *don't* want to show it when
> consecutive_oob_in_frame is called by goo.  Perhaps this deduplication
> code isn't working?  Can you reproduce similar behavior with C, or is
> it specific to C++?
>
>
Identical behavior both in C and C++. I will look at this code, any hint at
where it starts ?
Otherwise I would find it the good old way.


>
> >
> > First, as the subject line reads, I get a
> > -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> > bounds. I
> > feel it might be too much, as fixing the OOB would fix the former.
> > So maybe only OOB could result in a warning here ?
>
> Yes, that's a good point.  Can you file a bug about this in bugzilla
> please?  (and feel free to assign it to yourself if you want to have a
> go at fixing it)
>

Unfortunately the Assignee field is grayed out for me in both enter_bug.cgi
and show_bug.cgi.
I've also created a new tracker bug for out-of-bounds, as there is a number
of related bugs.


>
> Maybe we could fix this by having region_model::check_region_bounds
> return a bool that signifies if the access is valid, and propagating
> that value up through callers so that we can return a non-
> poisoned_svalue at the point where we'd normally return an
> "uninitialized" poisoned_svalue.
>
> Alternatively, we could simply terminate any analysis path in which an
> OOB access is detected (by implementing the
> pending_diagnostic::terminate_path_p virtual function for class
> out_of_bounds).
>

I'm adding your suggestions as comment to the filed bugs so as to not
forget them.


>
> >
> > Second, it seems that if a frame was a cause for a OOB (either by
> > containing the spurious code or by being a caller to such code), it
> > will
> > only emit one set of warning, rather than at each unique compromising
> > statements.
>
> Maybe.  There's a pending_diagnostic::supercedes_p virtual function
> that perhaps we could implement for out_of_bounds (or its subclasses).
>
>

> >
> > Finally, I think the diagnostic path should only go at deep as the
> > declaration of the injurious index.
>
> I'm not quite sure what you mean by this, sorry.
>
>
Indeed not the best explanation so far. I was actually sort of suggesting
to only emit OOB only on direct call sites,
you did too, so in a way you have answered me on this.

Just an addition though: if there is an OOB independent of its enclosing
function's parameters, I think
it might make sense to not emit for this particular OOB outside the
function definition itself.
Meaning that no OOB should be emitted on call sites to this function for
this particular array access.
(Typically, consecutive_oob_in_frame () shouldn't have resulted in more
than one warning, since the OOB within is independent of its parameters).

I believe right now the expected behavior is to issue warnings only on
actual function calls, so that a function never called
won't result in warnings. As a result, the initial analysis of each
functions should never results in warnings -actually the case for
malloc-leak,
not for OOB though-.
Thus we would need to tweak this into actually diagnosing the issues on
initial analysis -those that can be at least-, so that they are saved for a
later
use whenever the function is actually called. Then we would emit them once,
and only once, because by nature these diagnostics are parameters
independent.
I hope I made it clearer, not more convoluted.

>
> > Also, have you considered extending the current call summaries model
> > (symbolic execution from what you told me), to implement a partial
> > VASCO
> > model ? There would still be no need for distributive problems.
> > Maybe a start could be that functions without parameters working on
> > non-mutable global data should not generate EN more than once, rather
> > than
> > at each call sites.
>
> Please forgive my ignorance; do you have a link to a paper describing
> this?
>
>
> Thanks
> Dave
>
>
See https://dl.acm.org/doi/pdf/10.1145/2487568.2487569 and I would also
recommend watching
https://www.youtube.com/watch?v=_RI7sTxX2_o&list=PLamk8lFsMyPXrUIQm5naAQ08aK2ctv6gE&index=29
along or before the read,
it genuinely made it way more digest for me :-)

Best,
Benjamin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds
  2023-04-06 11:02   ` Benjamin Priour
@ 2023-04-06 21:59     ` David Malcolm
  0 siblings, 0 replies; 4+ messages in thread
From: David Malcolm @ 2023-04-06 21:59 UTC (permalink / raw)
  To: Benjamin Priour; +Cc: gcc

On Thu, 2023-04-06 at 13:02 +0200, Benjamin Priour wrote:
> Hi David,
> I haven't yet looked into your suggestions, probably won't have time
> until
> tomorrow actually :/
> Still, here are some updates
> 
> On Thu, Apr 6, 2023 at 2:32 AM David Malcolm <dmalcolm@redhat.com>
> wrote:
> 
> > On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> > > Hi David,
> 

[...snip...]

> > There's quite a bit of duplication here.  My recollection is that
> > there's code in the analyzer that's meant to be eliminating some of
> > this e.g. we want to show the OOB when consecutive_oob_in_frame is
> > called directly; we *don't* want to show it when
> > consecutive_oob_in_frame is called by goo.  Perhaps this
> > deduplication
> > code isn't working?  Can you reproduce similar behavior with C, or
> > is
> > it specific to C++?
> > 
> > 
> Identical behavior both in C and C++. I will look at this code, any
> hint at
> where it starts ?
> Otherwise I would find it the good old way.

It's in diagnostic-manager.cc:
diagnostic_manager::emit_saved_diagnostics uses class dedupe_winners to
try to deduplicate the various saved_diagnostic instances, using
dedupe_key::operator== (and also its hash function).  One of the things
dedupe_key::operator== uses is saved_diagnostic::operator==, which does
the bulk of the work to decide "are these two really the 'same'
diagnotic?"


> > > 
> > > First, as the subject line reads, I get a
> > > -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> > > bounds. I
> > > feel it might be too much, as fixing the OOB would fix the
> > > former.
> > > So maybe only OOB could result in a warning here ?
> > 
> > Yes, that's a good point.  Can you file a bug about this in
> > bugzilla
> > please?  (and feel free to assign it to yourself if you want to
> > have a
> > go at fixing it)
> > 
> 
> Unfortunately the Assignee field is grayed out for me in both
> enter_bug.cgi
> and show_bug.cgi.
> I've also created a new tracker bug for out-of-bounds, as there is a
> number
> of related bugs.

I think you get more access rights in our bugzilla if you log in with a
gcc.gnu.org email address.  You can request one here:
  https://sourceware.org/cgi-bin/pdw/ps_form.cgi
and cite me as approving the request.


> > 
> > Maybe we could fix this by having region_model::check_region_bounds
> > return a bool that signifies if the access is valid, and
> > propagating
> > that value up through callers so that we can return a non-
> > poisoned_svalue at the point where we'd normally return an
> > "uninitialized" poisoned_svalue.
> > 
> > Alternatively, we could simply terminate any analysis path in which
> > an
> > OOB access is detected (by implementing the
> > pending_diagnostic::terminate_path_p virtual function for class
> > out_of_bounds).
> > 
> 
> I'm adding your suggestions as comment to the filed bugs so as to not
> forget them.

Thanks.


> > > Second, it seems that if a frame was a cause for a OOB (either by
> > > containing the spurious code or by being a caller to such code),
> > > it
> > > will
> > > only emit one set of warning, rather than at each unique
> > > compromising
> > > statements.
> > 
> > Maybe.  There's a pending_diagnostic::supercedes_p virtual function
> > that perhaps we could implement for out_of_bounds (or its
> > subclasses).
> > 
> > 
> 
> > > 
> > > Finally, I think the diagnostic path should only go at deep as
> > > the
> > > declaration of the injurious index.
> > 
> > I'm not quite sure what you mean by this, sorry.
> > 
> > 
> Indeed not the best explanation so far. I was actually sort of
> suggesting
> to only emit OOB only on direct call sites,
> you did too, so in a way you have answered me on this.
> 
> Just an addition though: if there is an OOB independent of its
> enclosing
> function's parameters, I think
> it might make sense to not emit for this particular OOB outside the
> function definition itself.
> Meaning that no OOB should be emitted on call sites to this function
> for
> this particular array access.
> (Typically, consecutive_oob_in_frame () shouldn't have resulted in
> more
> than one warning, since the OOB within is independent of its
> parameters).

Yes; I think we're in agreement here.

> 
> I believe right now the expected behavior is to issue warnings only
> on
> actual function calls, so that a function never called
> won't result in warnings. 

IIRC that was the case in GCC 10, but I changed it in
8a2750086d57d1a2251d9239fa4e6c2dc9ec3a86.

> As a result, the initial analysis of each
> functions should never results in warnings -actually the case for
> malloc-leak,
> not for OOB though-.
> Thus we would need to tweak this into actually diagnosing the issues
> on
> initial analysis -those that can be at least-, so that they are saved
> for a
> later
> use whenever the function is actually called. Then we would emit them
> once,
> and only once, because by nature these diagnostics are parameters
> independent.
> I hope I made it clearer, not more convoluted.
> 
> > 
> > > Also, have you considered extending the current call summaries
> > > model
> > > (symbolic execution from what you told me), to implement a
> > > partial
> > > VASCO
> > > model ? There would still be no need for distributive problems.
> > > Maybe a start could be that functions without parameters working
> > > on
> > > non-mutable global data should not generate EN more than once,
> > > rather
> > > than
> > > at each call sites.
> > 
> > Please forgive my ignorance; do you have a link to a paper
> > describing
> > this?
> > 
> > 
> > Thanks
> > Dave
> > 
> > 
> See https://dl.acm.org/doi/pdf/10.1145/2487568.2487569 and I would
> also
> recommend watching
> https://www.youtube.com/watch?v=_RI7sTxX2_o&list=PLamk8lFsMyPXrUIQm5naAQ08aK2ctv6gE&index=29
> along or before the read,
> it genuinely made it way more digest for me :-)

Thanks
Dave


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-06 21:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-05 17:50 -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds Benjamin Priour
2023-04-06  0:32 ` David Malcolm
2023-04-06 11:02   ` Benjamin Priour
2023-04-06 21:59     ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).