From: Tim Lange <mail@tim-lange.me>
To: David Malcolm <dmalcolm@redhat.com>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: fanalyzer: debugging zero state machine
Date: Sun, 12 Jun 2022 20:27:02 +0200 [thread overview]
Message-ID: <29NDDR.H0A8YV139ST52@tim-lange.me> (raw)
In-Reply-To: <6d52e46e0ee72a9185bc9cd969e7f1553086585e.camel@redhat.com>
On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm
<dmalcolm@redhat.com> wrote:
> On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:
>>
>> > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm
>> <dmalcolm@redhat.com> wrote:
>> > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
>> > >
>> > > Hi Dave,
>
> Hi Tim; various responses inline below...
>
>> > >
>> > > I did spent some time to think about the zero state machine. I
>> first
>> > > thought about distinguishing between "assigned zero" and "EQ 0
>> > > condition on the path" for cases where e.g. unreachable() is
>> used
>> to
>> > > say that some variable will never be zero according to the
>> programmer.
>> > > In that case the dev might not want zero warnings from
>> conditions
>> > > outside the if body itself for dev build where unreachable
>> expands
>> to
>> > > some panic exit. But as the condition constraint is not pruned
>> when the
>> > > state machine is distinguishing the states, I'm not sure how to
>> know
>> > > whether the analysis already left the if body?
>> >
>> > The analyzer works on the gimple-ssa representation, which uses
>> basic
>> > blocks in a CFG, rather than an AST, so the only remants we have
>> of
>> > scope is in "clobber" statements (a special kind of assignment
>> stmt),
>> > which the gimplify adds as variables go out of scope.
>> If the constraints only lived until the immediate dominator of `if
>> (cond)`, I could easily distinguish:
>> 1. if (x == 0) && still inside the if => zero
>> 2. if (x == 0) && outside if => maybe zero
>> but as this seems to be not the case, if I want to distinguish 1. &
>> 2.,
>> I'd have to find another way.
>> >
>> > For pruning, the analyzer's state_machine class has a
>> "can_purge_p"
>> > virtual function:
>> >
>> > /* Return true if it safe to discard the given state (to help
>> > when simplifying state objects).
>> > States that need leak detection should return false. */
>> > virtual bool can_purge_p (state_t s) const = 0;
>> >
>> > which should return true for a "zeroness" state machine, in that
>> we
>> > always consider pruning states for svalues that aren't needed
>> anymore
>> > along a path.
>> Is implemented and returns true.
>> >
>> > Is there some other kind of state explosion you're running into?
>> It's
>> > hard to figure this out further without seeing code.
>> No, my code is by far not that mature to be tested. I just had in my
>> head that I wanted to find out if I can distinguish the two cases.
>> >
>> >
>> > > Also, while trying out different things, it seems simple
>> assignments on
>> > > phi like here
>> > > int x;
>> > > if (argc == 1) {
>> > > x = 1; // x_5
>> > > } else {
>> > > x = 0; // x_4
>> > > }
>> > > printf("%i", 5 / x); // x_2
>> > > automatically work such that x_2 already inherits the state
>> from
>> > > x_4/x_5 without me doing anything inside my sm's on_phi
>> function.
>> Same
>> > > for the simple y = 0; x = y; case. Where does this happen
>> inside
>> the
>> > > code?
>> >
>> > With the caveat that I'm seeing your code, what's probably
>> happening
>> is
>> > that we have either:
>> >
>> > BB (a):
>> > x_5 = 1;
>> > goto BB (c);
>> >
>> > BB (b):
>> > x_4 = 0;
>> > goto BB (c);
>> >
>> > BB (c):
>> > x_2 = PHI (x_5 from (a), x_4 from (b));
>> I compiled it with -g, so this one is like the dumped gimple.
>> >
>> > or (at higher optimization levels):
>> >
>> > BB (a):
>> > goto BB (c);
>> >
>> > BB (b):
>> > goto BB (c);
>> >
>> > BB (c):
>> > x_2 = PHI (1 from (a), 0 from (b));
>> >
>> > and I think that at the phi node we have
>> region_model::handle_phi,
>> > which is setting x_2 to either the constant 1 or the constant 0
>> in
>> the
>> > store, and is calling the on_phi vfunc, leading to on_phi being
>> called
>> > for all state machines.
>> Thanks, that is the case. The set_value inside handle_phi seems to
>> this
>> for me.
>> >
>> > BTW, are you implementing an override for this vfunc:
>> > virtual state_machine::state_t get_default_state (const svalue *)
>> > const;
>> >
>> > to capture the inherently known zeroness/nonzeroness of
>> constant_svalue
>> > instances? That would make those constants have that state.
>> Yeah, I saw that on your nullness check. I tried it on a small
>> example
>> with and without, but didn't noticed a difference in warnings
>> (except
>> for not having zero(x_4) inside the supergraph.dot). So if I
>> understood
>> this right, this is just to have one state less for that
>> variable/value[0]?
>
> The states are stored in sm_state_map using a mapping from svalue to
> state.
>
> Given e.g. a parameter, this will be "initial_svalue (parm)", but in
> the above case where x_4 either has value 1 or has value 0, it's not
> storing a state for x_4; it's storing a state for 0 or a state for 1.
> So without implementing the get_default_state vfunc, the sm_state_maps
> will gradually aquire explicit "this is zero" or "this is nonzero" for
> all of the constants that get used, leading to an explosion of states
> for all the different combinations of constants that have been
> encountered along an execution path, which is probably not going to be
> useful. If you do implement the get_default_state vfunc for
> constants,
> then the constants will implicitly have the zero vs nonzero states,
> and
> this information won't get stored in the sm_state_map instances.
>
> program_state::can_merge_with_p compares the sm_state_map instances
> between the two program_state instances - and so if the
> get_default_state vfunc is implemented for constants, the peer
> sm_state_maps for the "zero" state machine will both be empty, and
> thus
> equal, and thus it will merge state.
>
> So it might make sense to revamp this sm_state_map comparison so that
> we consider implicit states in sm_state_maps when merging the store,
> so
> that we see that "1" and "0" have different sm-states for the "zero"
> state machine, and thus we shouldn't merge states for these. See
> binding_cluster::can_merge_p, which calls svalue::can_merge_p. It
> might make sense to have svalue::can_merge_p "know" about
> sm-state-maps
> for this.
>
> svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that we
> can reject merging values that explicitly have different state-machine
> states, but that code could be reworked so that it considers whether a
> pair of svalues can be merged, using implicit state-machine states
> (and
> thus have it reject the merger of a zero with a nonzero constant).
>
> Or, whilst prototyping, you could simply hack in a rejection of
> merging
> zero vs non-zero integer values into svalue::can_merge_p.
>
>>
>> If that is right, is it also favorable to "merge" the stop state and
>> non_zero state inside the zero state machine because - for now -
>> there
>> is no warning planned on non-zero values?
>>
>> [0] less states are favorable because then the analyzer maybe has
>> less
>> different enodes to visit and thus less runtime(?)
>> >
>> > Thanks
>> > Dave
>> >
>> > > - Tim
>>
>> Also another question unrelated to the ones before. I do have a
>> weird
>> bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for
>> example, the probably simplest case:
>> int x = 0;h
>> printf("%d", 42 / x);
>> If I use inform to emit a notice for my state machine result, it
>> seems
>> to be correct and I do get following traversal order (by printing
>> the
>> gimple stmt inside on_stmt):
>> x_2 = 0;
>> _1 = 42 % x_2;
>> # .MEM_4 = VDEF <.MEM_3(D)>
>> printf ("%d", _1);
>> _5 = 0;
>> <L0>:
>> # VUSE <.MEM_4>
>> return _5;
>> But if i use my zero_diagnostics and sm_ctxt->warn to emit the
>> warning,
>> I get an unexpected traversal order of:
>> x_2 = 0;
>> _1 = 42 / x_2;
>> # .MEM_4 = VDEF <.MEM_3(D)>
>> printf ("%d", _1);
>> _5 = 0;
>> <L0>:
>> # VUSE <.MEM_4>
>> return _5;
>> x_2 = 0; <-- why do I get these stmts again but they
>> are not as
>> duplicated in the e-supergraph?
>> _1 = 42 / x_2;
>> _5 = 0;
>> _1 = 42 / x_2;
>> _5 = 0;
>
> What exactly is your test source file?
It is just the two lines wrapped into main():
#include <stdio.h>
int main(int argc, char **argv) {
int x = 0;
printf("%d", 42 / x);
return 0;
}
>
> Note that a function being called by another function will get
> explored
> in the exploded_graph twice: once being analyzed "standalone", and
> again being analyzed as called from the other function.
>
> When you say "I get these stmts again", do you mean the state machine
> code is seeing them again, or the region model code?
In the state_machine::on_stmt function.
>
> sm_ctxt->warn (new foo)
>
> will save an instance of a pending_diagnostic into the
> diagnotic_manager, associating it with a particular exploded_node (and
> some other info).
>
> After the exploded_graph has been built, the diagnostic_manager does
> some deduplication work, and then tries to find a feasible path to
> each
> pending_diagnostic's exploded node.
>
> This feasibility analysis reruns much of the region_model code that
> was
> run when building the exploded graph. So that's another way you can
> "see" stmts repeatedly.
Okay, that probably explains it. I had two distinct problems where I
thought they might be both related to seeing the statements again:
| 13 | int x = 0;
| | ^
| | |
| | (1) set to zero here
| | (3) set to zero here
| 14 | printf("%d", 42 / x);
| | ~~~~~~~~~~~~~~~~~~~~
| | |
| | (4) division by zero
| 15 |
| 16 | return 0;
| | ~
| | |
| | (2) set to zero here
| | (5) set to zero here
* In line 13, there are two state changes while I'd expect only one.
=> I re-looked at the other state machines and they transfer to the
stop state after the warning. If I do the same, my problem in line 13
is fixed.
* The other problem in line 16 is that return 0 is not relevant toward
the division by zero.
=> After looking at the diagnostic code, I think those should be
filtered out but aren't because x (in gimple x_2) and return 0 (in
gimple _5) share the (int)0 state. Inside
diagnostic_manager::prune_for_sm_diagnostic there is a check that the
state_change->m_sval is equal to the sval of the saved_diagnostic,
which is not enough to filter unrelated events in the zero state
machine case.
I tried to filter the event by comparing the state of interest but that
doesn't work if the unrelated event is after the related event on the
path.
Similarly, I tried to add a new field m_var to state_change_event and
provide this as a comparison but in most functions that call the
constructor of state_change_event, the var is not available.
Is there is any easy way to filter these unrelated events without
considerable changes [1]?
By the way, I can trigger the same behavior on the malloc null_deref
with gcc version 12.1.1 shipped in Fedora 36. Assume this simple
program:
#include <stddef.h>
int main (void)
{
int *p = NULL;
*p = 42;
int *q = NULL;
return 0;
}
results in the following warning:
/home/tim/Projects/simple_c/main.c: In function ‘main’:
/home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL
‘p’ [CWE-476] [-Wanalyzer-null-dereference]
12 | *p = 42;
| ~~~^~~~
‘main’: events 1-4
|
| 11 | int *p = NULL;
| | ^
| | |
| | (1) ‘p’ is NULL
| 12 | *p = 42;
| | ~~~~~~~
| | |
| | (4) dereference of NULL ‘p’
| 13 |
| 14 | int *q = NULL;
| | ~
| | |
| | (2) ‘p’ is NULL
| | (3) ‘p’ is NULL
>
>
>> I tracked the cause down the call stack to
>> m_saved_diagnostics.safe_push (sd); inside
>> diagnostic_manager::add_diagnostic. For whatever reason, the pushing
>> of
>> diagnostics bricks the zero sm. It might be a embarrassing error on
>> my
>> side, but I'm stuck on this and can't seem to find what I'm doing
>> wrong.
>
> FWIW, assuming you're using the gdb support scripts:
>
> https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb
> there's a handy:
>
> (gdb) break-on-saved-diagnostic
>
> command which adds a breakpoint on the analyzer's diagnostic_manager
> saving a pending_diagnostic.
>
> You might also what to try -fdump-analyzer-stderr, which will dump
> lots
> of information on what the analyzer is doing to stderr, which although
> verbose is usually very helpful in figuring out why the analyzer is
> doing something.
>
> Hope this is helpful
Definitely helped me understand more of the internals!
- Tim
> Dave
>
>
>
>>
>> - Tim
>>
>> [1]
>>
>>
>> https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181
>>
>>
>
>
next prev parent reply other threads:[~2022-06-12 18:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 14:49 Tim Lange
2022-06-09 17:40 ` David Malcolm
2022-06-12 18:27 ` Tim Lange [this message]
2022-06-13 19:25 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29NDDR.H0A8YV139ST52@tim-lange.me \
--to=mail@tim-lange.me \
--cc=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).