From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>
Cc: GCC Mailing List <gcc@gcc.gnu.org>
Subject: Re: fanalyzer: debugging zero state machine
Date: Mon, 13 Jun 2022 15:25:27 -0400 [thread overview]
Message-ID: <f772148cbb6be38356ec86a84f94d566f66410c6.camel@redhat.com> (raw)
In-Reply-To: <29NDDR.H0A8YV139ST52@tim-lange.me>
On Sun, 2022-06-12 at 20:27 +0200, Tim Lange wrote:
>
>
> On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm
> <dmalcolm@redhat.com> wrote:
> > On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:
> > >
> > > > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm
> > > <dmalcolm@redhat.com> wrote:
> > > > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
> > > > >
> > > > > Hi Dave,
> >
> > Hi Tim; various responses inline below...
> >
> > > > >
> > > > > I did spent some time to think about the zero state machine.
> > > I
> > > first
> > > > > thought about distinguishing between "assigned zero" and "EQ
> > > 0
> > > > > condition on the path" for cases where e.g. unreachable() is
> > > used
> > > to
> > > > > say that some variable will never be zero according to the
> > > programmer.
> > > > > In that case the dev might not want zero warnings from
> > > conditions
> > > > > outside the if body itself for dev build where unreachable
> > > expands
> > > to
> > > > > some panic exit. But as the condition constraint is not
> > > pruned
> > > when the
> > > > > state machine is distinguishing the states, I'm not sure how
> > > to
> > > know
> > > > > whether the analysis already left the if body?
> > > >
> > > > The analyzer works on the gimple-ssa representation, which uses
> > > basic
> > > > blocks in a CFG, rather than an AST, so the only remants we
> > > have
> > > of
> > > > scope is in "clobber" statements (a special kind of assignment
> > > stmt),
> > > > which the gimplify adds as variables go out of scope.
> > > If the constraints only lived until the immediate dominator of `if
> > > (cond)`, I could easily distinguish:
> > > 1. if (x == 0) && still inside the if => zero
> > > 2. if (x == 0) && outside if => maybe zero
> > > but as this seems to be not the case, if I want to distinguish 1.
> > > &
> > > 2.,
> > > I'd have to find another way.
> > > >
> > > > For pruning, the analyzer's state_machine class has a
> > > "can_purge_p"
> > > > virtual function:
> > > >
> > > > /* Return true if it safe to discard the given state (to help
> > > > when simplifying state objects).
> > > > States that need leak detection should return false. */
> > > > virtual bool can_purge_p (state_t s) const = 0;
> > > >
> > > > which should return true for a "zeroness" state machine, in
> > > that
> > > we
> > > > always consider pruning states for svalues that aren't needed
> > > anymore
> > > > along a path.
> > > Is implemented and returns true.
> > > >
> > > > Is there some other kind of state explosion you're running
> > > into?
> > > It's
> > > > hard to figure this out further without seeing code.
> > > No, my code is by far not that mature to be tested. I just had in
> > > my
> > > head that I wanted to find out if I can distinguish the two cases.
> > > >
> > > >
> > > > > Also, while trying out different things, it seems simple
> > > assignments on
> > > > > phi like here
> > > > > int x;
> > > > > if (argc == 1) {
> > > > > x = 1; // x_5
> > > > > } else {
> > > > > x = 0; // x_4
> > > > > }
> > > > > printf("%i", 5 / x); // x_2
> > > > > automatically work such that x_2 already inherits the state
> > > from
> > > > > x_4/x_5 without me doing anything inside my sm's on_phi
> > > function.
> > > Same
> > > > > for the simple y = 0; x = y; case. Where does this happen
> > > inside
> > > the
> > > > > code?
> > > >
> > > > With the caveat that I'm seeing your code, what's probably
> > > happening
> > > is
> > > > that we have either:
> > > >
> > > > BB (a):
> > > > x_5 = 1;
> > > > goto BB (c);
> > > >
> > > > BB (b):
> > > > x_4 = 0;
> > > > goto BB (c);
> > > >
> > > > BB (c):
> > > > x_2 = PHI (x_5 from (a), x_4 from (b));
> > > I compiled it with -g, so this one is like the dumped gimple.
> > > >
> > > > or (at higher optimization levels):
> > > >
> > > > BB (a):
> > > > goto BB (c);
> > > >
> > > > BB (b):
> > > > goto BB (c);
> > > >
> > > > BB (c):
> > > > x_2 = PHI (1 from (a), 0 from (b));
> > > >
> > > > and I think that at the phi node we have
> > > region_model::handle_phi,
> > > > which is setting x_2 to either the constant 1 or the constant 0
> > > in
> > > the
> > > > store, and is calling the on_phi vfunc, leading to on_phi being
> > > called
> > > > for all state machines.
> > > Thanks, that is the case. The set_value inside handle_phi seems to
> > > this
> > > for me.
> > > >
> > > > BTW, are you implementing an override for this vfunc:
> > > > virtual state_machine::state_t get_default_state (const svalue
> > > *)
> > > > const;
> > > >
> > > > to capture the inherently known zeroness/nonzeroness of
> > > constant_svalue
> > > > instances? That would make those constants have that state.
> > > Yeah, I saw that on your nullness check. I tried it on a small
> > > example
> > > with and without, but didn't noticed a difference in warnings
> > > (except
> > > for not having zero(x_4) inside the supergraph.dot). So if I
> > > understood
> > > this right, this is just to have one state less for that
> > > variable/value[0]?
> >
> > The states are stored in sm_state_map using a mapping from svalue to
> > state.
> >
> > Given e.g. a parameter, this will be "initial_svalue (parm)", but in
> > the above case where x_4 either has value 1 or has value 0, it's not
> > storing a state for x_4; it's storing a state for 0 or a state for 1.
> > So without implementing the get_default_state vfunc, the
> > sm_state_maps
> > will gradually aquire explicit "this is zero" or "this is nonzero"
> > for
> > all of the constants that get used, leading to an explosion of states
> > for all the different combinations of constants that have been
> > encountered along an execution path, which is probably not going to
> > be
> > useful. If you do implement the get_default_state vfunc for
> > constants,
> > then the constants will implicitly have the zero vs nonzero states,
> > and
> > this information won't get stored in the sm_state_map instances.
> >
> > program_state::can_merge_with_p compares the sm_state_map instances
> > between the two program_state instances - and so if the
> > get_default_state vfunc is implemented for constants, the peer
> > sm_state_maps for the "zero" state machine will both be empty, and
> > thus
> > equal, and thus it will merge state.
> >
> > So it might make sense to revamp this sm_state_map comparison so that
> > we consider implicit states in sm_state_maps when merging the store,
> > so
> > that we see that "1" and "0" have different sm-states for the "zero"
> > state machine, and thus we shouldn't merge states for these. See
> > binding_cluster::can_merge_p, which calls svalue::can_merge_p. It
> > might make sense to have svalue::can_merge_p "know" about
> > sm-state-maps
> > for this.
> >
> > svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that
> > we
> > can reject merging values that explicitly have different state-
> > machine
> > states, but that code could be reworked so that it considers whether
> > a
> > pair of svalues can be merged, using implicit state-machine states
> > (and
> > thus have it reject the merger of a zero with a nonzero constant).
> >
> > Or, whilst prototyping, you could simply hack in a rejection of
> > merging
> > zero vs non-zero integer values into svalue::can_merge_p.
> >
> > >
> > > If that is right, is it also favorable to "merge" the stop state
> > > and
> > > non_zero state inside the zero state machine because - for now -
> > > there
> > > is no warning planned on non-zero values?
> > >
> > > [0] less states are favorable because then the analyzer maybe has
> > > less
> > > different enodes to visit and thus less runtime(?)
> > > >
> > > > Thanks
> > > > Dave
> > > >
> > > > > - Tim
> > >
> > > Also another question unrelated to the ones before. I do have a
> > > weird
> > > bug in my zero sm[1] but I'm unsure where my sm is flawed. Take
> > > for
> > > example, the probably simplest case:
> > > int x = 0;h
> > > printf("%d", 42 / x);
> > > If I use inform to emit a notice for my state machine result, it
> > > seems
> > > to be correct and I do get following traversal order (by printing
> > > the
> > > gimple stmt inside on_stmt):
> > > x_2 = 0;
> > > _1 = 42 % x_2;
> > > # .MEM_4 = VDEF <.MEM_3(D)>
> > > printf ("%d", _1);
> > > _5 = 0;
> > > <L0>:
> > > # VUSE <.MEM_4>
> > > return _5;
> > > But if i use my zero_diagnostics and sm_ctxt->warn to emit the
> > > warning,
> > > I get an unexpected traversal order of:
> > > x_2 = 0;
> > > _1 = 42 / x_2;
> > > # .MEM_4 = VDEF <.MEM_3(D)>
> > > printf ("%d", _1);
> > > _5 = 0;
> > > <L0>:
> > > # VUSE <.MEM_4>
> > > return _5;
> > > x_2 = 0; <-- why do I get these stmts again but
> > > they
> > > are not as
> > > duplicated in the e-supergraph?
> > > _1 = 42 / x_2;
> > > _5 = 0;
> > > _1 = 42 / x_2;
> > > _5 = 0;
> >
> > What exactly is your test source file?
> It is just the two lines wrapped into main():
> #include <stdio.h>
> int main(int argc, char **argv) {
> int x = 0;
> printf("%d", 42 / x);
> return 0;
> }
> >
> > Note that a function being called by another function will get
> > explored
> > in the exploded_graph twice: once being analyzed "standalone", and
> > again being analyzed as called from the other function.
> >
> > When you say "I get these stmts again", do you mean the state machine
> > code is seeing them again, or the region model code?
> In the state_machine::on_stmt function.
> >
> > sm_ctxt->warn (new foo)
> >
> > will save an instance of a pending_diagnostic into the
> > diagnotic_manager, associating it with a particular exploded_node
> > (and
> > some other info).
> >
> > After the exploded_graph has been built, the diagnostic_manager does
> > some deduplication work, and then tries to find a feasible path to
> > each
> > pending_diagnostic's exploded node.
> >
> > This feasibility analysis reruns much of the region_model code that
> > was
> > run when building the exploded graph. So that's another way you can
> > "see" stmts repeatedly.
> Okay, that probably explains it. I had two distinct problems where I
> thought they might be both related to seeing the statements again:
> | 13 | int x = 0;
> | | ^
> | | |
> | | (1) set to zero here
> | | (3) set to zero here
> | 14 | printf("%d", 42 / x);
> | | ~~~~~~~~~~~~~~~~~~~~
> | | |
> | | (4) division by zero
> | 15 |
> | 16 | return 0;
> | | ~
> | | |
> | | (2) set to zero here
> | | (5) set to zero here
Aha: when you say "seeing the statements again", you're referring to
the diagnostic_path that's emitted.
gcc/analyzer/diagnostic-manager.cc finds a suitable path through the
exploded graph for the warning, and then walks the path, generating a
list of checker_event instances. It then "optimizes" this list of
checker_events, and the result is what then gets visualized in ASCII
art.
Something really weird's happening in the above.
The way diagnostic-manager.cc generates events could definitely be
improved - it tries to track the state of interest, and thus generate
events that relate to it, but it only knows about state machine
changes, not region_model changes, and it doesn't do a good job of the
former.
I've got some ideas for rewriting it which I'm hoping to get to later
this month, once I finish my work on SARIF.
Basically, my plan is to update the exploded_graph exploration so that
I capture a list of state changes on each edge, capturing the sequence
of how each source state became each destination state, in a reversible
way, so that for any given diagnostic I hope we'll be able to walk
backwards along a graph and see what the changes were, and where they
came from.
Given that, I think it's best if you put your state machine work on
hold for now, and focus on the other tests you were interested in.
FWIW, I just added some more ideas to the tracker bug here:
https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=105887&hide_resolved=1
at least one of which looks fairly easy (105947).
> * In line 13, there are two state changes while I'd expect only one.
> => I re-looked at the other state machines and they transfer to the
> stop state after the warning. If I do the same, my problem in line 13
> is fixed.
> * The other problem in line 16 is that return 0 is not relevant toward
> the division by zero.
> => After looking at the diagnostic code, I think those should be
> filtered out but aren't because x (in gimple x_2) and return 0 (in
> gimple _5) share the (int)0 state. Inside
> diagnostic_manager::prune_for_sm_diagnostic there is a check that the
> state_change->m_sval is equal to the sval of the saved_diagnostic,
> which is not enough to filter unrelated events in the zero state
> machine case.
I don't know that it makes much sense to track the states on constants;
it makes more sense for things like the initial_value (x), if x were a
parameter.
It may be that the current implementation of state machines in the
analyzer just isn't a good fit for this test.
> I tried to filter the event by comparing the state of interest but that
> doesn't work if the unrelated event is after the related event on the
> path.
> Similarly, I tried to add a new field m_var to state_change_event and
> provide this as a comparison but in most functions that call the
> constructor of state_change_event, the var is not available.
>
> Is there is any easy way to filter these unrelated events without
> considerable changes [1]?
>
> By the way, I can trigger the same behavior on the malloc null_deref
> with gcc version 12.1.1 shipped in Fedora 36. Assume this simple
> program:
> #include <stddef.h>
> int main (void)
> {
> int *p = NULL;
> *p = 42;
>
> int *q = NULL;
>
> return 0;
> }
> results in the following warning:
> /home/tim/Projects/simple_c/main.c: In function ‘main’:
> /home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL
> ‘p’ [CWE-476] [-Wanalyzer-null-dereference]
> 12 | *p = 42;
> | ~~~^~~~
> ‘main’: events 1-4
> |
> | 11 | int *p = NULL;
> | | ^
> | | |
> | | (1) ‘p’ is NULL
> | 12 | *p = 42;
> | | ~~~~~~~
> | | |
> | | (4) dereference of NULL ‘p’
> | 13 |
> | 14 | int *q = NULL;
> | | ~
> | | |
> | | (2) ‘p’ is NULL
> | | (3) ‘p’ is NULL
>
That definitely looks like a bug in the code I was talking about
earlier.
I've filed this as:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105958
Thanks
Dave
> >
> >
> > > I tracked the cause down the call stack to
> > > m_saved_diagnostics.safe_push (sd); inside
> > > diagnostic_manager::add_diagnostic. For whatever reason, the
> > > pushing
> > > of
> > > diagnostics bricks the zero sm. It might be a embarrassing error
> > > on
> > > my
> > > side, but I'm stuck on this and can't seem to find what I'm doing
> > > wrong.
> >
> > FWIW, assuming you're using the gdb support scripts:
> >
> >
> > https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb
> > there's a handy:
> >
> > (gdb) break-on-saved-diagnostic
> >
> > command which adds a breakpoint on the analyzer's diagnostic_manager
> > saving a pending_diagnostic.
> >
> > You might also what to try -fdump-analyzer-stderr, which will dump
> > lots
> > of information on what the analyzer is doing to stderr, which
> > although
> > verbose is usually very helpful in figuring out why the analyzer is
> > doing something.
> >
> > Hope this is helpful
> Definitely helped me understand more of the internals!
>
> - Tim
> > Dave
> >
> >
> >
> > >
> > > - Tim
> > >
> > > [1]
> > >
> > >
> > >
> > > https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181
> > >
> > >
> >
> >
>
>
prev parent reply other threads:[~2022-06-13 19:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 14:49 Tim Lange
2022-06-09 17:40 ` David Malcolm
2022-06-12 18:27 ` Tim Lange
2022-06-13 19:25 ` David Malcolm [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f772148cbb6be38356ec86a84f94d566f66410c6.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
--cc=mail@tim-lange.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).