fanalyzer: debugging zero state machine

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* fanalyzer: debugging zero state machine
@ 2022-06-09 14:49 Tim Lange
  2022-06-09 17:40 ` David Malcolm
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Lange @ 2022-06-09 14:49 UTC (permalink / raw)
  To: dmalcolm, GCC Mailing List


 > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm 
<dmalcolm@redhat.com> wrote:
 > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
 > >
 > > Hi Dave,
 > >
 > > I did spent some time to think about the zero state machine. I 
first
 > > thought about distinguishing between "assigned zero" and "EQ 0
 > > condition on the path" for cases where e.g. unreachable() is used 
to
 > > say that some variable will never be zero according to the 
programmer.
 > > In that case the dev might not want zero warnings from conditions
 > > outside the if body itself for dev build where unreachable expands 
to
 > > some panic exit. But as the condition constraint is not pruned 
when the
 > > state machine is distinguishing the states, I'm not sure how to 
know
 > > whether the analysis already left the if body?
 >
 > The analyzer works on the gimple-ssa representation, which uses basic
 > blocks in a CFG, rather than an AST, so the only remants we have of
 > scope is in "clobber" statements (a special kind of assignment stmt),
 > which the gimplify adds as variables go out of scope.
If the constraints only lived until the immediate dominator of `if 
(cond)`, I could easily distinguish:
1. if (x == 0) && still inside the if => zero
2. if (x == 0) && outside if => maybe zero
but as this seems to be not the case, if I want to distinguish 1. & 2., 
I'd have to find another way.
 >
 > For pruning, the analyzer's state_machine class has a "can_purge_p"
 > virtual function:
 >
 > /* Return true if it safe to discard the given state (to help
 > when simplifying state objects).
 > States that need leak detection should return false. */
 > virtual bool can_purge_p (state_t s) const = 0;
 >
 > which should return true for a "zeroness" state machine, in that we
 > always consider pruning states for svalues that aren't needed anymore
 > along a path.
Is implemented and returns true.
 >
 > Is there some other kind of state explosion you're running into? It's
 > hard to figure this out further without seeing code.
No, my code is by far not that mature to be tested. I just had in my 
head that I wanted to find out if I can distinguish the two cases.
 >
 >
 > > Also, while trying out different things, it seems simple 
assignments on
 > > phi like here
 > > int x;
 > > if (argc == 1) {
 > > x = 1; // x_5
 > > } else {
 > > x = 0; // x_4
 > > }
 > > printf("%i", 5 / x); // x_2
 > > automatically work such that x_2 already inherits the state from
 > > x_4/x_5 without me doing anything inside my sm's on_phi function. 
Same
 > > for the simple y = 0; x = y; case. Where does this happen inside 
the
 > > code?
 >
 > With the caveat that I'm seeing your code, what's probably happening 
is
 > that we have either:
 >
 > BB (a):
 > x_5 = 1;
 > goto BB (c);
 >
 > BB (b):
 > x_4 = 0;
 > goto BB (c);
 >
 > BB (c):
 > x_2 = PHI (x_5 from (a), x_4 from (b));
I compiled it with -g, so this one is like the dumped gimple.
 >
 > or (at higher optimization levels):
 >
 > BB (a):
 > goto BB (c);
 >
 > BB (b):
 > goto BB (c);
 >
 > BB (c):
 > x_2 = PHI (1 from (a), 0 from (b));
 >
 > and I think that at the phi node we have region_model::handle_phi,
 > which is setting x_2 to either the constant 1 or the constant 0 in 
the
 > store, and is calling the on_phi vfunc, leading to on_phi being 
called
 > for all state machines.
Thanks, that is the case. The set_value inside handle_phi seems to this 
for me.
 >
 > BTW, are you implementing an override for this vfunc:
 > virtual state_machine::state_t get_default_state (const svalue *)
 > const;
 >
 > to capture the inherently known zeroness/nonzeroness of 
constant_svalue
 > instances? That would make those constants have that state.
Yeah, I saw that on your nullness check. I tried it on a small example 
with and without, but didn't noticed a difference in warnings (except 
for not having zero(x_4) inside the supergraph.dot). So if I understood 
this right, this is just to have one state less for that 
variable/value[0]?

If that is right, is it also favorable to "merge" the stop state and 
non_zero state inside the zero state machine because - for now - there 
is no warning planned on non-zero values?

[0] less states are favorable because then the analyzer maybe has less 
different enodes to visit and thus less runtime(?)
 >
 > Thanks
 > Dave
 >
 > > - Tim

Also another question unrelated to the ones before. I do have a weird 
bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for 
example, the probably simplest case:
  int x = 0;h
  printf("%d", 42 / x);
If I use inform to emit a notice for my state machine result, it seems 
to be correct and I do get following traversal order (by printing the 
gimple stmt inside on_stmt):
  x_2 = 0;
  _1 = 42 % x_2;
  # .MEM_4 = VDEF <.MEM_3(D)>
  printf ("%d", _1);
  _5 = 0;
  <L0>:
  # VUSE <.MEM_4>
  return _5;
But if i use my zero_diagnostics and sm_ctxt->warn to emit the warning, 
I get an unexpected traversal order of:
  x_2 = 0;
  _1 = 42 / x_2;
  # .MEM_4 = VDEF <.MEM_3(D)>
  printf ("%d", _1);
  _5 = 0;
  <L0>:
  # VUSE <.MEM_4>
  return _5;
  x_2 = 0; 		<-- why do I get these stmts again but they are not as 
duplicated in the e-supergraph?
  _1 = 42 / x_2;
  _5 = 0;
  _1 = 42 / x_2;
  _5 = 0;
I tracked the cause down the call stack to 
m_saved_diagnostics.safe_push (sd); inside 
diagnostic_manager::add_diagnostic. For whatever reason, the pushing of 
diagnostics bricks the zero sm. It might be a embarrassing error on my 
side, but I'm stuck on this and can't seem to find what I'm doing wrong.

- Tim

[1] 
https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fanalyzer: debugging zero state machine
  2022-06-09 14:49 fanalyzer: debugging zero state machine Tim Lange
@ 2022-06-09 17:40 ` David Malcolm
  2022-06-12 18:27   ` Tim Lange
  0 siblings, 1 reply; 4+ messages in thread
From: David Malcolm @ 2022-06-09 17:40 UTC (permalink / raw)
  To: Tim Lange, GCC Mailing List

On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:
> 
>  > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm 
> <dmalcolm@redhat.com> wrote:
>  > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
>  > >
>  > > Hi Dave,

Hi Tim; various responses inline below...

>  > >
>  > > I did spent some time to think about the zero state machine. I 
> first
>  > > thought about distinguishing between "assigned zero" and "EQ 0
>  > > condition on the path" for cases where e.g. unreachable() is
> used 
> to
>  > > say that some variable will never be zero according to the 
> programmer.
>  > > In that case the dev might not want zero warnings from
> conditions
>  > > outside the if body itself for dev build where unreachable
> expands 
> to
>  > > some panic exit. But as the condition constraint is not pruned 
> when the
>  > > state machine is distinguishing the states, I'm not sure how to 
> know
>  > > whether the analysis already left the if body?
>  >
>  > The analyzer works on the gimple-ssa representation, which uses
> basic
>  > blocks in a CFG, rather than an AST, so the only remants we have
> of
>  > scope is in "clobber" statements (a special kind of assignment
> stmt),
>  > which the gimplify adds as variables go out of scope.
> If the constraints only lived until the immediate dominator of `if 
> (cond)`, I could easily distinguish:
> 1. if (x == 0) && still inside the if => zero
> 2. if (x == 0) && outside if => maybe zero
> but as this seems to be not the case, if I want to distinguish 1. &
> 2., 
> I'd have to find another way.
>  >
>  > For pruning, the analyzer's state_machine class has a
> "can_purge_p"
>  > virtual function:
>  >
>  > /* Return true if it safe to discard the given state (to help
>  > when simplifying state objects).
>  > States that need leak detection should return false. */
>  > virtual bool can_purge_p (state_t s) const = 0;
>  >
>  > which should return true for a "zeroness" state machine, in that
> we
>  > always consider pruning states for svalues that aren't needed
> anymore
>  > along a path.
> Is implemented and returns true.
>  >
>  > Is there some other kind of state explosion you're running into?
> It's
>  > hard to figure this out further without seeing code.
> No, my code is by far not that mature to be tested. I just had in my 
> head that I wanted to find out if I can distinguish the two cases.
>  >
>  >
>  > > Also, while trying out different things, it seems simple 
> assignments on
>  > > phi like here
>  > > int x;
>  > > if (argc == 1) {
>  > > x = 1; // x_5
>  > > } else {
>  > > x = 0; // x_4
>  > > }
>  > > printf("%i", 5 / x); // x_2
>  > > automatically work such that x_2 already inherits the state from
>  > > x_4/x_5 without me doing anything inside my sm's on_phi
> function. 
> Same
>  > > for the simple y = 0; x = y; case. Where does this happen inside
> the
>  > > code?
>  >
>  > With the caveat that I'm seeing your code, what's probably
> happening 
> is
>  > that we have either:
>  >
>  > BB (a):
>  > x_5 = 1;
>  > goto BB (c);
>  >
>  > BB (b):
>  > x_4 = 0;
>  > goto BB (c);
>  >
>  > BB (c):
>  > x_2 = PHI (x_5 from (a), x_4 from (b));
> I compiled it with -g, so this one is like the dumped gimple.
>  >
>  > or (at higher optimization levels):
>  >
>  > BB (a):
>  > goto BB (c);
>  >
>  > BB (b):
>  > goto BB (c);
>  >
>  > BB (c):
>  > x_2 = PHI (1 from (a), 0 from (b));
>  >
>  > and I think that at the phi node we have region_model::handle_phi,
>  > which is setting x_2 to either the constant 1 or the constant 0 in
> the
>  > store, and is calling the on_phi vfunc, leading to on_phi being 
> called
>  > for all state machines.
> Thanks, that is the case. The set_value inside handle_phi seems to
> this 
> for me.
>  >
>  > BTW, are you implementing an override for this vfunc:
>  > virtual state_machine::state_t get_default_state (const svalue *)
>  > const;
>  >
>  > to capture the inherently known zeroness/nonzeroness of 
> constant_svalue
>  > instances? That would make those constants have that state.
> Yeah, I saw that on your nullness check. I tried it on a small
> example 
> with and without, but didn't noticed a difference in warnings (except
> for not having zero(x_4) inside the supergraph.dot). So if I
> understood 
> this right, this is just to have one state less for that 
> variable/value[0]?

The states are stored in sm_state_map using a mapping from svalue to
state.

Given e.g. a parameter, this will be "initial_svalue (parm)", but in
the above case where x_4 either has value 1 or has value 0, it's not
storing a state for x_4; it's storing a state for 0 or a state for 1. 
So without implementing the get_default_state vfunc, the sm_state_maps
will gradually aquire explicit "this is zero" or "this is nonzero" for
all of the constants that get used, leading to an explosion of states
for all the different combinations of constants that have been
encountered along an execution path, which is probably not going to be
useful.  If you do implement the get_default_state vfunc for constants,
then the constants will implicitly have the zero vs nonzero states, and
this information won't get stored in the sm_state_map instances.

program_state::can_merge_with_p compares the sm_state_map instances
between the two program_state instances - and so if the
get_default_state vfunc is implemented for constants, the peer
sm_state_maps for the "zero" state machine will both be empty, and thus
equal, and thus it will merge state.

So it might make sense to revamp this sm_state_map comparison so that
we consider implicit states in sm_state_maps when merging the store, so
that we see that "1" and "0" have different sm-states for the "zero"
state machine, and thus we shouldn't merge states for these.  See
binding_cluster::can_merge_p, which calls svalue::can_merge_p.  It
might make sense to have svalue::can_merge_p "know" about sm-state-maps
for this.  

svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that we
can reject merging values that explicitly have different state-machine
states, but that code could be reworked so that it considers whether a
pair of svalues can be merged, using implicit state-machine states (and
thus have it reject the merger of a zero with a nonzero constant).

Or, whilst prototyping, you could simply hack in a rejection of merging
zero vs non-zero integer values into svalue::can_merge_p.

> 
> If that is right, is it also favorable to "merge" the stop state and 
> non_zero state inside the zero state machine because - for now -
> there 
> is no warning planned on non-zero values?
> 
> [0] less states are favorable because then the analyzer maybe has
> less 
> different enodes to visit and thus less runtime(?)
>  >
>  > Thanks
>  > Dave
>  >
>  > > - Tim
> 
> Also another question unrelated to the ones before. I do have a weird
> bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for 
> example, the probably simplest case:
>   int x = 0;h
>   printf("%d", 42 / x);
> If I use inform to emit a notice for my state machine result, it
> seems 
> to be correct and I do get following traversal order (by printing the
> gimple stmt inside on_stmt):
>   x_2 = 0;
>   _1 = 42 % x_2;
>   # .MEM_4 = VDEF <.MEM_3(D)>
>   printf ("%d", _1);
>   _5 = 0;
>   <L0>:
>   # VUSE <.MEM_4>
>   return _5;
> But if i use my zero_diagnostics and sm_ctxt->warn to emit the
> warning, 
> I get an unexpected traversal order of:
>   x_2 = 0;
>   _1 = 42 / x_2;
>   # .MEM_4 = VDEF <.MEM_3(D)>
>   printf ("%d", _1);
>   _5 = 0;
>   <L0>:
>   # VUSE <.MEM_4>
>   return _5;
>   x_2 = 0;              <-- why do I get these stmts again but they
> are not as 
> duplicated in the e-supergraph?
>   _1 = 42 / x_2;
>   _5 = 0;
>   _1 = 42 / x_2;
>   _5 = 0;

What exactly is your test source file?

Note that a function being called by another function will get explored
in the exploded_graph twice: once being analyzed "standalone", and
again being analyzed as called from the other function.

When you say "I get these stmts again", do you mean the state machine
code is seeing them again, or the region model code?

  sm_ctxt->warn (new foo)

will save an instance of a pending_diagnostic into the
diagnotic_manager, associating it with a particular exploded_node (and
some other info).

After the exploded_graph has been built, the diagnostic_manager does
some deduplication work, and then tries to find a feasible path to each
pending_diagnostic's exploded node.

This feasibility analysis reruns much of the region_model code that was
run when building the exploded graph.  So that's another way you can
"see" stmts repeatedly.

> I tracked the cause down the call stack to 
> m_saved_diagnostics.safe_push (sd); inside 
> diagnostic_manager::add_diagnostic. For whatever reason, the pushing
> of 
> diagnostics bricks the zero sm. It might be a embarrassing error on
> my 
> side, but I'm stuck on this and can't seem to find what I'm doing
> wrong.

FWIW, assuming you're using the gdb support scripts:
  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb
there's a handy:

(gdb) break-on-saved-diagnostic

command which adds a breakpoint on the analyzer's diagnostic_manager
saving a pending_diagnostic.

You might also what to try -fdump-analyzer-stderr, which will dump lots
of information on what the analyzer is doing to stderr, which although
verbose is usually very helpful in figuring out why the analyzer is
doing something.

Hope this is helpful
Dave

> 
> - Tim
> 
> [1] 
>  
> https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fanalyzer: debugging zero state machine
  2022-06-09 17:40 ` David Malcolm
@ 2022-06-12 18:27   ` Tim Lange
  2022-06-13 19:25     ` David Malcolm
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Lange @ 2022-06-12 18:27 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Mailing List



On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm 
<dmalcolm@redhat.com> wrote:
> On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:
>> 
>>   > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm
>>  <dmalcolm@redhat.com> wrote:
>>   > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
>>   > >
>>   > > Hi Dave,
> 
> Hi Tim; various responses inline below...
> 
>>   > >
>>   > > I did spent some time to think about the zero state machine. I
>>  first
>>   > > thought about distinguishing between "assigned zero" and "EQ 0
>>   > > condition on the path" for cases where e.g. unreachable() is
>>  used
>>  to
>>   > > say that some variable will never be zero according to the
>>  programmer.
>>   > > In that case the dev might not want zero warnings from
>>  conditions
>>   > > outside the if body itself for dev build where unreachable
>>  expands
>>  to
>>   > > some panic exit. But as the condition constraint is not pruned
>>  when the
>>   > > state machine is distinguishing the states, I'm not sure how to
>>  know
>>   > > whether the analysis already left the if body?
>>   >
>>   > The analyzer works on the gimple-ssa representation, which uses
>>  basic
>>   > blocks in a CFG, rather than an AST, so the only remants we have
>>  of
>>   > scope is in "clobber" statements (a special kind of assignment
>>  stmt),
>>   > which the gimplify adds as variables go out of scope.
>>  If the constraints only lived until the immediate dominator of `if
>>  (cond)`, I could easily distinguish:
>>  1. if (x == 0) && still inside the if => zero
>>  2. if (x == 0) && outside if => maybe zero
>>  but as this seems to be not the case, if I want to distinguish 1. &
>>  2.,
>>  I'd have to find another way.
>>   >
>>   > For pruning, the analyzer's state_machine class has a
>>  "can_purge_p"
>>   > virtual function:
>>   >
>>   > /* Return true if it safe to discard the given state (to help
>>   > when simplifying state objects).
>>   > States that need leak detection should return false. */
>>   > virtual bool can_purge_p (state_t s) const = 0;
>>   >
>>   > which should return true for a "zeroness" state machine, in that
>>  we
>>   > always consider pruning states for svalues that aren't needed
>>  anymore
>>   > along a path.
>>  Is implemented and returns true.
>>   >
>>   > Is there some other kind of state explosion you're running into?
>>  It's
>>   > hard to figure this out further without seeing code.
>>  No, my code is by far not that mature to be tested. I just had in my
>>  head that I wanted to find out if I can distinguish the two cases.
>>   >
>>   >
>>   > > Also, while trying out different things, it seems simple
>>  assignments on
>>   > > phi like here
>>   > > int x;
>>   > > if (argc == 1) {
>>   > > x = 1; // x_5
>>   > > } else {
>>   > > x = 0; // x_4
>>   > > }
>>   > > printf("%i", 5 / x); // x_2
>>   > > automatically work such that x_2 already inherits the state 
>> from
>>   > > x_4/x_5 without me doing anything inside my sm's on_phi
>>  function.
>>  Same
>>   > > for the simple y = 0; x = y; case. Where does this happen 
>> inside
>>  the
>>   > > code?
>>   >
>>   > With the caveat that I'm seeing your code, what's probably
>>  happening
>>  is
>>   > that we have either:
>>   >
>>   > BB (a):
>>   > x_5 = 1;
>>   > goto BB (c);
>>   >
>>   > BB (b):
>>   > x_4 = 0;
>>   > goto BB (c);
>>   >
>>   > BB (c):
>>   > x_2 = PHI (x_5 from (a), x_4 from (b));
>>  I compiled it with -g, so this one is like the dumped gimple.
>>   >
>>   > or (at higher optimization levels):
>>   >
>>   > BB (a):
>>   > goto BB (c);
>>   >
>>   > BB (b):
>>   > goto BB (c);
>>   >
>>   > BB (c):
>>   > x_2 = PHI (1 from (a), 0 from (b));
>>   >
>>   > and I think that at the phi node we have 
>> region_model::handle_phi,
>>   > which is setting x_2 to either the constant 1 or the constant 0 
>> in
>>  the
>>   > store, and is calling the on_phi vfunc, leading to on_phi being
>>  called
>>   > for all state machines.
>>  Thanks, that is the case. The set_value inside handle_phi seems to
>>  this
>>  for me.
>>   >
>>   > BTW, are you implementing an override for this vfunc:
>>   > virtual state_machine::state_t get_default_state (const svalue *)
>>   > const;
>>   >
>>   > to capture the inherently known zeroness/nonzeroness of
>>  constant_svalue
>>   > instances? That would make those constants have that state.
>>  Yeah, I saw that on your nullness check. I tried it on a small
>>  example
>>  with and without, but didn't noticed a difference in warnings 
>> (except
>>  for not having zero(x_4) inside the supergraph.dot). So if I
>>  understood
>>  this right, this is just to have one state less for that
>>  variable/value[0]?
> 
> The states are stored in sm_state_map using a mapping from svalue to
> state.
> 
> Given e.g. a parameter, this will be "initial_svalue (parm)", but in
> the above case where x_4 either has value 1 or has value 0, it's not
> storing a state for x_4; it's storing a state for 0 or a state for 1.
> So without implementing the get_default_state vfunc, the sm_state_maps
> will gradually aquire explicit "this is zero" or "this is nonzero" for
> all of the constants that get used, leading to an explosion of states
> for all the different combinations of constants that have been
> encountered along an execution path, which is probably not going to be
> useful.  If you do implement the get_default_state vfunc for 
> constants,
> then the constants will implicitly have the zero vs nonzero states, 
> and
> this information won't get stored in the sm_state_map instances.
> 
> program_state::can_merge_with_p compares the sm_state_map instances
> between the two program_state instances - and so if the
> get_default_state vfunc is implemented for constants, the peer
> sm_state_maps for the "zero" state machine will both be empty, and 
> thus
> equal, and thus it will merge state.
> 
> So it might make sense to revamp this sm_state_map comparison so that
> we consider implicit states in sm_state_maps when merging the store, 
> so
> that we see that "1" and "0" have different sm-states for the "zero"
> state machine, and thus we shouldn't merge states for these.  See
> binding_cluster::can_merge_p, which calls svalue::can_merge_p.  It
> might make sense to have svalue::can_merge_p "know" about 
> sm-state-maps
> for this.
> 
> svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that we
> can reject merging values that explicitly have different state-machine
> states, but that code could be reworked so that it considers whether a
> pair of svalues can be merged, using implicit state-machine states 
> (and
> thus have it reject the merger of a zero with a nonzero constant).
> 
> Or, whilst prototyping, you could simply hack in a rejection of 
> merging
> zero vs non-zero integer values into svalue::can_merge_p.
> 
>> 
>>  If that is right, is it also favorable to "merge" the stop state and
>>  non_zero state inside the zero state machine because - for now -
>>  there
>>  is no warning planned on non-zero values?
>> 
>>  [0] less states are favorable because then the analyzer maybe has
>>  less
>>  different enodes to visit and thus less runtime(?)
>>   >
>>   > Thanks
>>   > Dave
>>   >
>>   > > - Tim
>> 
>>  Also another question unrelated to the ones before. I do have a 
>> weird
>>  bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for
>>  example, the probably simplest case:
>>    int x = 0;h
>>    printf("%d", 42 / x);
>>  If I use inform to emit a notice for my state machine result, it
>>  seems
>>  to be correct and I do get following traversal order (by printing 
>> the
>>  gimple stmt inside on_stmt):
>>    x_2 = 0;
>>    _1 = 42 % x_2;
>>    # .MEM_4 = VDEF <.MEM_3(D)>
>>    printf ("%d", _1);
>>    _5 = 0;
>>    <L0>:
>>    # VUSE <.MEM_4>
>>    return _5;
>>  But if i use my zero_diagnostics and sm_ctxt->warn to emit the
>>  warning,
>>  I get an unexpected traversal order of:
>>    x_2 = 0;
>>    _1 = 42 / x_2;
>>    # .MEM_4 = VDEF <.MEM_3(D)>
>>    printf ("%d", _1);
>>    _5 = 0;
>>    <L0>:
>>    # VUSE <.MEM_4>
>>    return _5;
>>    x_2 = 0;              <-- why do I get these stmts again but they
>>  are not as
>>  duplicated in the e-supergraph?
>>    _1 = 42 / x_2;
>>    _5 = 0;
>>    _1 = 42 / x_2;
>>    _5 = 0;
> 
> What exactly is your test source file?
It is just the two lines wrapped into main():
  #include <stdio.h>
  int main(int argc, char **argv) {
    int x = 0;
    printf("%d", 42 / x);
    return 0;
  }
> 
> Note that a function being called by another function will get 
> explored
> in the exploded_graph twice: once being analyzed "standalone", and
> again being analyzed as called from the other function.
> 
> When you say "I get these stmts again", do you mean the state machine
> code is seeing them again, or the region model code?
In the state_machine::on_stmt function.
> 
>   sm_ctxt->warn (new foo)
> 
> will save an instance of a pending_diagnostic into the
> diagnotic_manager, associating it with a particular exploded_node (and
> some other info).
> 
> After the exploded_graph has been built, the diagnostic_manager does
> some deduplication work, and then tries to find a feasible path to 
> each
> pending_diagnostic's exploded node.
> 
> This feasibility analysis reruns much of the region_model code that 
> was
> run when building the exploded graph.  So that's another way you can
> "see" stmts repeatedly.
Okay, that probably explains it. I had two distinct problems where I 
thought they might be both related to seeing the statements again:
    |   13 |   int x = 0;
    |      |       ^
    |      |       |
    |      |       (1) set to zero here
    |      |       (3) set to zero here
    |   14 |   printf("%d", 42 / x);
    |      |   ~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (4) division by zero
    |   15 |
    |   16 |   return 0;
    |      |          ~
    |      |          |
    |      |          (2) set to zero here
    |      |          (5) set to zero here
* In line 13, there are two state changes while I'd expect only one.
=> I re-looked at the other state machines and they transfer to the 
stop state after the warning. If I do the same, my problem in line 13 
is fixed.
* The other problem in line 16 is that return 0 is not relevant toward 
the division by zero.
=> After looking at the diagnostic code, I think those should be 
filtered out but aren't because x (in gimple x_2) and return 0 (in 
gimple _5) share the (int)0 state. Inside 
diagnostic_manager::prune_for_sm_diagnostic there is a check that the 
state_change->m_sval is equal to the sval of the saved_diagnostic, 
which is not enough to filter unrelated events in the zero state 
machine case.
I tried to filter the event by comparing the state of interest but that 
doesn't work if the unrelated event is after the related event on the 
path.
Similarly, I tried to add a new field m_var to state_change_event and 
provide this as a comparison but in most functions that call the 
constructor of state_change_event, the var is not available.

Is there is any easy way to filter these unrelated events without 
considerable changes [1]?

By the way, I can trigger the same behavior on the malloc null_deref 
with gcc version 12.1.1 shipped in Fedora 36. Assume this simple 
program:
  #include <stddef.h>
  int main (void)
  {
    int *p = NULL;
    *p = 42;

    int *q = NULL;

    return 0;
  }
results in the following warning:
/home/tim/Projects/simple_c/main.c: In function ‘main’:
/home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL 
‘p’ [CWE-476] [-Wanalyzer-null-dereference]
   12 |   *p = 42;
      |   ~~~^~~~
  ‘main’: events 1-4
    |
    |   11 |   int *p = NULL;
    |      |        ^
    |      |        |
    |      |        (1) ‘p’ is NULL
    |   12 |   *p = 42;
    |      |   ~~~~~~~
    |      |      |
    |      |      (4) dereference of NULL ‘p’
    |   13 |
    |   14 |   int *q = NULL;
    |      |        ~
    |      |        |
    |      |        (2) ‘p’ is NULL
    |      |        (3) ‘p’ is NULL

> 
> 
>>  I tracked the cause down the call stack to
>>  m_saved_diagnostics.safe_push (sd); inside
>>  diagnostic_manager::add_diagnostic. For whatever reason, the pushing
>>  of
>>  diagnostics bricks the zero sm. It might be a embarrassing error on
>>  my
>>  side, but I'm stuck on this and can't seem to find what I'm doing
>>  wrong.
> 
> FWIW, assuming you're using the gdb support scripts:
>   
> https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb
> there's a handy:
> 
> (gdb) break-on-saved-diagnostic
> 
> command which adds a breakpoint on the analyzer's diagnostic_manager
> saving a pending_diagnostic.
> 
> You might also what to try -fdump-analyzer-stderr, which will dump 
> lots
> of information on what the analyzer is doing to stderr, which although
> verbose is usually very helpful in figuring out why the analyzer is
> doing something.
> 
> Hope this is helpful
Definitely helped me understand more of the internals!

- Tim
> Dave
> 
> 
> 
>> 
>>  - Tim
>> 
>>  [1]
>> 
>>  
>> https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181
>> 
>> 
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fanalyzer: debugging zero state machine
  2022-06-12 18:27   ` Tim Lange
@ 2022-06-13 19:25     ` David Malcolm
  0 siblings, 0 replies; 4+ messages in thread
From: David Malcolm @ 2022-06-13 19:25 UTC (permalink / raw)
  To: Tim Lange; +Cc: GCC Mailing List

On Sun, 2022-06-12 at 20:27 +0200, Tim Lange wrote:
> 
> 
> On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm 
> <dmalcolm@redhat.com> wrote:
> > On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote:
> > > 
> > >   > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm
> > >  <dmalcolm@redhat.com> wrote:
> > >   > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote:
> > >   > >
> > >   > > Hi Dave,
> > 
> > Hi Tim; various responses inline below...
> > 
> > >   > >
> > >   > > I did spent some time to think about the zero state machine.
> > > I
> > >  first
> > >   > > thought about distinguishing between "assigned zero" and "EQ
> > > 0
> > >   > > condition on the path" for cases where e.g. unreachable() is
> > >  used
> > >  to
> > >   > > say that some variable will never be zero according to the
> > >  programmer.
> > >   > > In that case the dev might not want zero warnings from
> > >  conditions
> > >   > > outside the if body itself for dev build where unreachable
> > >  expands
> > >  to
> > >   > > some panic exit. But as the condition constraint is not
> > > pruned
> > >  when the
> > >   > > state machine is distinguishing the states, I'm not sure how
> > > to
> > >  know
> > >   > > whether the analysis already left the if body?
> > >   >
> > >   > The analyzer works on the gimple-ssa representation, which uses
> > >  basic
> > >   > blocks in a CFG, rather than an AST, so the only remants we
> > > have
> > >  of
> > >   > scope is in "clobber" statements (a special kind of assignment
> > >  stmt),
> > >   > which the gimplify adds as variables go out of scope.
> > >  If the constraints only lived until the immediate dominator of `if
> > >  (cond)`, I could easily distinguish:
> > >  1. if (x == 0) && still inside the if => zero
> > >  2. if (x == 0) && outside if => maybe zero
> > >  but as this seems to be not the case, if I want to distinguish 1.
> > > &
> > >  2.,
> > >  I'd have to find another way.
> > >   >
> > >   > For pruning, the analyzer's state_machine class has a
> > >  "can_purge_p"
> > >   > virtual function:
> > >   >
> > >   > /* Return true if it safe to discard the given state (to help
> > >   > when simplifying state objects).
> > >   > States that need leak detection should return false. */
> > >   > virtual bool can_purge_p (state_t s) const = 0;
> > >   >
> > >   > which should return true for a "zeroness" state machine, in
> > > that
> > >  we
> > >   > always consider pruning states for svalues that aren't needed
> > >  anymore
> > >   > along a path.
> > >  Is implemented and returns true.
> > >   >
> > >   > Is there some other kind of state explosion you're running
> > > into?
> > >  It's
> > >   > hard to figure this out further without seeing code.
> > >  No, my code is by far not that mature to be tested. I just had in
> > > my
> > >  head that I wanted to find out if I can distinguish the two cases.
> > >   >
> > >   >
> > >   > > Also, while trying out different things, it seems simple
> > >  assignments on
> > >   > > phi like here
> > >   > > int x;
> > >   > > if (argc == 1) {
> > >   > > x = 1; // x_5
> > >   > > } else {
> > >   > > x = 0; // x_4
> > >   > > }
> > >   > > printf("%i", 5 / x); // x_2
> > >   > > automatically work such that x_2 already inherits the state
> > > from
> > >   > > x_4/x_5 without me doing anything inside my sm's on_phi
> > >  function.
> > >  Same
> > >   > > for the simple y = 0; x = y; case. Where does this happen 
> > > inside
> > >  the
> > >   > > code?
> > >   >
> > >   > With the caveat that I'm seeing your code, what's probably
> > >  happening
> > >  is
> > >   > that we have either:
> > >   >
> > >   > BB (a):
> > >   > x_5 = 1;
> > >   > goto BB (c);
> > >   >
> > >   > BB (b):
> > >   > x_4 = 0;
> > >   > goto BB (c);
> > >   >
> > >   > BB (c):
> > >   > x_2 = PHI (x_5 from (a), x_4 from (b));
> > >  I compiled it with -g, so this one is like the dumped gimple.
> > >   >
> > >   > or (at higher optimization levels):
> > >   >
> > >   > BB (a):
> > >   > goto BB (c);
> > >   >
> > >   > BB (b):
> > >   > goto BB (c);
> > >   >
> > >   > BB (c):
> > >   > x_2 = PHI (1 from (a), 0 from (b));
> > >   >
> > >   > and I think that at the phi node we have 
> > > region_model::handle_phi,
> > >   > which is setting x_2 to either the constant 1 or the constant 0
> > > in
> > >  the
> > >   > store, and is calling the on_phi vfunc, leading to on_phi being
> > >  called
> > >   > for all state machines.
> > >  Thanks, that is the case. The set_value inside handle_phi seems to
> > >  this
> > >  for me.
> > >   >
> > >   > BTW, are you implementing an override for this vfunc:
> > >   > virtual state_machine::state_t get_default_state (const svalue
> > > *)
> > >   > const;
> > >   >
> > >   > to capture the inherently known zeroness/nonzeroness of
> > >  constant_svalue
> > >   > instances? That would make those constants have that state.
> > >  Yeah, I saw that on your nullness check. I tried it on a small
> > >  example
> > >  with and without, but didn't noticed a difference in warnings 
> > > (except
> > >  for not having zero(x_4) inside the supergraph.dot). So if I
> > >  understood
> > >  this right, this is just to have one state less for that
> > >  variable/value[0]?
> > 
> > The states are stored in sm_state_map using a mapping from svalue to
> > state.
> > 
> > Given e.g. a parameter, this will be "initial_svalue (parm)", but in
> > the above case where x_4 either has value 1 or has value 0, it's not
> > storing a state for x_4; it's storing a state for 0 or a state for 1.
> > So without implementing the get_default_state vfunc, the
> > sm_state_maps
> > will gradually aquire explicit "this is zero" or "this is nonzero"
> > for
> > all of the constants that get used, leading to an explosion of states
> > for all the different combinations of constants that have been
> > encountered along an execution path, which is probably not going to
> > be
> > useful.  If you do implement the get_default_state vfunc for 
> > constants,
> > then the constants will implicitly have the zero vs nonzero states,
> > and
> > this information won't get stored in the sm_state_map instances.
> > 
> > program_state::can_merge_with_p compares the sm_state_map instances
> > between the two program_state instances - and so if the
> > get_default_state vfunc is implemented for constants, the peer
> > sm_state_maps for the "zero" state machine will both be empty, and 
> > thus
> > equal, and thus it will merge state.
> > 
> > So it might make sense to revamp this sm_state_map comparison so that
> > we consider implicit states in sm_state_maps when merging the store, 
> > so
> > that we see that "1" and "0" have different sm-states for the "zero"
> > state machine, and thus we shouldn't merge states for these.  See
> > binding_cluster::can_merge_p, which calls svalue::can_merge_p.  It
> > might make sense to have svalue::can_merge_p "know" about 
> > sm-state-maps
> > for this.
> > 
> > svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that
> > we
> > can reject merging values that explicitly have different state-
> > machine
> > states, but that code could be reworked so that it considers whether
> > a
> > pair of svalues can be merged, using implicit state-machine states 
> > (and
> > thus have it reject the merger of a zero with a nonzero constant).
> > 
> > Or, whilst prototyping, you could simply hack in a rejection of 
> > merging
> > zero vs non-zero integer values into svalue::can_merge_p.
> > 
> > > 
> > >  If that is right, is it also favorable to "merge" the stop state
> > > and
> > >  non_zero state inside the zero state machine because - for now -
> > >  there
> > >  is no warning planned on non-zero values?
> > > 
> > >  [0] less states are favorable because then the analyzer maybe has
> > >  less
> > >  different enodes to visit and thus less runtime(?)
> > >   >
> > >   > Thanks
> > >   > Dave
> > >   >
> > >   > > - Tim
> > > 
> > >  Also another question unrelated to the ones before. I do have a 
> > > weird
> > >  bug in my zero sm[1] but I'm unsure where my sm is flawed. Take
> > > for
> > >  example, the probably simplest case:
> > >    int x = 0;h
> > >    printf("%d", 42 / x);
> > >  If I use inform to emit a notice for my state machine result, it
> > >  seems
> > >  to be correct and I do get following traversal order (by printing 
> > > the
> > >  gimple stmt inside on_stmt):
> > >    x_2 = 0;
> > >    _1 = 42 % x_2;
> > >    # .MEM_4 = VDEF <.MEM_3(D)>
> > >    printf ("%d", _1);
> > >    _5 = 0;
> > >    <L0>:
> > >    # VUSE <.MEM_4>
> > >    return _5;
> > >  But if i use my zero_diagnostics and sm_ctxt->warn to emit the
> > >  warning,
> > >  I get an unexpected traversal order of:
> > >    x_2 = 0;
> > >    _1 = 42 / x_2;
> > >    # .MEM_4 = VDEF <.MEM_3(D)>
> > >    printf ("%d", _1);
> > >    _5 = 0;
> > >    <L0>:
> > >    # VUSE <.MEM_4>
> > >    return _5;
> > >    x_2 = 0;              <-- why do I get these stmts again but
> > > they
> > >  are not as
> > >  duplicated in the e-supergraph?
> > >    _1 = 42 / x_2;
> > >    _5 = 0;
> > >    _1 = 42 / x_2;
> > >    _5 = 0;
> > 
> > What exactly is your test source file?
> It is just the two lines wrapped into main():
>   #include <stdio.h>
>   int main(int argc, char **argv) {
>     int x = 0;
>     printf("%d", 42 / x);
>     return 0;
>   }
> > 
> > Note that a function being called by another function will get 
> > explored
> > in the exploded_graph twice: once being analyzed "standalone", and
> > again being analyzed as called from the other function.
> > 
> > When you say "I get these stmts again", do you mean the state machine
> > code is seeing them again, or the region model code?
> In the state_machine::on_stmt function.
> > 
> >   sm_ctxt->warn (new foo)
> > 
> > will save an instance of a pending_diagnostic into the
> > diagnotic_manager, associating it with a particular exploded_node
> > (and
> > some other info).
> > 
> > After the exploded_graph has been built, the diagnostic_manager does
> > some deduplication work, and then tries to find a feasible path to 
> > each
> > pending_diagnostic's exploded node.
> > 
> > This feasibility analysis reruns much of the region_model code that
> > was
> > run when building the exploded graph.  So that's another way you can
> > "see" stmts repeatedly.
> Okay, that probably explains it. I had two distinct problems where I 
> thought they might be both related to seeing the statements again:
>     |   13 |   int x = 0;
>     |      |       ^
>     |      |       |
>     |      |       (1) set to zero here
>     |      |       (3) set to zero here
>     |   14 |   printf("%d", 42 / x);
>     |      |   ~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (4) division by zero
>     |   15 |
>     |   16 |   return 0;
>     |      |          ~
>     |      |          |
>     |      |          (2) set to zero here
>     |      |          (5) set to zero here

Aha: when you say "seeing the statements again", you're referring to
the diagnostic_path that's emitted.

gcc/analyzer/diagnostic-manager.cc finds a suitable path through the
exploded graph for the warning, and then walks the path, generating a
list of checker_event instances.  It then "optimizes" this list of
checker_events, and the result is what then gets visualized in ASCII
art.

Something really weird's happening in the above.

The way diagnostic-manager.cc generates events could definitely be
improved - it tries to track the state of interest, and thus generate
events that relate to it, but it only knows about state machine
changes, not region_model changes, and it doesn't do a good job of the
former.

I've got some ideas for rewriting it which I'm hoping to get to later
this month, once I finish my work on SARIF.

Basically, my plan is to update the exploded_graph exploration so that
I capture a list of state changes on each edge, capturing the sequence
of how each source state became each destination state, in a reversible
way, so that for any given diagnostic I hope we'll be able to walk
backwards along a graph and see what the changes were, and where they
came from.

Given that, I think it's best if you put your state machine work on
hold for now, and focus on the other tests you were interested in.

FWIW, I just added some more ideas to the tracker bug here:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=105887&hide_resolved=1

at least one of which looks fairly easy (105947).

> * In line 13, there are two state changes while I'd expect only one.
> => I re-looked at the other state machines and they transfer to the 
> stop state after the warning. If I do the same, my problem in line 13
> is fixed.
> * The other problem in line 16 is that return 0 is not relevant toward 
> the division by zero.
> => After looking at the diagnostic code, I think those should be 
> filtered out but aren't because x (in gimple x_2) and return 0 (in 
> gimple _5) share the (int)0 state. Inside 
> diagnostic_manager::prune_for_sm_diagnostic there is a check that the
> state_change->m_sval is equal to the sval of the saved_diagnostic, 
> which is not enough to filter unrelated events in the zero state 
> machine case.

I don't know that it makes much sense to track the states on constants;
it makes more sense for things like the initial_value (x), if x were a
parameter.

It may be that the current implementation of state machines in the
analyzer just isn't a good fit for this test.

> I tried to filter the event by comparing the state of interest but that
> doesn't work if the unrelated event is after the related event on the
> path.
> Similarly, I tried to add a new field m_var to state_change_event and
> provide this as a comparison but in most functions that call the 
> constructor of state_change_event, the var is not available.
> 
> Is there is any easy way to filter these unrelated events without 
> considerable changes [1]?
> 
> By the way, I can trigger the same behavior on the malloc null_deref 
> with gcc version 12.1.1 shipped in Fedora 36. Assume this simple 
> program:
>   #include <stddef.h>
>   int main (void)
>   {
>     int *p = NULL;
>     *p = 42;
> 
>     int *q = NULL;
> 
>     return 0;
>   }
> results in the following warning:
> /home/tim/Projects/simple_c/main.c: In function ‘main’:
> /home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL
> ‘p’ [CWE-476] [-Wanalyzer-null-dereference]
>    12 |   *p = 42;
>       |   ~~~^~~~
>   ‘main’: events 1-4
>     |
>     |   11 |   int *p = NULL;
>     |      |        ^
>     |      |        |
>     |      |        (1) ‘p’ is NULL
>     |   12 |   *p = 42;
>     |      |   ~~~~~~~
>     |      |      |
>     |      |      (4) dereference of NULL ‘p’
>     |   13 |
>     |   14 |   int *q = NULL;
>     |      |        ~
>     |      |        |
>     |      |        (2) ‘p’ is NULL
>     |      |        (3) ‘p’ is NULL
> 

That definitely looks like a bug in the code I was talking about
earlier.

I've filed this as:
   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105958


Thanks
Dave


> > 
> > 
> > >  I tracked the cause down the call stack to
> > >  m_saved_diagnostics.safe_push (sd); inside
> > >  diagnostic_manager::add_diagnostic. For whatever reason, the
> > > pushing
> > >  of
> > >  diagnostics bricks the zero sm. It might be a embarrassing error
> > > on
> > >  my
> > >  side, but I'm stuck on this and can't seem to find what I'm doing
> > >  wrong.
> > 
> > FWIW, assuming you're using the gdb support scripts:
> >   
> >  
> > https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support-scripts-for-gdb
> > there's a handy:
> > 
> > (gdb) break-on-saved-diagnostic
> > 
> > command which adds a breakpoint on the analyzer's diagnostic_manager
> > saving a pending_diagnostic.
> > 
> > You might also what to try -fdump-analyzer-stderr, which will dump 
> > lots
> > of information on what the analyzer is doing to stderr, which
> > although
> > verbose is usually very helpful in figuring out why the analyzer is
> > doing something.
> > 
> > Hope this is helpful
> Definitely helped me understand more of the internals!
> 
> - Tim
> > Dave
> > 
> > 
> > 
> > > 
> > >  - Tim
> > > 
> > >  [1]
> > > 
> > >  
> > >  
> > > https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181
> > > 
> > > 
> > 
> > 
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-13 19:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09 14:49 fanalyzer: debugging zero state machine Tim Lange
2022-06-09 17:40 ` David Malcolm
2022-06-12 18:27   ` Tim Lange
2022-06-13 19:25     ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).