From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from www523.your-server.de (www523.your-server.de [159.69.224.22]) by sourceware.org (Postfix) with ESMTPS id 817203857B88 for ; Sun, 12 Jun 2022 18:27:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 817203857B88 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tim-lange.me Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tim-lange.me DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tim-lange.me; s=default2108; h=Content-Transfer-Encoding:Content-Type: MIME-Version:References:In-Reply-To:Message-Id:Cc:To:Subject:From:Date:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=ZAa7qna9g1pO5pVyfWvLmQgrAxVu78x3CQVj+nSMYbA=; b=iuaG0zU1tCNw8GzLQPhfz/Aioo uRUTyty37aXAEzRoYlgXE7skidzndg1ZfJbMQXIovxOEf+J6XEluNBj/RxxROf8YktWp7nAPgZtIb ZuFUiZIqTBFK82w01C7X/pd8lgnZ7WLWS0vZcowJG8+LzGGB3Vw907vvrM/pgkEo3MAwZDR9hSMTb xSPADrhdCv9bklqx9FDJHTD+tREoFj64MwCdRJ9LlnDICuBfNzn1v2OVr04I+Il46Qy3h+ZhexOM9 VXmAxGdmFIRzNq0nDLyJCspBBRDIST7bK/MeEsbFYGN8a3XOz9/+8kgoIJj+CQGLnF8suTyp7MRKf Bs4haAzw==; Received: from sslproxy05.your-server.de ([78.46.172.2]) by www523.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1o0SIm-000Arz-Ur; Sun, 12 Jun 2022 20:27:09 +0200 Received: from [2a02:908:1864:dbe0::73b5] (helo=fedora) by sslproxy05.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o0SIm-000Emn-QL; Sun, 12 Jun 2022 20:27:08 +0200 Date: Sun, 12 Jun 2022 20:27:02 +0200 From: Tim Lange Subject: Re: fanalyzer: debugging zero state machine To: David Malcolm Cc: GCC Mailing List Message-Id: <29NDDR.H0A8YV139ST52@tim-lange.me> In-Reply-To: <6d52e46e0ee72a9185bc9cd969e7f1553086585e.camel@redhat.com> References: <57T7DR.KTUNZBPTS69U1@tim-lange.me> <6d52e46e0ee72a9185bc9cd969e7f1553086585e.camel@redhat.com> X-Mailer: geary/40.0 MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: quoted-printable X-Authenticated-Sender: mail@tim-lange.me X-Virus-Scanned: Clear (ClamAV 0.103.6/26570/Sun Jun 12 10:15:28 2022) X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_INFOUSMEBIZ, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Jun 2022 18:27:13 -0000 On Do, Jun 9 2022 at 13:40:06 -0400, David Malcolm=20 wrote: > On Thu, 2022-06-09 at 16:49 +0200, Tim Lange wrote: >>=20 >> > On Mi, Jun 8 2022 at 11:12:52 -0400, David Malcolm >> wrote: >> > > On Wed, 2022-06-08 at 01:42 +0200, Tim Lange wrote: >> > > >> > > Hi Dave, >=20 > Hi Tim; various responses inline below... >=20 >> > > >> > > I did spent some time to think about the zero state machine. I >> first >> > > thought about distinguishing between "assigned zero" and "EQ 0 >> > > condition on the path" for cases where e.g. unreachable() is >> used >> to >> > > say that some variable will never be zero according to the >> programmer. >> > > In that case the dev might not want zero warnings from >> conditions >> > > outside the if body itself for dev build where unreachable >> expands >> to >> > > some panic exit. But as the condition constraint is not pruned >> when the >> > > state machine is distinguishing the states, I'm not sure how to >> know >> > > whether the analysis already left the if body? >> > >> > The analyzer works on the gimple-ssa representation, which uses >> basic >> > blocks in a CFG, rather than an AST, so the only remants we have >> of >> > scope is in "clobber" statements (a special kind of assignment >> stmt), >> > which the gimplify adds as variables go out of scope. >> If the constraints only lived until the immediate dominator of `if >> (cond)`, I could easily distinguish: >> 1. if (x =3D=3D 0) && still inside the if =3D> zero >> 2. if (x =3D=3D 0) && outside if =3D> maybe zero >> but as this seems to be not the case, if I want to distinguish 1. & >> 2., >> I'd have to find another way. >> > >> > For pruning, the analyzer's state_machine class has a >> "can_purge_p" >> > virtual function: >> > >> > /* Return true if it safe to discard the given state (to help >> > when simplifying state objects). >> > States that need leak detection should return false. */ >> > virtual bool can_purge_p (state_t s) const =3D 0; >> > >> > which should return true for a "zeroness" state machine, in that >> we >> > always consider pruning states for svalues that aren't needed >> anymore >> > along a path. >> Is implemented and returns true. >> > >> > Is there some other kind of state explosion you're running into? >> It's >> > hard to figure this out further without seeing code. >> No, my code is by far not that mature to be tested. I just had in my >> head that I wanted to find out if I can distinguish the two cases. >> > >> > >> > > Also, while trying out different things, it seems simple >> assignments on >> > > phi like here >> > > int x; >> > > if (argc =3D=3D 1) { >> > > x =3D 1; // x_5 >> > > } else { >> > > x =3D 0; // x_4 >> > > } >> > > printf("%i", 5 / x); // x_2 >> > > automatically work such that x_2 already inherits the state=20 >> from >> > > x_4/x_5 without me doing anything inside my sm's on_phi >> function. >> Same >> > > for the simple y =3D 0; x =3D y; case. Where does this happen=20 >> inside >> the >> > > code? >> > >> > With the caveat that I'm seeing your code, what's probably >> happening >> is >> > that we have either: >> > >> > BB (a): >> > x_5 =3D 1; >> > goto BB (c); >> > >> > BB (b): >> > x_4 =3D 0; >> > goto BB (c); >> > >> > BB (c): >> > x_2 =3D PHI (x_5 from (a), x_4 from (b)); >> I compiled it with -g, so this one is like the dumped gimple. >> > >> > or (at higher optimization levels): >> > >> > BB (a): >> > goto BB (c); >> > >> > BB (b): >> > goto BB (c); >> > >> > BB (c): >> > x_2 =3D PHI (1 from (a), 0 from (b)); >> > >> > and I think that at the phi node we have=20 >> region_model::handle_phi, >> > which is setting x_2 to either the constant 1 or the constant 0=20 >> in >> the >> > store, and is calling the on_phi vfunc, leading to on_phi being >> called >> > for all state machines. >> Thanks, that is the case. The set_value inside handle_phi seems to >> this >> for me. >> > >> > BTW, are you implementing an override for this vfunc: >> > virtual state_machine::state_t get_default_state (const svalue *) >> > const; >> > >> > to capture the inherently known zeroness/nonzeroness of >> constant_svalue >> > instances? That would make those constants have that state. >> Yeah, I saw that on your nullness check. I tried it on a small >> example >> with and without, but didn't noticed a difference in warnings=20 >> (except >> for not having zero(x_4) inside the supergraph.dot). So if I >> understood >> this right, this is just to have one state less for that >> variable/value[0]? >=20 > The states are stored in sm_state_map using a mapping from svalue to > state. >=20 > Given e.g. a parameter, this will be "initial_svalue (parm)", but in > the above case where x_4 either has value 1 or has value 0, it's not > storing a state for x_4; it's storing a state for 0 or a state for 1. > So without implementing the get_default_state vfunc, the sm_state_maps > will gradually aquire explicit "this is zero" or "this is nonzero" for > all of the constants that get used, leading to an explosion of states > for all the different combinations of constants that have been > encountered along an execution path, which is probably not going to be > useful. If you do implement the get_default_state vfunc for=20 > constants, > then the constants will implicitly have the zero vs nonzero states,=20 > and > this information won't get stored in the sm_state_map instances. >=20 > program_state::can_merge_with_p compares the sm_state_map instances > between the two program_state instances - and so if the > get_default_state vfunc is implemented for constants, the peer > sm_state_maps for the "zero" state machine will both be empty, and=20 > thus > equal, and thus it will merge state. >=20 > So it might make sense to revamp this sm_state_map comparison so that > we consider implicit states in sm_state_maps when merging the store,=20 > so > that we see that "1" and "0" have different sm-states for the "zero" > state machine, and thus we shouldn't merge states for these. See > binding_cluster::can_merge_p, which calls svalue::can_merge_p. It > might make sense to have svalue::can_merge_p "know" about=20 > sm-state-maps > for this. >=20 > svalue::can_merge_p calls model_merger::mergeable_svalue_p, so that we > can reject merging values that explicitly have different state-machine > states, but that code could be reworked so that it considers whether a > pair of svalues can be merged, using implicit state-machine states=20 > (and > thus have it reject the merger of a zero with a nonzero constant). >=20 > Or, whilst prototyping, you could simply hack in a rejection of=20 > merging > zero vs non-zero integer values into svalue::can_merge_p. >=20 >>=20 >> If that is right, is it also favorable to "merge" the stop state and >> non_zero state inside the zero state machine because - for now - >> there >> is no warning planned on non-zero values? >>=20 >> [0] less states are favorable because then the analyzer maybe has >> less >> different enodes to visit and thus less runtime(?) >> > >> > Thanks >> > Dave >> > >> > > - Tim >>=20 >> Also another question unrelated to the ones before. I do have a=20 >> weird >> bug in my zero sm[1] but I'm unsure where my sm is flawed. Take for >> example, the probably simplest case: >> int x =3D 0;h >> printf("%d", 42 / x); >> If I use inform to emit a notice for my state machine result, it >> seems >> to be correct and I do get following traversal order (by printing=20 >> the >> gimple stmt inside on_stmt): >> x_2 =3D 0; >> _1 =3D 42 % x_2; >> # .MEM_4 =3D VDEF <.MEM_3(D)> >> printf ("%d", _1); >> _5 =3D 0; >> : >> # VUSE <.MEM_4> >> return _5; >> But if i use my zero_diagnostics and sm_ctxt->warn to emit the >> warning, >> I get an unexpected traversal order of: >> x_2 =3D 0; >> _1 =3D 42 / x_2; >> # .MEM_4 =3D VDEF <.MEM_3(D)> >> printf ("%d", _1); >> _5 =3D 0; >> : >> # VUSE <.MEM_4> >> return _5; >> x_2 =3D 0; <-- why do I get these stmts again but they >> are not as >> duplicated in the e-supergraph? >> _1 =3D 42 / x_2; >> _5 =3D 0; >> _1 =3D 42 / x_2; >> _5 =3D 0; >=20 > What exactly is your test source file? It is just the two lines wrapped into main(): #include int main(int argc, char **argv) { int x =3D 0; printf("%d", 42 / x); return 0; } >=20 > Note that a function being called by another function will get=20 > explored > in the exploded_graph twice: once being analyzed "standalone", and > again being analyzed as called from the other function. >=20 > When you say "I get these stmts again", do you mean the state machine > code is seeing them again, or the region model code? In the state_machine::on_stmt function. >=20 > sm_ctxt->warn (new foo) >=20 > will save an instance of a pending_diagnostic into the > diagnotic_manager, associating it with a particular exploded_node (and > some other info). >=20 > After the exploded_graph has been built, the diagnostic_manager does > some deduplication work, and then tries to find a feasible path to=20 > each > pending_diagnostic's exploded node. >=20 > This feasibility analysis reruns much of the region_model code that=20 > was > run when building the exploded graph. So that's another way you can > "see" stmts repeatedly. Okay, that probably explains it. I had two distinct problems where I=20 thought they might be both related to seeing the statements again: | 13 | int x =3D 0; | | ^ | | | | | (1) set to zero here | | (3) set to zero here | 14 | printf("%d", 42 / x); | | ~~~~~~~~~~~~~~~~~~~~ | | | | | (4) division by zero | 15 | | 16 | return 0; | | ~ | | | | | (2) set to zero here | | (5) set to zero here * In line 13, there are two state changes while I'd expect only one. =3D> I re-looked at the other state machines and they transfer to the=20 stop state after the warning. If I do the same, my problem in line 13=20 is fixed. * The other problem in line 16 is that return 0 is not relevant toward=20 the division by zero. =3D> After looking at the diagnostic code, I think those should be=20 filtered out but aren't because x (in gimple x_2) and return 0 (in=20 gimple _5) share the (int)0 state. Inside=20 diagnostic_manager::prune_for_sm_diagnostic there is a check that the=20 state_change->m_sval is equal to the sval of the saved_diagnostic,=20 which is not enough to filter unrelated events in the zero state=20 machine case. I tried to filter the event by comparing the state of interest but that=20 doesn't work if the unrelated event is after the related event on the=20 path. Similarly, I tried to add a new field m_var to state_change_event and=20 provide this as a comparison but in most functions that call the=20 constructor of state_change_event, the var is not available. Is there is any easy way to filter these unrelated events without=20 considerable changes [1]? By the way, I can trigger the same behavior on the malloc null_deref=20 with gcc version 12.1.1 shipped in Fedora 36. Assume this simple=20 program: #include int main (void) { int *p =3D NULL; *p =3D 42; int *q =3D NULL; return 0; } results in the following warning: /home/tim/Projects/simple_c/main.c: In function =91main=92: /home/tim/Projects/simple_c/main.c:12:6: warning: dereference of NULL=20 =91p=92 [CWE-476] [-Wanalyzer-null-dereference] 12 | *p =3D 42; | ~~~^~~~ =91main=92: events 1-4 | | 11 | int *p =3D NULL; | | ^ | | | | | (1) =91p=92 is NULL | 12 | *p =3D 42; | | ~~~~~~~ | | | | | (4) dereference of NULL =91p=92 | 13 | | 14 | int *q =3D NULL; | | ~ | | | | | (2) =91p=92 is NULL | | (3) =91p=92 is NULL >=20 >=20 >> I tracked the cause down the call stack to >> m_saved_diagnostics.safe_push (sd); inside >> diagnostic_manager::add_diagnostic. For whatever reason, the pushing >> of >> diagnostics bricks the zero sm. It might be a embarrassing error on >> my >> side, but I'm stuck on this and can't seem to find what I'm doing >> wrong. >=20 > FWIW, assuming you're using the gdb support scripts: > =20 > https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html#support= -scripts-for-gdb > there's a handy: >=20 > (gdb) break-on-saved-diagnostic >=20 > command which adds a breakpoint on the analyzer's diagnostic_manager > saving a pending_diagnostic. >=20 > You might also what to try -fdump-analyzer-stderr, which will dump=20 > lots > of information on what the analyzer is doing to stderr, which although > verbose is usually very helpful in figuring out why the analyzer is > doing something. >=20 > Hope this is helpful Definitely helped me understand more of the internals! - Tim > Dave >=20 >=20 >=20 >>=20 >> - Tim >>=20 >> [1] >>=20 >> =20 >> https://github.com/timll/gcc/blob/castsize/gcc/analyzer/sm-zero.cc#L181 >>=20 >>=20 >=20 >=20