From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-40130.protonmail.ch (mail-40130.protonmail.ch [185.70.40.130]) by sourceware.org (Postfix) with ESMTPS id EA5EF385041C for ; Sun, 21 Feb 2021 05:27:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EA5EF385041C Date: Sun, 21 Feb 2021 05:27:07 +0000 To: David Malcolm From: "brian.sobulefsky" Cc: "gcc@gcc.gnu.org" Reply-To: "brian.sobulefsky" Subject: Re: Constraints and branching in -fanalyzer Message-ID: In-Reply-To: <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com> References: <7be0053794b8b06245f011bfb8fa8c7185a653c1.camel@redhat.com> <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com> <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Feb 2021 05:27:24 -0000 To be clear, I only solved the lesser problem if(idx-- > 0) __analyzer_eval(idx >=3D 0); which is a stepping stone problem. I correctly surmised that this was faili= ng (with the prefix operator and -=3D operator working as expected) because th= e condition that is constrainted in the postfix problem is the old value for = idx while the condition being evaluated is the new value. I can send you a patc= h, but the short version is the initial value of idx is constrained, then a bi= nop_svalue is stored and eventually ends up in __analyzer_eval. Adding a case in constraint_manager::eval_condition to take apart binop svalues and recur the way you are imagining would be necessary is basically all that is neede= d to solve that one. Currently, the constraint_manager is just looking at that binop_svalue and determining it does not know any rules for it, becaus= e the rule it knows about is actually for one of its arguments. I was hoping this would be it for the original loop problem, but like I sai= d, it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I found out happens for the other operators in a for loop as well. The first true is my binop_svalue handler, but the second UNKNOWN is the merging of the bottom of the loop back with the entry point. Since that is fairly abstract, when I found the case I told you about, I decided to see if I could fix it, because merging >0 with =3D0 into >=3D0 for a linear CFG should not be too hard. Sent with ProtonMail Secure Email. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Saturday, February 20, 2021 12:42 PM, David Malcolm wrote: > [Moving this discussion from offlist to the GCC mailing list (with > permission) and tweaking the subject] > > On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote: > > > Yeah, its a lot to take in. For the last one, it was just about > > storing and retrieving data and I ignored everything else about the > > analyzer, and that was hard enough. > > Well done on making it this far; I'm impressed that you're diving into > some of the more difficult aspects of this code, and seem to be coping. > > > I am working on PR94362, which originates from a false positive found > > compiling openssl. It effectivly amounted to not knowing that idx >=3D > > 0 within the loop for(; idx-- >0 ;). > > It turns out there are two problems here. One has to do with the > > postfix operator, and yes, the analyzer currently does not know that > > i >=3D 0 within an if block like if(idx-- > 0). That problem was easy > > and I got it to work within a few days with a relatively simple > > patch. I thought I was going to be submitting again. > > The second part is hard. It has to do with state merging and has > > nothing to do with the postfix operator. It fails for all sorts of > > operators when looping. In fact, the following fails: > > if(idx < 0) > > =C2=A0 idx =3D 0; > > __analyzer_eval(idx >=3D 0); > > which is devastating if you are hoping the analyzer can "understand" > > a loop. Even with my first fix (which gives one TRUE when run on a > > for loop), there is the second "iterated" pass, which ends up with a > > widening_svalue (I'm still trying to wrap my head around that one > > too), that gives an UNKNOWN > > FWIW "widening" in this context is taken from abstract interpretation; > see e.g. the early papers by Patrick and Radhia Cousot; the idea is to > jump ahead of an infinitely descending chain of values to instead go > directly to a fixed point in a (small) finite number of steps. (I've > not attempted the narrowing approach, which refines it further to get a > tighter approximation). > > > So I am trying to follow how states are merged, but that means I need > > to at least sort of understand the graphs. I do know that the actual > > merging follows in the PK_AFTER_SUPERNODE branch, with the call to > > node->on_edge, which eventually gets you to maybe_update_for_edge and > > the for_each_fact iterator. > > I have spent far too many hours poring over graph dumps from the > analyzer, and yes, grokking the state merging is painful, and I'm sure > there are many bugs. > > Are you familiar with the various dump formats for the graph? In > particular the .dot ones? FWIW I use xdot.py for viewing them: > https://github.com/jrfonseca/xdot.py > (and indeed am the maintainer of the Fedora package for it); it has a > relatively quick and scalable UI for navigating graphs, but at some > point even it can't cope. > I started writing a dedicated visualizer that uses some of xdot.py's > classes: > https://github.com/davidmalcolm/gcc-analyzer-viewer > but it's early days for that. > > > I watched a merge in the debugger yesterday for the if example above > > and watched the unknown_svalues be made for the merged state, but it > > was still too much to take in all at once for me to know where the > > solution is. > > One other nasty problem with the state merging code is that any time I > touch it, there are knock-on effects where other things break (e.g. > loop analysis stops converging), and as I fix those, yet more things > break, which is demoralizing (3 steps forward, 2 steps back). > > Finding ways to break problems down into smaller chunks seems to be the > key here. > > It sounds like you're making progress with the: > > if (idx < 0) > idx =3D 0; > __analyzer_eval (idx >=3D 0); > > case. Does your fix at least work outside of a loop, without > regressing things? (Or, if it does, what regresses?) If so, then it > could be turned into a minimal testcase and we could at least fix that. > > FWIW I've experimented with better ways to handle loops in the > analyzer. One approach is that GCC already has its own loop analysis > framework. At the point where -fanalyzer runs, the IR has captured the > nesting structure of loops in the code, so we might want to make use of > that in our heuristics. Sadly, as far as I can tell, any attempts to > go beyond that and reuse GCC's scalar-value-evolution code (for > handling loop bounds and iterations) require us to enable modification > of the CFGs, which is a no-no for -fanalyzer. > > (Loop handling is one of the most important and difficult issues within > the analyzer implementation. That said, in the last few days I've been > ignoring it, and have been focusing instead on a rewrite of how we find > the shortest feasibile path for each diagnostic, since there's another > cluster of analyzer bugs relating to false negatives in that; I hope to > land a big fix for feasibilty handling next week - and to finish > reviewing your existing patch (sorry!)) > > Hope this is helpful. I'm quoting the rest of our exchange below (for > the mailing list) > > Dave > > > Sent with ProtonMail Secure Email. > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina= l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80= =90 > > On Friday, February 19, 2021 5:48 PM, David Malcolm < > > dmalcolm@redhat.com> wrote: > > > > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote: > > > > > > > Without asking you to give a lecture on what is effectively in > > > > the > > > > source, can you give a 30,000 ft. view on your intent between the > > > > "supers" and the "explodeds" (i.e. supergraph, supernode, > > > > superedge > > > > vs. exploded graph, exploded node, exploded edge). I understand > > > > that > > > > each are a derived class from your dgraph one directory level up, > > > > I > > > > mean, what was your design intent. > > > > > > Does: > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html > > > answer your question? > > > The main point of the supergraph is to combine all of the CFGs into > > > one > > > big directed graph. > > > The point of the exploded graph is to combine that with dataflow, > > > or, > > > at least, "interesting" aspects of data flow, where "interesting" > > > roughly means "differences in state-machine state". > > > IIRC, both of these are from > > > "Precise Interprocedural Dataflow Analysis via Graph Reachability" > > > (Thomas Reps, Susan Horwitz and Mooly Sagiv), > > > though I'm not using the algorithm they're using, more the high- > > > level > > > ideas of the two above paragraphs. > > > Having an exploded graph means I can generate sequences of events > > > by > > > considering paths through the exploded graph. > > > Does this clarify things? > > > (BTW, yes; I've seen your latest patch; I hope to get to it > > > tomorrow) > > > Dave > > > > > > > Sent with ProtonMail Secure Email. > > > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Ori= ginal Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90= =E2=80=90 > > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm < > > > > dmalcolm@redhat.com> wrote: > > > > > > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote: > > > > > > > > > > > Hi David, > > > > > > I am having a hard time understanding how the > > > > > > constraint_managers > > > > > > update with various types of branching. Usually I can figure > > > > > > things > > > > > > out by following it at a particular memory location (an > > > > > > svalue or > > > > > > region at a particular memory location never changes kind and > > > > > > can > > > > > > be > > > > > > read later and get the value stored there). I am being > > > > > > defeated > > > > > > by > > > > > > the constraint_manager. I will watch two equivalence classes > > > > > > and > > > > > > a > > > > > > constraint be added and then on the next pass see that there > > > > > > are > > > > > > no > > > > > > classes or constraints in that manager. Is there an obvious > > > > > > reason > > > > > > that this would happen? > > > > > > > > > > constraint_manager instances get copied a lot: typically at > > > > > each > > > > > new > > > > > state you're working with a fresh instance that got created via > > > > > the > > > > > copy constructor from the previous state. > > > > > > > > > > > FWIW, I was expecting the way it worked to be that when you > > > > > > reach > > > > > > a > > > > > > branch at say, an if or a for loop condition, two managers > > > > > > are > > > > > > created, one for the condition and one for the negation of > > > > > > the > > > > > > condition. Is this be how it works? > > > > > > > > > > Roughly. The constraint_manager instances are "owned" by > > > > > region_model > > > > > instances, themselves within program_state instances. > > > > > At a branch, within exploded_graph::process_node, for > > > > > PK_AFTER_SUPERNODE it iterates through successor supernodes, > > > > > creating a > > > > > (next_point, next_state) pair for each outedge, and calls > > > > > exploded_node::on_edge on them. > > > > > This in turn calls program_state::on_edge, which applies the > > > > > edge > > > > > flags > > > > > to the program state, and, in particular, the > > > > > constraint_manager; > > > > > either adding constraints, or rejecting the edge as infeasible. > > > > > At > > > > > an > > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other > > > > > EDGE_FLAG_FALSE. > > > > > The output of the supergraph .dot dump shows the edge flags. > > > > > Hope that makes sense. > > > > > (one drawback of this approach is we can't bifurcate state > > > > > within a > > > > > node, only at edges. Ideally I'd like to eventually fix that, > > > > > but > > > > > it > > > > > would be a big reworking) > > > > > > > > > > > Thanks, > > > > > > Brian > > > > > > Sent with ProtonMail Secure Email.