From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 70FE13850427 for ; Tue, 23 Feb 2021 01:57:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 70FE13850427 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-175-kK9g4u2GPEKTRt1CctRy8Q-1; Mon, 22 Feb 2021 20:57:34 -0500 X-MC-Unique: kK9g4u2GPEKTRt1CctRy8Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E78FB80196C; Tue, 23 Feb 2021 01:57:32 +0000 (UTC) Received: from t14s.localdomain (ovpn-114-98.phx2.redhat.com [10.3.114.98]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64BA419C78; Tue, 23 Feb 2021 01:57:32 +0000 (UTC) Message-ID: <584b3ab9a1d7c9d09d78131b3b3ccfe972c3cc27.camel@redhat.com> Subject: Re: Constraints and branching in -fanalyzer From: David Malcolm To: "brian.sobulefsky" Cc: "gcc@gcc.gnu.org" Date: Mon, 22 Feb 2021 20:57:31 -0500 In-Reply-To: References: <7be0053794b8b06245f011bfb8fa8c7185a653c1.camel@redhat.com> <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com> <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com> User-Agent: Evolution 3.38.3 (3.38.3-1.fc33) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2021 01:57:40 -0000 On Sun, 2021-02-21 at 05:27 +0000, brian.sobulefsky wrote: > To be clear, I only solved the lesser problem > > if(idx-- > 0) >   __analyzer_eval(idx >= 0); > > which is a stepping stone problem. I correctly surmised that this was > failing > (with the prefix operator and -= operator working as expected) > because the > condition that is constrainted in the postfix problem is the old > value for idx > while the condition being evaluated is the new value. I can send you > a patch, > but the short version is the initial value of idx is constrained, > then a binop_svalue > is stored and eventually ends up in __analyzer_eval. Adding a case in > constraint_manager::eval_condition to take apart binop svalues and > recur > the way you are imagining would be necessary is basically all that is > needed > to solve that one. Currently, the constraint_manager is just looking > at > that binop_svalue and determining it does not know any rules for it, > because > the rule it knows about is actually for one of its arguments. > > I was hoping this would be it for the original loop problem, but like > I said, > it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I > found out happens for the other operators in a for loop as well. The > first > true is my binop_svalue handler, but the second UNKNOWN is the > merging of > the bottom of the loop back with the entry point. > > Since that is fairly abstract, when I found the case I told you > about, > I decided to see if I could fix it, because merging >0 with =0 into > >=0 > for a linear CFG should not be too hard. I think it's probably best if you post the patch that you have so far (which as I understand it fixes the non-looping case), since it sounds like a useful base to work from. Thanks Dave > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > On Saturday, February 20, 2021 12:42 PM, David Malcolm < > dmalcolm@redhat.com> wrote: > > > [Moving this discussion from offlist to the GCC mailing list (with > > permission) and tweaking the subject] > > > > On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote: > > > > > Yeah, its a lot to take in. For the last one, it was just about > > > storing and retrieving data and I ignored everything else about > > > the > > > analyzer, and that was hard enough. > > > > Well done on making it this far; I'm impressed that you're diving > > into > > some of the more difficult aspects of this code, and seem to be > > coping. > > > > > I am working on PR94362, which originates from a false positive > > > found > > > compiling openssl. It effectivly amounted to not knowing that idx > > > >= > > > 0 within the loop for(; idx-- >0 ;). > > > It turns out there are two problems here. One has to do with the > > > postfix operator, and yes, the analyzer currently does not know > > > that > > > i >= 0 within an if block like if(idx-- > 0). That problem was > > > easy > > > and I got it to work within a few days with a relatively simple > > > patch. I thought I was going to be submitting again. > > > The second part is hard. It has to do with state merging and has > > > nothing to do with the postfix operator. It fails for all sorts > > > of > > > operators when looping. In fact, the following fails: > > > if(idx < 0) > > >   idx = 0; > > > __analyzer_eval(idx >= 0); > > > which is devastating if you are hoping the analyzer can > > > "understand" > > > a loop. Even with my first fix (which gives one TRUE when run on > > > a > > > for loop), there is the second "iterated" pass, which ends up > > > with a > > > widening_svalue (I'm still trying to wrap my head around that one > > > too), that gives an UNKNOWN > > > > FWIW "widening" in this context is taken from abstract > > interpretation; > > see e.g. the early papers by Patrick and Radhia Cousot; the idea is > > to > > jump ahead of an infinitely descending chain of values to instead > > go > > directly to a fixed point in a (small) finite number of steps. > > (I've > > not attempted the narrowing approach, which refines it further to > > get a > > tighter approximation). > > > > > So I am trying to follow how states are merged, but that means I > > > need > > > to at least sort of understand the graphs. I do know that the > > > actual > > > merging follows in the PK_AFTER_SUPERNODE branch, with the call > > > to > > > node->on_edge, which eventually gets you to maybe_update_for_edge > > > and > > > the for_each_fact iterator. > > > > I have spent far too many hours poring over graph dumps from the > > analyzer, and yes, grokking the state merging is painful, and I'm > > sure > > there are many bugs. > > > > Are you familiar with the various dump formats for the graph? In > > particular the .dot ones? FWIW I use xdot.py for viewing them: > > https://github.com/jrfonseca/xdot.py > > (and indeed am the maintainer of the Fedora package for it); it has > > a > > relatively quick and scalable UI for navigating graphs, but at some > > point even it can't cope. > > I started writing a dedicated visualizer that uses some of > > xdot.py's > > classes: > > https://github.com/davidmalcolm/gcc-analyzer-viewer > > but it's early days for that. > > > > > I watched a merge in the debugger yesterday for the if example > > > above > > > and watched the unknown_svalues be made for the merged state, but > > > it > > > was still too much to take in all at once for me to know where > > > the > > > solution is. > > > > One other nasty problem with the state merging code is that any > > time I > > touch it, there are knock-on effects where other things break (e.g. > > loop analysis stops converging), and as I fix those, yet more > > things > > break, which is demoralizing (3 steps forward, 2 steps back). > > > > Finding ways to break problems down into smaller chunks seems to be > > the > > key here. > > > > It sounds like you're making progress with the: > > > > if (idx < 0) > > idx = 0; > > __analyzer_eval (idx >= 0); > > > > case. Does your fix at least work outside of a loop, without > > regressing things? (Or, if it does, what regresses?) If so, then it > > could be turned into a minimal testcase and we could at least fix > > that. > > > > FWIW I've experimented with better ways to handle loops in the > > analyzer. One approach is that GCC already has its own loop > > analysis > > framework. At the point where -fanalyzer runs, the IR has captured > > the > > nesting structure of loops in the code, so we might want to make > > use of > > that in our heuristics. Sadly, as far as I can tell, any attempts > > to > > go beyond that and reuse GCC's scalar-value-evolution code (for > > handling loop bounds and iterations) require us to enable > > modification > > of the CFGs, which is a no-no for -fanalyzer. > > > > (Loop handling is one of the most important and difficult issues > > within > > the analyzer implementation. That said, in the last few days I've > > been > > ignoring it, and have been focusing instead on a rewrite of how we > > find > > the shortest feasibile path for each diagnostic, since there's > > another > > cluster of analyzer bugs relating to false negatives in that; I > > hope to > > land a big fix for feasibilty handling next week - and to finish > > reviewing your existing patch (sorry!)) > > > > Hope this is helpful. I'm quoting the rest of our exchange below > > (for > > the mailing list) > > > > Dave > > > > > Sent with ProtonMail Secure Email. > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > On Friday, February 19, 2021 5:48 PM, David Malcolm < > > > dmalcolm@redhat.com> wrote: > > > > > > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote: > > > > > > > > > Without asking you to give a lecture on what is effectively > > > > > in > > > > > the > > > > > source, can you give a 30,000 ft. view on your intent between > > > > > the > > > > > "supers" and the "explodeds" (i.e. supergraph, supernode, > > > > > superedge > > > > > vs. exploded graph, exploded node, exploded edge). I > > > > > understand > > > > > that > > > > > each are a derived class from your dgraph one directory level > > > > > up, > > > > > I > > > > > mean, what was your design intent. > > > > > > > > Does: > > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html > > > > answer your question? > > > > The main point of the supergraph is to combine all of the CFGs > > > > into > > > > one > > > > big directed graph. > > > > The point of the exploded graph is to combine that with > > > > dataflow, > > > > or, > > > > at least, "interesting" aspects of data flow, where > > > > "interesting" > > > > roughly means "differences in state-machine state". > > > > IIRC, both of these are from > > > > "Precise Interprocedural Dataflow Analysis via Graph > > > > Reachability" > > > > (Thomas Reps, Susan Horwitz and Mooly Sagiv), > > > > though I'm not using the algorithm they're using, more the > > > > high- > > > > level > > > > ideas of the two above paragraphs. > > > > Having an exploded graph means I can generate sequences of > > > > events > > > > by > > > > considering paths through the exploded graph. > > > > Does this clarify things? > > > > (BTW, yes; I've seen your latest patch; I hope to get to it > > > > tomorrow) > > > > Dave > > > > > > > > > Sent with ProtonMail Secure Email. > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm < > > > > > dmalcolm@redhat.com> wrote: > > > > > > > > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote: > > > > > > > > > > > > > Hi David, > > > > > > > I am having a hard time understanding how the > > > > > > > constraint_managers > > > > > > > update with various types of branching. Usually I can > > > > > > > figure > > > > > > > things > > > > > > > out by following it at a particular memory location (an > > > > > > > svalue or > > > > > > > region at a particular memory location never changes kind > > > > > > > and > > > > > > > can > > > > > > > be > > > > > > > read later and get the value stored there). I am being > > > > > > > defeated > > > > > > > by > > > > > > > the constraint_manager. I will watch two equivalence > > > > > > > classes > > > > > > > and > > > > > > > a > > > > > > > constraint be added and then on the next pass see that > > > > > > > there > > > > > > > are > > > > > > > no > > > > > > > classes or constraints in that manager. Is there an > > > > > > > obvious > > > > > > > reason > > > > > > > that this would happen? > > > > > > > > > > > > constraint_manager instances get copied a lot: typically at > > > > > > each > > > > > > new > > > > > > state you're working with a fresh instance that got created > > > > > > via > > > > > > the > > > > > > copy constructor from the previous state. > > > > > > > > > > > > > FWIW, I was expecting the way it worked to be that when > > > > > > > you > > > > > > > reach > > > > > > > a > > > > > > > branch at say, an if or a for loop condition, two > > > > > > > managers > > > > > > > are > > > > > > > created, one for the condition and one for the negation > > > > > > > of > > > > > > > the > > > > > > > condition. Is this be how it works? > > > > > > > > > > > > Roughly. The constraint_manager instances are "owned" by > > > > > > region_model > > > > > > instances, themselves within program_state instances. > > > > > > At a branch, within exploded_graph::process_node, for > > > > > > PK_AFTER_SUPERNODE it iterates through successor > > > > > > supernodes, > > > > > > creating a > > > > > > (next_point, next_state) pair for each outedge, and calls > > > > > > exploded_node::on_edge on them. > > > > > > This in turn calls program_state::on_edge, which applies > > > > > > the > > > > > > edge > > > > > > flags > > > > > > to the program state, and, in particular, the > > > > > > constraint_manager; > > > > > > either adding constraints, or rejecting the edge as > > > > > > infeasible. > > > > > > At > > > > > > an > > > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other > > > > > > EDGE_FLAG_FALSE. > > > > > > The output of the supergraph .dot dump shows the edge > > > > > > flags. > > > > > > Hope that makes sense. > > > > > > (one drawback of this approach is we can't bifurcate state > > > > > > within a > > > > > > node, only at edges. Ideally I'd like to eventually fix > > > > > > that, > > > > > > but > > > > > > it > > > > > > would be a big reworking) > > > > > > > > > > > > > Thanks, > > > > > > > Brian > > > > > > > Sent with ProtonMail Secure Email. > >