From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dmalcolm@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by sourceware.org (Postfix) with ESMTP id 70FE13850427
 for <gcc@gcc.gnu.org>; Tue, 23 Feb 2021 01:57:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 70FE13850427
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-175-kK9g4u2GPEKTRt1CctRy8Q-1; Mon, 22 Feb 2021 20:57:34 -0500
X-MC-Unique: kK9g4u2GPEKTRt1CctRy8Q-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com
 [10.5.11.23])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E78FB80196C;
 Tue, 23 Feb 2021 01:57:32 +0000 (UTC)
Received: from t14s.localdomain (ovpn-114-98.phx2.redhat.com [10.3.114.98])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 64BA419C78;
 Tue, 23 Feb 2021 01:57:32 +0000 (UTC)
Message-ID: <584b3ab9a1d7c9d09d78131b3b3ccfe972c3cc27.camel@redhat.com>
Subject: Re: Constraints and branching in -fanalyzer
From: David Malcolm <dmalcolm@redhat.com>
To: "brian.sobulefsky" <brian.sobulefsky@protonmail.com>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Date: Mon, 22 Feb 2021 20:57:31 -0500
In-Reply-To: <aloOy3uJYIRYnz59OiMgu3gYW5E5WiWv9yXzNp28hi_yJ9cXWHmGXEBUOn6AZQa7VxfPeTDVOENFYEWT5ornHMSvIKGnERDZLwejieymGdg=@protonmail.com>
References: <jucWR-YKKzti_-_bnQ6Os96YcUMJk35eNiKqoy_ioDDO1SAOvoQEDxxfoDFMBMwvmwbliD-nAGT1iKZUfhAcK2eRtec9hj0Pm4dQoQ5cZXI=@protonmail.com>
 <7be0053794b8b06245f011bfb8fa8c7185a653c1.camel@redhat.com>
 <QeY37NrkPfWTosbvrowGKHcmhj0NO0n8q73Bl4kxlgbxupdObJBzKifFFPWbWNxn2QUT3xHFFuQxdhM31g3_nX8Vt01T4BUuV-YJvv7ROTY=@protonmail.com>
 <e88bb83ccdf26d0220054d3c40236458812a4739.camel@redhat.com>
 <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com>
 <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com>
 <aloOy3uJYIRYnz59OiMgu3gYW5E5WiWv9yXzNp28hi_yJ9cXWHmGXEBUOn6AZQa7VxfPeTDVOENFYEWT5ornHMSvIKGnERDZLwejieymGdg=@protonmail.com>
User-Agent: Evolution 3.38.3 (3.38.3-1.fc33)
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Feb 2021 01:57:40 -0000

On Sun, 2021-02-21 at 05:27 +0000, brian.sobulefsky wrote:
> To be clear, I only solved the lesser problem
> 
> if(idx-- > 0)
>   __analyzer_eval(idx >= 0);
> 
> which is a stepping stone problem. I correctly surmised that this was
> failing
> (with the prefix operator and -= operator working as expected)
> because the
> condition that is constrainted in the postfix problem is the old
> value for idx
> while the condition being evaluated is the new value. I can send you
> a patch,
> but the short version is the initial value of idx is constrained,
> then a binop_svalue
> is stored and eventually ends up in __analyzer_eval. Adding a case in
> constraint_manager::eval_condition to take apart binop svalues and
> recur
> the way you are imagining would be necessary is basically all that is
> needed
> to solve that one. Currently, the constraint_manager is just looking
> at
> that binop_svalue and determining it does not know any rules for it,
> because
> the rule it knows about is actually for one of its arguments.
> 
> I was hoping this would be it for the original loop problem, but like
> I said,
> it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
> found out happens for the other operators in a for loop as well. The
> first
> true is my binop_svalue handler, but the second UNKNOWN is the
> merging of
> the bottom of the loop back with the entry point.
> 
> Since that is fairly abstract, when I found the case I told you
> about,
> I decided to see if I could fix it, because merging >0 with =0 into
> >=0
> for a linear CFG should not be too hard.

I think it's probably best if you post the patch that you have so far
(which as I understand it fixes the non-looping case), since it sounds
like a useful base to work from.

Thanks
Dave

> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Saturday, February 20, 2021 12:42 PM, David Malcolm <
> dmalcolm@redhat.com> wrote:
> 
> > [Moving this discussion from offlist to the GCC mailing list (with
> > permission) and tweaking the subject]
> > 
> > On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote:
> > 
> > > Yeah, its a lot to take in. For the last one, it was just about
> > > storing and retrieving data and I ignored everything else about
> > > the
> > > analyzer, and that was hard enough.
> > 
> > Well done on making it this far; I'm impressed that you're diving
> > into
> > some of the more difficult aspects of this code, and seem to be
> > coping.
> > 
> > > I am working on PR94362, which originates from a false positive
> > > found
> > > compiling openssl. It effectivly amounted to not knowing that idx
> > > >=
> > > 0 within the loop for(; idx-- >0 ;).
> > > It turns out there are two problems here. One has to do with the
> > > postfix operator, and yes, the analyzer currently does not know
> > > that
> > > i >= 0 within an if block like if(idx-- > 0). That problem was
> > > easy
> > > and I got it to work within a few days with a relatively simple
> > > patch. I thought I was going to be submitting again.
> > > The second part is hard. It has to do with state merging and has
> > > nothing to do with the postfix operator. It fails for all sorts
> > > of
> > > operators when looping. In fact, the following fails:
> > > if(idx < 0)
> > >   idx = 0;
> > > __analyzer_eval(idx >= 0);
> > > which is devastating if you are hoping the analyzer can
> > > "understand"
> > > a loop. Even with my first fix (which gives one TRUE when run on
> > > a
> > > for loop), there is the second "iterated" pass, which ends up
> > > with a
> > > widening_svalue (I'm still trying to wrap my head around that one
> > > too), that gives an UNKNOWN
> > 
> > FWIW "widening" in this context is taken from abstract
> > interpretation;
> > see e.g. the early papers by Patrick and Radhia Cousot; the idea is
> > to
> > jump ahead of an infinitely descending chain of values to instead
> > go
> > directly to a fixed point in a (small) finite number of steps.
> > (I've
> > not attempted the narrowing approach, which refines it further to
> > get a
> > tighter approximation).
> > 
> > > So I am trying to follow how states are merged, but that means I
> > > need
> > > to at least sort of understand the graphs. I do know that the
> > > actual
> > > merging follows in the PK_AFTER_SUPERNODE branch, with the call
> > > to
> > > node->on_edge, which eventually gets you to maybe_update_for_edge
> > > and
> > > the for_each_fact iterator.
> > 
> > I have spent far too many hours poring over graph dumps from the
> > analyzer, and yes, grokking the state merging is painful, and I'm
> > sure
> > there are many bugs.
> > 
> > Are you familiar with the various dump formats for the graph? In
> > particular the .dot ones? FWIW I use xdot.py for viewing them:
> > https://github.com/jrfonseca/xdot.py
> > (and indeed am the maintainer of the Fedora package for it); it has
> > a
> > relatively quick and scalable UI for navigating graphs, but at some
> > point even it can't cope.
> > I started writing a dedicated visualizer that uses some of
> > xdot.py's
> > classes:
> > https://github.com/davidmalcolm/gcc-analyzer-viewer
> > but it's early days for that.
> > 
> > > I watched a merge in the debugger yesterday for the if example
> > > above
> > > and watched the unknown_svalues be made for the merged state, but
> > > it
> > > was still too much to take in all at once for me to know where
> > > the
> > > solution is.
> > 
> > One other nasty problem with the state merging code is that any
> > time I
> > touch it, there are knock-on effects where other things break (e.g.
> > loop analysis stops converging), and as I fix those, yet more
> > things
> > break, which is demoralizing (3 steps forward, 2 steps back).
> > 
> > Finding ways to break problems down into smaller chunks seems to be
> > the
> > key here.
> > 
> > It sounds like you're making progress with the:
> > 
> > if (idx < 0)
> > idx = 0;
> > __analyzer_eval (idx >= 0);
> > 
> > case. Does your fix at least work outside of a loop, without
> > regressing things? (Or, if it does, what regresses?) If so, then it
> > could be turned into a minimal testcase and we could at least fix
> > that.
> > 
> > FWIW I've experimented with better ways to handle loops in the
> > analyzer. One approach is that GCC already has its own loop
> > analysis
> > framework. At the point where -fanalyzer runs, the IR has captured
> > the
> > nesting structure of loops in the code, so we might want to make
> > use of
> > that in our heuristics. Sadly, as far as I can tell, any attempts
> > to
> > go beyond that and reuse GCC's scalar-value-evolution code (for
> > handling loop bounds and iterations) require us to enable
> > modification
> > of the CFGs, which is a no-no for -fanalyzer.
> > 
> > (Loop handling is one of the most important and difficult issues
> > within
> > the analyzer implementation. That said, in the last few days I've
> > been
> > ignoring it, and have been focusing instead on a rewrite of how we
> > find
> > the shortest feasibile path for each diagnostic, since there's
> > another
> > cluster of analyzer bugs relating to false negatives in that; I
> > hope to
> > land a big fix for feasibilty handling next week - and to finish
> > reviewing your existing patch (sorry!))
> > 
> > Hope this is helpful. I'm quoting the rest of our exchange below
> > (for
> > the mailing list)
> > 
> > Dave
> > 
> > > Sent with ProtonMail Secure Email.
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Friday, February 19, 2021 5:48 PM, David Malcolm <
> > > dmalcolm@redhat.com> wrote:
> > > 
> > > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote:
> > > > 
> > > > > Without asking you to give a lecture on what is effectively
> > > > > in
> > > > > the
> > > > > source, can you give a 30,000 ft. view on your intent between
> > > > > the
> > > > > "supers" and the "explodeds" (i.e. supergraph, supernode,
> > > > > superedge
> > > > > vs. exploded graph, exploded node, exploded edge). I
> > > > > understand
> > > > > that
> > > > > each are a derived class from your dgraph one directory level
> > > > > up,
> > > > > I
> > > > > mean, what was your design intent.
> > > > 
> > > > Does:
> > > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html
> > > > answer your question?
> > > > The main point of the supergraph is to combine all of the CFGs
> > > > into
> > > > one
> > > > big directed graph.
> > > > The point of the exploded graph is to combine that with
> > > > dataflow,
> > > > or,
> > > > at least, "interesting" aspects of data flow, where
> > > > "interesting"
> > > > roughly means "differences in state-machine state".
> > > > IIRC, both of these are from
> > > > "Precise Interprocedural Dataflow Analysis via Graph
> > > > Reachability"
> > > > (Thomas Reps, Susan Horwitz and Mooly Sagiv),
> > > > though I'm not using the algorithm they're using, more the
> > > > high-
> > > > level
> > > > ideas of the two above paragraphs.
> > > > Having an exploded graph means I can generate sequences of
> > > > events
> > > > by
> > > > considering paths through the exploded graph.
> > > > Does this clarify things?
> > > > (BTW, yes; I've seen your latest patch; I hope to get to it
> > > > tomorrow)
> > > > Dave
> > > > 
> > > > > Sent with ProtonMail Secure Email.
> > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm <
> > > > > dmalcolm@redhat.com> wrote:
> > > > > 
> > > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote:
> > > > > > 
> > > > > > > Hi David,
> > > > > > > I am having a hard time understanding how the
> > > > > > > constraint_managers
> > > > > > > update with various types of branching. Usually I can
> > > > > > > figure
> > > > > > > things
> > > > > > > out by following it at a particular memory location (an
> > > > > > > svalue or
> > > > > > > region at a particular memory location never changes kind
> > > > > > > and
> > > > > > > can
> > > > > > > be
> > > > > > > read later and get the value stored there). I am being
> > > > > > > defeated
> > > > > > > by
> > > > > > > the constraint_manager. I will watch two equivalence
> > > > > > > classes
> > > > > > > and
> > > > > > > a
> > > > > > > constraint be added and then on the next pass see that
> > > > > > > there
> > > > > > > are
> > > > > > > no
> > > > > > > classes or constraints in that manager. Is there an
> > > > > > > obvious
> > > > > > > reason
> > > > > > > that this would happen?
> > > > > > 
> > > > > > constraint_manager instances get copied a lot: typically at
> > > > > > each
> > > > > > new
> > > > > > state you're working with a fresh instance that got created
> > > > > > via
> > > > > > the
> > > > > > copy constructor from the previous state.
> > > > > > 
> > > > > > > FWIW, I was expecting the way it worked to be that when
> > > > > > > you
> > > > > > > reach
> > > > > > > a
> > > > > > > branch at say, an if or a for loop condition, two
> > > > > > > managers
> > > > > > > are
> > > > > > > created, one for the condition and one for the negation
> > > > > > > of
> > > > > > > the
> > > > > > > condition. Is this be how it works?
> > > > > > 
> > > > > > Roughly. The constraint_manager instances are "owned" by
> > > > > > region_model
> > > > > > instances, themselves within program_state instances.
> > > > > > At a branch, within exploded_graph::process_node, for
> > > > > > PK_AFTER_SUPERNODE it iterates through successor
> > > > > > supernodes,
> > > > > > creating a
> > > > > > (next_point, next_state) pair for each outedge, and calls
> > > > > > exploded_node::on_edge on them.
> > > > > > This in turn calls program_state::on_edge, which applies
> > > > > > the
> > > > > > edge
> > > > > > flags
> > > > > > to the program state, and, in particular, the
> > > > > > constraint_manager;
> > > > > > either adding constraints, or rejecting the edge as
> > > > > > infeasible.
> > > > > > At
> > > > > > an
> > > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other
> > > > > > EDGE_FLAG_FALSE.
> > > > > > The output of the supergraph .dot dump shows the edge
> > > > > > flags.
> > > > > > Hope that makes sense.
> > > > > > (one drawback of this approach is we can't bifurcate state
> > > > > > within a
> > > > > > node, only at edges. Ideally I'd like to eventually fix
> > > > > > that,
> > > > > > but
> > > > > > it
> > > > > > would be a big reworking)
> > > > > > 
> > > > > > > Thanks,
> > > > > > > Brian
> > > > > > > Sent with ProtonMail Secure Email.
> 
>