From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <brian.sobulefsky@protonmail.com>
Received: from mail-40130.protonmail.ch (mail-40130.protonmail.ch
 [185.70.40.130])
 by sourceware.org (Postfix) with ESMTPS id EA5EF385041C
 for <gcc@gcc.gnu.org>; Sun, 21 Feb 2021 05:27:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EA5EF385041C
Date: Sun, 21 Feb 2021 05:27:07 +0000
To: David Malcolm <dmalcolm@redhat.com>
From: "brian.sobulefsky" <brian.sobulefsky@protonmail.com>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Reply-To: "brian.sobulefsky" <brian.sobulefsky@protonmail.com>
Subject: Re: Constraints and branching in -fanalyzer
Message-ID: <aloOy3uJYIRYnz59OiMgu3gYW5E5WiWv9yXzNp28hi_yJ9cXWHmGXEBUOn6AZQa7VxfPeTDVOENFYEWT5ornHMSvIKGnERDZLwejieymGdg=@protonmail.com>
In-Reply-To: <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com>
References: <jucWR-YKKzti_-_bnQ6Os96YcUMJk35eNiKqoy_ioDDO1SAOvoQEDxxfoDFMBMwvmwbliD-nAGT1iKZUfhAcK2eRtec9hj0Pm4dQoQ5cZXI=@protonmail.com>
 <7be0053794b8b06245f011bfb8fa8c7185a653c1.camel@redhat.com>
 <QeY37NrkPfWTosbvrowGKHcmhj0NO0n8q73Bl4kxlgbxupdObJBzKifFFPWbWNxn2QUT3xHFFuQxdhM31g3_nX8Vt01T4BUuV-YJvv7ROTY=@protonmail.com>
 <e88bb83ccdf26d0220054d3c40236458812a4739.camel@redhat.com>
 <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com>
 <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM,
 KAM_SHORT, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Feb 2021 05:27:24 -0000

To be clear, I only solved the lesser problem

if(idx-- > 0)
  __analyzer_eval(idx >=3D 0);

which is a stepping stone problem. I correctly surmised that this was faili=
ng
(with the prefix operator and -=3D operator working as expected) because th=
e
condition that is constrainted in the postfix problem is the old value for =
idx
while the condition being evaluated is the new value. I can send you a patc=
h,
but the short version is the initial value of idx is constrained, then a bi=
nop_svalue
is stored and eventually ends up in __analyzer_eval. Adding a case in
constraint_manager::eval_condition to take apart binop svalues and recur
the way you are imagining would be necessary is basically all that is neede=
d
to solve that one. Currently, the constraint_manager is just looking at
that binop_svalue and determining it does not know any rules for it, becaus=
e
the rule it knows about is actually for one of its arguments.

I was hoping this would be it for the original loop problem, but like I sai=
d,
it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
found out happens for the other operators in a for loop as well. The first
true is my binop_svalue handler, but the second UNKNOWN is the merging of
the bottom of the loop back with the entry point.

Since that is fairly abstract, when I found the case I told you about,
I decided to see if I could fix it, because merging >0 with =3D0 into >=3D0
for a linear CFG should not be too hard.


Sent with ProtonMail Secure Email.

=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me=
ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90
On Saturday, February 20, 2021 12:42 PM, David Malcolm <dmalcolm@redhat.com=
> wrote:

> [Moving this discussion from offlist to the GCC mailing list (with
> permission) and tweaking the subject]
>
> On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote:
>
> > Yeah, its a lot to take in. For the last one, it was just about
> > storing and retrieving data and I ignored everything else about the
> > analyzer, and that was hard enough.
>
> Well done on making it this far; I'm impressed that you're diving into
> some of the more difficult aspects of this code, and seem to be coping.
>
> > I am working on PR94362, which originates from a false positive found
> > compiling openssl. It effectivly amounted to not knowing that idx >=3D
> > 0 within the loop for(; idx-- >0 ;).
> > It turns out there are two problems here. One has to do with the
> > postfix operator, and yes, the analyzer currently does not know that
> > i >=3D 0 within an if block like if(idx-- > 0). That problem was easy
> > and I got it to work within a few days with a relatively simple
> > patch. I thought I was going to be submitting again.
> > The second part is hard. It has to do with state merging and has
> > nothing to do with the postfix operator. It fails for all sorts of
> > operators when looping. In fact, the following fails:
> > if(idx < 0)
> > =C2=A0 idx =3D 0;
> > __analyzer_eval(idx >=3D 0);
> > which is devastating if you are hoping the analyzer can "understand"
> > a loop. Even with my first fix (which gives one TRUE when run on a
> > for loop), there is the second "iterated" pass, which ends up with a
> > widening_svalue (I'm still trying to wrap my head around that one
> > too), that gives an UNKNOWN
>
> FWIW "widening" in this context is taken from abstract interpretation;
> see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
> jump ahead of an infinitely descending chain of values to instead go
> directly to a fixed point in a (small) finite number of steps. (I've
> not attempted the narrowing approach, which refines it further to get a
> tighter approximation).
>
> > So I am trying to follow how states are merged, but that means I need
> > to at least sort of understand the graphs. I do know that the actual
> > merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> > node->on_edge, which eventually gets you to maybe_update_for_edge and
> > the for_each_fact iterator.
>
> I have spent far too many hours poring over graph dumps from the
> analyzer, and yes, grokking the state merging is painful, and I'm sure
> there are many bugs.
>
> Are you familiar with the various dump formats for the graph? In
> particular the .dot ones? FWIW I use xdot.py for viewing them:
> https://github.com/jrfonseca/xdot.py
> (and indeed am the maintainer of the Fedora package for it); it has a
> relatively quick and scalable UI for navigating graphs, but at some
> point even it can't cope.
> I started writing a dedicated visualizer that uses some of xdot.py's
> classes:
> https://github.com/davidmalcolm/gcc-analyzer-viewer
> but it's early days for that.
>
> > I watched a merge in the debugger yesterday for the if example above
> > and watched the unknown_svalues be made for the merged state, but it
> > was still too much to take in all at once for me to know where the
> > solution is.
>
> One other nasty problem with the state merging code is that any time I
> touch it, there are knock-on effects where other things break (e.g.
> loop analysis stops converging), and as I fix those, yet more things
> break, which is demoralizing (3 steps forward, 2 steps back).
>
> Finding ways to break problems down into smaller chunks seems to be the
> key here.
>
> It sounds like you're making progress with the:
>
> if (idx < 0)
> idx =3D 0;
> __analyzer_eval (idx >=3D 0);
>
> case. Does your fix at least work outside of a loop, without
> regressing things? (Or, if it does, what regresses?) If so, then it
> could be turned into a minimal testcase and we could at least fix that.
>
> FWIW I've experimented with better ways to handle loops in the
> analyzer. One approach is that GCC already has its own loop analysis
> framework. At the point where -fanalyzer runs, the IR has captured the
> nesting structure of loops in the code, so we might want to make use of
> that in our heuristics. Sadly, as far as I can tell, any attempts to
> go beyond that and reuse GCC's scalar-value-evolution code (for
> handling loop bounds and iterations) require us to enable modification
> of the CFGs, which is a no-no for -fanalyzer.
>
> (Loop handling is one of the most important and difficult issues within
> the analyzer implementation. That said, in the last few days I've been
> ignoring it, and have been focusing instead on a rewrite of how we find
> the shortest feasibile path for each diagnostic, since there's another
> cluster of analyzer bugs relating to false negatives in that; I hope to
> land a big fix for feasibilty handling next week - and to finish
> reviewing your existing patch (sorry!))
>
> Hope this is helpful. I'm quoting the rest of our exchange below (for
> the mailing list)
>
> Dave
>
> > Sent with ProtonMail Secure Email.
> > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Origina=
l Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=
=90
> > On Friday, February 19, 2021 5:48 PM, David Malcolm <
> > dmalcolm@redhat.com> wrote:
> >
> > > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote:
> > >
> > > > Without asking you to give a lecture on what is effectively in
> > > > the
> > > > source, can you give a 30,000 ft. view on your intent between the
> > > > "supers" and the "explodeds" (i.e. supergraph, supernode,
> > > > superedge
> > > > vs. exploded graph, exploded node, exploded edge). I understand
> > > > that
> > > > each are a derived class from your dgraph one directory level up,
> > > > I
> > > > mean, what was your design intent.
> > >
> > > Does:
> > > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html
> > > answer your question?
> > > The main point of the supergraph is to combine all of the CFGs into
> > > one
> > > big directed graph.
> > > The point of the exploded graph is to combine that with dataflow,
> > > or,
> > > at least, "interesting" aspects of data flow, where "interesting"
> > > roughly means "differences in state-machine state".
> > > IIRC, both of these are from
> > > "Precise Interprocedural Dataflow Analysis via Graph Reachability"
> > > (Thomas Reps, Susan Horwitz and Mooly Sagiv),
> > > though I'm not using the algorithm they're using, more the high-
> > > level
> > > ideas of the two above paragraphs.
> > > Having an exploded graph means I can generate sequences of events
> > > by
> > > considering paths through the exploded graph.
> > > Does this clarify things?
> > > (BTW, yes; I've seen your latest patch; I hope to get to it
> > > tomorrow)
> > > Dave
> > >
> > > > Sent with ProtonMail Secure Email.
> > > > =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Ori=
ginal Message =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=
=E2=80=90
> > > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm <
> > > > dmalcolm@redhat.com> wrote:
> > > >
> > > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote:
> > > > >
> > > > > > Hi David,
> > > > > > I am having a hard time understanding how the
> > > > > > constraint_managers
> > > > > > update with various types of branching. Usually I can figure
> > > > > > things
> > > > > > out by following it at a particular memory location (an
> > > > > > svalue or
> > > > > > region at a particular memory location never changes kind and
> > > > > > can
> > > > > > be
> > > > > > read later and get the value stored there). I am being
> > > > > > defeated
> > > > > > by
> > > > > > the constraint_manager. I will watch two equivalence classes
> > > > > > and
> > > > > > a
> > > > > > constraint be added and then on the next pass see that there
> > > > > > are
> > > > > > no
> > > > > > classes or constraints in that manager. Is there an obvious
> > > > > > reason
> > > > > > that this would happen?
> > > > >
> > > > > constraint_manager instances get copied a lot: typically at
> > > > > each
> > > > > new
> > > > > state you're working with a fresh instance that got created via
> > > > > the
> > > > > copy constructor from the previous state.
> > > > >
> > > > > > FWIW, I was expecting the way it worked to be that when you
> > > > > > reach
> > > > > > a
> > > > > > branch at say, an if or a for loop condition, two managers
> > > > > > are
> > > > > > created, one for the condition and one for the negation of
> > > > > > the
> > > > > > condition. Is this be how it works?
> > > > >
> > > > > Roughly. The constraint_manager instances are "owned" by
> > > > > region_model
> > > > > instances, themselves within program_state instances.
> > > > > At a branch, within exploded_graph::process_node, for
> > > > > PK_AFTER_SUPERNODE it iterates through successor supernodes,
> > > > > creating a
> > > > > (next_point, next_state) pair for each outedge, and calls
> > > > > exploded_node::on_edge on them.
> > > > > This in turn calls program_state::on_edge, which applies the
> > > > > edge
> > > > > flags
> > > > > to the program state, and, in particular, the
> > > > > constraint_manager;
> > > > > either adding constraints, or rejecting the edge as infeasible.
> > > > > At
> > > > > an
> > > > > if statement, one edge will have EDGE_FLAG_TRUE, the other
> > > > > EDGE_FLAG_FALSE.
> > > > > The output of the supergraph .dot dump shows the edge flags.
> > > > > Hope that makes sense.
> > > > > (one drawback of this approach is we can't bifurcate state
> > > > > within a
> > > > > node, only at edges. Ideally I'd like to eventually fix that,
> > > > > but
> > > > > it
> > > > > would be a big reworking)
> > > > >
> > > > > > Thanks,
> > > > > > Brian
> > > > > > Sent with ProtonMail Secure Email.