Constraints and branching in -fanalyzer

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: David Malcolm <dmalcolm@redhat.com>
To: "brian.sobulefsky" <brian.sobulefsky@protonmail.com>, gcc@gcc.gnu.org
Subject: Constraints and branching in -fanalyzer
Date: Sat, 20 Feb 2021 15:42:07 -0500	[thread overview]
Message-ID: <5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com> (raw)
In-Reply-To: <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com>

[Moving this discussion from offlist to the GCC mailing list (with
permission) and tweaking the subject]

On Sat, 2021-02-20 at 02:57 +0000, brian.sobulefsky wrote:
> Yeah, its a lot to take in. For the last one, it was just about
> storing and retrieving data and I ignored everything else about the
> analyzer, and that was hard enough.

Well done on making it this far; I'm impressed that you're diving into
some of the more difficult aspects of this code, and seem to be coping.

> I am working on PR94362, which originates from a false positive found
> compiling openssl. It effectivly amounted to not knowing that idx >=
> 0 within the loop for(; idx-- >0 ;).
> 
> It turns out there are two problems here. One has to do with the
> postfix operator, and yes, the analyzer currently does not know that
> i >= 0 within an if block like if(idx-- > 0). That problem was easy
> and I got it to work within a few days with a relatively simple
> patch. I thought I was going to be submitting again.
> 
> The second part is hard. It has to do with state merging and has
> nothing to do with the postfix operator. It fails for all sorts of
> operators when looping. In fact, the following fails:
> 
> if(idx < 0)
>   idx = 0;
> __analyzer_eval(idx >= 0);
> 
> which is devastating if you are hoping the analyzer can "understand"
> a loop. Even with my first fix (which gives one TRUE when run on a
> for loop), there is the second "iterated" pass, which ends up with a
> widening_svalue (I'm still trying to wrap my head around that one
> too), that gives an UNKNOWN

FWIW "widening" in this context is taken from abstract interpretation;
see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
jump ahead of an infinitely descending chain of values to instead go
directly to a fixed point in a (small) finite number of steps.  (I've
not attempted the narrowing approach, which refines it further to get a
tighter approximation).


> So I am trying to follow how states are merged, but that means I need
> to at least sort of understand the graphs. I do know that the actual
> merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> node->on_edge, which eventually gets you to maybe_update_for_edge and
> the for_each_fact iterator.

I have spent far too many hours poring over graph dumps from the
analyzer, and yes, grokking the state merging is painful, and I'm sure
there are many bugs.

Are you familiar with the various dump formats for the graph?  In
particular the .dot ones?  FWIW I use xdot.py for viewing them:
  https://github.com/jrfonseca/xdot.py
(and indeed am the maintainer of the Fedora package for it); it has a
relatively quick and scalable UI for navigating graphs, but at some
point even it can't cope.
I started writing a dedicated visualizer that uses some of xdot.py's
classes:
  https://github.com/davidmalcolm/gcc-analyzer-viewer
but it's early days for that.



> I watched a merge in the debugger yesterday for the if example above
> and watched the unknown_svalues be made for the merged state, but it
> was still too much to take in all at once for me to know where the
> solution is.

One other nasty problem with the state merging code is that any time I
touch it, there are knock-on effects where other things break (e.g.
loop analysis stops converging), and as I fix those, yet more things
break, which is demoralizing (3 steps forward, 2 steps back).

Finding ways to break problems down into smaller chunks seems to be the
key here.

It sounds like you're making progress with the:

  if (idx < 0)
     idx = 0;
  __analyzer_eval (idx >= 0);

case.  Does your fix at least work outside of a loop, without
regressing things? (Or, if it does, what regresses?)  If so, then it
could be turned into a minimal testcase and we could at least fix that.


FWIW I've experimented with better ways to handle loops in the
analyzer.  One approach is that GCC already has its own loop analysis
framework.  At the point where -fanalyzer runs, the IR has captured the
nesting structure of loops in the code, so we might want to make use of
that in our heuristics.  Sadly, as far as I can tell, any attempts to
go beyond that and reuse GCC's scalar-value-evolution code (for
handling loop bounds and iterations) require us to enable modification
of the CFGs, which is a no-no for -fanalyzer.

(Loop handling is one of the most important and difficult issues within
the analyzer implementation.  That said, in the last few days I've been
ignoring it, and have been focusing instead on a rewrite of how we find
the shortest feasibile path for each diagnostic, since there's another
cluster of analyzer bugs relating to false negatives in that; I hope to
land a big fix for feasibilty handling next week - and to finish
reviewing your existing patch (sorry!))

Hope this is helpful.  I'm quoting the rest of our exchange below (for
the mailing list)

Dave


> 
> Sent with ProtonMail Secure Email.
> 
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Friday, February 19, 2021 5:48 PM, David Malcolm <   
> dmalcolm@redhat.com> wrote:
> 
> > On Sat, 2021-02-20 at 00:40 +0000, brian.sobulefsky wrote:
> > 
> > > Without asking you to give a lecture on what is effectively in
> > > the
> > > source, can you give a 30,000 ft. view on your intent between the
> > > "supers" and the "explodeds" (i.e. supergraph, supernode,
> > > superedge
> > > vs. exploded graph, exploded node, exploded edge). I understand
> > > that
> > > each are a derived class from your dgraph one directory level up,
> > > I
> > > mean, what was your design intent.
> > 
> > Does:
> > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html
> > answer your question?
> > 
> > The main point of the supergraph is to combine all of the CFGs into
> > one
> > big directed graph.
> > 
> > The point of the exploded graph is to combine that with dataflow,
> > or,
> > at least, "interesting" aspects of data flow, where "interesting"
> > roughly means "differences in state-machine state".
> > 
> > IIRC, both of these are from
> > "Precise Interprocedural Dataflow Analysis via Graph Reachability"
> > (Thomas Reps, Susan Horwitz and Mooly Sagiv),
> > though I'm not using the algorithm they're using, more the high-
> > level
> > ideas of the two above paragraphs.
> > 
> > Having an exploded graph means I can generate sequences of events
> > by
> > considering paths through the exploded graph.
> > 
> > Does this clarify things?
> > 
> > (BTW, yes; I've seen your latest patch; I hope to get to it
> > tomorrow)
> > 
> > Dave
> > 
> > > Sent with ProtonMail Secure Email.
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > On Tuesday, February 16, 2021 3:10 PM, David Malcolm <
> > > dmalcolm@redhat.com> wrote:
> > > 
> > > > On Tue, 2021-02-16 at 22:15 +0000, brian.sobulefsky wrote:
> > > > 
> > > > > Hi David,
> > > > > I am having a hard time understanding how the
> > > > > constraint_managers
> > > > > update with various types of branching. Usually I can figure
> > > > > things
> > > > > out by following it at a particular memory location (an
> > > > > svalue or
> > > > > region at a particular memory location never changes kind and
> > > > > can
> > > > > be
> > > > > read later and get the value stored there). I am being
> > > > > defeated
> > > > > by
> > > > > the constraint_manager. I will watch two equivalence classes
> > > > > and
> > > > > a
> > > > > constraint be added and then on the next pass see that there
> > > > > are
> > > > > no
> > > > > classes or constraints in that manager. Is there an obvious
> > > > > reason
> > > > > that this would happen?
> > > > 
> > > > constraint_manager instances get copied a lot: typically at
> > > > each
> > > > new
> > > > state you're working with a fresh instance that got created via
> > > > the
> > > > copy constructor from the previous state.
> > > > 
> > > > > FWIW, I was expecting the way it worked to be that when you
> > > > > reach
> > > > > a
> > > > > branch at say, an if or a for loop condition, two managers
> > > > > are
> > > > > created, one for the condition and one for the negation of
> > > > > the
> > > > > condition. Is this be how it works?
> > > > 
> > > > Roughly. The constraint_manager instances are "owned" by
> > > > region_model
> > > > instances, themselves within program_state instances.
> > > > At a branch, within exploded_graph::process_node, for
> > > > PK_AFTER_SUPERNODE it iterates through successor supernodes,
> > > > creating a
> > > > (next_point, next_state) pair for each outedge, and calls
> > > > exploded_node::on_edge on them.
> > > > This in turn calls program_state::on_edge, which applies the
> > > > edge
> > > > flags
> > > > to the program state, and, in particular, the
> > > > constraint_manager;
> > > > either adding constraints, or rejecting the edge as infeasible.
> > > > At
> > > > an
> > > > if statement, one edge will have EDGE_FLAG_TRUE, the other
> > > > EDGE_FLAG_FALSE.
> > > > The output of the supergraph .dot dump shows the edge flags.
> > > > Hope that makes sense.
> > > > (one drawback of this approach is we can't bifurcate state
> > > > within a
> > > > node, only at edges. Ideally I'd like to eventually fix that,
> > > > but
> > > > it
> > > > would be a big reworking)
> > > > 
> > > > > Thanks,
> > > > > Brian
> > > > > Sent with ProtonMail Secure Email.
> 
>

next      parent reply	other threads:[~2021-02-20 20:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <jucWR-YKKzti_-_bnQ6Os96YcUMJk35eNiKqoy_ioDDO1SAOvoQEDxxfoDFMBMwvmwbliD-nAGT1iKZUfhAcK2eRtec9hj0Pm4dQoQ5cZXI=@protonmail.com>
     [not found] ` <7be0053794b8b06245f011bfb8fa8c7185a653c1.camel@redhat.com>
     [not found]   ` <QeY37NrkPfWTosbvrowGKHcmhj0NO0n8q73Bl4kxlgbxupdObJBzKifFFPWbWNxn2QUT3xHFFuQxdhM31g3_nX8Vt01T4BUuV-YJvv7ROTY=@protonmail.com>
     [not found]     ` <e88bb83ccdf26d0220054d3c40236458812a4739.camel@redhat.com>
     [not found]       ` <2kjoI-lyNji8HmMtXXtDIAN8A6fOHZzTSVfFanfvT4AVnqDwk5ULV5LMA349hKnWdNdMJzR-Z6xBXveZp2P8bVukhcLYh_Ejuo6IRwCQPTQ=@protonmail.com>
2021-02-20 20:42         ` David Malcolm [this message]
2021-02-21  5:27           ` brian.sobulefsky
2021-02-23  1:57             ` David Malcolm
2021-02-23 17:26               ` brian.sobulefsky
2021-02-26  4:23                 ` brian.sobulefsky
2021-02-26 16:36                   ` David Malcolm
2021-02-26 23:59                     ` brian.sobulefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f1dd86eca9c04e437aed56ca2522405b19bf1c6.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=brian.sobulefsky@protonmail.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).