It's been my plan since finally wrapping my head around Bodik's thesis 
to revamp how we handle jump threading to use some of the principles 
from his thesis.  In particular, the back substitution and 
simplification model feels like the right long term direction.

Sebastian's FSM threader was the first step on that path (gcc-5). 
Exploiting that threader for more than just FSM loops was the next big 
step (gcc-6).

This patch takes the next step -- dis-entangling that new jump threading 
code from the old threading code and VRP/DOM.

The key thing to realize here is that the backwards (FSM) jump threader 
does not inherently need the DOM tables nor the ASSERT_EXPRs from VRP to 
do its job.  ie, it can and should run completely independently of 
DOM/VRP (though one day it may exploit range information that a prior 
VRP pass has computed).

By moving the backwards threader into its own pass, we can run it prior 
to DOM/VRP, which allow DOM/VRP to work on a simpler CFG with larger 
extended basic blocks.

The removal of unexecutable paths before VRP also has the nice effect of 
also eliminating false positive warnings for some work Aldy is doing 
around out-of-bound array index warnings.

We can remove all the calls to the backwards threader from the old style 
threader.  The way the FSM bits wired into the old threader caused 
redundant path evaluations.  That can be easily fixed with the FSM bits 
in their own pass.  The net is a 25% reduction in paths examined by the 
FSM threader.

Finally, we ultimately end up threading more jumps.  I don't have the #s 
handy anymore, but when I ran this through my tester there was a clear 
decrease in the number of runtime jumps.

So what are the downsides.

With the threader in its own pass, we end up getting more calls into the 
CFG & SSA verification routines in a checking-enabled compiler.  So the 
compile-time improvement is lost for a checking-enabled compiler.

The backward threader does not combine identical jump threading paths 
with different starting edges into a single jump threading path with 
multiple entry points.  This is primarily a codesize issue, but can have 
a secondary effect on performance.  I know how to fix this and it's on 
the list for gcc-7 along with further cleanups.


Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk momentarily.

Jeff