On 11/12/2015 11:32 AM, Jeff Law wrote:
> On 11/12/2015 10:05 AM, Jeff Law wrote:
>>> But IIRC you mentioned it should enable vectorization or so?  In this
>>> case
>>> that's obviously too late.
>> The opposite.  Path splitting interferes with if-conversion &
>> vectorization.  Path splitting mucks up the CFG enough that
>> if-conversion won't fire and as a result vectorization is inhibited.  It
>> also creates multi-latch loops, which isn't a great situation either.
>>
>> It *may* be the case that dropping it that far down in the pipeline and
>> making the modifications necessary to handle simple latches may in turn
>> make the path splitting code play better with if-conversion and
>> vectorization and avoid creation of multi-latch loops.  At least that's
>> how it looks on paper when I draw out the CFG manipulations.
>>
>> I'll do some experiments.
> It doesn't look too terrible to ravamp the recognition code to work
> later in the pipeline with simple latches.  Sadly that doesn't seem to
> have fixed the bad interactions with if-conversion.
>
> *But* that does open up the possibility of moving the path splitting
> pass even deeper in the pipeline -- in particular we can move it past
> the vectorizer.  Which is may be a win.
>
> So the big question is whether or not we'll still see enough benefits
> from having it so late in the pipeline.  It's still early enough that we
> get DOM, VRP, reassoc, forwprop, phiopt, etc.
>
> Ajit, I'll pass along an updated patch after doing some more testing.
So here's what I'm working with.  It runs after the vectorizer now.

Ajit, if you could benchmark this it would be greatly appreciated.  I 
know you saw significant improvements on one or more benchmarks in the 
past.  It'd be good to know that the updated placement of the pass 
doesn't invalidate the gains you saw.

With the updated pass placement, we don't have to worry about switching 
the pass on/off based on whether or not the vectorizer & if-conversion 
are enabled.  So that hackery is gone.

I think I've beefed up the test to identify the diamond patterns we want 
so that it's stricter in what we accept.  The call to ignore_bb_p is a 
part of that test so that we're actually looking at the right block in a 
world where we're doing this transformation with simple latches.

I've also put a graphical comment before perform_path_splitting which 
hopefully shows the CFG transformation we're making a bit clearer.

This bootstraps and regression tests cleanly on x86_64-linux-gnu.