From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-417003-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 90809 invoked by alias); 10 Dec 2015 20:08:50 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 90082 invoked by uid 89); 10 Dec 2015 20:08:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 10 Dec 2015 20:08:48 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	by mx1.redhat.com (Postfix) with ESMTPS id B53EA67C53;	Thu, 10 Dec 2015 20:08:46 +0000 (UTC)
Received: from localhost.localdomain (ovpn-113-32.phx2.redhat.com [10.3.113.32])	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tBAK8jLs006750;	Thu, 10 Dec 2015 15:08:46 -0500
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
To: Richard Biener <richard.guenther@gmail.com>
References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541@XAP-PVEXMBX02.xlnx.xilinx.com> <CAFiYyc23ZRxFv4S=XKQUX2VgT113GDZVVjOYViSoYy4yP11cdg@mail.gmail.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB@XAP-PVEXMBX02.xlnx.xilinx.com> <55D4F921.2020708@redhat.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4297704C@XAP-PVEXMBX02.xlnx.xilinx.com> <5643A732.4040707@redhat.com> <CAFiYyc07fxKy=pxo4t81M8WVOx9PLnzmMthkbBm52wub93dErQ@mail.gmail.com> <5644C6CC.90203@redhat.com> <5644DB59.9040809@redhat.com> <56450B62.4090404@redhat.com> <CAFiYyc06=_w4+B0C1d8y0rgCmNCy7vvsFfdvCqstnh6t5zsqbA@mail.gmail.com> <56460F19.5010009@redhat.com> <0B62FFB6-DF7A-4080-A655-3E51070E1DEE@gmail.com> <564646AA.5030300@redhat.com> <564673DA.3020403@redhat.com> <CAFiYyc2WXOS5kJFeKPa0aqaWbHV-CnDeWJ98zRa5X=FbAZ2THw@mail.gmail.com>
Cc: Ajit Kumar Agarwal <ajit.kumar.agarwal@xilinx.com>,        GCC Patches <gcc-patches@gcc.gnu.org>,        Vinod Kathail <vinodk@xilinx.com>,        Shail Aditya Gupta <shailadi@xilinx.com>,        Vidhumouli Hunsigida <vidhum@xilinx.com>,        Nagaraju Mekala <nmekala@xilinx.com>
From: Jeff Law <law@redhat.com>
Message-ID: <5669DBCD.1060507@redhat.com>
Date: Thu, 10 Dec 2015 20:08:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <CAFiYyc2WXOS5kJFeKPa0aqaWbHV-CnDeWJ98zRa5X=FbAZ2THw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-12/txt/msg01179.txt.bz2

On 12/03/2015 07:38 AM, Richard Biener wrote:
> This pass is now enabled by default with -Os but has no limits on the amount of
> stmts it copies.
The more statements it copies, the more likely it is that the path 
spitting will turn out to be useful!  It's counter-intuitive.

The primary benefit AFAICT with path splitting is that it exposes 
additional CSE, DCE, etc opportunities.

IIRC  Ajit posited that it could help with live/conflict analysis, I 
never saw that, and with the changes to push splitting deeper into the 
pipeline I'd further life/conflict analysis since that work also 
involved preserving the single latch property.


  It also will make all loops with this shape have at least two
> exits (if the resulting loop will be disambiguated the inner loop will
> have two exits).
> Having more than one exit will disable almost all loop optimizations after it.
Hmmm, the updated code keeps the single latch property, but I'm pretty 
sure it won't keep a single exit policy.

To keep a single exit policy would require keeping an additional block 
around.  Each of the split paths would unconditionally transfer to this 
new block.  The new block would then either transfer to the latch block 
or out of the loop.


>
> The pass itself documents the transform it does but does zero to motivate it.
>
> What's the benefit of this pass (apart from disrupting further optimizations)?
It's essentially building superblocks in a special case to enable 
additional CSE, DCE and the like.

Unfortunately what is is missing is heuristics and de-duplication.  The 
former to drive cases when it's not useful and the latter to reduce 
codesize for any statements that did not participate in optimizations 
when they were duplicated.

The de-duplication is the "sink-statements-through-phi" problems, cross 
jumping, tail merging and the like class of problems.

It was only after I approved this code after twiddling it for Ajit that 
I came across Honza's tracer implementation, which may in fact be 
retargettable to these loops and do a better job.  I haven't 
experimented with that.


>
> I can see a _single_ case where duplicating the latch will allow threading
> one of the paths through the loop header to eliminate the original exit.  Then
> disambiguation may create a nice nested loop out of this.  Of course that
> is only profitable again if you know the remaining single exit of the inner
> loop (exiting to the outer one) is executed infrequently (thus the inner loop
> actually loops).
It wasn't ever about threading.

>
> But no checks other than on the CFG shape exist (oh, it checks it will
> at _least_ copy two stmts!).
Again, the more statements it copies the more likely it is to be 
profitable.  Think superblocks to expose CSE, DCE and the like.

>
> Given the profitability constraints above (well, correct me if I am
> wrong on these)
> it looks like the whole transform should be done within the FSM threading
> code which might be able to compute whether there will be an inner loop
> with a single exit only.
While it shares some concepts with jump threading, I don't think the 
transformation belongs in jump threading.

>
> I'm inclined to request the pass to be removed again or at least disabled by
> default.
I wouldn't lose any sleep if we disabled by default or removed, 
particularly if we can repurpose Honza's code.  In fact, I might 
strongly support the former until we hear back from Ajit on performance 
data.

I also keep coming back to Click's paper on code motion -- in that 
context, copying statements would be a way to break dependencies and 
give the global code motion algorithm more freedom.  The advantage of 
doing it in a framework like Click's is it's got a built-in sinking step.


>
> What closed source benchmark was this transform invented for?
I think it was EEMBC or Coremark.  Ajit should know for sure.  I was 
actually still hoping to see benchmark results from Ajit to confirm the 
new placement in the pipeline didn't negate all the benefits he saw.

jeff