From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <law@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [63.128.21.124])
 by sourceware.org (Postfix) with ESMTP id 8E7173857804
 for <gcc-patches@gcc.gnu.org>; Mon, 30 Nov 2020 06:45:24 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8E7173857804
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-470-7ARd4JOeM-KVM7okfGO3CA-1; Mon, 30 Nov 2020 01:45:19 -0500
X-MC-Unique: 7ARd4JOeM-KVM7okfGO3CA-1
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com
 [10.5.11.15])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 831E880623C;
 Mon, 30 Nov 2020 06:45:18 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-145.phx2.redhat.com
 [10.3.112.145])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 2BF665D6A8;
 Mon, 30 Nov 2020 06:45:18 +0000 (UTC)
Subject: Re: [00/23] Make fwprop use an on-the-side RTL SSA representation
To: Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
References: <mpth7ptad81.fsf@arm.com>
 <a24fe294-25a0-4bf4-0fac-bb0df079fb96@redhat.com> <mptlfeogl6h.fsf@arm.com>
From: Jeff Law <law@redhat.com>
Message-ID: <ca3512c8-6398-d8a0-da50-0ad1e327d136@redhat.com>
Date: Sun, 29 Nov 2020 23:45:17 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.4.0
MIME-Version: 1.0
In-Reply-To: <mptlfeogl6h.fsf@arm.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 30 Nov 2020 06:45:26 -0000


On 11/26/20 9:03 AM, Richard Sandiford wrote:
> Thanks for the reviews.
>
> Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 11/13/20 1:10 AM, Richard Sandiford via Gcc-patches wrote:
>>> Just after GCC 10 stage 1 closed (oops), I posted a patch to add a new
>>> combine pass.  One of its main aims was to allow instructions to move
>>> around where necessary in order to make a combination possible.
>>> It also tried to parallelise instructions that use the same resource.
>>>
>>> That pass contained its own code for maintaining limited def-use chains.
>>> When I posted the patch, Segher asked why we wanted yet another piece
>>> of pass-specific code to do that.  Although I had specific reasons
>>> (which I explained at the time) I've gradually come round to agreeing
>>> that that was a flaw.
>>>
>>> This series of patches is the result of a Covid-time project to add
>>> a more general, pass-agnostic framework.  There are two parts:
>>> adding the framework itself, and using it to make fwprop.c faster.
>>>
>>> The framework part
>>> ------------------
>>>
>>> The framework provides an optional, on-the-side SSA view of existing
>>> RTL instructions.  Each instruction gets a list of definitions and a
>>> list of uses, with each use having a single definition.  Phi nodes
>>> handle cases in which there are multiple possible definitions of a
>>> register on entry to a basic block.  There are also routines for
>>> updating instructions while keeping the SSA representation intact.
>>>
>>> The aim is only to provide a different view of existing RTL instructions.
>>> Unlike gimple, and unlike (IIRC) the old RTL SSA project from way back,
>>> the new framework isn't a “native” SSA representation.  This means that
>>> all inputs to a phi node for a register R are also definitions of
>>> register R; no move operation is “hidden” in the phi node.
>> Hmm, I'm trying to parse what the last phrase means.  Does it mean that
>> the "hidden copy" problem for out-of-ssa is avoided?  And if so, how is
>> that maintained over time.  Things like copy-prop will tend to introduce
>> those issues even if they didn't originally exist.
> Yeah, the phi nodes simply say which definition of register R provides
> the value of R on a particular incoming edge.  That definition will
> itself be a phi node for R, an artificial definition of R created by DF
> (e.g. for incoming function arguments or for EH data registers), or an
> actual instruction that sets R.
>
> In other words, the SSA form is a purely on-the-side thing and the
> underlying RTL instructions are maintained in the same way as normal.
> The SSA form can be deleted at any time without performing a separate
> out-of-ssa step.  In that respect it's different from cfglayout,
> for example.
>
> One of the goals was to allow the SSA form to be used even after RA,
> where invisible copies would be more problematic.
Right.  But what I'm struggling a bit with is whether or not we have to
put restrictions on what passes can do with that on-the-side data
structure.  While I think we can have that on-the-side data structure be
conservatively correct, I think we have to make sure that we don't allow
changes to the on-the-side data structure to occur that ultimately we
can't reflect into RTL.

I may need to go back and re-read the lost copy problem literature.  But
it's definitely an area that I'm concerned about.


>> It's unfortunately that there's no DCE passes abutting
>> fwprop as DCE is really easy in an SSA world.
> fwprop.c calls delete_trivially_dead_insns, so it does some light DCE.
> One thing I wanted to do (but ran out of time to do) was get the main
> SSA insn-change routine (rtl_ssa::change_insns) to record when an
> instruction becomes dead, and then perform DCE as part of the later
> rtl_ssa::perform_pending_updates step.  This would be much cheaper
> than doing another full scan of the instruction stream (which is what
> delete_trivially_dead_insns needs needs to do).
>
> Unfortunately, I suspect we're relying on this delete_trivially_dead_insns
> call to delete instructions that became dead during earlier passes, not just
> those that become dead during fwprop.c.  So I guess we would need a full
> DCE at some point: making fwprop.c clean up its own mess might not be
> enough.
Oh, yea, if it's using delete_trivially_dead_insns, then, yea, it's got
a mini-DCE and using the SSA algorithm would seem to be a step forward.

I don't necessarily see that incoming dead code is that big of a
problem.  Ultimately it's still going to look like SSA definition with
no uses, in the on-the-side data structure, right?  So an SSA based DCE
should be able to clean up the mess from fwprop as well as any incoming
dead code.
>
>>> * The SSA code groups blocks into extended basic blocks, with the
>>>   EBBs rather than individual blocks having phi nodes.  
>> So I haven't looked at the patch, but the usual place to put PHIs is at
>> the dominance frontier.  But extra PHIs just increase time/memory and
>> shouldn't affect correctness.
> Yeah, the phis still are at dominance frontiers (except for certain
> cases where we use degenerate phis to maintain a linear RPO view;
> see the doc patch for more details about that).  It wasn't clear from
> the description above, but I was really talking about a pure data
> structure choice: once we have both BBs and EBBs, the phis naturally
> attach to the EBB data structure rather than the BB data structure,
> since second and subsequent BBs in an EBB have a single predecessor
> and so never need phi nodes.
Certainly its the case that the dominance frontier must be at the start
of an EBB.  So inserting PHIs at the start of EBBs should be correct. 
But my recollection was that if do it naively you end up with
unnecessary PHIs.    But I don't think we have to do a "no useless PHIs"
algorithm, we just have to do something sensible -- it's my suspicion
that all the work in the early days of SSA to minimize PHIs isn't as
important as it used to be.

>
>>> * The framework also provides live range information for registers
>>>   within an extended basic block and allows instructions to move within
>>>   their EBB.  It might be useful to allow further movement in future;
>>>   I just don't have a use case for it yet.
>> Yup.   You could do something like Click's algorithm to schedule the
>> instructions in a block to maximize CSE opportunities on top of this.
> Yeah.
I noticed that you've got a lot of the infrastructure to do this already
:-) 

>
>>> * One advantage of the new infrastructure is that it gives
>>>   recog_for_combine-like behaviour: if recog wants to add clobbers
>>>   of things like the flags register, the SSA code will make sure
>>>   that the flags register is free.
>> I look more at the intersection between combine and SSA as an
>> opportunity to combine on extended blocks, simplify the "does dataflow
>> allow this combination" logic, drop the need to build/maintain LOG_LINKS
>> and more generally simplify note distribution.
> Yeah, my ultimate goal (for GCC12, I hope this time for real) is still
> to provide an SSA version of combine.  Initially it could sit alongside
> the existing combine pass, and perhaps be run only after RA by default.
> (Testing locally, that seems to give nice results, and reduces the
> pressure to reimplement everything in combine.c in one go.)
>
> But the point above was instead that, at the moment, combine is the
> only pass that can add (say) new clobbers of the flags register as
> part of a recog.  I think ideally *all* passes should be able to do that.
> But passes would then need to track the live ranges of the flags
> register in order to tell when the flags register is free.  One of the
> side-benefits of the SSA stuff is that it can do this in amortised
> sublinear complexity.  So RTL SSA provides its own interface to recog
> that can do the same things as recog_for_combine does.
Sweet.


Jeff