From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-194811-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 36258 invoked by alias); 19 Dec 2017 00:07:42 -0000
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
Received: (qmail 36245 invoked by uid 89); 19 Dec 2017 00:07:41 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.2 spammy=HContent-type:plain, regional, clark, Clark
X-HELO: pv33p33im-asmtp002.me.com
Received: from pv33p33im-asmtp002.me.com (HELO pv33p33im-asmtp002.me.com) (17.142.241.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Dec 2017 00:07:39 +0000
Received: from process-dkim-sign-daemon.pv33p33im-asmtp002.me.com by pv33p33im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun  7 2017)) id <0P1600C00KOZES00@pv33p33im-asmtp002.me.com> for gcc@gcc.gnu.org; Tue, 19 Dec 2017 00:07:24 +0000 (GMT)
Received: from icloud.com ([127.0.0.1]) by pv33p33im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun  7 2017)) with ESMTPSA id <0P16001A4LNXTG20@pv33p33im-asmtp002.me.com>; Tue, 19 Dec 2017 00:07:14 +0000 (GMT)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-18_17:,, signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1011 suspectscore=20 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1712190000
Content-type: text/plain; charset=utf-8
MIME-version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Subject: Re: Register Allocation Graph Coloring algorithm and Others
From: Michael Clark <michaeljclark@mac.com>
In-reply-to: <615F0DCE4D5873A9@mac.com>
Date: Tue, 19 Dec 2017 00:07:00 -0000
Cc: vmakarov@redhat.com, LewisR9@cf.ac.uk, dag@cray.com, stoklund@2pi.dk, GCC Development <gcc@gcc.gnu.org>, LLVM Developers Mailing List <llvm-dev@lists.llvm.org>
Content-transfer-encoding: quoted-printable
Message-id: <62E41783-1145-4036-9FD4-75D3B0C22DE6@mac.com>
References: <615F0DCE4D5873A9@mac.com>
To: Leslie Zhai <lesliezhai@llvm.org.cn>
X-IsSubscribed: yes
X-SW-Source: 2017-12/txt/msg00113.txt.bz2

Hi Leslie,

I suggest adding these 3 papers to your reading list.

	Register allocation for programs in SSA-form
	Sebastian Hack, Daniel Grund, and Gerhard Goos
	http://www.rw.cdl.uni-saarland.de/~grund/papers/cc06-ra_ssa.pdf

	Simple and Efficient Construction of Static Single Assignment Form
	Matthias Braun , Sebastian Buchwald , Sebastian Hack , Roland Lei=C3=9Fa ,=
 Christoph Mallon , and Andreas Zwinkau
	https://www.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf

	Optimal register allocation for SSA-form programs in polynomial time
	Sebastian Hack, Gerhard Goos
	http://web.cs.ucla.edu/~palsberg/course/cs232/papers/HackGoos-ipl06.pdf

A lot of the earlier literature regarding the register allocation problem d=
escribes the general graph colouring problem as NP-complete, however previo=
us research in Register Allocation has been in the context of programs that=
 were not in SSA form. i.e. the Chaitin-Briggs paper states that register a=
llocation is NP-complete and proposes an iterative algorithm.

If one reads Hack Goos, there is the discovery that programs that are in SS=
A form have chordal graphs, and the colouring algorithm for chordal graphs =
can be completed in polynomial time. After the cost of building the interfe=
rence graph, it is possible to perform register allocation in a single pass=
. The key is in not modifying the graph.

If one has frequency for each basic block, then one can sort basic blocks b=
y frequency, allocating the highest frequency blocks first, and where possi=
ble assigning the same physcial register to all virtual registers in each p=
hi node (to avoid copies). At program points where the live set is larger t=
han k, the set of physical registers, one spills the the register that has =
the largest distance between its next use. At least that is how I am thinki=
ng about this problem. I am also a novice with regards to register allocati=
on.

I intend to develop a register allocator for use in this JIT engine:
=09
	rv8: a high performance RISC-V to x86 binary translator
	Michael Clark, Bruce Hoult
	https://carrv.github.io/papers/clark-rv8-carrv2017.pdf

Our JIT already performs almost twice as fast a QEMU and we are using a sta=
tic register allocation, and QEMU i believe has a register allocator. We ar=
e mapping a 31 register RISC architecture to a 16 register CISC architectur=
e. I have statistics on the current slow-down, and it appears to mainly be =
L1 accesses to spilled registers. I believe after we have implemented a reg=
ister allocator in our JIT we will be able to run RISC-V binaries at near n=
ative performance on x86-64. In fact given we are performing runtime profil=
e guided optimisation, it may even be possible for some benchmarks to run f=
aster than register allocations that are based on static estimates of loop =
frequencies.

	https://anarch128.org/~mclark/rv8-slides.pdf
	https://rv8.io/bench

We=E2=80=99ve still got a long way to go to get to ~1:1 performance for RIS=
C-V on x86-64, but I think it is possible, at least for some codes=E2=80=A6

Regards,
Michael.

> On 15/12/2017, at 4:18 PM, Leslie Zhai <lesliezhai@llvm.org.cn> wrote:
>=20
> Hi GCC and LLVM developers,
>=20
> I am learning Register Allocation algorithms and I am clear that:
>=20
> * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg (har=
d)
>=20
> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but it h=
as to spill code when PhysReg is unavailable
>=20
> * Folding spill code into instructions, handling register coallescing, sp=
litting live ranges, doing rematerialization, doing shrink wrapping are har=
der than RegAlloc
>=20
> * LRA and IRA is default Passes in RA for GCC:
>=20
> $ /opt/gcc-git/bin/gcc hello.c
> DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
> DEBUG: ../../gcc/ira-build.c, ira_build, line 3409
>=20
> * Greedy is default Pass for LLVM
>=20
> But I have some questions, please give me some hint, thanks a lot!
>=20
> * IRA is regional register allocator performing graph coloring on a top-d=
own traversal of nested regions, is it Global? compares with Local LRA
>=20
> * The papers by Briggs and Chaiten contradict[2] themselves when examine =
the text of the paper vs. the pseudocode provided?
>=20
> * Why  interference graph is expensive to build[3]?
>=20
> And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for LL=
VM firstly.
>=20
>=20
> [1] https://reviews.llvm.org/D39712
>=20
> [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html
>=20
> [3] https://github.com/joaotavio/llvm-register-allocator
>=20
> [4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol
>=20
> --=20
> Regards,
> Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/
>=20
>=20
>=20