From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36258 invoked by alias); 19 Dec 2017 00:07:42 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 36245 invoked by uid 89); 19 Dec 2017 00:07:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.2 spammy=HContent-type:plain, regional, clark, Clark X-HELO: pv33p33im-asmtp002.me.com Received: from pv33p33im-asmtp002.me.com (HELO pv33p33im-asmtp002.me.com) (17.142.241.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Dec 2017 00:07:39 +0000 Received: from process-dkim-sign-daemon.pv33p33im-asmtp002.me.com by pv33p33im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) id <0P1600C00KOZES00@pv33p33im-asmtp002.me.com> for gcc@gcc.gnu.org; Tue, 19 Dec 2017 00:07:24 +0000 (GMT) Received: from icloud.com ([127.0.0.1]) by pv33p33im-asmtp002.me.com (Oracle Communications Messaging Server 8.0.1.2.20170607 64bit (built Jun 7 2017)) with ESMTPSA id <0P16001A4LNXTG20@pv33p33im-asmtp002.me.com>; Tue, 19 Dec 2017 00:07:14 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-18_17:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1011 suspectscore=20 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1712190000 Content-type: text/plain; charset=utf-8 MIME-version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\)) Subject: Re: Register Allocation Graph Coloring algorithm and Others From: Michael Clark In-reply-to: <615F0DCE4D5873A9@mac.com> Date: Tue, 19 Dec 2017 00:07:00 -0000 Cc: vmakarov@redhat.com, LewisR9@cf.ac.uk, dag@cray.com, stoklund@2pi.dk, GCC Development , LLVM Developers Mailing List Content-transfer-encoding: quoted-printable Message-id: <62E41783-1145-4036-9FD4-75D3B0C22DE6@mac.com> References: <615F0DCE4D5873A9@mac.com> To: Leslie Zhai X-IsSubscribed: yes X-SW-Source: 2017-12/txt/msg00113.txt.bz2 Hi Leslie, I suggest adding these 3 papers to your reading list. Register allocation for programs in SSA-form Sebastian Hack, Daniel Grund, and Gerhard Goos http://www.rw.cdl.uni-saarland.de/~grund/papers/cc06-ra_ssa.pdf Simple and Efficient Construction of Static Single Assignment Form Matthias Braun , Sebastian Buchwald , Sebastian Hack , Roland Lei=C3=9Fa ,= Christoph Mallon , and Andreas Zwinkau https://www.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf Optimal register allocation for SSA-form programs in polynomial time Sebastian Hack, Gerhard Goos http://web.cs.ucla.edu/~palsberg/course/cs232/papers/HackGoos-ipl06.pdf A lot of the earlier literature regarding the register allocation problem d= escribes the general graph colouring problem as NP-complete, however previo= us research in Register Allocation has been in the context of programs that= were not in SSA form. i.e. the Chaitin-Briggs paper states that register a= llocation is NP-complete and proposes an iterative algorithm. If one reads Hack Goos, there is the discovery that programs that are in SS= A form have chordal graphs, and the colouring algorithm for chordal graphs = can be completed in polynomial time. After the cost of building the interfe= rence graph, it is possible to perform register allocation in a single pass= . The key is in not modifying the graph. If one has frequency for each basic block, then one can sort basic blocks b= y frequency, allocating the highest frequency blocks first, and where possi= ble assigning the same physcial register to all virtual registers in each p= hi node (to avoid copies). At program points where the live set is larger t= han k, the set of physical registers, one spills the the register that has = the largest distance between its next use. At least that is how I am thinki= ng about this problem. I am also a novice with regards to register allocati= on. I intend to develop a register allocator for use in this JIT engine: =09 rv8: a high performance RISC-V to x86 binary translator Michael Clark, Bruce Hoult https://carrv.github.io/papers/clark-rv8-carrv2017.pdf Our JIT already performs almost twice as fast a QEMU and we are using a sta= tic register allocation, and QEMU i believe has a register allocator. We ar= e mapping a 31 register RISC architecture to a 16 register CISC architectur= e. I have statistics on the current slow-down, and it appears to mainly be = L1 accesses to spilled registers. I believe after we have implemented a reg= ister allocator in our JIT we will be able to run RISC-V binaries at near n= ative performance on x86-64. In fact given we are performing runtime profil= e guided optimisation, it may even be possible for some benchmarks to run f= aster than register allocations that are based on static estimates of loop = frequencies. https://anarch128.org/~mclark/rv8-slides.pdf https://rv8.io/bench We=E2=80=99ve still got a long way to go to get to ~1:1 performance for RIS= C-V on x86-64, but I think it is possible, at least for some codes=E2=80=A6 Regards, Michael. > On 15/12/2017, at 4:18 PM, Leslie Zhai wrote: >=20 > Hi GCC and LLVM developers, >=20 > I am learning Register Allocation algorithms and I am clear that: >=20 > * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg (har= d) >=20 > * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but it h= as to spill code when PhysReg is unavailable >=20 > * Folding spill code into instructions, handling register coallescing, sp= litting live ranges, doing rematerialization, doing shrink wrapping are har= der than RegAlloc >=20 > * LRA and IRA is default Passes in RA for GCC: >=20 > $ /opt/gcc-git/bin/gcc hello.c > DEBUG: ../../gcc/lra.c, lra_init_once, line 2441 > DEBUG: ../../gcc/ira-build.c, ira_build, line 3409 >=20 > * Greedy is default Pass for LLVM >=20 > But I have some questions, please give me some hint, thanks a lot! >=20 > * IRA is regional register allocator performing graph coloring on a top-d= own traversal of nested regions, is it Global? compares with Local LRA >=20 > * The papers by Briggs and Chaiten contradict[2] themselves when examine = the text of the paper vs. the pseudocode provided? >=20 > * Why interference graph is expensive to build[3]? >=20 > And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for LL= VM firstly. >=20 >=20 > [1] https://reviews.llvm.org/D39712 >=20 > [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html >=20 > [3] https://github.com/joaotavio/llvm-register-allocator >=20 > [4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol >=20 > --=20 > Regards, > Leslie Zhai - https://reviews.llvm.org/p/xiangzhai/ >=20 >=20 >=20