From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5272 invoked by alias); 22 Nov 2005 19:26:45 -0000 Received: (qmail 5262 invoked by uid 22791); 22 Nov 2005 19:26:44 -0000 X-Spam-Check-By: sourceware.org Received: from e31.co.us.ibm.com (HELO e31.co.us.ibm.com) (32.97.110.149) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Nov 2005 19:26:42 +0000 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e31.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jAMJQev4013690 for ; Tue, 22 Nov 2005 14:26:40 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jAMJQJks099286 for ; Tue, 22 Nov 2005 12:26:19 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jAMJQe6l032444 for ; Tue, 22 Nov 2005 12:26:40 -0700 Received: from [192.168.4.1] (vorma.rchland.ibm.com [9.10.86.186]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jAMJQbhi032329; Tue, 22 Nov 2005 12:26:39 -0700 Subject: Re: Register Allocation From: Peter Bergner To: Andrew MacLeod Cc: gcc mailing list In-Reply-To: <1132246411.10098.153.camel@localhost.localdomain> References: <1132246411.10098.153.camel@localhost.localdomain> Content-Type: text/plain Date: Tue, 22 Nov 2005 19:26:00 -0000 Message-Id: <1132687596.7748.32.camel@otta.rchland.ibm.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2005-11/txt/msg01080.txt.bz2 On Thu, 2005-11-17 at 11:53 -0500, Andrew MacLeod wrote: > I have been contemplating building a GCC register allocator from scratch > for some time. To that end, I have put together a bit of a document > given a high level overview of the various components I think would > benefit GCC, and a rough description of how they would work together. First off, let me join the others in thanking you for submitting your document for review. I'm particularly glad to see you included early instruction selection in your design, since I view the lack of that as one of the major problems with the current register allocator. Let me also say that I look forward to helping out anyway I can with this project. Peter Live Range Disjointer [page(s) 6]: * As an FYI for others, this is also know in Chaitin's paper as "Right Number of Names" and also as "web analysis". Instruction Selection [page(s) 7-8]: * Yes! IMHO, the lack of early instruction selection is one of the major problems with the current allocator. * Note, better instruction scheduling could make good use of early instruction selection too. * Cost model (which includes register pressure info) for determining which native register class to use is good. Register Coalescing [page(s) 8]: * We will probably need some method for telling the coalescer that a particular copy (or type of copy/copies) should either be completely ignored (ie, do not coalesce it even if it looks coalescable) or to think very hard before it does. Examples of copies we might want to skip would be copies inserted to satisfy instruction constraints or some type of spilling/splitting code. Examples of copies we might want to think about before blindly coalescing might be pseudo - hard register copies, or copies that would lead to an unacceptable increase in register constraints. Global Allocation [page(s) 10]: * I'd like to keep the design flexible enough (if it's at all possible) in case we want to attempt interprocedural register allocation in the future. * I'd also like to allow Classic Chaitin and Briggs algorithms which do iterative spilling (ie, color -> spill -> color ...). This would be beneficial in that it gives us a well known baseline to which all other algorithms/spilling heuristics can be compared to. It has the added benefit of making publishing easier if your baseline allocator is already implemented. Spiller [page(s) 10-11]: * If we are doing a single pass Chaitin style allocation, then I agree that the spiller would map all unallocated pseudos into hardware registers. However, if we are doing a multi pass Chaitin style allocation, then the spiller should not map the unallocated pseudos to hardware registers, but simply insert the required spill code leaving the spilled pseudos as pseudos. The spiller should be flexible enough to do both. * I can also envision someone wanting to spill only a subset of the unallocated pseudos, so we should handle that scenario. * Now that I think of it, Vlad's idea seems vaguely familiar to Load/Store Range Analysis from PLDI93. It's been so long since I read the paper, so am not 100% certain. I don't recall hearing of anyone actually using Load/Store Range Analysis. Spill Location Optimizer [page(s) 11]: * The IBM iSeries backend has this optimization. During spilling, it inserts copies to/from spill pseudos (essentially like another register class) which represent the stores/loads from the stack. These spill pseudos can then be dead code eliminated, coalesced and colored (using an interference graph) just like any other normal pseudo. Normal Chaitin coloring (using k = infinity) does a good job of minimizing the amount of stack space used for spilled pseudos. Reg_Info [page(s) 16-17]: * I have some questions about how your register group mechanism will work in practice, but that can probably wait for a while. Insn Annotations [page(s) 17-18]: * I like the idea of easy access to the register usage info provided by the insn annotations. RTL isn't really setup for making that easy. * Encoding register pressure summary info as +1, -2 or 0 is fine, but it is not a total solution. In particular, it doesn't handle register clobbers adequately. An example would be the volatile registers being clobbered on a function call. Also, for register classes with distinct subclasses (eg, GPRS and a subset of GPRS capable of acting as index registers), would you have separate counters? Spill Cost Engine [page(s) 26-29]: * The register allocator should not be estimating the execution frequency of a basic block as 10^nesting level. That information should be coming from the cfg which comes from profile data or from a good static profile. The problem with 10^loop nesting level is that we can overestimate the spill costs for some pseudos. For example: while (...) { if (...) else