From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-122157-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5272 invoked by alias); 22 Nov 2005 19:26:45 -0000
Received: (qmail 5262 invoked by uid 22791); 22 Nov 2005 19:26:44 -0000
X-Spam-Check-By: sourceware.org
Received: from e31.co.us.ibm.com (HELO e31.co.us.ibm.com) (32.97.110.149)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 22 Nov 2005 19:26:42 +0000
Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) 	by e31.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id jAMJQev4013690 	for <gcc@gcc.gnu.org>; Tue, 22 Nov 2005 14:26:40 -0500
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) 	by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.8) with ESMTP id jAMJQJks099286 	for <gcc@gcc.gnu.org>; Tue, 22 Nov 2005 12:26:19 -0700
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) 	by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id jAMJQe6l032444 	for <gcc@gcc.gnu.org>; Tue, 22 Nov 2005 12:26:40 -0700
Received: from [192.168.4.1] (vorma.rchland.ibm.com [9.10.86.186]) 	by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id jAMJQbhi032329; 	Tue, 22 Nov 2005 12:26:39 -0700
Subject: Re: Register Allocation
From: Peter Bergner <bergner@vnet.ibm.com>
To: Andrew MacLeod <amacleod@redhat.com>
Cc: gcc mailing list <gcc@gcc.gnu.org>
In-Reply-To: <1132246411.10098.153.camel@localhost.localdomain>
References: <1132246411.10098.153.camel@localhost.localdomain>
Content-Type: text/plain
Date: Tue, 22 Nov 2005 19:26:00 -0000
Message-Id: <1132687596.7748.32.camel@otta.rchland.ibm.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2005-11/txt/msg01080.txt.bz2

On Thu, 2005-11-17 at 11:53 -0500, Andrew MacLeod wrote:
> I have been contemplating building a GCC register allocator from scratch
> for some time.  To that end, I have put together a bit of a document
> given a high level overview of the various components I think would
> benefit GCC, and a rough description of how they would work together.  

First off, let me join the others in thanking you for submitting
your document for review.  I'm particularly glad to see you included
early instruction selection in your design, since I view the lack of
that as one of the major problems with the current register allocator.
Let me also say that I look forward to helping out anyway I can with
this project.

Peter


Live Range Disjointer [page(s) 6]:
    * As an FYI for others, this is also know in Chaitin's paper as
      "Right Number of Names" and also as "web analysis".

Instruction Selection [page(s) 7-8]:
    * Yes!  IMHO, the lack of early instruction selection is one of
      the major problems with the current allocator.
    * Note, better instruction scheduling could make good use
      of early instruction selection too.
    * Cost model (which includes register pressure info) for determining
      which native register class to use is good.

Register Coalescing [page(s) 8]:
    * We will probably need some method for telling the coalescer
      that a particular copy (or type of copy/copies) should either
      be completely ignored (ie, do not coalesce it even if it looks
      coalescable) or to think very hard before it does.  Examples of
      copies we might want to skip would be copies inserted to satisfy
      instruction constraints or some type of spilling/splitting code.
      Examples of copies we might want to think about before blindly
      coalescing might be pseudo - hard register copies, or copies
      that would lead to an unacceptable increase in register 
      constraints.

Global Allocation [page(s) 10]:
    * I'd like to keep the design flexible enough (if it's at all
      possible) in case we want to attempt interprocedural register
      allocation in the future.
    * I'd also like to allow Classic Chaitin and Briggs algorithms
      which do iterative spilling (ie, color -> spill -> color ...).
      This would be beneficial in that it gives us a well known
      baseline to which all other algorithms/spilling heuristics can
      be compared to.  It has the added benefit of making publishing
      easier if your baseline allocator is already implemented.

Spiller [page(s) 10-11]:
    * If we are doing a single pass Chaitin style allocation, then
      I agree that the spiller would map all unallocated pseudos
      into hardware registers.  However, if we are doing a multi
      pass Chaitin style allocation, then the spiller should not
      map the unallocated pseudos to hardware registers, but simply
      insert the required spill code leaving the spilled pseudos as
      pseudos.  The spiller should be flexible enough to do both.
    * I can also envision someone wanting to spill only a subset
      of the unallocated pseudos, so we should handle that scenario.
    * Now that I think of it, Vlad's idea seems vaguely familiar to
      Load/Store Range Analysis from PLDI93.  It's been so long since
      I read the paper, so am not 100% certain.  I don't recall hearing
      of anyone actually using Load/Store Range Analysis.

Spill Location Optimizer [page(s) 11]:
    * The IBM iSeries backend has this optimization.  During spilling,
      it inserts copies to/from spill pseudos (essentially like another
      register class) which represent the stores/loads from the stack.
      These spill pseudos can then be dead code eliminated, coalesced
      and colored (using an interference graph) just like any other
      normal pseudo.  Normal Chaitin coloring (using k = infinity) does
      a good job of minimizing the amount of stack space used for
      spilled pseudos.

Reg_Info [page(s) 16-17]:
    * I have some questions about how your register group mechanism
      will work in practice, but that can probably wait for a while.

Insn Annotations [page(s) 17-18]:
    * I like the idea of easy access to the register usage info
      provided by the insn annotations.  RTL isn't really setup
      for making that easy.
    * Encoding register pressure summary info as +1, -2 or 0 is fine,
      but it is not a total solution.  In particular, it doesn't
      handle register clobbers adequately.  An example would be the
      volatile registers being clobbered on a function call.  Also,
      for register classes with distinct subclasses (eg, GPRS and a
      subset of GPRS capable of acting as index registers), would you
      have separate counters?

Spill Cost Engine [page(s) 26-29]:
    * The register allocator should not be estimating the execution
      frequency of a basic block as 10^nesting level.  That information
      should be coming from the cfg which comes from profile data or
      from a good static profile.  The problem with 10^loop nesting
      level is that we can overestimate the spill costs for some
      pseudos.  For example:
    	while (...) {
    	  <use of "a">
    	  if (...)
    	    <use of "b">
    	  else
    	    <use of "b"
    	}
      In the code above, "b"'s spill cost will be twice that of "a",
      when they really should have the same spill cost.