From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-157320-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 32252 invoked by alias); 16 Oct 2009 14:57:43 -0000
Received: (qmail 32227 invoked by uid 22791); 16 Oct 2009 14:57:42 -0000
X-SWARE-Spam-Status: No, hits=-1.3 required=5.0 	tests=AWL,BAYES_00,KAM_BADIPHTTP,NORMAL_HTTP_TO_IP,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 16 Oct 2009 14:57:37 +0000
Received: from int-mx04.intmail.prod.int.phx2.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.17]) 	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n9GEvZrI017850; 	Fri, 16 Oct 2009 10:57:35 -0400
Received: from [10.11.13.132] (vpn-13-132.rdu.redhat.com [10.11.13.132]) 	by int-mx04.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n9GEvYYL031571; 	Fri, 16 Oct 2009 10:57:35 -0400
Message-ID: <4AD888ED.1020102@redhat.com>
Date: Fri, 16 Oct 2009 15:23:00 -0000
From: Vladimir Makarov <vmakarov@redhat.com>
User-Agent: Thunderbird 2.0.0.23 (X11/20090825)
MIME-Version: 1.0
To: Ian Bolton <bolton@IceraSemi.com>
CC: gcc@gcc.gnu.org
Subject: Re: Understanding IRA
References: <4D60B0700D1DB54A8C0C6E9BE69163700BD6DB18@EXCHANGEVS.IceraSemi.local>
In-Reply-To: <4D60B0700D1DB54A8C0C6E9BE69163700BD6DB18@EXCHANGEVS.IceraSemi.local>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg00366.txt.bz2

Ian Bolton wrote:
> Hi Vladimir,
>
> I have just begun working for Icera Inc, on a private port of GCC,
> and my first task has been to investigate the performance impact of
> using the Chaitin-Briggs algorithm of IRA vs the priority algorithm
> that we currently have enabled in our production compiler (due to not
> defining IRA_COVER_CLASSES).
>
> After running various tests, I measured little performance changes
> except for some test cases that have regressed quite considerably.
> My suspicion was that these regressions were due to our architecture
> having some instructions that can only use a subset of the register
> bank, so I set about trying to understand IRA enough to confirm this.
>
That is quite possible.  Chaitin-Briggs algorithm has some problems when 
there are instructions using specific register subsets.  It is well 
known and some researchers tried to solve the problem.  The most known 
article about this is "A generalized algorithm for graph-coloring 
register allocation":

http://130.203.133.121:8080/viewdoc/summary?doi=10.1.1.107.113

I am still working on this problem but I don't know when it will be 
finally in gcc.
> I now believe I have now reached a point where I understand IRA
> enough to start thinking about how I can improve these regressed
> cases, but I thought it might be a good idea to formulate an email
> with my current understanding of IRA in it, so you could correct any
> errors before I continue.  (I figure this exchange between us will
> also serve useful to other newbies in the field of GCC register
> allocation!)
>
The biggest problem of GCC RA is still reload pass.  It frequently 
changes decisions of IRA not in a good way.
> Without further ado, here is my current understanding of IRA
> (numbered for ease of reference).
>
> 1. You can enable full IRA, with Chaitin-Briggs coloring, by defining
> IRA_COVER_CLASSES, but you can just as easily disable it again by
> supplying -fira-algorithm=priority at the command line.  (The latter
> is useful to know because it means you can compare two versions
> without recompiling.)
>
Correct.
> 2. The priority algorithm allocates in an order determined by the
> allocno_priority_compare_func and calls into assign_hard_reg in that
> order to determine which allocno gets what register (or memory).
>
Correct.
> 3. The CB algorithm puts allocnos into colored and uncolored buckets,
> according to conflicts between allocnos.  The more conflicts, the
> more chance of being put in the uncolored bucket.  Being in the
> uncolored bucket increases an allocno's chance of being spilled;
> being in the colored bucket guarantees that it will get a hard
> register.
>
I'd say if allocno goes to the coloring stack from the uncolored bucket, 
it will be probably spilled.  Going to the coloring stack from coloring 
bucket means that the allocno most probably get a hard register.  In 
theory it should be guaranteed but in practice it might not happen 
because of multi-registers allocnos, hard register conflicts with 
allocno, non-profitability of some hard registers (e.g. callee-saved vs 
callee-clobbered registers).

Being in colored or uncolored bucket is defined by trivial colorability 
criteria which is classically based on conflicts left in the graph for 
given allocno.  The criteria can be more complicated and exact but it 
might slow down RA a lot.
> 4. When you run at -O0, the conflicts aspect of IRA is disabled
> (ira_conflict_p is set to false), and this subsequently causes the
> fast_allocation function to be used instead of the do_coloring IRA
> function.  (This makes sense since you cannot separate into buckets
> without inspecting conflicts.)
>
Correct.  Building conflicts is an expansive operation.  Therefore -O0 
(or fast) RA is based on live ranges.
> 5. Assuming ira_conflicts_p is enabled and you are using the CB
> algorithm, the two buckets are sorted using a
> bucket_allocno_compare_func, and this ordering is important, since it
> dictates the order each allocno will be pushed onto the stack and
> hence the order it will be popped from it (and assign_hard_reg
> called).
>
Correct.  But I should say it is only used for currently colored 
allocnos.  You can not reorder two allocnos when putting  the first 
allocno on the stack (in order words, removing trivially colored allocno 
from the graph) results in making the second allocno trivially colored.
> 6. Allocnos from the colorable bucket always go on the stack first
> (in the order they were sorted) and then uncolorable ones, in an
> order based on which is the best candidate for spilling.  Each time
> an allocno is added to the stack, conflicts may be reduced and there
> is the possibility of moving one or more allocnos from the
> uncolorable bucket to the colorable one (so they end up being added
> to the stack sooner than the other uncolorable ones).  Near the end
> of processing the uncolorable bucket, there comes an opportunity to
> move all remaining allocnos to the colorable bucket (due to few
> remaining conflicts) and these get put on the stack last, and hence
> popped first, and so they all get registers.
Correct.   It is how Chaitin-Briggs coloring works and such description 
can be found in any compiler book.
>  These final uncolorables
> will be those that had the highest spill cost and/or the most
> remaining conflicts.
>
Sometimes such allocnos get hard register (it is implemented by 
optimistic coloring which is the biggest invention of Briggs).  It is 
very hard to predict when it happens.
> 7. The spill cost and priority calculation for determining which
> uncolorable to put on the stack first is important, for two main
> reasons: firstly, whichever allocno is picked is a more likely
> candidate for spilling; secondly, it will free other allocnos up to
> be added to the nicer colored bucket, from which an allocno always
> receives a hard register.
>
Correct with mentioned above problem that allocnos from colored bucket 
might not get a hard register.

Chaitin-Briggs algorithm is good at coloring but not good at spilling 
which should be used on register pressure criteria and this is very 
different from # conflicts.  Priority coloring, imho, better dealing 
with spilling because its priority takes register pressure into account 
by using live range length.  From well known RA algorithms, ELS 
(extended linear scan RA) probably does the best decision about spilling 
but it is not good about coloring (especially for code with complicated 
CFG).
> If you could let me know if and where I've gone wrong with any of the
> above, I would be extremely grateful.  Then, once we are on the same
> page, I hope you could also make some suggestions as to how I might
> help IRA work well with our instructions that can only use a subset
> of the register bank.
>
Subsets are complicated issue.  I've been working on different 
approaches to this problem for year without significant improvements.  
As I wrote I hope it finally will be in GCC in some time and necessity 
for IRA_COVER_CLASS will finally go away.
> Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)