From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32252 invoked by alias); 16 Oct 2009 14:57:43 -0000 Received: (qmail 32227 invoked by uid 22791); 16 Oct 2009 14:57:42 -0000 X-SWARE-Spam-Status: No, hits=-1.3 required=5.0 tests=AWL,BAYES_00,KAM_BADIPHTTP,NORMAL_HTTP_TO_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 16 Oct 2009 14:57:37 +0000 Received: from int-mx04.intmail.prod.int.phx2.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.17]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n9GEvZrI017850; Fri, 16 Oct 2009 10:57:35 -0400 Received: from [10.11.13.132] (vpn-13-132.rdu.redhat.com [10.11.13.132]) by int-mx04.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n9GEvYYL031571; Fri, 16 Oct 2009 10:57:35 -0400 Message-ID: <4AD888ED.1020102@redhat.com> Date: Fri, 16 Oct 2009 15:23:00 -0000 From: Vladimir Makarov User-Agent: Thunderbird 2.0.0.23 (X11/20090825) MIME-Version: 1.0 To: Ian Bolton CC: gcc@gcc.gnu.org Subject: Re: Understanding IRA References: <4D60B0700D1DB54A8C0C6E9BE69163700BD6DB18@EXCHANGEVS.IceraSemi.local> In-Reply-To: <4D60B0700D1DB54A8C0C6E9BE69163700BD6DB18@EXCHANGEVS.IceraSemi.local> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00366.txt.bz2 Ian Bolton wrote: > Hi Vladimir, > > I have just begun working for Icera Inc, on a private port of GCC, > and my first task has been to investigate the performance impact of > using the Chaitin-Briggs algorithm of IRA vs the priority algorithm > that we currently have enabled in our production compiler (due to not > defining IRA_COVER_CLASSES). > > After running various tests, I measured little performance changes > except for some test cases that have regressed quite considerably. > My suspicion was that these regressions were due to our architecture > having some instructions that can only use a subset of the register > bank, so I set about trying to understand IRA enough to confirm this. > That is quite possible. Chaitin-Briggs algorithm has some problems when there are instructions using specific register subsets. It is well known and some researchers tried to solve the problem. The most known article about this is "A generalized algorithm for graph-coloring register allocation": http://130.203.133.121:8080/viewdoc/summary?doi=10.1.1.107.113 I am still working on this problem but I don't know when it will be finally in gcc. > I now believe I have now reached a point where I understand IRA > enough to start thinking about how I can improve these regressed > cases, but I thought it might be a good idea to formulate an email > with my current understanding of IRA in it, so you could correct any > errors before I continue. (I figure this exchange between us will > also serve useful to other newbies in the field of GCC register > allocation!) > The biggest problem of GCC RA is still reload pass. It frequently changes decisions of IRA not in a good way. > Without further ado, here is my current understanding of IRA > (numbered for ease of reference). > > 1. You can enable full IRA, with Chaitin-Briggs coloring, by defining > IRA_COVER_CLASSES, but you can just as easily disable it again by > supplying -fira-algorithm=priority at the command line. (The latter > is useful to know because it means you can compare two versions > without recompiling.) > Correct. > 2. The priority algorithm allocates in an order determined by the > allocno_priority_compare_func and calls into assign_hard_reg in that > order to determine which allocno gets what register (or memory). > Correct. > 3. The CB algorithm puts allocnos into colored and uncolored buckets, > according to conflicts between allocnos. The more conflicts, the > more chance of being put in the uncolored bucket. Being in the > uncolored bucket increases an allocno's chance of being spilled; > being in the colored bucket guarantees that it will get a hard > register. > I'd say if allocno goes to the coloring stack from the uncolored bucket, it will be probably spilled. Going to the coloring stack from coloring bucket means that the allocno most probably get a hard register. In theory it should be guaranteed but in practice it might not happen because of multi-registers allocnos, hard register conflicts with allocno, non-profitability of some hard registers (e.g. callee-saved vs callee-clobbered registers). Being in colored or uncolored bucket is defined by trivial colorability criteria which is classically based on conflicts left in the graph for given allocno. The criteria can be more complicated and exact but it might slow down RA a lot. > 4. When you run at -O0, the conflicts aspect of IRA is disabled > (ira_conflict_p is set to false), and this subsequently causes the > fast_allocation function to be used instead of the do_coloring IRA > function. (This makes sense since you cannot separate into buckets > without inspecting conflicts.) > Correct. Building conflicts is an expansive operation. Therefore -O0 (or fast) RA is based on live ranges. > 5. Assuming ira_conflicts_p is enabled and you are using the CB > algorithm, the two buckets are sorted using a > bucket_allocno_compare_func, and this ordering is important, since it > dictates the order each allocno will be pushed onto the stack and > hence the order it will be popped from it (and assign_hard_reg > called). > Correct. But I should say it is only used for currently colored allocnos. You can not reorder two allocnos when putting the first allocno on the stack (in order words, removing trivially colored allocno from the graph) results in making the second allocno trivially colored. > 6. Allocnos from the colorable bucket always go on the stack first > (in the order they were sorted) and then uncolorable ones, in an > order based on which is the best candidate for spilling. Each time > an allocno is added to the stack, conflicts may be reduced and there > is the possibility of moving one or more allocnos from the > uncolorable bucket to the colorable one (so they end up being added > to the stack sooner than the other uncolorable ones). Near the end > of processing the uncolorable bucket, there comes an opportunity to > move all remaining allocnos to the colorable bucket (due to few > remaining conflicts) and these get put on the stack last, and hence > popped first, and so they all get registers. Correct. It is how Chaitin-Briggs coloring works and such description can be found in any compiler book. > These final uncolorables > will be those that had the highest spill cost and/or the most > remaining conflicts. > Sometimes such allocnos get hard register (it is implemented by optimistic coloring which is the biggest invention of Briggs). It is very hard to predict when it happens. > 7. The spill cost and priority calculation for determining which > uncolorable to put on the stack first is important, for two main > reasons: firstly, whichever allocno is picked is a more likely > candidate for spilling; secondly, it will free other allocnos up to > be added to the nicer colored bucket, from which an allocno always > receives a hard register. > Correct with mentioned above problem that allocnos from colored bucket might not get a hard register. Chaitin-Briggs algorithm is good at coloring but not good at spilling which should be used on register pressure criteria and this is very different from # conflicts. Priority coloring, imho, better dealing with spilling because its priority takes register pressure into account by using live range length. From well known RA algorithms, ELS (extended linear scan RA) probably does the best decision about spilling but it is not good about coloring (especially for code with complicated CFG). > If you could let me know if and where I've gone wrong with any of the > above, I would be extremely grateful. Then, once we are on the same > page, I hope you could also make some suggestions as to how I might > help IRA work well with our instructions that can only use a subset > of the register bank. > Subsets are complicated issue. I've been working on different approaches to this problem for year without significant improvements. As I wrote I hope it finally will be in GCC in some time and necessity for IRA_COVER_CLASS will finally go away. > Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)