public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: Understanding IRA
@ 2009-11-05 17:36 Ian Bolton
  2009-11-05 18:05 ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-05 17:36 UTC (permalink / raw)
  To: Jeff Law, Vladimir Makarov; +Cc: gcc

(NOTE: this is a repost to gcc@gcc.gnu.org of a private message
originally sent to just Jeff Law and Vladimir Makarov.)

> Vladimir Makarov wrote:
> Jeff Law wrote:
> > On 11/03/09 09:29, Ian Bolton wrote:
> >> Hi again, Vladimir,
> >>
> >> I am pleased to report some performance improvements after altering
> >> ira-costs.c.  A key benchmark for us has improved by 5%.
> >>
> >> Specifically, in record_reg_classes(), after the alt_cost has been
> >> calculated and it will be applied to pp->mem_cost and pp->cost[k],
I
> >> check whether this particular operand wanted one of our BOTTOM_REGS
> >> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> >> amount and also increase pp->cost[k] by an arbitrary amount if k
> >> does not represent the BOTTOM_REGS class.  My aim here is to nudge
> >> IRA in the right direction for operands that just want BOTTOM_REGS.
> >>
> >> After experimenting with different values for my "arbitrary
> >> amounts", I discovered some that successfully made IRA more likely
> >> to give BOTTOM_REGS to those instructions/operands that want
> >> BOTTOM_REGS, since any other regs and memory ended up with high
> >> enough costs for IRA to try and avoid using them.
> >>
> >> I have included a snippet from my version of record_reg_classes()
> >> below:
> >>
> >>
> > What I don't understand at this point is why the current mechanisms
> in
> > IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher
> > cost for TOP_REGS).  i.e.  I don't think any of this should be
> > necessary as IRA should already be doing something similar.
> >
> > This may be a case where your backend hasn't indicated that TOP_REGS
> > has a higher cost than BOTTOM_REGS in particular situations.
> >
> I am agree with Jeff.  It is hard to understand what you are doing
> without the architecture knowledge and some macro values in your port
> (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
> 
> I'd also add that besides right macro value definitions, you could use
> insn alternative hints near register constraints like ? or even *.
>

FYI, my IRA_COVER_CLASSES is simply this:

#define IRA_COVER_CLASSES	\
{					\
  GENERAL_REGS,  			\
  LIM_REG_CLASSES 		\
}

REGISTER_MOVE_COST is currently set to the default of 2, in defaults.h.
On our architecture, it takes 1 cycle to move from one register to
another, so I assume 2 would be the correct number if this is meant to
represent the cost of moving there and back again?  Or is the cost 2
to represent the cycle impact (1) plus the size impact (1)?

MEMORY_MOVE_COST is set to be 4 + memory_move_secondary_cost in
reload.h.  I've not delved into what is happening in that function for
my particular case, but I see it mentions REGISTER_MOVE_COST in there
too.

(As you can see from my previous email, I have increased the costs for
allocnos where we want BOTTOM_REGS and get TOP_CREGS or MEM by 6*freq
and 20*freq, respectively.  I do not know whether these numbers are
sensible at this stage, since I do not fully understand what the costs
are meant to represent.)

After Jeff mentioned that IRA should already be doing the right thing,
I had another look at the 176r.ira dump file and realised that I had
originally only seen "Pass 0 for finding allocno costs" in that file.
In fact, some costs for allocnos are added in Pass 1.

For example, in unmodified IRA with priority algorithm and mixed
region:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,8000 C_REGS:0,8000
MEM:8000
...
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:8000,8000 C_REGS:8000,8000
MEM:20000

From looking at 174r.sched1, I can see that r203 is defined in l0 and
used as an operand that needed BOTTOM_REGS in l1.  Notice that the
allocno cost is 0 for l0, but the total cost is 8000, since this has
been passed on from the 8000 for allocno in l1.

r203 ends up getting register #23, which is not a BOTTOM_REG, so we need
to generate a move in region l1 before we can use this value.

What is interesting is that in my modified version of IRA, both lines
that mention r203 have an allocno cost:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:12000,50000
C_REGS:12000,50000 MEM:48000
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:38000,38000
C_REGS:38000,38000 MEM:138000

I'm not really sure how my change in record_reg_classes has ended up
altering the allocno cost in l0, but that does seem to be the key.
I will continue investigating this.

Another thing you can notice from the above cost output is that allocnos
that are going to be used for memory operations are getting their MEM
cost set high, to prevent them being put in MEM.  (r230 gets used
several
times to load in values and store out values.)

But in my modified IRA, my grossly inflated MEM costs for allocnos that
want BOTTOM_REGS are causing some of those allocnos to get a higher
priority than these ones that are to be used as memory address.  See
that a29/r203 has a MEM score of 138000, which means it gets first pick
of the available registers, so more chance of getting the BOTTOM_REG
that it wants.

I wonder if the value being used for these memory address allocnos could
be tuned.  I also wonder if we could persuade them to prefer TOP_CREGS,
thus leaving more BOTTOM_REGS for those that want them.

Best regards,
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Understanding IRA
@ 2009-10-16 14:22 Ian Bolton
  2009-10-16 15:23 ` Vladimir Makarov
  2009-10-16 15:45 ` Vladimir Makarov
  0 siblings, 2 replies; 30+ messages in thread
From: Ian Bolton @ 2009-10-16 14:22 UTC (permalink / raw)
  To: vmakarov; +Cc: gcc

Hi Vladimir,

I have just begun working for Icera Inc, on a private port of GCC,
and my first task has been to investigate the performance impact of
using the Chaitin-Briggs algorithm of IRA vs the priority algorithm
that we currently have enabled in our production compiler (due to not
defining IRA_COVER_CLASSES).

After running various tests, I measured little performance changes
except for some test cases that have regressed quite considerably.
My suspicion was that these regressions were due to our architecture
having some instructions that can only use a subset of the register
bank, so I set about trying to understand IRA enough to confirm this.

I now believe I have now reached a point where I understand IRA
enough to start thinking about how I can improve these regressed
cases, but I thought it might be a good idea to formulate an email
with my current understanding of IRA in it, so you could correct any
errors before I continue.  (I figure this exchange between us will
also serve useful to other newbies in the field of GCC register
allocation!)

Without further ado, here is my current understanding of IRA
(numbered for ease of reference).

1. You can enable full IRA, with Chaitin-Briggs coloring, by defining
IRA_COVER_CLASSES, but you can just as easily disable it again by
supplying -fira-algorithm=priority at the command line.  (The latter
is useful to know because it means you can compare two versions
without recompiling.)

2. The priority algorithm allocates in an order determined by the
allocno_priority_compare_func and calls into assign_hard_reg in that
order to determine which allocno gets what register (or memory).

3. The CB algorithm puts allocnos into colored and uncolored buckets,
according to conflicts between allocnos.  The more conflicts, the
more chance of being put in the uncolored bucket.  Being in the
uncolored bucket increases an allocno's chance of being spilled;
being in the colored bucket guarantees that it will get a hard
register.

4. When you run at -O0, the conflicts aspect of IRA is disabled
(ira_conflict_p is set to false), and this subsequently causes the
fast_allocation function to be used instead of the do_coloring IRA
function.  (This makes sense since you cannot separate into buckets
without inspecting conflicts.)

5. Assuming ira_conflicts_p is enabled and you are using the CB
algorithm, the two buckets are sorted using a
bucket_allocno_compare_func, and this ordering is important, since it
dictates the order each allocno will be pushed onto the stack and
hence the order it will be popped from it (and assign_hard_reg
called).

6. Allocnos from the colorable bucket always go on the stack first
(in the order they were sorted) and then uncolorable ones, in an
order based on which is the best candidate for spilling.  Each time
an allocno is added to the stack, conflicts may be reduced and there
is the possibility of moving one or more allocnos from the
uncolorable bucket to the colorable one (so they end up being added
to the stack sooner than the other uncolorable ones).  Near the end
of processing the uncolorable bucket, there comes an opportunity to
move all remaining allocnos to the colorable bucket (due to few
remaining conflicts) and these get put on the stack last, and hence
popped first, and so they all get registers. These final uncolorables
will be those that had the highest spill cost and/or the most
remaining conflicts.

7. The spill cost and priority calculation for determining which
uncolorable to put on the stack first is important, for two main
reasons: firstly, whichever allocno is picked is a more likely
candidate for spilling; secondly, it will free other allocnos up to
be added to the nicer colored bucket, from which an allocno always
receives a hard register.

If you could let me know if and where I've gone wrong with any of the
above, I would be extremely grateful.  Then, once we are on the same
page, I hope you could also make some suggestions as to how I might
help IRA work well with our instructions that can only use a subset
of the register bank.

Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-12-07 13:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 17:36 Understanding IRA Ian Bolton
2009-11-05 18:05 ` Ian Bolton
2009-11-06 12:53   ` Dave Hudson
2009-11-09 14:13     ` Ian Bolton
2009-11-10 12:19       ` Dave Hudson
2009-11-10 17:21     ` Jeff Law
2009-11-10 17:38       ` Ian Bolton
2009-11-11 15:19         ` Ian Bolton
2009-11-11 16:12           ` Jeff Law
2009-11-11 17:04           ` Vladimir Makarov
2009-11-11 18:36             ` Ian Bolton
2009-11-11 20:09               ` Ian Bolton
2009-11-16 17:35                 ` Ian Bolton
     [not found]                   ` <4B01BB87.6020902@redhat.com>
2009-11-19 15:41                     ` Ian Bolton
     [not found]                       ` <4B1451C7.2010207@redhat.com>
2009-12-02 20:29                         ` Ian Bolton
2009-12-03 19:16                           ` Jeff Law
2009-12-07 13:30                             ` Ian Bolton
  -- strict thread matches above, loose matches on Subject: below --
2009-10-16 14:22 Ian Bolton
2009-10-16 15:23 ` Vladimir Makarov
2009-10-16 16:19   ` Jeff Law
2009-10-19 19:21     ` Ian Bolton
2009-10-19 21:09       ` Vladimir Makarov
2009-10-23  7:33       ` Jeff Law
2009-11-04 17:52         ` Ian Bolton
2009-11-04 19:49           ` Jeff Law
2009-10-16 15:45 ` Vladimir Makarov
2009-11-03 16:29   ` Ian Bolton
2009-11-03 23:02     ` Jeff Law
2009-11-04 17:13       ` Vladimir Makarov
2009-11-05  0:23         ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).