RE: Understanding IRA

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Ian Bolton" <bolton@IceraSemi.com>
To: "Jeff Law" <law@redhat.com>
Cc: "Vladimir Makarov" <vmakarov@redhat.com>,
		"Dave Hudson" <gcc@blueteddy.net>, 	<gcc@gcc.gnu.org>
Subject: RE: Understanding IRA
Date: Wed, 02 Dec 2009 20:29:00 -0000	[thread overview]
Message-ID: <4D60B0700D1DB54A8C0C6E9BE69163700C8D36DE@EXCHANGEVS.IceraSemi.local> (raw)
In-Reply-To: <4B1451C7.2010207@redhat.com>

Jeff Law wrote:
> Ian Bolton wrote: 
> > Initial results showed that IRA was moving input arguments out of
> their
> > BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
> > thought TOP_CREGS were less costly to use, despite the cost of the
> move
> > instruction to get the input argument into a TOP_CREG.
> >
> That may indicate a cost scaling issue or more general weaknesses in
> IRA's cost modeling.
> 
> > I addressed this problem by splitting my register bank a little
> > differently: instead of making a distinction between BOTTOM_REGS and
> > TOP_CREGS, I made it so there was only a penalty if you used one of
> the
> > non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG).  This meant
> > that IRA was happy to leave input arguments in their BOTTOM_REGS but
> > erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had
> run
> > out.  This was an improvement, but there was still a case where these
> > '?' penalties were not aligned with reality:
> >
> > T1 = A + B; // can use any register, TOP_CREGS appears cheaper
> > T2 = A - C; // can use any register, TOP_CREGS appears cheaper
> > T3 = A&  D; // must use BOTTOM_REGS
> >
> > The constraints for the first two instructions show that TOP_CREGS is
> > cheaper, but then you have to plant a move to get A into a BOTTOM_REG
> > to do the AND; in reality, we know it cheaper to have A in a
> BOTTOM_REG
> > all along, but the '?' constraint suggests there is a cost in doing
> this
> > for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
> > incur the cost of the move because it is still cheaper than the costs
> I
> > have defined in with my constraints.  I don't believe there is a way
> to
> > communicate a conditional cost, so I'm thinking that constraints are
> not
> > the solution for me at this time.  What are your thoughts?
> >
> See above.  This might be a problem with scaling or weaknesses in IRA's
> costing model.   In theory IRA accumulates the cost of using each class
> over the set of insns using a particular pseudo.

I had an epiphany this morning and came up with an idea to achieve the
lookahead I thought I needed, thereby making the costs created by '?' a
lot more concrete and reliable.

Firstly, I have altered the alt_cost adjustment (for '?') in ira-costs.c,
so that it only happens on the second pass *and* only when the first pass
has determined that this allocno never needs a BOTTOM_REG.

The first condition (that it only occurs on the second pass) is there so
that the preferred class calculated for an allocno is based on hard
constraints, as opposed to the fuzzier constraints of '?'.  Without this
change, the second pass cannot use the preferred class to correctly add
costs for each class that doesn't intersect with the preferred class.

e.g. If an insn has an allocno as an operand that requires BOTTOM_REGS,
then we want the cost for TOP_CREGS for that allocno in that operand to
be higher to show the cost of moving it into BOTTOM_REGS.  But if we let
the '?' constraints dictate the pref class, and this allocno appears in
other insns where the '?' constraint has appeared, then TOP_CREGS may
end up being the preferred class and so this insn, which actually needs
BOTTOM_REGS for its operand, will end increasing the costs for any class
that doesn't intersect with TOP_CREGS (i.e. BOTTOM_REGS).

I'm thinking that this change will be generally useful.  What are your
thoughts?

The second condition is determined by me storing a new variable in each
allocno on the first pass to flag whether it ever appears as an operand
that must have BOTTOM_REGS.  On the second pass, I can then only penalise
an allocno for using my precious BOTTOM_REGS if I have already determined
that it will never need them.

This change is probably too specific to our case at the moment, but it
could probably be made generic, if it has use to other architectures.

These two changes together make the example I created above (where A
appears in multiple operations) correctly allocate a BOTTOM_REG from the
outset.  I am yet to run benchmarks with this solution, but I think it
will end up being similar to my alternate REG_ALLOC_ORDER, where I just
gave out TOP_CREGS first.

Sadly, I don't think either approach will handle the case where there is
low pressure and we first grab a TOP_CREG for something like an ADD (due
to our constraints telling us to) then we free up that register cos it's
not needed anymore and then we have to use a different (BOTTOM_REG) reg
for something like an AND; we should have just used just one BOTTOM_REG.

I guess solving this would involve bigger changes to the algorithm, so
that each allocno is aware of how many conflicting allocnos want the
BOTTOM_REGS.  We can then only penalise them for considering using
BOTTOM_REGS if we know it will impact someone.

> 
> >
> >> You might try something like this:
> >>
> >>     1. Crank up the callee-saved register cost adjustment in
> >> assign_hard_reg so that it's scaled based on REG_FREQ.  That will
> >> probably lead to some regressions based on my experiments.
> >>
> >>     2. Then add a check that completely avoids the cost adjustment
> in
> >> cases where we pushed a MAY_SPILL_P allocno.  This was on my todo
> list,
> >> but I haven't got to it yet.
> >>
> >> If you wanted to get fancy, you could track the maximum number of
> >> neighbors in each class as allocnos are pushed and use that to
> adjust
> >> how many registers are cost adjusted in assign_hard_reg.  The idea
> >> being
> >> the more neighbors the allocno has, the  more callee-saved regsiters
> >> we're likely to need.
> >>
> >> You could also try to account for the fact that once allocated, the
> >> callee saved regsiter is available "for free" to non-conflicting
> >> allocnos.   So you could iterate over those and decrease the penalty
> >> for
> >> using a callee saved register on the current allocno.  Given the
> >> interfaces provided by IRA, this could be compile-time expensive.
> >>
> >>
> > Your "small improvement to IRA allocations" patch posted yesterday
> > achieves something very similar to this, right?
> >
> Not really.  That small improvement is a well known heuristic to make
> it
> more likely that a node with a high degree can be colored by
> encouraging
> the use of the same color for conflicting nodes which do not conflict
> with each other.
> 
> It may help in some of your cases, but they really aren't the
> motivation
> behind the change.

I think I understand!

> 
> > In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
> > off depends on which allocno is colored first.
> Absolutely.   Note that we use a tiny adjustment to costing for this
> heuristic so that, in general, this heuristic only applies when  when
> other costing heuristics don't result in a particular register
> preference.

Yes, you don't want to undo all the cleverness done previously.  I do
wonder if the costs should all be multiplied by ten so we can distinguish
between a cost of 1 and something that should be less than 1 but not 0.

> 
> [ ... ]
> > Obviously, this will make things even more expensive for compilation,
> but
> > I'm not sure I can get what I require without some kind of "look-
> ahead"
> > mechanism like in the two ideas above.  Unless I simply change the
> > coloring algorithm so that we always color those that need
> BOTTOM_REGS
> > first?  I think this is effectively what I was achieving with my
> early
> > hacks involving the cost adjustments of 6 and 20.
> >
> Well, always coloring BOTTOM_REGs first simply means that you're
> bypassing the costing metrics rather than fixing or enhancing them.
> The class of problems you're running into aren't unique to your
> processor and thus I'd rather see you helping with fixing the costing
> heuristics which will help just about everyone.

I implemented the early-coloring approach and got good performance on
most benchmarks, but a 3% regression on one due to planting far more
loads and stores.  I guess there were some allocnos that really wanted
any register, but all the early-colored ones got colored first and
pinched the ones they would have used.  I have had some ideas for
refining this heuristic, but your email today, and my subsequent
epiphany, led to me returning to the constraints approach.

> It's also worth remembering that allocation is based on heuristics,
> which often don't lead to perfect allocation, merely a good one.

I have quoted this to my boss! ;-)

Cheers,
Ian

> 
>

next prev parent reply	other threads:[~2009-12-02 20:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-05 17:36 Ian Bolton
2009-11-05 18:05 ` Ian Bolton
2009-11-06 12:53   ` Dave Hudson
2009-11-09 14:13     ` Ian Bolton
2009-11-10 12:19       ` Dave Hudson
2009-11-10 17:21     ` Jeff Law
2009-11-10 17:38       ` Ian Bolton
2009-11-11 15:19         ` Ian Bolton
2009-11-11 16:12           ` Jeff Law
2009-11-11 17:04           ` Vladimir Makarov
2009-11-11 18:36             ` Ian Bolton
2009-11-11 20:09               ` Ian Bolton
2009-11-16 17:35                 ` Ian Bolton
     [not found]                   ` <4B01BB87.6020902@redhat.com>
2009-11-19 15:41                     ` Ian Bolton
     [not found]                       ` <4B1451C7.2010207@redhat.com>
2009-12-02 20:29                         ` Ian Bolton [this message]
2009-12-03 19:16                           ` Jeff Law
2009-12-07 13:30                             ` Ian Bolton
  -- strict thread matches above, loose matches on Subject: below --
2009-10-16 14:22 Ian Bolton
2009-10-16 15:23 ` Vladimir Makarov
2009-10-16 16:19   ` Jeff Law
2009-10-19 19:21     ` Ian Bolton
2009-10-19 21:09       ` Vladimir Makarov
2009-10-23  7:33       ` Jeff Law
2009-11-04 17:52         ` Ian Bolton
2009-11-04 19:49           ` Jeff Law
2009-10-16 15:45 ` Vladimir Makarov
2009-11-03 16:29   ` Ian Bolton
2009-11-03 23:02     ` Jeff Law
2009-11-04 17:13       ` Vladimir Makarov
2009-11-05  0:23         ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D60B0700D1DB54A8C0C6E9BE69163700C8D36DE@EXCHANGEVS.IceraSemi.local \
    --to=bolton@icerasemi.com \
    --cc=gcc@blueteddy.net \
    --cc=gcc@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=vmakarov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).