RE: Understanding IRA

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* RE: Understanding IRA
@ 2009-11-05 17:36 Ian Bolton
  2009-11-05 18:05 ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-05 17:36 UTC (permalink / raw)
  To: Jeff Law, Vladimir Makarov; +Cc: gcc

(NOTE: this is a repost to gcc@gcc.gnu.org of a private message
originally sent to just Jeff Law and Vladimir Makarov.)

> Vladimir Makarov wrote:
> Jeff Law wrote:
> > On 11/03/09 09:29, Ian Bolton wrote:
> >> Hi again, Vladimir,
> >>
> >> I am pleased to report some performance improvements after altering
> >> ira-costs.c.  A key benchmark for us has improved by 5%.
> >>
> >> Specifically, in record_reg_classes(), after the alt_cost has been
> >> calculated and it will be applied to pp->mem_cost and pp->cost[k],
I
> >> check whether this particular operand wanted one of our BOTTOM_REGS
> >> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> >> amount and also increase pp->cost[k] by an arbitrary amount if k
> >> does not represent the BOTTOM_REGS class.  My aim here is to nudge
> >> IRA in the right direction for operands that just want BOTTOM_REGS.
> >>
> >> After experimenting with different values for my "arbitrary
> >> amounts", I discovered some that successfully made IRA more likely
> >> to give BOTTOM_REGS to those instructions/operands that want
> >> BOTTOM_REGS, since any other regs and memory ended up with high
> >> enough costs for IRA to try and avoid using them.
> >>
> >> I have included a snippet from my version of record_reg_classes()
> >> below:
> >>
> >>
> > What I don't understand at this point is why the current mechanisms
> in
> > IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher
> > cost for TOP_REGS).  i.e.  I don't think any of this should be
> > necessary as IRA should already be doing something similar.
> >
> > This may be a case where your backend hasn't indicated that TOP_REGS
> > has a higher cost than BOTTOM_REGS in particular situations.
> >
> I am agree with Jeff.  It is hard to understand what you are doing
> without the architecture knowledge and some macro values in your port
> (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
> 
> I'd also add that besides right macro value definitions, you could use
> insn alternative hints near register constraints like ? or even *.
>

FYI, my IRA_COVER_CLASSES is simply this:

#define IRA_COVER_CLASSES	\
{					\
  GENERAL_REGS,  			\
  LIM_REG_CLASSES 		\
}

REGISTER_MOVE_COST is currently set to the default of 2, in defaults.h.
On our architecture, it takes 1 cycle to move from one register to
another, so I assume 2 would be the correct number if this is meant to
represent the cost of moving there and back again?  Or is the cost 2
to represent the cycle impact (1) plus the size impact (1)?

MEMORY_MOVE_COST is set to be 4 + memory_move_secondary_cost in
reload.h.  I've not delved into what is happening in that function for
my particular case, but I see it mentions REGISTER_MOVE_COST in there
too.

(As you can see from my previous email, I have increased the costs for
allocnos where we want BOTTOM_REGS and get TOP_CREGS or MEM by 6*freq
and 20*freq, respectively.  I do not know whether these numbers are
sensible at this stage, since I do not fully understand what the costs
are meant to represent.)

After Jeff mentioned that IRA should already be doing the right thing,
I had another look at the 176r.ira dump file and realised that I had
originally only seen "Pass 0 for finding allocno costs" in that file.
In fact, some costs for allocnos are added in Pass 1.

For example, in unmodified IRA with priority algorithm and mixed
region:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,8000 C_REGS:0,8000
MEM:8000
...
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:8000,8000 C_REGS:8000,8000
MEM:20000

From looking at 174r.sched1, I can see that r203 is defined in l0 and
used as an operand that needed BOTTOM_REGS in l1.  Notice that the
allocno cost is 0 for l0, but the total cost is 8000, since this has
been passed on from the 8000 for allocno in l1.

r203 ends up getting register #23, which is not a BOTTOM_REG, so we need
to generate a move in region l1 before we can use this value.

What is interesting is that in my modified version of IRA, both lines
that mention r203 have an allocno cost:

a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:12000,50000
C_REGS:12000,50000 MEM:48000
a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:38000,38000
C_REGS:38000,38000 MEM:138000

I'm not really sure how my change in record_reg_classes has ended up
altering the allocno cost in l0, but that does seem to be the key.
I will continue investigating this.

Another thing you can notice from the above cost output is that allocnos
that are going to be used for memory operations are getting their MEM
cost set high, to prevent them being put in MEM.  (r230 gets used
several
times to load in values and store out values.)

But in my modified IRA, my grossly inflated MEM costs for allocnos that
want BOTTOM_REGS are causing some of those allocnos to get a higher
priority than these ones that are to be used as memory address.  See
that a29/r203 has a MEM score of 138000, which means it gets first pick
of the available registers, so more chance of getting the BOTTOM_REG
that it wants.

I wonder if the value being used for these memory address allocnos could
be tuned.  I also wonder if we could persuade them to prefer TOP_CREGS,
thus leaving more BOTTOM_REGS for those that want them.

Best regards,
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-05 17:36 Understanding IRA Ian Bolton
@ 2009-11-05 18:05 ` Ian Bolton
  2009-11-06 12:53   ` Dave Hudson
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-05 18:05 UTC (permalink / raw)
  To: Jeff Law, Vladimir Makarov; +Cc: gcc


> Ian Bolton wrote:
> 
> (NOTE: this is a repost to gcc@gcc.gnu.org of a private message
> originally sent to just Jeff Law and Vladimir Makarov.)
> 
> > Vladimir Makarov wrote:
> > Jeff Law wrote:
> > > On 11/03/09 09:29, Ian Bolton wrote:
> > >> Hi again, Vladimir,
> > >>
> > >> I am pleased to report some performance improvements after
> altering
> > >> ira-costs.c.  A key benchmark for us has improved by 5%.
> > >>
> > >> Specifically, in record_reg_classes(), after the alt_cost has
been
> > >> calculated and it will be applied to pp->mem_cost and
pp->cost[k],
> I
> > >> check whether this particular operand wanted one of our
> BOTTOM_REGS
> > >> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> > >> amount and also increase pp->cost[k] by an arbitrary amount if k
> > >> does not represent the BOTTOM_REGS class.  My aim here is to
nudge
> > >> IRA in the right direction for operands that just want
> BOTTOM_REGS.
> > >>
> > >> After experimenting with different values for my "arbitrary
> > >> amounts", I discovered some that successfully made IRA more
likely
> > >> to give BOTTOM_REGS to those instructions/operands that want
> > >> BOTTOM_REGS, since any other regs and memory ended up with high
> > >> enough costs for IRA to try and avoid using them.
> > >>
> > >> I have included a snippet from my version of record_reg_classes()
> > >> below:
> > >>
> > >>
> > > What I don't understand at this point is why the current
mechanisms
> > in
> > > IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher
> > > cost for TOP_REGS).  i.e.  I don't think any of this should be
> > > necessary as IRA should already be doing something similar.
> > >
> > > This may be a case where your backend hasn't indicated that
> TOP_REGS
> > > has a higher cost than BOTTOM_REGS in particular situations.
> > >
> > I am agree with Jeff.  It is hard to understand what you are doing
> > without the architecture knowledge and some macro values in your
port
> > (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
> >
> > I'd also add that besides right macro value definitions, you could
> use
> > insn alternative hints near register constraints like ? or even *.
> >
> 
> FYI, my IRA_COVER_CLASSES is simply this:
> 
> #define IRA_COVER_CLASSES	\
> {					\
>   GENERAL_REGS,  			\
>   LIM_REG_CLASSES 		\
> }
> 
> REGISTER_MOVE_COST is currently set to the default of 2, in
defaults.h.
> On our architecture, it takes 1 cycle to move from one register to
> another, so I assume 2 would be the correct number if this is meant to
> represent the cost of moving there and back again?  Or is the cost 2
> to represent the cycle impact (1) plus the size impact (1)?
> 
> MEMORY_MOVE_COST is set to be 4 + memory_move_secondary_cost in
> reload.h.  I've not delved into what is happening in that function for
> my particular case, but I see it mentions REGISTER_MOVE_COST in there
> too.
> 
> (As you can see from my previous email, I have increased the costs for
> allocnos where we want BOTTOM_REGS and get TOP_CREGS or MEM by 6*freq
> and 20*freq, respectively.  I do not know whether these numbers are
> sensible at this stage, since I do not fully understand what the costs
> are meant to represent.)
> 
> After Jeff mentioned that IRA should already be doing the right thing,
> I had another look at the 176r.ira dump file and realised that I had
> originally only seen "Pass 0 for finding allocno costs" in that file.
> In fact, some costs for allocnos are added in Pass 1.
> 
> For example, in unmodified IRA with priority algorithm and mixed
> region:
> 
> a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
> a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,8000 C_REGS:0,8000
> MEM:8000
> ...
> a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:8000,8000
> C_REGS:8000,8000
> MEM:20000
> 
> From looking at 174r.sched1, I can see that r203 is defined in l0 and
> used as an operand that needed BOTTOM_REGS in l1.  Notice that the
> allocno cost is 0 for l0, but the total cost is 8000, since this has
> been passed on from the 8000 for allocno in l1.
> 
> r203 ends up getting register #23, which is not a BOTTOM_REG, so we
> need
> to generate a move in region l1 before we can use this value.
> 
> What is interesting is that in my modified version of IRA, both lines
> that mention r203 have an allocno cost:
> 
> a0(r230,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:0,0 C_REGS:0,0 MEM:32000
> a1(r203,l0) costs: BOTTOM_REGS:0,0 TOP_CREGS:12000,50000
> C_REGS:12000,50000 MEM:48000
> a29(r203,l1) costs: BOTTOM_REGS:0,0 TOP_CREGS:38000,38000
> C_REGS:38000,38000 MEM:138000
> 
> I'm not really sure how my change in record_reg_classes has ended up
> altering the allocno cost in l0, but that does seem to be the key.
> I will continue investigating this.
> 
> Another thing you can notice from the above cost output is that
> allocnos
> that are going to be used for memory operations are getting their MEM
> cost set high, to prevent them being put in MEM.  (r230 gets used
> several
> times to load in values and store out values.)
> 
> But in my modified IRA, my grossly inflated MEM costs for allocnos
that
> want BOTTOM_REGS are causing some of those allocnos to get a higher
> priority than these ones that are to be used as memory address.  See
> that a29/r203 has a MEM score of 138000, which means it gets first
pick
> of the available registers, so more chance of getting the BOTTOM_REG
> that it wants.
> 
> I wonder if the value being used for these memory address allocnos
> could
> be tuned.  I also wonder if we could persuade them to prefer
TOP_CREGS,
> thus leaving more BOTTOM_REGS for those that want them.
> 

I think I may have made a breakthrough!

As mentioned above, IRA is correctly increasing the cost for TOP_REGS
when an allocno in region 1 is being used in one of our special
instructions that needs BOTTOM_REGS.  We can also see that IRA is
passing that allocno cost up to the associated allocno in region 0 and
altering the total_cost for that allocno.

However, it does not appear as though IRA is doing anything with the
total_costs, apart from using it to determine the preferred class for
the allocno.  When the main logic of the algorithms comes into play,
we only look at allocno_costs, which do not take into account the
future usage of that register in the allocno in region 1.

I believe that if the decisions were made based on total_costs then I
would get a better allocation with existing IRA.  Before I experiment
with this, I was wondering what the motivation is for only looking
at allocno_costs?

BTW, I did look into using the ! and * constraints, but I don't think
they could help.  Our architecture is like Motorola 68k in that we have
some instructions that can use all the registers and some that can
only use the subset (BOTTOM_REGS).  The latter type can only specify
"b" for their constraint, since they can only go in BOTTOM_REGS.  The
former type might benefit from "b,!t", so we are more likely to get
things in BOTTOM_REGS for when they are later needed there, but the
flip-side is that we might also fill up BOTTOM_REGS when actually there
was no subsequent need for the value to be in BOTTOM_REGS.  I may have
misunderstood how all this works, but it looks like constraints will
not help here and, in fact, the total_costs calculations that I
mention above are precisely the way IRA passes information about
register requirements upwards.

Best regards,
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-05 18:05 ` Ian Bolton
@ 2009-11-06 12:53   ` Dave Hudson
  2009-11-09 14:13     ` Ian Bolton
  2009-11-10 17:21     ` Jeff Law
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Hudson @ 2009-11-06 12:53 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Jeff Law, Vladimir Makarov, gcc

On Thu, 2009-11-05 at 18:05 +0000, Ian Bolton wrote:
> I think I may have made a breakthrough!
> 
> As mentioned above, IRA is correctly increasing the cost for TOP_REGS
> when an allocno in region 1 is being used in one of our special
> instructions that needs BOTTOM_REGS.  We can also see that IRA is
> passing that allocno cost up to the associated allocno in region 0 and
> altering the total_cost for that allocno.
> 
> However, it does not appear as though IRA is doing anything with the
> total_costs, apart from using it to determine the preferred class for
> the allocno.  When the main logic of the algorithms comes into play,
> we only look at allocno_costs, which do not take into account the
> future usage of that register in the allocno in region 1.
> 
> I believe that if the decisions were made based on total_costs then I
> would get a better allocation with existing IRA.  Before I experiment
> with this, I was wondering what the motivation is for only looking
> at allocno_costs?
> 
> BTW, I did look into using the ! and * constraints, but I don't think
> they could help.  Our architecture is like Motorola 68k in that we have
> some instructions that can use all the registers and some that can
> only use the subset (BOTTOM_REGS).  The latter type can only specify
> "b" for their constraint, since they can only go in BOTTOM_REGS.  The
> former type might benefit from "b,!t", so we are more likely to get
> things in BOTTOM_REGS for when they are later needed there, but the
> flip-side is that we might also fill up BOTTOM_REGS when actually there
> was no subsequent need for the value to be in BOTTOM_REGS.  I may have
> misunderstood how all this works, but it looks like constraints will
> not help here and, in fact, the total_costs calculations that I
> mention above are precisely the way IRA passes information about
> register requirements upwards.

I've been working on gcc for an ISA (Ubicom32) that seems to have some
similarities to the problem you're seeing (we have some regs that can be
used for many things but not all) and was seeing a ton a pointless moves
introduced by reload.

I've ended up trying quite a few different strategies including defining
different cover classes and the strategy we're now using is to leave the
cover classes the same way you have them but found that the most
critical thing was to ensure that REGNO_REG_CLASS was returning a
minimal class correctly.  Once I had this right then IRA started to do
the right thing most of the time (-Os still isn't great but -O2 usually
looks very good now).  We still see some problems but they're usually a
result of reload messing things up or suboptimal handling of
auto-incrementing in MEMs.

I haven't found using "*" and "!" helped much except where we had a
couple of different ways of encoding the same behaviour and where we
wanted to avoid using one unless the register allocations worked out
right.

Cheers,
Dave

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-06 12:53   ` Dave Hudson
@ 2009-11-09 14:13     ` Ian Bolton
  2009-11-10 12:19       ` Dave Hudson
  2009-11-10 17:21     ` Jeff Law
  1 sibling, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-09 14:13 UTC (permalink / raw)
  To: Dave Hudson; +Cc: Jeff Law, Vladimir Makarov, gcc

Dave Hudson wrote:
> On Thu, 2009-11-05 at 18:05 +0000, Ian Bolton wrote:
> > I think I may have made a breakthrough!
> >
> > As mentioned above, IRA is correctly increasing the cost for TOP_REGS
> > when an allocno in region 1 is being used in one of our special
> > instructions that needs BOTTOM_REGS.  We can also see that IRA is
> > passing that allocno cost up to the associated allocno in region 0
> and
> > altering the total_cost for that allocno.
> >
> > However, it does not appear as though IRA is doing anything with the
> > total_costs, apart from using it to determine the preferred class for
> > the allocno.  When the main logic of the algorithms comes into play,
> > we only look at allocno_costs, which do not take into account the
> > future usage of that register in the allocno in region 1.
> >
> > I believe that if the decisions were made based on total_costs then I
> > would get a better allocation with existing IRA.  Before I experiment
> > with this, I was wondering what the motivation is for only looking
> > at allocno_costs?
> >
> > BTW, I did look into using the ! and * constraints, but I don't think
> > they could help.  Our architecture is like Motorola 68k in that we
> have
> > some instructions that can use all the registers and some that can
> > only use the subset (BOTTOM_REGS).  The latter type can only specify
> > "b" for their constraint, since they can only go in BOTTOM_REGS.  The
> > former type might benefit from "b,!t", so we are more likely to get
> > things in BOTTOM_REGS for when they are later needed there, but the
> > flip-side is that we might also fill up BOTTOM_REGS when actually
> there
> > was no subsequent need for the value to be in BOTTOM_REGS.  I may
> have
> > misunderstood how all this works, but it looks like constraints will
> > not help here and, in fact, the total_costs calculations that I
> > mention above are precisely the way IRA passes information about
> > register requirements upwards.
> 
> I've been working on gcc for an ISA (Ubicom32) that seems to have some
> similarities to the problem you're seeing (we have some regs that can
> be
> used for many things but not all) and was seeing a ton a pointless
> moves
> introduced by reload.
> 
> I've ended up trying quite a few different strategies including
> defining
> different cover classes and the strategy we're now using is to leave
> the
> cover classes the same way you have them but found that the most
> critical thing was to ensure that REGNO_REG_CLASS was returning a
> minimal class correctly.  Once I had this right then IRA started to do
> the right thing most of the time (-Os still isn't great but -O2 usually
> looks very good now).  We still see some problems but they're usually a
> result of reload messing things up or suboptimal handling of
> auto-incrementing in MEMs.

Thanks for the info, Dave.

Given that C_REGS = r0-r31, BOTTOM_REGS = r0-r15, TOP_CREGS = r16-r31, 
when you say "minimal class", does that mean that I should change my
macro from this:

/* Map a register number to a register class.  */
#define REGNO_REG_CLASS(REGNO)				\
  (BOTTOM_C_REG_P (REGNO) ? BOTTOM_REGS : 		\
   (REGNO) == LINK_POINTER_REGNUM ? LINK_REGS : 	\
   C_REG_P (REGNO) ? C_REGS : 				\
   D_REG_P (REGNO) ? D_REGS : 				\
   CSR_REG_P (REGNO) ? CSR_REGS : 			\
   (unsigned)(REGNO) < FIRST_PSEUDO_REGISTER ? INTERNAL_REGS : ALL_REGS)

to this (see C_REG_P line for change):

/* Map a register number to a register class.  */
#define REGNO_REG_CLASS(REGNO)				\
  (BOTTOM_C_REG_P (REGNO) ? BOTTOM_REGS : 		\
   (REGNO) == LINK_POINTER_REGNUM ? LINK_REGS : 	\
   C_REG_P (REGNO) ? TOP_CREGS : 				\
   D_REG_P (REGNO) ? D_REGS : 				\
   CSR_REG_P (REGNO) ? CSR_REGS : 			\
   (unsigned)(REGNO) < FIRST_PSEUDO_REGISTER ? INTERNAL_REGS : ALL_REGS)

I made the change and nothing noticeable happened, but maybe once I fix
whatever else needs fixing then the second version of the macro will be
better?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-09 14:13     ` Ian Bolton
@ 2009-11-10 12:19       ` Dave Hudson
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Hudson @ 2009-11-10 12:19 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Jeff Law, Vladimir Makarov, gcc

On Mon, 2009-11-09 at 14:13 +0000, Ian Bolton wrote:
> Dave Hudson wrote:
> > I've been working on gcc for an ISA (Ubicom32) that seems to have some
> > similarities to the problem you're seeing (we have some regs that can
> > be
> > used for many things but not all) and was seeing a ton a pointless
> > moves
> > introduced by reload.
> > 
> > I've ended up trying quite a few different strategies including
> > defining
> > different cover classes and the strategy we're now using is to leave
> > the
> > cover classes the same way you have them but found that the most
> > critical thing was to ensure that REGNO_REG_CLASS was returning a
> > minimal class correctly.  Once I had this right then IRA started to do
> > the right thing most of the time (-Os still isn't great but -O2 usually
> > looks very good now).  We still see some problems but they're usually a
> > result of reload messing things up or suboptimal handling of
> > auto-incrementing in MEMs.
> 
> Thanks for the info, Dave.
> 
> Given that C_REGS = r0-r31, BOTTOM_REGS = r0-r15, TOP_CREGS = r16-r31, 
> when you say "minimal class", does that mean that I should change my
> macro from this:
> 
> /* Map a register number to a register class.  */
> #define REGNO_REG_CLASS(REGNO)				\
>   (BOTTOM_C_REG_P (REGNO) ? BOTTOM_REGS : 		\
>    (REGNO) == LINK_POINTER_REGNUM ? LINK_REGS : 	\
>    C_REG_P (REGNO) ? C_REGS : 				\
>    D_REG_P (REGNO) ? D_REGS : 				\
>    CSR_REG_P (REGNO) ? CSR_REGS : 			\
>    (unsigned)(REGNO) < FIRST_PSEUDO_REGISTER ? INTERNAL_REGS : ALL_REGS)
> 
> to this (see C_REG_P line for change):
> 
> /* Map a register number to a register class.  */
> #define REGNO_REG_CLASS(REGNO)				\
>   (BOTTOM_C_REG_P (REGNO) ? BOTTOM_REGS : 		\
>    (REGNO) == LINK_POINTER_REGNUM ? LINK_REGS : 	\
>    C_REG_P (REGNO) ? TOP_CREGS : 				\
>    D_REG_P (REGNO) ? D_REGS : 				\
>    CSR_REG_P (REGNO) ? CSR_REGS : 			\
>    (unsigned)(REGNO) < FIRST_PSEUDO_REGISTER ? INTERNAL_REGS : ALL_REGS)
> 
> I made the change and nothing noticeable happened, but maybe once I fix
> whatever else needs fixing then the second version of the macro will be
> better?

I think this looks more like what I had to do.  FWIW I found it easier
to use an array in our port rather than using a cascading conditional
sequence (I can't remember which port I got that idea from):

#define REG_CLASS_CONTENTS					\
{								\
  {0x00000000, 0x00000000},	/* No regs */			\
  {0x0000ffff, 0x00000000},	/* DATA_REGS */			\
  {0x00010000, 0x00000000},	/* FDPIC_REG */			\
  {0x20fe0000, 0x00000000},	/* ADDRESS_REGS */		\
  {0x20ff0000, 0x00000000},	/* ALL_ADDRESS_REGS */		\
  {0x0a000000, 0x00000000},	/* ACC_LO_REGS */		\
  {0x0f000000, 0x00000000},	/* ACC_REGS */			\
  {0x40000000, 0x00000000},	/* CC_REG */			\
  {0x0f00ffff, 0x00000000},	/* DATA_ACC_REGS */		\
  {0x10000000, 0x00000000},	/* SOURGE3_REG */		\
  {0x80000000, 0x0000007f},	/* SPECIAL_REGS */		\
  {0xbfffffff, 0x0000007f},	/* GENERAL_REGS */		\
  {0xbfffffff, 0x0000007f}	/* ALL_REGS	*/		\
}

extern enum reg_class const
ubicom32_regclass_map[FIRST_PSEUDO_REGISTER];

/* A C expression whose value is a register class containing hard
register
   REGNO.  In general there is more than one such class; choose a class
which
   is "minimal", meaning that no smaller class also contains the
register.  */
#define REGNO_REG_CLASS(REGNO) (ubicom32_regclass_map[REGNO])

and:

enum reg_class const ubicom32_regclass_map[FIRST_PSEUDO_REGISTER] =
{
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  DATA_REGS, 
  FDPIC_REG, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ADDRESS_REGS, 
  ACC_REGS,
  ACC_LO_REGS,
  ACC_REGS,
  ACC_LO_REGS,
  SOURCE3_REG,
  ADDRESS_REGS,
  NO_REGS,			/* CC_REG must be NO_REGS */
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS,
  SPECIAL_REGS
};



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-06 12:53   ` Dave Hudson
  2009-11-09 14:13     ` Ian Bolton
@ 2009-11-10 17:21     ` Jeff Law
  2009-11-10 17:38       ` Ian Bolton
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff Law @ 2009-11-10 17:21 UTC (permalink / raw)
  To: Dave Hudson; +Cc: Ian Bolton, Vladimir Makarov, gcc

On 11/06/09 05:53, Dave Hudson wrote:
>   the most
> critical thing was to ensure that REGNO_REG_CLASS was returning a
> minimal class correctly.
I believe that's been documented as the right thing to do for about 15 
years :-)   So, yes, you definitely want REGNO_REG_CLASS to return the 
smallest class to which a register belongs.

Jeff


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-10 17:21     ` Jeff Law
@ 2009-11-10 17:38       ` Ian Bolton
  2009-11-11 15:19         ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-10 17:38 UTC (permalink / raw)
  To: Jeff Law, Dave Hudson; +Cc: Vladimir Makarov, gcc

> On 11/06/09 05:53, Dave Hudson wrote:
> >   the most
> > critical thing was to ensure that REGNO_REG_CLASS was returning a
> > minimal class correctly.
> I believe that's been documented as the right thing to do for about 15
> years :-)   So, yes, you definitely want REGNO_REG_CLASS to return the
> smallest class to which a register belongs.
> 
> Jeff

So I should definitely change my function then!  If something isn't
BOTTOM_REGS and it is in C_REGS then the smallest class it's in is
actually TOP_CREGS.

BTW, today I have achieved some good results with existing IRA by doing
the following:

1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out before
BOTTOM_REGS.  My previous hacked version worked by increasing the
priority of those that wanted BOTTOM_REGS, so they got first pick; this
new version makes them wait their turn, but ensures those with higher
priority take TOP_CREGS before BOTTOM_REGS.

The analogy, I think, is of giving out meals on an airplane.  Most
people will eat meat or vegetarian meals, whereas vegetarians only
want vegetarian meals.  My hacked version was effectively allowing
the vegetarians to push to the front of the queue and get their meals;
this new version works by encouraging the meat eaters to eat the meaty
meals first, leaving more suitable meals for the vegetarians further
down the plane.

2) I have forced propagation of allocnos to parent regions with the
following hack in find_allocno_class_costs():

{
  /* Propagate costs to upper levels in the region
     tree.  */
  parent_a_num = ALLOCNO_NUM (parent_a);
  for (k = 0; k < cost_classes_num; k++)
    COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
      += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
  COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
    += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
  /* BEGIN IGB-IRA CHANGE 2 */
  /* Force total_costs to propagate upwards, by setting
     allocno_costs to be total_costs */
  for (k = 0; k < cost_classes_num; k++)
    COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
      = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
    COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
      = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
  /* END IGB-IRA CHANGE 2 */
}

I don't know why propagation isn't happening normally, but
this total_costs hack achieves the same thing for me at the
moment.  Is there any information I could provide to help
someone tell me why propagation isn't happening?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-10 17:38       ` Ian Bolton
@ 2009-11-11 15:19         ` Ian Bolton
  2009-11-11 16:12           ` Jeff Law
  2009-11-11 17:04           ` Vladimir Makarov
  0 siblings, 2 replies; 30+ messages in thread
From: Ian Bolton @ 2009-11-11 15:19 UTC (permalink / raw)
  To: Ian Bolton, Jeff Law, Dave Hudson; +Cc: Vladimir Makarov, gcc

Yesterday, I wrote:
> BTW, today I have achieved some good results with existing IRA by doing
> the following:
> 
> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out before
> BOTTOM_REGS.  My previous hacked version worked by increasing the
> priority of those that wanted BOTTOM_REGS, so they got first pick; this
> new version makes them wait their turn, but ensures those with higher
> priority take TOP_CREGS before BOTTOM_REGS.
> 
> The analogy, I think, is of giving out meals on an airplane.  Most
> people will eat meat or vegetarian meals, whereas vegetarians only
> want vegetarian meals.  My hacked version was effectively allowing
> the vegetarians to push to the front of the queue and get their meals;
> this new version works by encouraging the meat eaters to eat the meaty
> meals first, leaving more suitable meals for the vegetarians further
> down the plane.
> 
> 2) I have forced propagation of allocnos to parent regions with the
> following hack in find_allocno_class_costs():
> 
> {
>   /* Propagate costs to upper levels in the region
>      tree.  */
>   parent_a_num = ALLOCNO_NUM (parent_a);
>   for (k = 0; k < cost_classes_num; k++)
>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
>   /* BEGIN IGB-IRA CHANGE 2 */
>   /* Force total_costs to propagate upwards, by setting
>      allocno_costs to be total_costs */
>   for (k = 0; k < cost_classes_num; k++)
>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
>   /* END IGB-IRA CHANGE 2 */
> }
> 
> I don't know why propagation isn't happening normally, but
> this total_costs hack achieves the same thing for me at the
> moment.  Is there any information I could provide to help
> someone tell me why propagation isn't happening?

Good news!  I have been able to remove my "total_costs" hack
above by instead commenting out the following line from ira_build()
in ira-build.c:

  remove_unnecessary_regions (false);

For my situation at least, this function is preventing the
propagation of useful allocno information from region 1 to
region 0.  Without my change, the allocation for a pseudo in
region 0 is not aware of how that pseudo will be used inside
a loop in region 1; the real impact of this is that we need
to then do a register move *inside the loop* into a valid
register class for the instruction in region 1.

I do not believe this will impact anyone with a regular
register set, but for any architecture where some instructions
can only use a subset of the register bank, I believe that
all regions are always necessary, since cost information
for each allocno is relevant and important.

I still need to do some more testing in regards this "fix",
but I wanted to put my findings out there as soon as possible
for comment from the experts.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-11 15:19         ` Ian Bolton
@ 2009-11-11 16:12           ` Jeff Law
  2009-11-11 17:04           ` Vladimir Makarov
  1 sibling, 0 replies; 30+ messages in thread
From: Jeff Law @ 2009-11-11 16:12 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Dave Hudson, Vladimir Makarov, gcc

On 11/11/09 08:18, Ian Bolton wrote:
>
> Good news!  I have been able to remove my "total_costs" hack
> above by instead commenting out the following line from ira_build()
> in ira-build.c:
>
>    remove_unnecessary_regions (false);
>    
Which is probably an indication of a problem elsewhere.  However, it's 
definitely a good find in that you can put the two compilers side by 
side and debug them to figure out why calling this function inhibits 
propagation.;


> I do not believe this will impact anyone with a regular
> register set, but for any architecture where some instructions
> can only use a subset of the register bank, I believe that
> all regions are always necessary, since cost information
> for each allocno is relevant and important.
>    
FWIW, almost every chip has certain irregularities in their register 
set, so improvements in how the irregularities are handled is definitely 
a good thing across the board.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-11 15:19         ` Ian Bolton
  2009-11-11 16:12           ` Jeff Law
@ 2009-11-11 17:04           ` Vladimir Makarov
  2009-11-11 18:36             ` Ian Bolton
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Makarov @ 2009-11-11 17:04 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Jeff Law, Dave Hudson, gcc

Ian Bolton wrote:
> Yesterday, I wrote:
>   
>> BTW, today I have achieved some good results with existing IRA by doing
>> the following:
>>
>> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out before
>> BOTTOM_REGS.  My previous hacked version worked by increasing the
>> priority of those that wanted BOTTOM_REGS, so they got first pick; this
>> new version makes them wait their turn, but ensures those with higher
>> priority take TOP_CREGS before BOTTOM_REGS.
>>
>> The analogy, I think, is of giving out meals on an airplane.  Most
>> people will eat meat or vegetarian meals, whereas vegetarians only
>> want vegetarian meals.  My hacked version was effectively allowing
>> the vegetarians to push to the front of the queue and get their meals;
>> this new version works by encouraging the meat eaters to eat the meaty
>> meals first, leaving more suitable meals for the vegetarians further
>> down the plane.
>>
>> 2) I have forced propagation of allocnos to parent regions with the
>> following hack in find_allocno_class_costs():
>>
>> {
>>   /* Propagate costs to upper levels in the region
>>      tree.  */
>>   parent_a_num = ALLOCNO_NUM (parent_a);
>>   for (k = 0; k < cost_classes_num; k++)
>>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
>>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
>>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
>>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
>>   /* BEGIN IGB-IRA CHANGE 2 */
>>   /* Force total_costs to propagate upwards, by setting
>>      allocno_costs to be total_costs */
>>   for (k = 0; k < cost_classes_num; k++)
>>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
>>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
>>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
>>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
>>   /* END IGB-IRA CHANGE 2 */
>> }
>>
>> I don't know why propagation isn't happening normally, but
>> this total_costs hack achieves the same thing for me at the
>> moment.  Is there any information I could provide to help
>> someone tell me why propagation isn't happening?
>>     
>
> Good news!  I have been able to remove my "total_costs" hack
> above by instead commenting out the following line from ira_build()
> in ira-build.c:
>
>   remove_unnecessary_regions (false);
>
> For my situation at least, this function is preventing the
> propagation of useful allocno information from region 1 to
> region 0.  Without my change, the allocation for a pseudo in
> region 0 is not aware of how that pseudo will be used inside
> a loop in region 1; the real impact of this is that we need
> to then do a register move *inside the loop* into a valid
> register class for the instruction in region 1.
>
> I do not believe this will impact anyone with a regular
> register set, but for any architecture where some instructions
> can only use a subset of the register bank, I believe that
> all regions are always necessary, since cost information
> for each allocno is relevant and important.
>
> I still need to do some more testing in regards this "fix",
> but I wanted to put my findings out there as soon as possible
> for comment from the experts.
>   
The function (remove_unnecessary_regions) is used to increase RA speed 
by decreasing number of regions.  The region is removed if the register 
pressure is not high in the region in other words if the probability to 
spill pseudos on the region border to improve RA in the region is small.

But when the region is removed and correspondingly allocnos (see 
function remove_unecessary_allocnos) in the region the info including 
hard register costs is propagated to the  corresponding allocnos in the 
parent region (see function propagate_some_info_from_allocno).

I've just checked the code it looks ok to me.  Ian, it would be nice if 
you figure out why the propagation does not happen.   As Jeff wrote, 
fixing that would be important practically for any target.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-11 17:04           ` Vladimir Makarov
@ 2009-11-11 18:36             ` Ian Bolton
  2009-11-11 20:09               ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-11 18:36 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Jeff Law, Dave Hudson, gcc

Vladimir Makarov wrote:
> Ian Bolton wrote:
> > Yesterday, I wrote:
> >
> >> BTW, today I have achieved some good results with existing IRA by
> doing
> >> the following:
> >>
> >> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out
> before
> >> BOTTOM_REGS.  My previous hacked version worked by increasing the
> >> priority of those that wanted BOTTOM_REGS, so they got first pick;
> this
> >> new version makes them wait their turn, but ensures those with
> higher
> >> priority take TOP_CREGS before BOTTOM_REGS.
> >>
> >> The analogy, I think, is of giving out meals on an airplane.  Most
> >> people will eat meat or vegetarian meals, whereas vegetarians only
> >> want vegetarian meals.  My hacked version was effectively allowing
> >> the vegetarians to push to the front of the queue and get their
> meals;
> >> this new version works by encouraging the meat eaters to eat the
> meaty
> >> meals first, leaving more suitable meals for the vegetarians further
> >> down the plane.
> >>
> >> 2) I have forced propagation of allocnos to parent regions with the
> >> following hack in find_allocno_class_costs():
> >>
> >> {
> >>   /* Propagate costs to upper levels in the region
> >>      tree.  */
> >>   parent_a_num = ALLOCNO_NUM (parent_a);
> >>   for (k = 0; k < cost_classes_num; k++)
> >>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
> >>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
> >>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
> >>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
> >>   /* BEGIN IGB-IRA CHANGE 2 */
> >>   /* Force total_costs to propagate upwards, by setting
> >>      allocno_costs to be total_costs */
> >>   for (k = 0; k < cost_classes_num; k++)
> >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
> >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
> >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
> >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
> >>   /* END IGB-IRA CHANGE 2 */
> >> }
> >>
> >> I don't know why propagation isn't happening normally, but
> >> this total_costs hack achieves the same thing for me at the
> >> moment.  Is there any information I could provide to help
> >> someone tell me why propagation isn't happening?
> >>
> >
> > Good news!  I have been able to remove my "total_costs" hack
> > above by instead commenting out the following line from ira_build()
> > in ira-build.c:
> >
> >   remove_unnecessary_regions (false);
> >
> > For my situation at least, this function is preventing the
> > propagation of useful allocno information from region 1 to
> > region 0.  Without my change, the allocation for a pseudo in
> > region 0 is not aware of how that pseudo will be used inside
> > a loop in region 1; the real impact of this is that we need
> > to then do a register move *inside the loop* into a valid
> > register class for the instruction in region 1.
> >
> > I do not believe this will impact anyone with a regular
> > register set, but for any architecture where some instructions
> > can only use a subset of the register bank, I believe that
> > all regions are always necessary, since cost information
> > for each allocno is relevant and important.
> >
> > I still need to do some more testing in regards this "fix",
> > but I wanted to put my findings out there as soon as possible
> > for comment from the experts.
> >
> The function (remove_unnecessary_regions) is used to increase RA speed
> by decreasing number of regions.  The region is removed if the register
> pressure is not high in the region in other words if the probability to
> spill pseudos on the region border to improve RA in the region is
> small.
> 
> But when the region is removed and correspondingly allocnos (see
> function remove_unecessary_allocnos) in the region the info including
> hard register costs is propagated to the  corresponding allocnos in the
> parent region (see function propagate_some_info_from_allocno).
> 
> I've just checked the code it looks ok to me.  Ian, it would be nice if
> you figure out why the propagation does not happen.   As Jeff wrote,
> fixing that would be important practically for any target.

(I'm in the middle of a hefty compile at the moment, so I can't do any
more debugging yet, but I figured I'd think out loud on this list in the
mean time.)

At first, I thought the problem was that propagate_some_info_from_allocno()
appears to only pass up information about the cover class, as opposed
to every cost_class (which was what my total_costs hack did).  However, I see
that the propagate_allocno_info() function, which is now being called when
I comment out remove_unnecessary_regions(), also only passes up information
about the cover class, so I don't think it's propagation that's the issue.

Looking at ira_build(), I see that there is a call to create_caps() just
after the call to propagate_allocno_info() - both of which were guarded by
the condition more_one_region_p(), which is only true when I comment out
the call to remove_unnecessary_regions().  Looking at create_caps(), I see
that a Cap will be created for each of my region 1 allocnos.  Looking further
down in the ira_build() function, there is a call to ira_build_conflicts(),
which makes a lot of references to Caps.  Perhaps it's the existence of Caps
that matters?

I see in my benchmark that there are some unfortunate allocnos that require
BOTTOM_REGS, but they only live in Region 1.  Am I right in thinking that,
without the caps, there is no way for them to get their voice heard during
the allocation for Region 0?

Maybe, when an irregular register set exists, whether there is high pressure
in the region or not, the Caps should be made to ensure that you don't end
up with move instructions happening within the loop?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-11 18:36             ` Ian Bolton
@ 2009-11-11 20:09               ` Ian Bolton
  2009-11-16 17:35                 ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-11 20:09 UTC (permalink / raw)
  To: Ian Bolton, Vladimir Makarov; +Cc: Jeff Law, Dave Hudson, gcc

Ian Bolton wrote:
> Vladimir Makarov wrote:
> > Ian Bolton wrote:
> > > Yesterday, I wrote:
> > >
> > >> BTW, today I have achieved some good results with existing IRA by
> > doing
> > >> the following:
> > >>
> > >> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out
> > before
> > >> BOTTOM_REGS.  My previous hacked version worked by increasing the
> > >> priority of those that wanted BOTTOM_REGS, so they got first pick;
> > this
> > >> new version makes them wait their turn, but ensures those with
> > higher
> > >> priority take TOP_CREGS before BOTTOM_REGS.
> > >>
> > >> The analogy, I think, is of giving out meals on an airplane.  Most
> > >> people will eat meat or vegetarian meals, whereas vegetarians only
> > >> want vegetarian meals.  My hacked version was effectively allowing
> > >> the vegetarians to push to the front of the queue and get their
> > meals;
> > >> this new version works by encouraging the meat eaters to eat the
> > meaty
> > >> meals first, leaving more suitable meals for the vegetarians
> further
> > >> down the plane.
> > >>
> > >> 2) I have forced propagation of allocnos to parent regions with
> the
> > >> following hack in find_allocno_class_costs():
> > >>
> > >> {
> > >>   /* Propagate costs to upper levels in the region
> > >>      tree.  */
> > >>   parent_a_num = ALLOCNO_NUM (parent_a);
> > >>   for (k = 0; k < cost_classes_num; k++)
> > >>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
> > >>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
> > >>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
> > >>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
> > >>   /* BEGIN IGB-IRA CHANGE 2 */
> > >>   /* Force total_costs to propagate upwards, by setting
> > >>      allocno_costs to be total_costs */
> > >>   for (k = 0; k < cost_classes_num; k++)
> > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
> > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
> > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
> > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
> > >>   /* END IGB-IRA CHANGE 2 */
> > >> }
> > >>
> > >> I don't know why propagation isn't happening normally, but
> > >> this total_costs hack achieves the same thing for me at the
> > >> moment.  Is there any information I could provide to help
> > >> someone tell me why propagation isn't happening?
> > >>
> > >
> > > Good news!  I have been able to remove my "total_costs" hack
> > > above by instead commenting out the following line from ira_build()
> > > in ira-build.c:
> > >
> > >   remove_unnecessary_regions (false);
> > >
> > > For my situation at least, this function is preventing the
> > > propagation of useful allocno information from region 1 to
> > > region 0.  Without my change, the allocation for a pseudo in
> > > region 0 is not aware of how that pseudo will be used inside
> > > a loop in region 1; the real impact of this is that we need
> > > to then do a register move *inside the loop* into a valid
> > > register class for the instruction in region 1.
> > >
> > > I do not believe this will impact anyone with a regular
> > > register set, but for any architecture where some instructions
> > > can only use a subset of the register bank, I believe that
> > > all regions are always necessary, since cost information
> > > for each allocno is relevant and important.
> > >
> > > I still need to do some more testing in regards this "fix",
> > > but I wanted to put my findings out there as soon as possible
> > > for comment from the experts.
> > >
> > The function (remove_unnecessary_regions) is used to increase RA
> speed
> > by decreasing number of regions.  The region is removed if the
> register
> > pressure is not high in the region in other words if the probability
> to
> > spill pseudos on the region border to improve RA in the region is
> > small.
> >
> > But when the region is removed and correspondingly allocnos (see
> > function remove_unecessary_allocnos) in the region the info including
> > hard register costs is propagated to the  corresponding allocnos in
> the
> > parent region (see function propagate_some_info_from_allocno).
> >
> > I've just checked the code it looks ok to me.  Ian, it would be nice
> if
> > you figure out why the propagation does not happen.   As Jeff wrote,
> > fixing that would be important practically for any target.
> 
> (I'm in the middle of a hefty compile at the moment, so I can't do any
> more debugging yet, but I figured I'd think out loud on this list in
> the
> mean time.)
> 
> At first, I thought the problem was that
> propagate_some_info_from_allocno()
> appears to only pass up information about the cover class, as opposed
> to every cost_class (which was what my total_costs hack did).  However,
> I see
> that the propagate_allocno_info() function, which is now being called
> when
> I comment out remove_unnecessary_regions(), also only passes up
> information
> about the cover class, so I don't think it's propagation that's the
> issue.
> 
> Looking at ira_build(), I see that there is a call to create_caps()
> just
> after the call to propagate_allocno_info() - both of which were guarded
> by
> the condition more_one_region_p(), which is only true when I comment
> out
> the call to remove_unnecessary_regions().  Looking at create_caps(), I
> see
> that a Cap will be created for each of my region 1 allocnos.  Looking
> further
> down in the ira_build() function, there is a call to
> ira_build_conflicts(),
> which makes a lot of references to Caps.  Perhaps it's the existence of
> Caps
> that matters?
> 
> I see in my benchmark that there are some unfortunate allocnos that
> require
> BOTTOM_REGS, but they only live in Region 1.  Am I right in thinking
> that,
> without the caps, there is no way for them to get their voice heard
> during
> the allocation for Region 0?
> 
> Maybe, when an irregular register set exists, whether there is high
> pressure
> in the region or not, the Caps should be made to ensure that you don't
> end
> up with move instructions happening within the loop?

I've just changed my code to turn back on the remove_unnecessary_regions
function and ... I fear I have taken us all on a wild goose chase!

It seems that my alternate REG_ALLOC_ORDER, where TOP_CREGS are given out
before BOTTOM_REGS, is enough on its own to get the allocation right!  I
didn't notice this because I added the REG_ALLOC_ORDER change as the final
piece of the puzzle after the total_costs hack.

So I guess the key for any irregular architectures is to hold back the
special registers where possible, so the instructions that need them can
get them by showing they have a higher cost for the non-special registers.

Otherwise, IRA cannot do a good job because the preferences of lower
priority instructions that want BOTTOM_REGS will be irrelevant if higher
priority instructions (such as those working on memory addresses) that
just want any register have ignorantly taken those BOTTOM_REGS already.

I must admit that I have not been looking at the CB algorithm recently
though, so further work may still be needed, although some tests I just
ran confirmed that CB and Priority algorithm got the same performance.

I may also need to tweak our REGISTER_MOVE_COST and MEMORY_MOVE_COST.

Thanks for all your help so far.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-11-11 20:09               ` Ian Bolton
@ 2009-11-16 17:35                 ` Ian Bolton
       [not found]                   ` <4B01BB87.6020902@redhat.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-16 17:35 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Jeff Law, Dave Hudson, gcc

Ian Bolton wrote:
> Ian Bolton wrote:
> > Vladimir Makarov wrote:
> > > Ian Bolton wrote:
> > > > Yesterday, I wrote:
> > > >
> > > >> BTW, today I have achieved some good results with existing IRA
> by
> > > doing
> > > >> the following:
> > > >>
> > > >> 1) Changed the REG_ALLOC_ORDER so that TOP_CREGS are given out
> > > before
> > > >> BOTTOM_REGS.  My previous hacked version worked by increasing
> the
> > > >> priority of those that wanted BOTTOM_REGS, so they got first
> pick;
> > > this
> > > >> new version makes them wait their turn, but ensures those with
> > > higher
> > > >> priority take TOP_CREGS before BOTTOM_REGS.
> > > >>
> > > >> The analogy, I think, is of giving out meals on an airplane.
> Most
> > > >> people will eat meat or vegetarian meals, whereas vegetarians
> only
> > > >> want vegetarian meals.  My hacked version was effectively
> allowing
> > > >> the vegetarians to push to the front of the queue and get their
> > > meals;
> > > >> this new version works by encouraging the meat eaters to eat the
> > > meaty
> > > >> meals first, leaving more suitable meals for the vegetarians
> > further
> > > >> down the plane.
> > > >>
> > > >> 2) I have forced propagation of allocnos to parent regions with
> > the
> > > >> following hack in find_allocno_class_costs():
> > > >>
> > > >> {
> > > >>   /* Propagate costs to upper levels in the region
> > > >>      tree.  */
> > > >>   parent_a_num = ALLOCNO_NUM (parent_a);
> > > >>   for (k = 0; k < cost_classes_num; k++)
> > > >>     COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k]
> > > >>       += COSTS_OF_ALLOCNO (total_costs, a_num)->cost[k];
> > > >>   COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost
> > > >>     += COSTS_OF_ALLOCNO (total_costs, a_num)->mem_cost;
> > > >>   /* BEGIN IGB-IRA CHANGE 2 */
> > > >>   /* Force total_costs to propagate upwards, by setting
> > > >>      allocno_costs to be total_costs */
> > > >>   for (k = 0; k < cost_classes_num; k++)
> > > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->cost[k]
> > > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->cost[k];
> > > >>     COSTS_OF_ALLOCNO (allocno_costs, parent_a_num)->mem_cost
> > > >>       = COSTS_OF_ALLOCNO (total_costs, parent_a_num)->mem_cost;
> > > >>   /* END IGB-IRA CHANGE 2 */
> > > >> }
> > > >>
> > > >> I don't know why propagation isn't happening normally, but
> > > >> this total_costs hack achieves the same thing for me at the
> > > >> moment.  Is there any information I could provide to help
> > > >> someone tell me why propagation isn't happening?
> > > >>
> > > >
> > > > Good news!  I have been able to remove my "total_costs" hack
> > > > above by instead commenting out the following line from
> ira_build()
> > > > in ira-build.c:
> > > >
> > > >   remove_unnecessary_regions (false);
> > > >
> > > > For my situation at least, this function is preventing the
> > > > propagation of useful allocno information from region 1 to
> > > > region 0.  Without my change, the allocation for a pseudo in
> > > > region 0 is not aware of how that pseudo will be used inside
> > > > a loop in region 1; the real impact of this is that we need
> > > > to then do a register move *inside the loop* into a valid
> > > > register class for the instruction in region 1.
> > > >
> > > > I do not believe this will impact anyone with a regular
> > > > register set, but for any architecture where some instructions
> > > > can only use a subset of the register bank, I believe that
> > > > all regions are always necessary, since cost information
> > > > for each allocno is relevant and important.
> > > >
> > > > I still need to do some more testing in regards this "fix",
> > > > but I wanted to put my findings out there as soon as possible
> > > > for comment from the experts.
> > > >
> > > The function (remove_unnecessary_regions) is used to increase RA
> > speed
> > > by decreasing number of regions.  The region is removed if the
> > register
> > > pressure is not high in the region in other words if the
> probability
> > to
> > > spill pseudos on the region border to improve RA in the region is
> > > small.
> > >
> > > But when the region is removed and correspondingly allocnos (see
> > > function remove_unecessary_allocnos) in the region the info
> including
> > > hard register costs is propagated to the  corresponding allocnos in
> > the
> > > parent region (see function propagate_some_info_from_allocno).
> > >
> > > I've just checked the code it looks ok to me.  Ian, it would be
> nice
> > if
> > > you figure out why the propagation does not happen.   As Jeff
> wrote,
> > > fixing that would be important practically for any target.
> >
> > (I'm in the middle of a hefty compile at the moment, so I can't do
> any
> > more debugging yet, but I figured I'd think out loud on this list in
> > the
> > mean time.)
> >
> > At first, I thought the problem was that
> > propagate_some_info_from_allocno()
> > appears to only pass up information about the cover class, as opposed
> > to every cost_class (which was what my total_costs hack did).
> However,
> > I see
> > that the propagate_allocno_info() function, which is now being called
> > when
> > I comment out remove_unnecessary_regions(), also only passes up
> > information
> > about the cover class, so I don't think it's propagation that's the
> > issue.
> >
> > Looking at ira_build(), I see that there is a call to create_caps()
> > just
> > after the call to propagate_allocno_info() - both of which were
> guarded
> > by
> > the condition more_one_region_p(), which is only true when I comment
> > out
> > the call to remove_unnecessary_regions().  Looking at create_caps(),
> I
> > see
> > that a Cap will be created for each of my region 1 allocnos.  Looking
> > further
> > down in the ira_build() function, there is a call to
> > ira_build_conflicts(),
> > which makes a lot of references to Caps.  Perhaps it's the existence
> of
> > Caps
> > that matters?
> >
> > I see in my benchmark that there are some unfortunate allocnos that
> > require
> > BOTTOM_REGS, but they only live in Region 1.  Am I right in thinking
> > that,
> > without the caps, there is no way for them to get their voice heard
> > during
> > the allocation for Region 0?
> >
> > Maybe, when an irregular register set exists, whether there is high
> > pressure
> > in the region or not, the Caps should be made to ensure that you
> don't
> > end
> > up with move instructions happening within the loop?
> 
> I've just changed my code to turn back on the
> remove_unnecessary_regions
> function and ... I fear I have taken us all on a wild goose chase!
> 
> It seems that my alternate REG_ALLOC_ORDER, where TOP_CREGS are given
> out
> before BOTTOM_REGS, is enough on its own to get the allocation right!

I have discovered a regression in the Huffman benchmark.  The new
REG_ALLOC_ORDER gives out the upper regs first, as intended.  But then
one of these upper regs becomes dead and could be re-used for more work,
except that it can't be re-used because subsequent instructions want
a BOTTOM_REG.  As we have to spill and restore for each callee-save
register, this costs two instructions; previously, the old REG_ALLOC_ORDER
gave a BOTTOM_REG out, which then got re-used for more work when it became
dead.  I expect this kind of thing will happen anywhere that register
pressure is low and lead to regressions all over the place.

The question is: how to fix this?  I have initially put the REG_ALLOC_ORDER
back to how it was and changed the operand constraints in our MD file,
so each of the "apathetic" instructions, will either take a 't' (TOP_CREG)
or '?b' (BOTTOM_REG).  The '?' shows that this alternative is slightly more
costly than using 't'.  On the benchmark that benefitted the most from
the new REG_ALLOC_ORDER, these constraints are almost achieving the same
thing.  It is only "almost" there because I am struggling with how to show
two alternatives for loads and stores, which currently have an 'm'
constraint.

I want to show that it is slightly more costly to have the address in a 'b',
as opposed to a 't', but I can't see how to show this in the .md file.  Maybe
it isn't possible?  Even if I make an alternate memory constraint (e.g. '?k'),
I'm not sure how to show that this one only works with BOTTOM_REGS as its
base register.

Another approach for fixing is to keep my new REG_ALLOC_ORDER and
investigate why IRA does not realise it is more costly to adhere to the
REG_ALLOC_ORDER (due to having to save/restore one extra register) versus
using a BOTTOM_REG instead that could be re-used once the initial use becomes
dead.  The hard register costs when assigning for the first use don't appear
to distinguish between any of the callee-save registers; is it possible for
IRA to realise it will be able to re-use a BOTTOM_REG (and eliminate a
save/restore) and thereby increase the cost for using an upper reg?

The final approach I have thought of is for IRA to dynamically select
the REG_ALLOC_ORDER based on how many callee-save registers it thinks it
will need.  For example, if it thinks it doesn't need all of them, it can
use the original REG_ALLOC_ORDER, so we can limit the number of save/restore
instructions; if it is going to use all of them, then we need to make sure
we hold back the BOTTOM_REGS for those that need them.  Perhaps I could
force an extra pass so that IRA tries both of the REG_ALLOC_ORDERs and keeps
the one that did the best?

As always, your thoughts and advice would be much appreciated!

^ permalink raw reply	[flat|nested] 30+ messages in thread

[parent not found: <4B01BB87.6020902@redhat.com>]

* RE: Understanding IRA
       [not found]                   ` <4B01BB87.6020902@redhat.com>
@ 2009-11-19 15:41                     ` Ian Bolton
       [not found]                       ` <4B1451C7.2010207@redhat.com>
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-19 15:41 UTC (permalink / raw)
  To: Jeff Law; +Cc: Vladimir Makarov, Dave Hudson, gcc

Jeff Law wrote: 
> On 11/16/09 10:33, Ian Bolton wrote:
> > The question is: how to fix this?  I have initially put the
> REG_ALLOC_ORDER
> > back to how it was and changed the operand constraints in our MD
> file,
> > so each of the "apathetic" instructions, will either take a 't'
> (TOP_CREG)
> > or '?b' (BOTTOM_REG).  The '?' shows that this alternative is
> slightly more
> > costly than using 't'.  On the benchmark that benefitted the most
> from
> > the new REG_ALLOC_ORDER, these constraints are almost achieving the
> same
> > thing.  It is only "almost" there because I am struggling with how to
> show
> > two alternatives for loads and stores, which currently have an 'm'
> > constraint.
> >
> I'm not aware of any way to describe this to IRA.  In theory I guess
> IRA
> could be twiddled to use  TARGET_ADDRESS_COST to derive some kind of
> cost difference based on the registers used in the MEM, but it seems
> rather hackish.

I found somewhere in record_address_reg to achieve what I needed:

  for (k = 0; k < cost_classes_num; k++)
  {
    i = cost_classes[k];
    pp->cost[k]
      += (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;

    /* Slightly nudge memory addresses away from using BOTTOM_REGS and
       C_REGS, so they take TOP_CREGS instead - should this pseudo later
       need BOTTOM_REGS, there will be a higher cost to use TOP_CREGS
       and it will still get BOTTOM_REGS. This is equivalent to adding a
       ?b on each instruction that currently has a 'm' constraint.

       Writing this generically might look something like:

       pp->cost[k] += TARGET_ADDRESS_EXTRA_COST_P(cost_classes[k])
                      ? (scale/2) : 0;
    */
    if (cost_classes[k] == BOTTOM_REGS || cost_classes[k] == C_REGS)
      pp->cost[k] += (scale) / 2;
  }

I was then able to alter all our register-agnostic instructions in our
.md file to take either a 't' for TOP_CREGS for a '?b' for BOTTOM_REGS.
Initial results showed that IRA was moving input arguments out of their
BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
thought TOP_CREGS were less costly to use, despite the cost of the move
instruction to get the input argument into a TOP_CREG.

I wondered if this was because we add 2 to the alt_cost for each '?' and
our REG_MOVE_COST is also 2, but this becomes irrelevant anyway if you
do a lot of work with the input argument, since each potential use in a
BOTTOM_REG incurs some kind of penalty (one per '?' seen) which will
eventually persuade IRA that leaving the input argument in the
BOTTOM_REG it arrived in is more costly than moving it to a TOP_CREG.

I addressed this problem by splitting my register bank a little
differently: instead of making a distinction between BOTTOM_REGS and
TOP_CREGS, I made it so there was only a penalty if you used one of the
non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG).  This meant
that IRA was happy to leave input arguments in their BOTTOM_REGS but
erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had run
out.  This was an improvement, but there was still a case where these
'?' penalties were not aligned with reality:

T1 = A + B; // can use any register, TOP_CREGS appears cheaper
T2 = A - C; // can use any register, TOP_CREGS appears cheaper
T3 = A & D; // must use BOTTOM_REGS

The constraints for the first two instructions show that TOP_CREGS is
cheaper, but then you have to plant a move to get A into a BOTTOM_REG
to do the AND; in reality, we know it cheaper to have A in a BOTTOM_REG
all along, but the '?' constraint suggests there is a cost in doing this
for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
incur the cost of the move because it is still cheaper than the costs I
have defined in with my constraints.  I don't believe there is a way to
communicate a conditional cost, so I'm thinking that constraints are not
the solution for me at this time.  What are your thoughts?

> You might try something like this:
> 
>    1. Crank up the callee-saved register cost adjustment in
> assign_hard_reg so that it's scaled based on REG_FREQ.  That will
> probably lead to some regressions based on my experiments.
> 
>    2. Then add a check that completely avoids the cost adjustment in
> cases where we pushed a MAY_SPILL_P allocno.  This was on my todo list,
> but I haven't got to it yet.
> 
> If you wanted to get fancy, you could track the maximum number of
> neighbors in each class as allocnos are pushed and use that to adjust
> how many registers are cost adjusted in assign_hard_reg.  The idea
> being
> the more neighbors the allocno has, the  more callee-saved regsiters
> we're likely to need.
> 
> You could also try to account for the fact that once allocated, the
> callee saved regsiter is available "for free" to non-conflicting
> allocnos.   So you could iterate over those and decrease the penalty
> for
> using a callee saved register on the current allocno.  Given the
> interfaces provided by IRA, this could be compile-time expensive.
> 

Your "small improvement to IRA allocations" patch posted yesterday
achieves something very similar to this, right?

In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
off depends on which allocno is colored first.  If we color the one that
needs BOTTOM_REGS first then we will later reuse that BOTTOM_REG to do
work that can be done in any register.  But if we color the one that can
be done in any register first, my REG_ALLOC_ORDER is giving out TOP_CREGS
first, which then can't be reused later to do the BOTTOM_REGS work.

I guess when we are coloring an agnostic allocno that will take any reg,
we could check if any of the non-conflicting allocnos want a BOTTOM_REG
and therefore give out a BOTTOM_REG since it can be reused later.
Alternatively, we could walk over all conflicting allocnos when coloring
this one and check for how many conflicting allocnos require BOTTOM_REGS
and make sure we don't give one out now if we will otherwise run out.

Obviously, this will make things even more expensive for compilation, but
I'm not sure I can get what I require without some kind of "look-ahead"
mechanism like in the two ideas above.  Unless I simply change the
coloring algorithm so that we always color those that need BOTTOM_REGS
first?  I think this is effectively what I was achieving with my early
hacks involving the cost adjustments of 6 and 20.

Thanks for reading, 
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread

[parent not found: <4B1451C7.2010207@redhat.com>]

* RE: Understanding IRA
       [not found]                       ` <4B1451C7.2010207@redhat.com>
@ 2009-12-02 20:29                         ` Ian Bolton
  2009-12-03 19:16                           ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-12-02 20:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: Vladimir Makarov, Dave Hudson, gcc

Jeff Law wrote:
> Ian Bolton wrote: 
> > Initial results showed that IRA was moving input arguments out of
> their
> > BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
> > thought TOP_CREGS were less costly to use, despite the cost of the
> move
> > instruction to get the input argument into a TOP_CREG.
> >
> That may indicate a cost scaling issue or more general weaknesses in
> IRA's cost modeling.
> 
> > I addressed this problem by splitting my register bank a little
> > differently: instead of making a distinction between BOTTOM_REGS and
> > TOP_CREGS, I made it so there was only a penalty if you used one of
> the
> > non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG).  This meant
> > that IRA was happy to leave input arguments in their BOTTOM_REGS but
> > erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had
> run
> > out.  This was an improvement, but there was still a case where these
> > '?' penalties were not aligned with reality:
> >
> > T1 = A + B; // can use any register, TOP_CREGS appears cheaper
> > T2 = A - C; // can use any register, TOP_CREGS appears cheaper
> > T3 = A&  D; // must use BOTTOM_REGS
> >
> > The constraints for the first two instructions show that TOP_CREGS is
> > cheaper, but then you have to plant a move to get A into a BOTTOM_REG
> > to do the AND; in reality, we know it cheaper to have A in a
> BOTTOM_REG
> > all along, but the '?' constraint suggests there is a cost in doing
> this
> > for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
> > incur the cost of the move because it is still cheaper than the costs
> I
> > have defined in with my constraints.  I don't believe there is a way
> to
> > communicate a conditional cost, so I'm thinking that constraints are
> not
> > the solution for me at this time.  What are your thoughts?
> >
> See above.  This might be a problem with scaling or weaknesses in IRA's
> costing model.   In theory IRA accumulates the cost of using each class
> over the set of insns using a particular pseudo.

I had an epiphany this morning and came up with an idea to achieve the
lookahead I thought I needed, thereby making the costs created by '?' a
lot more concrete and reliable.

Firstly, I have altered the alt_cost adjustment (for '?') in ira-costs.c,
so that it only happens on the second pass *and* only when the first pass
has determined that this allocno never needs a BOTTOM_REG.

The first condition (that it only occurs on the second pass) is there so
that the preferred class calculated for an allocno is based on hard
constraints, as opposed to the fuzzier constraints of '?'.  Without this
change, the second pass cannot use the preferred class to correctly add
costs for each class that doesn't intersect with the preferred class.

e.g. If an insn has an allocno as an operand that requires BOTTOM_REGS,
then we want the cost for TOP_CREGS for that allocno in that operand to
be higher to show the cost of moving it into BOTTOM_REGS.  But if we let
the '?' constraints dictate the pref class, and this allocno appears in
other insns where the '?' constraint has appeared, then TOP_CREGS may
end up being the preferred class and so this insn, which actually needs
BOTTOM_REGS for its operand, will end increasing the costs for any class
that doesn't intersect with TOP_CREGS (i.e. BOTTOM_REGS).

I'm thinking that this change will be generally useful.  What are your
thoughts?

The second condition is determined by me storing a new variable in each
allocno on the first pass to flag whether it ever appears as an operand
that must have BOTTOM_REGS.  On the second pass, I can then only penalise
an allocno for using my precious BOTTOM_REGS if I have already determined
that it will never need them.

This change is probably too specific to our case at the moment, but it
could probably be made generic, if it has use to other architectures.

These two changes together make the example I created above (where A
appears in multiple operations) correctly allocate a BOTTOM_REG from the
outset.  I am yet to run benchmarks with this solution, but I think it
will end up being similar to my alternate REG_ALLOC_ORDER, where I just
gave out TOP_CREGS first.

Sadly, I don't think either approach will handle the case where there is
low pressure and we first grab a TOP_CREG for something like an ADD (due
to our constraints telling us to) then we free up that register cos it's
not needed anymore and then we have to use a different (BOTTOM_REG) reg
for something like an AND; we should have just used just one BOTTOM_REG.

I guess solving this would involve bigger changes to the algorithm, so
that each allocno is aware of how many conflicting allocnos want the
BOTTOM_REGS.  We can then only penalise them for considering using
BOTTOM_REGS if we know it will impact someone.

> 
> >
> >> You might try something like this:
> >>
> >>     1. Crank up the callee-saved register cost adjustment in
> >> assign_hard_reg so that it's scaled based on REG_FREQ.  That will
> >> probably lead to some regressions based on my experiments.
> >>
> >>     2. Then add a check that completely avoids the cost adjustment
> in
> >> cases where we pushed a MAY_SPILL_P allocno.  This was on my todo
> list,
> >> but I haven't got to it yet.
> >>
> >> If you wanted to get fancy, you could track the maximum number of
> >> neighbors in each class as allocnos are pushed and use that to
> adjust
> >> how many registers are cost adjusted in assign_hard_reg.  The idea
> >> being
> >> the more neighbors the allocno has, the  more callee-saved regsiters
> >> we're likely to need.
> >>
> >> You could also try to account for the fact that once allocated, the
> >> callee saved regsiter is available "for free" to non-conflicting
> >> allocnos.   So you could iterate over those and decrease the penalty
> >> for
> >> using a callee saved register on the current allocno.  Given the
> >> interfaces provided by IRA, this could be compile-time expensive.
> >>
> >>
> > Your "small improvement to IRA allocations" patch posted yesterday
> > achieves something very similar to this, right?
> >
> Not really.  That small improvement is a well known heuristic to make
> it
> more likely that a node with a high degree can be colored by
> encouraging
> the use of the same color for conflicting nodes which do not conflict
> with each other.
> 
> It may help in some of your cases, but they really aren't the
> motivation
> behind the change.

I think I understand!

> 
> > In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
> > off depends on which allocno is colored first.
> Absolutely.   Note that we use a tiny adjustment to costing for this
> heuristic so that, in general, this heuristic only applies when  when
> other costing heuristics don't result in a particular register
> preference.

Yes, you don't want to undo all the cleverness done previously.  I do
wonder if the costs should all be multiplied by ten so we can distinguish
between a cost of 1 and something that should be less than 1 but not 0.

> 
> [ ... ]
> > Obviously, this will make things even more expensive for compilation,
> but
> > I'm not sure I can get what I require without some kind of "look-
> ahead"
> > mechanism like in the two ideas above.  Unless I simply change the
> > coloring algorithm so that we always color those that need
> BOTTOM_REGS
> > first?  I think this is effectively what I was achieving with my
> early
> > hacks involving the cost adjustments of 6 and 20.
> >
> Well, always coloring BOTTOM_REGs first simply means that you're
> bypassing the costing metrics rather than fixing or enhancing them.
> The class of problems you're running into aren't unique to your
> processor and thus I'd rather see you helping with fixing the costing
> heuristics which will help just about everyone.

I implemented the early-coloring approach and got good performance on
most benchmarks, but a 3% regression on one due to planting far more
loads and stores.  I guess there were some allocnos that really wanted
any register, but all the early-colored ones got colored first and
pinched the ones they would have used.  I have had some ideas for
refining this heuristic, but your email today, and my subsequent
epiphany, led to me returning to the constraints approach.

> It's also worth remembering that allocation is based on heuristics,
> which often don't lead to perfect allocation, merely a good one.

I have quoted this to my boss! ;-)

Cheers,
Ian

> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-12-02 20:29                         ` Ian Bolton
@ 2009-12-03 19:16                           ` Jeff Law
  2009-12-07 13:30                             ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2009-12-03 19:16 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Vladimir Makarov, Dave Hudson, gcc

On 12/02/09 13:29, Ian Bolton wrote:
>
> I had an epiphany this morning and came up with an idea to achieve the
> lookahead I thought I needed, thereby making the costs created by '?' a
> lot more concrete and reliable.
>
> Firstly, I have altered the alt_cost adjustment (for '?') in ira-costs.c,
> so that it only happens on the second pass *and* only when the first pass
> has determined that this allocno never needs a BOTTOM_REG.
>
> The first condition (that it only occurs on the second pass) is there so
> that the preferred class calculated for an allocno is based on hard
> constraints, as opposed to the fuzzier constraints of '?'.  Without this
> change, the second pass cannot use the preferred class to correctly add
> costs for each class that doesn't intersect with the preferred class.
>
> e.g. If an insn has an allocno as an operand that requires BOTTOM_REGS,
> then we want the cost for TOP_CREGS for that allocno in that operand to
> be higher to show the cost of moving it into BOTTOM_REGS.  But if we let
> the '?' constraints dictate the pref class, and this allocno appears in
> other insns where the '?' constraint has appeared, then TOP_CREGS may
> end up being the preferred class and so this insn, which actually needs
> BOTTOM_REGS for its operand, will end increasing the costs for any class
> that doesn't intersect with TOP_CREGS (i.e. BOTTOM_REGS).
>
> I'm thinking that this change will be generally useful.  What are your
> thoughts?
>    
Step #1 seems a lot like adding '*' before the '?'.   The idea behind 
'*' was to make it possible to ignore a constraint modifier such as '?' 
or '!' when choosing register preferences.   I haven't looked at IRA's 
implementation of preferencing, but I suspect it mostly lifted the old 
allocator's preferencing code, so I'd expect we're still supporting '*'.



> The second condition is determined by me storing a new variable in each
> allocno on the first pass to flag whether it ever appears as an operand
> that must have BOTTOM_REGS.  On the second pass, I can then only penalise
> an allocno for using my precious BOTTOM_REGS if I have already determined
> that it will never need them.
>
> This change is probably too specific to our case at the moment, but it
> could probably be made generic, if it has use to other architectures.
>    
I think we already have code to do something like this via 
CONFLICT_HARD_REGNO_COSTS.  Though I must admit, I haven't entirely 
wrapped my head around those costs and how they're used yet.

> Sadly, I don't think either approach will handle the case where there is
> low pressure and we first grab a TOP_CREG for something like an ADD (due
> to our constraints telling us to) then we free up that register cos it's
> not needed anymore and then we have to use a different (BOTTOM_REG) reg
> for something like an AND; we should have just used just one BOTTOM_REG.
>    
That would actually be something helped by the patch we were discussing 
in a prior message.  The only trick is the two allocnos would have to be 
connected by conflicting with some 3rd allocno.  The 3rd allocno may not 
be a high degree, but the heuristic will still trigger and slightly 
encourage the ADD and AND allocnos to share the same reg.

>> Absolutely.   Note that we use a tiny adjustment to costing for this
>> heuristic so that, in general, this heuristic only applies when  when
>> other costing heuristics don't result in a particular register
>> preference.
>>      
> Yes, you don't want to undo all the cleverness done previously.  I do
> wonder if the costs should all be multiplied by ten so we can distinguish
> between a cost of 1 and something that should be less than 1 but not 0.
>    
Note that most costs are scaled based on REG_FREQ (which is derived from 
BLOCK_FREQ).  So there's already a multiplier effect.  So adding an 
unscaled "1" to allow us to detect this situation precisely.

Jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-12-03 19:16                           ` Jeff Law
@ 2009-12-07 13:30                             ` Ian Bolton
  0 siblings, 0 replies; 30+ messages in thread
From: Ian Bolton @ 2009-12-07 13:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: Vladimir Makarov, Dave Hudson, gcc

Jeff Law wrote:
> >
> > I had an epiphany this morning and came up with an idea to achieve
> the
> > lookahead I thought I needed, thereby making the costs created by '?'
> a
> > lot more concrete and reliable.
> >
> > Firstly, I have altered the alt_cost adjustment (for '?') in ira-
> costs.c,
> > so that it only happens on the second pass *and* only when the first
> pass
> > has determined that this allocno never needs a BOTTOM_REG.
> >
> > The first condition (that it only occurs on the second pass) is there
> so
> > that the preferred class calculated for an allocno is based on hard
> > constraints, as opposed to the fuzzier constraints of '?'.  Without
> this
> > change, the second pass cannot use the preferred class to correctly
> add
> > costs for each class that doesn't intersect with the preferred class.
> >
> > e.g. If an insn has an allocno as an operand that requires
> BOTTOM_REGS,
> > then we want the cost for TOP_CREGS for that allocno in that operand
> to
> > be higher to show the cost of moving it into BOTTOM_REGS.  But if we
> let
> > the '?' constraints dictate the pref class, and this allocno appears
> in
> > other insns where the '?' constraint has appeared, then TOP_CREGS may
> > end up being the preferred class and so this insn, which actually
> needs
> > BOTTOM_REGS for its operand, will end increasing the costs for any
> class
> > that doesn't intersect with TOP_CREGS (i.e. BOTTOM_REGS).
> >
> > I'm thinking that this change will be generally useful.  What are
> your
> > thoughts?
> >
> Step #1 seems a lot like adding '*' before the '?'.   The idea behind
> '*' was to make it possible to ignore a constraint modifier such as '?'
> or '!' when choosing register preferences.   I haven't looked at IRA's
> implementation of preferencing, but I suspect it mostly lifted the old
> allocator's preferencing code, so I'd expect we're still supporting
> '*'.

I just had a look through the code and '*' means the '?' is completely
ignored during costing.  A quick grep shows me that '?' would then only
have meaning in reload.c, reload1.c and postreload.c.

This is the code I have from record_reg_classes() in ira-costs.c:

		case '*':
		  /* Ignore the next letter for this pass.  */
		  c = *++p;
		  break;

The comment suggests it will be ignored in "this pass", but the code
makes it get ignored in both passes during ira-costs.c.  Maybe this is a
bug?  Perhaps it should read:

		case '*':
		  /* Ignore the next letter for this pass.  */
		  if (!allocno_pref)
                c = *++p;
		  break;

If I changed all my '?' constraints to be '*?' then the above
change would achieve what my hack does.

> > The second condition is determined by me storing a new variable in
> each
> > allocno on the first pass to flag whether it ever appears as an
> operand
> > that must have BOTTOM_REGS.  On the second pass, I can then only
> penalise
> > an allocno for using my precious BOTTOM_REGS if I have already
> determined
> > that it will never need them.
> >
> > This change is probably too specific to our case at the moment, but
> it
> > could probably be made generic, if it has use to other architectures.
> >
> I think we already have code to do something like this via
> CONFLICT_HARD_REGNO_COSTS.  Though I must admit, I haven't entirely
> wrapped my head around those costs and how they're used yet.

I was hoping I would be able to avoid all the conflict-related stuff, but
this may be where I have to go next, although I am considering returning
to my elegant solution of just giving out TOP_REGS first unless someone
specifically needs BOTTOM_REGS.  This does perform slightly worse for low
reg pressure cases (due to the case I mention below), but is the best
solution I have come up with for the high pressure cases.

> > Sadly, I don't think either approach will handle the case where there
> is
> > low pressure and we first grab a TOP_CREG for something like an ADD
> (due
> > to our constraints telling us to) then we free up that register cos
> it's
> > not needed anymore and then we have to use a different (BOTTOM_REG)
> reg
> > for something like an AND; we should have just used just one
> BOTTOM_REG.
> >
> That would actually be something helped by the patch we were discussing
> in a prior message.  The only trick is the two allocnos would have to
> be
> connected by conflicting with some 3rd allocno.  The 3rd allocno may
> not
> be a high degree, but the heuristic will still trigger and slightly
> encourage the ADD and AND allocnos to share the same reg.

Yeah, but how do I ensure that the one that needs the BOTTOM_REG gets
colored first, thereby leaving its register free to be re-used?  This
little tweak only kicks in when you are coloring the first of the non-
conflicting pair, right?  If the agnostic allocno is colored first, it
will grab the TOP_REG, thereby causing your tweak to slightly decrement
the cost of using that TOP_REG for the other allocno, to encourage
sharing, and it will end up with a TOP_REG that it can't use.

I have thought of using the need for BOTTOM_REGS as a tie-breaker when
coloring (in allocno_bucket_compare_func), so that two non-conflicting
ones that otherwise have the same priority will end up being ordered
so that the one that needs BOTTOM_REGS gets colored first.  What are
your thoughts on this approach?

Cheers,
Ian

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Understanding IRA
@ 2009-10-16 14:22 Ian Bolton
  2009-10-16 15:23 ` Vladimir Makarov
  2009-10-16 15:45 ` Vladimir Makarov
  0 siblings, 2 replies; 30+ messages in thread
From: Ian Bolton @ 2009-10-16 14:22 UTC (permalink / raw)
  To: vmakarov; +Cc: gcc

Hi Vladimir,

I have just begun working for Icera Inc, on a private port of GCC,
and my first task has been to investigate the performance impact of
using the Chaitin-Briggs algorithm of IRA vs the priority algorithm
that we currently have enabled in our production compiler (due to not
defining IRA_COVER_CLASSES).

After running various tests, I measured little performance changes
except for some test cases that have regressed quite considerably.
My suspicion was that these regressions were due to our architecture
having some instructions that can only use a subset of the register
bank, so I set about trying to understand IRA enough to confirm this.

I now believe I have now reached a point where I understand IRA
enough to start thinking about how I can improve these regressed
cases, but I thought it might be a good idea to formulate an email
with my current understanding of IRA in it, so you could correct any
errors before I continue.  (I figure this exchange between us will
also serve useful to other newbies in the field of GCC register
allocation!)

Without further ado, here is my current understanding of IRA
(numbered for ease of reference).

1. You can enable full IRA, with Chaitin-Briggs coloring, by defining
IRA_COVER_CLASSES, but you can just as easily disable it again by
supplying -fira-algorithm=priority at the command line.  (The latter
is useful to know because it means you can compare two versions
without recompiling.)

2. The priority algorithm allocates in an order determined by the
allocno_priority_compare_func and calls into assign_hard_reg in that
order to determine which allocno gets what register (or memory).

3. The CB algorithm puts allocnos into colored and uncolored buckets,
according to conflicts between allocnos.  The more conflicts, the
more chance of being put in the uncolored bucket.  Being in the
uncolored bucket increases an allocno's chance of being spilled;
being in the colored bucket guarantees that it will get a hard
register.

4. When you run at -O0, the conflicts aspect of IRA is disabled
(ira_conflict_p is set to false), and this subsequently causes the
fast_allocation function to be used instead of the do_coloring IRA
function.  (This makes sense since you cannot separate into buckets
without inspecting conflicts.)

5. Assuming ira_conflicts_p is enabled and you are using the CB
algorithm, the two buckets are sorted using a
bucket_allocno_compare_func, and this ordering is important, since it
dictates the order each allocno will be pushed onto the stack and
hence the order it will be popped from it (and assign_hard_reg
called).

6. Allocnos from the colorable bucket always go on the stack first
(in the order they were sorted) and then uncolorable ones, in an
order based on which is the best candidate for spilling.  Each time
an allocno is added to the stack, conflicts may be reduced and there
is the possibility of moving one or more allocnos from the
uncolorable bucket to the colorable one (so they end up being added
to the stack sooner than the other uncolorable ones).  Near the end
of processing the uncolorable bucket, there comes an opportunity to
move all remaining allocnos to the colorable bucket (due to few
remaining conflicts) and these get put on the stack last, and hence
popped first, and so they all get registers. These final uncolorables
will be those that had the highest spill cost and/or the most
remaining conflicts.

7. The spill cost and priority calculation for determining which
uncolorable to put on the stack first is important, for two main
reasons: firstly, whichever allocno is picked is a more likely
candidate for spilling; secondly, it will free other allocnos up to
be added to the nicer colored bucket, from which an allocno always
receives a hard register.

If you could let me know if and where I've gone wrong with any of the
above, I would be extremely grateful.  Then, once we are on the same
page, I hope you could also make some suggestions as to how I might
help IRA work well with our instructions that can only use a subset
of the register bank.

Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-10-16 14:22 Ian Bolton
@ 2009-10-16 15:23 ` Vladimir Makarov
  2009-10-16 16:19   ` Jeff Law
  2009-10-16 15:45 ` Vladimir Makarov
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Makarov @ 2009-10-16 15:23 UTC (permalink / raw)
  To: Ian Bolton; +Cc: gcc

Ian Bolton wrote:
> Hi Vladimir,
>
> I have just begun working for Icera Inc, on a private port of GCC,
> and my first task has been to investigate the performance impact of
> using the Chaitin-Briggs algorithm of IRA vs the priority algorithm
> that we currently have enabled in our production compiler (due to not
> defining IRA_COVER_CLASSES).
>
> After running various tests, I measured little performance changes
> except for some test cases that have regressed quite considerably.
> My suspicion was that these regressions were due to our architecture
> having some instructions that can only use a subset of the register
> bank, so I set about trying to understand IRA enough to confirm this.
>
That is quite possible.  Chaitin-Briggs algorithm has some problems when 
there are instructions using specific register subsets.  It is well 
known and some researchers tried to solve the problem.  The most known 
article about this is "A generalized algorithm for graph-coloring 
register allocation":

http://130.203.133.121:8080/viewdoc/summary?doi=10.1.1.107.113

I am still working on this problem but I don't know when it will be 
finally in gcc.
> I now believe I have now reached a point where I understand IRA
> enough to start thinking about how I can improve these regressed
> cases, but I thought it might be a good idea to formulate an email
> with my current understanding of IRA in it, so you could correct any
> errors before I continue.  (I figure this exchange between us will
> also serve useful to other newbies in the field of GCC register
> allocation!)
>
The biggest problem of GCC RA is still reload pass.  It frequently 
changes decisions of IRA not in a good way.
> Without further ado, here is my current understanding of IRA
> (numbered for ease of reference).
>
> 1. You can enable full IRA, with Chaitin-Briggs coloring, by defining
> IRA_COVER_CLASSES, but you can just as easily disable it again by
> supplying -fira-algorithm=priority at the command line.  (The latter
> is useful to know because it means you can compare two versions
> without recompiling.)
>
Correct.
> 2. The priority algorithm allocates in an order determined by the
> allocno_priority_compare_func and calls into assign_hard_reg in that
> order to determine which allocno gets what register (or memory).
>
Correct.
> 3. The CB algorithm puts allocnos into colored and uncolored buckets,
> according to conflicts between allocnos.  The more conflicts, the
> more chance of being put in the uncolored bucket.  Being in the
> uncolored bucket increases an allocno's chance of being spilled;
> being in the colored bucket guarantees that it will get a hard
> register.
>
I'd say if allocno goes to the coloring stack from the uncolored bucket, 
it will be probably spilled.  Going to the coloring stack from coloring 
bucket means that the allocno most probably get a hard register.  In 
theory it should be guaranteed but in practice it might not happen 
because of multi-registers allocnos, hard register conflicts with 
allocno, non-profitability of some hard registers (e.g. callee-saved vs 
callee-clobbered registers).

Being in colored or uncolored bucket is defined by trivial colorability 
criteria which is classically based on conflicts left in the graph for 
given allocno.  The criteria can be more complicated and exact but it 
might slow down RA a lot.
> 4. When you run at -O0, the conflicts aspect of IRA is disabled
> (ira_conflict_p is set to false), and this subsequently causes the
> fast_allocation function to be used instead of the do_coloring IRA
> function.  (This makes sense since you cannot separate into buckets
> without inspecting conflicts.)
>
Correct.  Building conflicts is an expansive operation.  Therefore -O0 
(or fast) RA is based on live ranges.
> 5. Assuming ira_conflicts_p is enabled and you are using the CB
> algorithm, the two buckets are sorted using a
> bucket_allocno_compare_func, and this ordering is important, since it
> dictates the order each allocno will be pushed onto the stack and
> hence the order it will be popped from it (and assign_hard_reg
> called).
>
Correct.  But I should say it is only used for currently colored 
allocnos.  You can not reorder two allocnos when putting  the first 
allocno on the stack (in order words, removing trivially colored allocno 
from the graph) results in making the second allocno trivially colored.
> 6. Allocnos from the colorable bucket always go on the stack first
> (in the order they were sorted) and then uncolorable ones, in an
> order based on which is the best candidate for spilling.  Each time
> an allocno is added to the stack, conflicts may be reduced and there
> is the possibility of moving one or more allocnos from the
> uncolorable bucket to the colorable one (so they end up being added
> to the stack sooner than the other uncolorable ones).  Near the end
> of processing the uncolorable bucket, there comes an opportunity to
> move all remaining allocnos to the colorable bucket (due to few
> remaining conflicts) and these get put on the stack last, and hence
> popped first, and so they all get registers.
Correct.   It is how Chaitin-Briggs coloring works and such description 
can be found in any compiler book.
>  These final uncolorables
> will be those that had the highest spill cost and/or the most
> remaining conflicts.
>
Sometimes such allocnos get hard register (it is implemented by 
optimistic coloring which is the biggest invention of Briggs).  It is 
very hard to predict when it happens.
> 7. The spill cost and priority calculation for determining which
> uncolorable to put on the stack first is important, for two main
> reasons: firstly, whichever allocno is picked is a more likely
> candidate for spilling; secondly, it will free other allocnos up to
> be added to the nicer colored bucket, from which an allocno always
> receives a hard register.
>
Correct with mentioned above problem that allocnos from colored bucket 
might not get a hard register.

Chaitin-Briggs algorithm is good at coloring but not good at spilling 
which should be used on register pressure criteria and this is very 
different from # conflicts.  Priority coloring, imho, better dealing 
with spilling because its priority takes register pressure into account 
by using live range length.  From well known RA algorithms, ELS 
(extended linear scan RA) probably does the best decision about spilling 
but it is not good about coloring (especially for code with complicated 
CFG).
> If you could let me know if and where I've gone wrong with any of the
> above, I would be extremely grateful.  Then, once we are on the same
> page, I hope you could also make some suggestions as to how I might
> help IRA work well with our instructions that can only use a subset
> of the register bank.
>
Subsets are complicated issue.  I've been working on different 
approaches to this problem for year without significant improvements.  
As I wrote I hope it finally will be in GCC in some time and necessity 
for IRA_COVER_CLASS will finally go away.
> Best regards, Ian Bolton (Compiler Engineer, Icera Inc.)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-10-16 15:23 ` Vladimir Makarov
@ 2009-10-16 16:19   ` Jeff Law
  2009-10-19 19:21     ` Ian Bolton
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2009-10-16 16:19 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Ian Bolton, gcc

On 10/16/09 08:53, Vladimir Makarov wrote:
>
> The biggest problem of GCC RA is still reload pass.  It frequently 
> changes decisions of IRA not in a good way.
Agreed.  Not only may reload make a bad choice, it's horribly 
unpredictable.  Trivial changes often lead to drastically different 
reloading decisions which in turn drastically change the final output.

One of my favorites right now is the round-robin selection of spill 
registers to encourage reload inheritance.  While I certainly understand 
the reasoning behind the code, it's amazingly frustrating to watch 
reload choose the worst possible spill register simply because of the 
round-robin selection.

I've got a little hack in the reload-v2 branch which queries IRA to 
mitigate this problem, but it's merely a short-term hack until I can 
make reload inheritance go away.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-10-16 16:19   ` Jeff Law
@ 2009-10-19 19:21     ` Ian Bolton
  2009-10-19 21:09       ` Vladimir Makarov
  2009-10-23  7:33       ` Jeff Law
  0 siblings, 2 replies; 30+ messages in thread
From: Ian Bolton @ 2009-10-19 19:21 UTC (permalink / raw)
  To: Jeff Law, Vladimir Makarov; +Cc: gcc

Hi Jeff and Vladimir.

Jeff: I'd be interested in trying the patch if you can send it my way.

Vladimir: Today I have seen reasonable improvements on
progressively-trimmed-down versions of a *single* test case by applying
some cost adjustments in the case when an operand wants one of our
bottom registers and IRA is considering giving it an alternative
register or memory.  I will be experimenting with different values for
these two new costs and some more test cases tomorrow.

I wonder if the optimal register allocator will just end up being one
that tries all the algorithms and picks the best output for a given
case?  When it comes to optimisation, I prefer to bend the rules if at
all possible!

Best regards,
Ian

-----Original Message-----
From: Jeff Law [mailto:law@redhat.com] 
Sent: 16 October 2009 16:45
To: Vladimir Makarov
Cc: Ian Bolton; gcc@gcc.gnu.org
Subject: Re: Understanding IRA

On 10/16/09 08:53, Vladimir Makarov wrote:
>
> The biggest problem of GCC RA is still reload pass.  It frequently 
> changes decisions of IRA not in a good way.
Agreed.  Not only may reload make a bad choice, it's horribly 
unpredictable.  Trivial changes often lead to drastically different 
reloading decisions which in turn drastically change the final output.

One of my favorites right now is the round-robin selection of spill 
registers to encourage reload inheritance.  While I certainly understand

the reasoning behind the code, it's amazingly frustrating to watch 
reload choose the worst possible spill register simply because of the 
round-robin selection.

I've got a little hack in the reload-v2 branch which queries IRA to 
mitigate this problem, but it's merely a short-term hack until I can 
make reload inheritance go away.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-10-19 19:21     ` Ian Bolton
@ 2009-10-19 21:09       ` Vladimir Makarov
  2009-10-23  7:33       ` Jeff Law
  1 sibling, 0 replies; 30+ messages in thread
From: Vladimir Makarov @ 2009-10-19 21:09 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Jeff Law, gcc

Ian Bolton wrote:
> Hi Jeff and Vladimir.
>
> Jeff: I'd be interested in trying the patch if you can send it my way.
>
> Vladimir: Today I have seen reasonable improvements on
> progressively-trimmed-down versions of a *single* test case by applying
> some cost adjustments in the case when an operand wants one of our
> bottom registers and IRA is considering giving it an alternative
> register or memory.  I will be experimenting with different values for
> these two new costs and some more test cases tomorrow.
>
> I wonder if the optimal register allocator will just end up being one
> that tries all the algorithms and picks the best output for a given
> case?  When it comes to optimisation, I prefer to bend the rules if at
> all possible!
>
>   
  I did not try progressive register allocator (in which I have some 
doubts) but I did a lot of experiments recently (last year) with ILP 
based register allocators: from simple spill model (which pseudos should 
be spilled) to RA with coalescing, live range splitting, 
rematerialization, and code selection (it needs  models on hard reg 
levels) including some models closed to Apel's and two Wilken's approaches.

  The simplest model can be only used to solve medium-size functions in 
reasonable time (about 10 minutes).  But even this model can not be used 
for function like reload.c::find_reloads.  More complex model problems 
can not be solved even for many tiny benchmarks (like heapsort) in 
reasonable time.  Although I used GLPK which is much slower than the 
best package (CPLEX).  Other algorithms for soliving ILP (like Balas' 
one) are even much worse.  The improvement for the simplest model was 
not significant unless profiling was used (in this case some SPECInt2000 
benchmarks were improved upto 20% on x86).

  In any case, I did not find optimal RA based on ILP is practical.  
About 4 years ago, I tried QAP (quadratic assignment problem) model 
approaches with the same conclusion.  Although I did not try multi 
commodity flow network approach for which there are better specialized 
algorithms because there are no good GPL-based packages for this 
(unfortunately the best one MCF has no such license). It could be an 
interesting research.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-10-19 19:21     ` Ian Bolton
  2009-10-19 21:09       ` Vladimir Makarov
@ 2009-10-23  7:33       ` Jeff Law
  2009-11-04 17:52         ` Ian Bolton
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff Law @ 2009-10-23  7:33 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Vladimir Makarov, gcc

On 10/19/09 12:30, Ian Bolton wrote:
> Hi Jeff and Vladimir.
>
> Jeff: I'd be interested in trying the patch if you can send it my way.
>    
It's nothing special.

/* Return nonzero if REGNO is a particularly bad choice for reloading X.  */
static int
ira_bad_reload_regno_1 (int regno, rtx x)
{
   int x_regno;
   ira_allocno_t a;
   enum reg_class pref;

   /* We only deal with pseudo regs.  */
   if (! x || GET_CODE (x) != REG)
     return 0;

   x_regno = REGNO (x);
   if (x_regno < FIRST_PSEUDO_REGISTER)
     return 0;

   /* If the pseudo prefers REGNO explicitly, then do not consider
      REGNO a bad spill choice.  */
   pref = reg_preferred_class (x_regno);
   if (reg_class_size[pref] == 1
&& TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
     return 0;

   /* If the pseudo conflicts with REGNO, then we consider REGNO a
      poor choice for a reload regno.  */
   a = ira_regno_allocno_map[x_regno];
   if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), regno))
     return 1;

   return 0;
}

/* Return nonzero if REGNO is a particularly bad choice for reloading
    IN or OUT.  */
int
ira_bad_reload_regno (int regno, rtx in, rtx out)
{
   return (ira_bad_reload_regno_1 (regno, in)
           || ira_bad_reload_regno_1 (regno, out));
}

Then change the loop in allocate_reload_reg to iterate 3 times intead of 
2.  And add this fragment



  if (pass == 1
&& ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
                 continue;


To body of hte conditional starting with

if ((reload_reg_free_p ...


It's really just a hack.  I don't want to spend much time on that code 
as ultimately I want it all to go away.

Jeff




^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-10-23  7:33       ` Jeff Law
@ 2009-11-04 17:52         ` Ian Bolton
  2009-11-04 19:49           ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-04 17:52 UTC (permalink / raw)
  To: Jeff Law; +Cc: Vladimir Makarov, gcc

Hi Jeff,

From an empirical perspective, the value of your patch is hard to
determine at this stage - one benchmark improved about 0.5% but others
have marginally regressed.

From an intellectual perspective, however, your patch is clearly a Good
Thing.  If I am understanding correctly, your aim is to prevent cases
where reload evicts a pseudo from a register that conflicts with the
pseudo that we are trying to reload, thereby undoing the clever cost-
based logic in IRA that gave the register to the current owner in the
first place?

My guess is that the performance drop could be attributed to reload
being lucky in some cases and your patch is preventing this luck from
happening.  Whilst I'm more of a risk-taker by nature, my employer
would prefer more predictable behaviour from its compiler, so we will
likely commit your patch to our development branch. Thanks very much!

Is there an ETA on when reload will be gone away? ;-)

Best regards,
Ian
 
> -----Original Message-----
> From: Jeff Law [mailto:law@redhat.com]
> Sent: 22 October 2009 23:05
> To: Ian Bolton
> Cc: Vladimir Makarov; gcc@gcc.gnu.org
> Subject: Re: Understanding IRA
> 
> On 10/19/09 12:30, Ian Bolton wrote:
> > Hi Jeff and Vladimir.
> >
> > Jeff: I'd be interested in trying the patch if you can send it my
> way.
> >
> It's nothing special.
> 
> /* Return nonzero if REGNO is a particularly bad choice for reloading
> X.  */
> static int
> ira_bad_reload_regno_1 (int regno, rtx x)
> {
>    int x_regno;
>    ira_allocno_t a;
>    enum reg_class pref;
> 
>    /* We only deal with pseudo regs.  */
>    if (! x || GET_CODE (x) != REG)
>      return 0;
> 
>    x_regno = REGNO (x);
>    if (x_regno < FIRST_PSEUDO_REGISTER)
>      return 0;
> 
>    /* If the pseudo prefers REGNO explicitly, then do not consider
>       REGNO a bad spill choice.  */
>    pref = reg_preferred_class (x_regno);
>    if (reg_class_size[pref] == 1
> && TEST_HARD_REG_BIT (reg_class_contents[pref], regno))
>      return 0;
> 
>    /* If the pseudo conflicts with REGNO, then we consider REGNO a
>       poor choice for a reload regno.  */
>    a = ira_regno_allocno_map[x_regno];
>    if (TEST_HARD_REG_BIT (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a),
regno))
>      return 1;
> 
>    return 0;
> }
> 
> /* Return nonzero if REGNO is a particularly bad choice for reloading
>     IN or OUT.  */
> int
> ira_bad_reload_regno (int regno, rtx in, rtx out)
> {
>    return (ira_bad_reload_regno_1 (regno, in)
>            || ira_bad_reload_regno_1 (regno, out));
> }
> 
> Then change the loop in allocate_reload_reg to iterate 3 times intead
> of
> 2.  And add this fragment
> 
> 
> 
>   if (pass == 1
> && ira_bad_reload_regno (regnum, rld[r].in, rld[r].out))
>                  continue;
> 
> 
> To body of hte conditional starting with
> 
> if ((reload_reg_free_p ...
> 
> 
> It's really just a hack.  I don't want to spend much time on that
code
> as ultimately I want it all to go away.
> 
> Jeff
> 
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-04 17:52         ` Ian Bolton
@ 2009-11-04 19:49           ` Jeff Law
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Law @ 2009-11-04 19:49 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Vladimir Makarov, gcc

On 11/04/09 10:52, Ian Bolton wrote:
> Hi Jeff,
>
>  From an empirical perspective, the value of your patch is hard to
> determine at this stage - one benchmark improved about 0.5% but others
> have marginally regressed.
>    
It's a hack, no doubt about it.  Your results are about what I expected, 
a few things get better a few things get worse.

The purpose of the patch was twofold.  First, hopefully make reload's 
actions more predictable.  Second hopefully improve the overall code in 
at least some marginal fashion.

I didn't want to burn a lot of time on it since the reload inheritance 
is, IMHO, the biggest ball of hair in reload that I want to kill.  If 
I'm successful in killing the need for reload inheritance, then this 
little hack silently goes away.


>  From an intellectual perspective, however, your patch is clearly a Good
> Thing.  If I am understanding correctly, your aim is to prevent cases
> where reload evicts a pseudo from a register that conflicts with the
> pseudo that we are trying to reload, thereby undoing the clever cost-
> based logic in IRA that gave the register to the current owner in the
> first place?
>    
Basically, yes.  Note that we still allow eviction if no other register 
can be found.

> My guess is that the performance drop could be attributed to reload
> being lucky in some cases and your patch is preventing this luck from
> happening.
Precisely.


>    Whilst I'm more of a risk-taker by nature, my employer
> would prefer more predictable behaviour from its compiler, so we will
> likely commit your patch to our development branch. Thanks very much!
>    
There's definitely a lot more that could be done with this code.  If it 
were going to be around for a while, you'd want to sort the spill 
register array based on a number of criteria.  Part of those criteria 
would be the conflicts & costs recorded by IRA, whether or not the spill 
reg holds a value interesting for this insn, etc.  Given my goals, I 
didn't want to spend that much time on it.


> Is there an ETA on when reload will be gone away? ;-)
>    
It'll be a long time, though I'd like to start staging in pieces of the 
work for 4.6.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-10-16 14:22 Ian Bolton
  2009-10-16 15:23 ` Vladimir Makarov
@ 2009-10-16 15:45 ` Vladimir Makarov
  2009-11-03 16:29   ` Ian Bolton
  1 sibling, 1 reply; 30+ messages in thread
From: Vladimir Makarov @ 2009-10-16 15:45 UTC (permalink / raw)
  To: Ian Bolton; +Cc: gcc

Ian Bolton wrote:
>  I hope you could also make some suggestions as to how I might
> help IRA work well with our instructions that can only use a subset
> of the register bank.
>   
I forgot to write: thanks,  it would be interesting for me to see your 
suggestions :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: Understanding IRA
  2009-10-16 15:45 ` Vladimir Makarov
@ 2009-11-03 16:29   ` Ian Bolton
  2009-11-03 23:02     ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Ian Bolton @ 2009-11-03 16:29 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc

Hi again, Vladimir,

I am pleased to report some performance improvements after altering
ira-costs.c.  A key benchmark for us has improved by 5%.

Specifically, in record_reg_classes(), after the alt_cost has been
calculated and it will be applied to pp->mem_cost and pp->cost[k], I
check whether this particular operand wanted one of our BOTTOM_REGS
(r0-r15) and I further increase the pp->mem_cost by an arbitrary
amount and also increase pp->cost[k] by an arbitrary amount if k
does not represent the BOTTOM_REGS class.  My aim here is to nudge
IRA in the right direction for operands that just want BOTTOM_REGS.

After experimenting with different values for my "arbitrary
amounts", I discovered some that successfully made IRA more likely
to give BOTTOM_REGS to those instructions/operands that want
BOTTOM_REGS, since any other regs and memory ended up with high
enough costs for IRA to try and avoid using them.

I have included a snippet from my version of record_reg_classes()
below:

====================================
op_cost_add = alt_cost * frequency;
/* Finally, update the costs with the information we've
   calculated about this alternative.  */
for (i = 0; i < n_ops; i++)
  if (REG_P (ops[i]) && REGNO (ops[i]) >= FIRST_PSEUDO_REGISTER)
    {
      struct costs *pp = op_costs[i], *qq = this_op_costs[i];
      int scale = 1 + (recog_data.operand_type[i] == OP_INOUT);

      /* If this operand really wanted a BOTTOM_REG, add an extra
         cost onto memory to nudge IRA away from putting it in
         memory */
      if (allocno_pref &&
          allocno_pref[ALLOCNO_NUM(ira_curr_regno_allocno_map
                                   [REGNO (ops[i])])]
          == BOTTOM_REGS)
        {                
          pp->mem_cost = MIN (pp->mem_cost,
                              (qq->mem_cost + op_cost_add +
             (flag_ira_preferred_register_cost_memory * frequency))
                              * scale);
        }
      else
        {
          pp->mem_cost = MIN (pp->mem_cost,
                              (qq->mem_cost + op_cost_add)
                              * scale);
        }

	for (k = 0; k < cost_classes_num; k++)
        {
          /* If this operand really wanted a BOTTOM_REG, add an
             extra cost onto any register class that isn't 
             BOTTOM_REGS to nudge IRA away from putting it in a
             hard register of that class */
          if (allocno_pref &&
              allocno_pref[ALLOCNO_NUM(ira_curr_regno_allocno_map

                                       [REGNO (ops[i])])]
              == BOTTOM_REGS)
          {              
            switch(cost_classes[k])
              {
                case BOTTOM_REGS:
                  op_cost_add = alt_cost * frequency;
                  break;
                case TOP_CREGS:
                case C_REGS:
                  op_cost_add = (alt_cost +
                         flag_ira_preferred_register_cost_register)
                         * frequency;
                  break;
                default:
                  op_cost_add = alt_cost * frequency;
                  break;
              }
          }

          pp->cost[k] = MIN (pp->cost[k],
                             (qq->cost[k] + op_cost_add)
                             * scale);
        }
 }
====================================

So far, I have found the best value for
flag_ira_preferred_register_cost_memory to be 20 and the best value
for flag_ira_preferred_register_cost_register to be 6.  I appreciate
that these numbers do not really correlate with the other cost units
but they were the ones that made the impact. 

In terms of coloring algorithms, we are still seeing better
performance with the priority algorithm on our benchmarks, but the
cost adjustments above improved both priority algorithm and the CB
algorithm, with ira-region=mixed and ira-region=one.

If you have any thoughts you'd like to share then I'd definitely be
interested, but this post is mainly because you said in a previous
email that you wanted to hear my suggestions :)

Best regards,
Ian

>Ian Bolton wrote:
>>  I hope you could also make some suggestions as to how I might
>>  help IRA work well with our instructions that can only use a
>>  subset of the register bank.
>>   
>I forgot to write: thanks,  it would be interesting for me to see
>your suggestions :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-03 16:29   ` Ian Bolton
@ 2009-11-03 23:02     ` Jeff Law
  2009-11-04 17:13       ` Vladimir Makarov
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2009-11-03 23:02 UTC (permalink / raw)
  To: Ian Bolton; +Cc: Vladimir Makarov, gcc

On 11/03/09 09:29, Ian Bolton wrote:
> Hi again, Vladimir,
>
> I am pleased to report some performance improvements after altering
> ira-costs.c.  A key benchmark for us has improved by 5%.
>
> Specifically, in record_reg_classes(), after the alt_cost has been
> calculated and it will be applied to pp->mem_cost and pp->cost[k], I
> check whether this particular operand wanted one of our BOTTOM_REGS
> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
> amount and also increase pp->cost[k] by an arbitrary amount if k
> does not represent the BOTTOM_REGS class.  My aim here is to nudge
> IRA in the right direction for operands that just want BOTTOM_REGS.
>
> After experimenting with different values for my "arbitrary
> amounts", I discovered some that successfully made IRA more likely
> to give BOTTOM_REGS to those instructions/operands that want
> BOTTOM_REGS, since any other regs and memory ended up with high
> enough costs for IRA to try and avoid using them.
>
> I have included a snippet from my version of record_reg_classes()
> below:
>
>    
What I don't understand at this point is why the current mechanisms in 
IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher cost 
for TOP_REGS).  i.e.  I don't think any of this should be necessary as 
IRA should already be doing something similar.

This may be a case where your backend hasn't indicated that TOP_REGS has 
a higher cost than BOTTOM_REGS in particular situations.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-03 23:02     ` Jeff Law
@ 2009-11-04 17:13       ` Vladimir Makarov
  2009-11-05  0:23         ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Vladimir Makarov @ 2009-11-04 17:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Ian Bolton, gcc

Jeff Law wrote:
> On 11/03/09 09:29, Ian Bolton wrote:
>> Hi again, Vladimir,
>>
>> I am pleased to report some performance improvements after altering
>> ira-costs.c.  A key benchmark for us has improved by 5%.
>>
>> Specifically, in record_reg_classes(), after the alt_cost has been
>> calculated and it will be applied to pp->mem_cost and pp->cost[k], I
>> check whether this particular operand wanted one of our BOTTOM_REGS
>> (r0-r15) and I further increase the pp->mem_cost by an arbitrary
>> amount and also increase pp->cost[k] by an arbitrary amount if k
>> does not represent the BOTTOM_REGS class.  My aim here is to nudge
>> IRA in the right direction for operands that just want BOTTOM_REGS.
>>
>> After experimenting with different values for my "arbitrary
>> amounts", I discovered some that successfully made IRA more likely
>> to give BOTTOM_REGS to those instructions/operands that want
>> BOTTOM_REGS, since any other regs and memory ended up with high
>> enough costs for IRA to try and avoid using them.
>>
>> I have included a snippet from my version of record_reg_classes()
>> below:
>>
>>    
> What I don't understand at this point is why the current mechanisms in 
> IRA aren't showing a lower cost for using BOTTOM_REGS (or a higher 
> cost for TOP_REGS).  i.e.  I don't think any of this should be 
> necessary as IRA should already be doing something similar.
>
> This may be a case where your backend hasn't indicated that TOP_REGS 
> has a higher cost than BOTTOM_REGS in particular situations.
>
I am agree with Jeff.  It is hard to understand what you are doing 
without the architecture knowledge and some macro values in your port 
(IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).

I'd also add that besides right macro value definitions, you could use 
insn alternative hints near register constraints like ? or even *.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Understanding IRA
  2009-11-04 17:13       ` Vladimir Makarov
@ 2009-11-05  0:23         ` Jeff Law
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Law @ 2009-11-05  0:23 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Ian Bolton, gcc

On 11/04/09 10:14, Vladimir Makarov wrote:
>
> I am agree with Jeff.  It is hard to understand what you are doing 
> without the architecture knowledge and some macro values in your port 
> (IRA_COVER_CLASSES, MEMORY_MOVE_COST, and REGISTER_MOVE_COST).
>
> I'd also add that besides right macro value definitions, you could use 
> insn alternative hints near register constraints like ? or even *.
I was wondering about the constraints in particular.  ISTM that for 
insns where TOP_REGS and BOTTOM_REGS have differing costs that the 
constraint letters for TOP_REGS ought to have a '?' to show they're 
slightly more costly than BOTTOM_REGS.

Alternately IRA_HARD_REGNO_ADD_COST_MULTIPLER might be used to get 
BOTTOM_REGS preferred without having to add a zillion '?' modifiers to 
the constraints in the machine description.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-12-07 13:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-05 17:36 Understanding IRA Ian Bolton
2009-11-05 18:05 ` Ian Bolton
2009-11-06 12:53   ` Dave Hudson
2009-11-09 14:13     ` Ian Bolton
2009-11-10 12:19       ` Dave Hudson
2009-11-10 17:21     ` Jeff Law
2009-11-10 17:38       ` Ian Bolton
2009-11-11 15:19         ` Ian Bolton
2009-11-11 16:12           ` Jeff Law
2009-11-11 17:04           ` Vladimir Makarov
2009-11-11 18:36             ` Ian Bolton
2009-11-11 20:09               ` Ian Bolton
2009-11-16 17:35                 ` Ian Bolton
     [not found]                   ` <4B01BB87.6020902@redhat.com>
2009-11-19 15:41                     ` Ian Bolton
     [not found]                       ` <4B1451C7.2010207@redhat.com>
2009-12-02 20:29                         ` Ian Bolton
2009-12-03 19:16                           ` Jeff Law
2009-12-07 13:30                             ` Ian Bolton
  -- strict thread matches above, loose matches on Subject: below --
2009-10-16 14:22 Ian Bolton
2009-10-16 15:23 ` Vladimir Makarov
2009-10-16 16:19   ` Jeff Law
2009-10-19 19:21     ` Ian Bolton
2009-10-19 21:09       ` Vladimir Makarov
2009-10-23  7:33       ` Jeff Law
2009-11-04 17:52         ` Ian Bolton
2009-11-04 19:49           ` Jeff Law
2009-10-16 15:45 ` Vladimir Makarov
2009-11-03 16:29   ` Ian Bolton
2009-11-03 23:02     ` Jeff Law
2009-11-04 17:13       ` Vladimir Makarov
2009-11-05  0:23         ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).