Register Allocation

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Register Allocation
@ 2004-09-22  1:21 Adrian Strätling
  2004-09-22  5:22 ` tm_gccmail
  0 siblings, 1 reply; 41+ messages in thread
From: Adrian Strätling @ 2004-09-22  1:21 UTC (permalink / raw)
  To: GCC

Hi,

I'm working on a target that does support up to 8 parallel instructions, 
but they have to be grouped at compile time. The scheduler could do much 
better than it currently does if there were less (output) depencies.

Is it possible to tell the register allocator not to reduce the register 
count to the minimum?

example:
( || means that this insn is executed in parallel to the last one )

1)    mvkl  L4,     b4      \
2)    mvkh  L4,     b4      +   push ret_label
3)    stw   b4,     *--b15  /
   || mvkl  times2, b4      \
4)    mvkh  times2, b4      +   branch
5)    b     b4              /
L4:

If the second temporary were not assigned to b4 but to b5, the schedule 
would be shorter:

1)    mvkl  L4,     b4
   || mvkl  times2, b5
2)    mvkh  L4,     b4
   || mvkh  times2, b5
3)    stw   b4,     *--b15    
   || b     b5
L4:

Thanks,
Adrian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2004-09-22  1:21 Register Allocation Adrian Strätling
@ 2004-09-22  5:22 ` tm_gccmail
  2004-10-04 14:13   ` Nick Ing-Simmons
  0 siblings, 1 reply; 41+ messages in thread
From: tm_gccmail @ 2004-09-22  5:22 UTC (permalink / raw)
  To: Adrian Strätling; +Cc: GCC

On Tue, 21 Sep 2004, Adrian StrÃ¤tling wrote:

> Hi,
> 
> I'm working on a target that does support up to 8 parallel instructions, 
> but they have to be grouped at compile time. The scheduler could do much 
> better than it currently does if there were less (output) depencies.
> 
> Is it possible to tell the register allocator not to reduce the register 
> count to the minimum?

I think you're trying to do this the wrong way.

The first scheduling pass is responsible for reordering the instructions
so the pseudo lifetimes are overlapping. 

This forces the register allocator to allocate the pseudos to different
hardware registers.

You need to look at why the first scheduling pass isn't reordering the
instructions on your target.

Toshi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2004-09-22  5:22 ` tm_gccmail
@ 2004-10-04 14:13   ` Nick Ing-Simmons
  0 siblings, 0 replies; 41+ messages in thread
From: Nick Ing-Simmons @ 2004-10-04 14:13 UTC (permalink / raw)
  To: tm_gccmail; +Cc: GCC, Adrian Strätling

Tm Gccmail <tm_gccmail@kloo.net> writes:
>On Tue, 21 Sep 2004, Adrian Strätling wrote:
>
>> Hi,
>> 
>> I'm working on a target that does support up to 8 parallel instructions, 
>> but they have to be grouped at compile time. The scheduler could do much 
>> better than it currently does if there were less (output) depencies.
>> 
>> Is it possible to tell the register allocator not to reduce the register 
>> count to the minimum?
>
>I think you're trying to do this the wrong way.

Or at least looking in the wrong place.
My c6x port went out of its way to create new psuedo registers
so that 1st scheduling pass could do this. But I seem to recall 
cse kept eliminiating them.


>
>The first scheduling pass is responsible for reordering the instructions
>so the pseudo lifetimes are overlapping. 
>
>This forces the register allocator to allocate the pseudos to different
>hardware registers.
>
>You need to look at why the first scheduling pass isn't reordering the
>instructions on your target.


>
>Toshi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2011-01-05 14:44       ` roy rosen
  2011-01-05 15:26         ` Jeff Law
@ 2011-01-11 16:11         ` Vladimir Makarov
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Makarov @ 2011-01-11 16:11 UTC (permalink / raw)
  To: roy rosen; +Cc: Jeff Law, gcc

On 01/05/2011 09:44 AM, roy rosen wrote:
> 2011/1/3 Jeff Law<law@redhat.com>:
>> On 12/27/10 08:43, roy rosen wrote:
>>>> I'd recommend to try ira-improv branch.  I think that part of the problem
>>>> is
>>>> in usage of cover classes.  The branch removes the cover classes and
>>>> permits
>>>> IRA to use intersected register classes and that helps to assign better
>>>> hard
>>>> registers.
>>>>
>>>>
>>> I tried now this branch and got better results for some cases but
>>> still in other cases I get lots of redundent register copies.
>>> I might be missing something from the gcc history but I wonder why do
>>> we need to limit the coloring stage to select a hard reg from a class
>>> that was chosen by a prior stage.
>> It was a design decision with the introduction of IRA.  It made certain
>> problems easier to resolve at the time and in reality, most of the time the
>> set of legitimate and profitable hard registers for a given pseudo maps to a
>> register class reasonably well.
>>
>>> Why not simply put in the interference graph edges for all registers
>>> which are not possible for a pseudo and let the coloring algorithm
>>> select the best hard reg.
>> That's largely what the ira-improv branch does.  Register classes at that
>> point are used primarily to drive the costing model.
> Actually, I tried on this branch disabling the improve_allocation
> function and now it is doing a great job.
> For some reason the improve_allocation function only damaged the good
> allocation that was done.
>
This function is pretty straight forward.  It always improve allocation 
cost in given cost model.  So either it only seems that the code is 
worse or the cost model is wrong for some reasons (it might be wrong 
definitions of target cost macros or the IRA cost model is inadequate).  
It is hard to say without more information.  You could send me IRA dump 
with and without improve_allocation.  You might want send the dumps only 
for me if you want keep confidentiality about you port.
> In order to look at that I am trying to understand the conflict table: I see
>
> ;; a3(r255,l0) conflicts: a4(r243,l0) a6(r129,l0) a8(r126,l0)
> a9(r254,l0) a10(r256,l0) a11(r257,l0) a12(r291,w0,l0) a12(r291,w1,l0)
> a13(r316,w0,l0) a13(r316,w1,l0) a14(r318,w0,l0) a14(r318,w1,l0)
> a15(r319,w0,l0) a15(r319,w1,l0) a16(r321,w0,l0) a16(r321,w1,l0)
> a5(r253,l0) a7(r252,l0) a17(r261,l0)
> ;;     total conflict hard regs: 53
> ;;     conflict hard regs: 53
>
> I see here conflicts of the pseudo with other pseudos and conflict
> with a hard reg - all are result of live range data.
> How is the constraint data which limits a pseudo in an insn to be of a
> certain class gets into this table?
Constraints defines profitable allocno class (for example, in one insn 
the pseudo should be in floating point hard register and in many others 
it should be in general hard registers.  In this case IRA most probably 
will use general register class for the allocno).  But only reload pass 
really deals with constraints.
> I have expected also all hard regs which are not allowed for this
> pseudo because of constraints in the insns to be also in the conflict
> table. I guess I miss something...
>
IRA reports only conflicting hard registers from allocno class to keep 
this list short.
> If it isn't there then how is it guranteed that the pseudo would be
> allocated to a hard reg which is allowed by the constraints?
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2011-01-03 15:41     ` Jeff Law
  2011-01-05 14:44       ` roy rosen
@ 2011-01-11 15:53       ` Vladimir Makarov
  1 sibling, 0 replies; 41+ messages in thread
From: Vladimir Makarov @ 2011-01-11 15:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: roy rosen, gcc

On 01/03/2011 10:41 AM, Jeff Law wrote:
> On 12/27/10 08:43, roy rosen wrote:
>>>
>>> I'd recommend to try ira-improv branch.  I think that part of the 
>>> problem is
>>> in usage of cover classes.  The branch removes the cover classes and 
>>> permits
>>> IRA to use intersected register classes and that helps to assign 
>>> better hard
>>> registers.
>>>
>>>
Sorry for the delay with the answer.  I was on vacation last week.
>> I tried now this branch and got better results for some cases but
>> still in other cases I get lots of redundent register copies.
>> I might be missing something from the gcc history but I wonder why do
>> we need to limit the coloring stage to select a hard reg from a class
>> that was chosen by a prior stage.
In some cases, using all hard registers could improve the allocation.  
For example, usage of a pseudo for transferring value from one memory to 
another is such a case.  In this case, we could use general hard 
register or floating hard registers if there are no free general hard 
registers.

But in the most cases, usage of all hard registers is terrible idea for 
coloring because it results in wrong pseudos spilling.  For example, if 
usage of floating point hard register is less profitable than memory, 
although it will never get floating point hard register in function 
assign_hard_reg whose decision is based on costs, we will still put the 
pseudo as colorable on the stack because even if there are not enough 
general hard register, CB algorithm still count it as colorable because 
there are floating point hard registers available.  The order of pseudos 
on the stack actually defines what pseudos will be spilled.

Also original CB works on non-intersected register classes.

Those are reasons for the current IRA implementation.

>>
> It was a design decision with the introduction of IRA.  It made 
> certain problems easier to resolve at the time and in reality, most of 
> the time the set of legitimate and profitable hard registers for a 
> given pseudo maps to a register class reasonably well.
>
>> Why not simply put in the interference graph edges for all registers
>> which are not possible for a pseudo and let the coloring algorithm
>> select the best hard reg.
It is already implemented but in more efficient way through macro 
ALLOCNO_AVAILABLE_REGS_NUMS.
>>
> That's largely what the ira-improv branch does.  Register classes at 
> that point are used primarily to drive the costing model.
>
In other words, ira-improv permits CB works on intersected register 
classess and do better allocation for the 1st example above.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2011-01-05 14:44       ` roy rosen
@ 2011-01-05 15:26         ` Jeff Law
  2011-01-11 16:11         ` Vladimir Makarov
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff Law @ 2011-01-05 15:26 UTC (permalink / raw)
  To: roy rosen; +Cc: Vladimir Makarov, gcc

>>> Why not simply put in the interference graph edges for all registers
>>> which are not possible for a pseudo and let the coloring algorithm
>>> select the best hard reg.
>> That's largely what the ira-improv branch does.  Register classes at that
>> point are used primarily to drive the costing model.
> Actually, I tried on this branch disabling the improve_allocation
> function and now it is doing a great job.
> For some reason the improve_allocation function only damaged the good
> allocation that was done.
I'm sure Vlad will want to take a look at that, particularly if you've 
got a testcase for a target that's part of the mainline sources.  The 
whole point of that function is to spill a set of allocnos and using the 
hard regs for different allocnos when it seems profitable to do so.

> In order to look at that I am trying to understand the conflict table: I see
>
> ;; a3(r255,l0) conflicts: a4(r243,l0) a6(r129,l0) a8(r126,l0)
> a9(r254,l0) a10(r256,l0) a11(r257,l0) a12(r291,w0,l0) a12(r291,w1,l0)
> a13(r316,w0,l0) a13(r316,w1,l0) a14(r318,w0,l0) a14(r318,w1,l0)
> a15(r319,w0,l0) a15(r319,w1,l0) a16(r321,w0,l0) a16(r321,w1,l0)
> a5(r253,l0) a7(r252,l0) a17(r261,l0)
> ;;     total conflict hard regs: 53
> ;;     conflict hard regs: 53
>
> I see here conflicts of the pseudo with other pseudos and conflict
> with a hard reg - all are result of live range data.
Right.

> How is the constraint data which limits a pseudo in an insn to be of a
> certain class gets into this table?
Constraints are generally not used to drive conflict data, but instead 
are  to build the set of potential hard registers that can be used to 
satisfy a particular allocno and costing models.

> I have expected also all hard regs which are not allowed for this
> pseudo because of constraints in the insns to be also in the conflict
> table. I guess I miss something...
If a hard register can't be used for a particular allocno, then it's 
wasteful (in terms of memory and compilation speed) to represent 
conflicts between the unusable hard reg and the allocno.

> If it isn't there then how is it guranteed that the pseudo would be
> allocated to a hard reg which is allowed by the constraints?
The mental picture you should probably work with is build a union of all 
the registers allowed by the constraints.  Those are the set of hard 
registers that might be assigned to an allocno.  If a hard register 
isn't allowed by any constraint referenced by a particular allocno, then 
that hard register will not be part of the set of registers considered 
by IRA for that particular allocno.

Jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2011-01-03 15:41     ` Jeff Law
@ 2011-01-05 14:44       ` roy rosen
  2011-01-05 15:26         ` Jeff Law
  2011-01-11 16:11         ` Vladimir Makarov
  2011-01-11 15:53       ` Vladimir Makarov
  1 sibling, 2 replies; 41+ messages in thread
From: roy rosen @ 2011-01-05 14:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: Vladimir Makarov, gcc

2011/1/3 Jeff Law <law@redhat.com>:
> On 12/27/10 08:43, roy rosen wrote:
>>>
>>> I'd recommend to try ira-improv branch.  I think that part of the problem
>>> is
>>> in usage of cover classes.  The branch removes the cover classes and
>>> permits
>>> IRA to use intersected register classes and that helps to assign better
>>> hard
>>> registers.
>>>
>>>
>> I tried now this branch and got better results for some cases but
>> still in other cases I get lots of redundent register copies.
>> I might be missing something from the gcc history but I wonder why do
>> we need to limit the coloring stage to select a hard reg from a class
>> that was chosen by a prior stage.
>
> It was a design decision with the introduction of IRA.  It made certain
> problems easier to resolve at the time and in reality, most of the time the
> set of legitimate and profitable hard registers for a given pseudo maps to a
> register class reasonably well.
>
>> Why not simply put in the interference graph edges for all registers
>> which are not possible for a pseudo and let the coloring algorithm
>> select the best hard reg.
>
> That's largely what the ira-improv branch does.  Register classes at that
> point are used primarily to drive the costing model.

Actually, I tried on this branch disabling the improve_allocation
function and now it is doing a great job.
For some reason the improve_allocation function only damaged the good
allocation that was done.

In order to look at that I am trying to understand the conflict table: I see

;; a3(r255,l0) conflicts: a4(r243,l0) a6(r129,l0) a8(r126,l0)
a9(r254,l0) a10(r256,l0) a11(r257,l0) a12(r291,w0,l0) a12(r291,w1,l0)
a13(r316,w0,l0) a13(r316,w1,l0) a14(r318,w0,l0) a14(r318,w1,l0)
a15(r319,w0,l0) a15(r319,w1,l0) a16(r321,w0,l0) a16(r321,w1,l0)
a5(r253,l0) a7(r252,l0) a17(r261,l0)
;;     total conflict hard regs: 53
;;     conflict hard regs: 53

I see here conflicts of the pseudo with other pseudos and conflict
with a hard reg - all are result of live range data.
How is the constraint data which limits a pseudo in an insn to be of a
certain class gets into this table?
I have expected also all hard regs which are not allowed for this
pseudo because of constraints in the insns to be also in the conflict
table. I guess I miss something...

If it isn't there then how is it guranteed that the pseudo would be
allocated to a hard reg which is allowed by the constraints?

Thanks, Roy.


>
> jeff
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2010-12-27 15:43   ` roy rosen
@ 2011-01-03 15:41     ` Jeff Law
  2011-01-05 14:44       ` roy rosen
  2011-01-11 15:53       ` Vladimir Makarov
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff Law @ 2011-01-03 15:41 UTC (permalink / raw)
  To: roy rosen; +Cc: Vladimir Makarov, gcc

On 12/27/10 08:43, roy rosen wrote:
>>
>> I'd recommend to try ira-improv branch.  I think that part of the problem is
>> in usage of cover classes.  The branch removes the cover classes and permits
>> IRA to use intersected register classes and that helps to assign better hard
>> registers.
>>
>>
> I tried now this branch and got better results for some cases but
> still in other cases I get lots of redundent register copies.
> I might be missing something from the gcc history but I wonder why do
> we need to limit the coloring stage to select a hard reg from a class
> that was chosen by a prior stage.
It was a design decision with the introduction of IRA.  It made certain 
problems easier to resolve at the time and in reality, most of the time 
the set of legitimate and profitable hard registers for a given pseudo 
maps to a register class reasonably well.

> Why not simply put in the interference graph edges for all registers
> which are not possible for a pseudo and let the coloring algorithm
> select the best hard reg.
That's largely what the ira-improv branch does.  Register classes at 
that point are used primarily to drive the costing model.

jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2010-12-23 16:48 ` Vladimir Makarov
  2010-12-23 17:22   ` Jeff Law
@ 2010-12-27 15:43   ` roy rosen
  2011-01-03 15:41     ` Jeff Law
  1 sibling, 1 reply; 41+ messages in thread
From: roy rosen @ 2010-12-27 15:43 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc

2010/12/23 Vladimir Makarov <vmakarov@redhat.com>:
> On 12/23/2010 03:13 AM, roy rosen wrote:
>>
>> Hi All,
>>
>> I am looking at the code generated by my port and it seems that I have
>> a problem that too many copies between registers are generated.
>> I looked a bit at the register allocation and wanted to verify that I
>> understand its behavior.
>>
>> Is that true that it first chooses a register class for each pseodo
>> and only then starts coloring?
>>
> Yes, that is true.
>>
>> I think that my problem is that in my architecture there are two
>> register classes which can do all arithmetic operation but class X can
>> also do loads and stores and class Y can also do DSP operations.
>>
>> So when there are for example two DSP operations and between them some
>> arithmetic operations I expect to use only class Y but GCC prefers to
>> copy registers and do the arithmetic operations using X because for
>> some reason it determined that the prefered class for the registers in
>> the arithmetic operations is X.
>>
>> It seems that determining the class does not look at the whole flow
>> but rather looks only at insns in which the register appears.
>>
> Defining classes for pseudos is already one of the most expensive operation
> in IRA.  Looking at the flow would make it even more complicated (I even
> don't know how to use this to improve the allocation because it means live
> range splitting before coloring and before defining classes which could help
> do live range splitting reasonably taking register pressure into account).
>>
>> Do I understand the situation correctly?
>
> Yes, I guess.
>>
>> Is there something I can do about it?
>
> I'd recommend to try ira-improv branch.  I think that part of the problem is
> in usage of cover classes.  The branch removes the cover classes and permits
> IRA to use intersected register classes and that helps to assign better hard
> registers.
>
>

I tried now this branch and got better results for some cases but
still in other cases I get lots of redundent register copies.
I might be missing something from the gcc history but I wonder why do
we need to limit the coloring stage to select a hard reg from a class
that was chosen by a prior stage.
Why not simply put in the interference graph edges for all registers
which are not possible for a pseudo and let the coloring algorithm
select the best hard reg.
It seems that choosing a class for a pseudo before the coloring only
makes it impossible for the coloring to get to the best solution.

Roy.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2010-12-23 16:48 ` Vladimir Makarov
@ 2010-12-23 17:22   ` Jeff Law
  2010-12-27 15:43   ` roy rosen
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff Law @ 2010-12-23 17:22 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: roy rosen, gcc

On 12/23/10 09:50, Vladimir Makarov wrote:
>
> Defining classes for pseudos is already one of the most expensive 
> operation in IRA.  Looking at the flow would make it even more 
> complicated (I even don't know how to use this to improve the 
> allocation because it means live range splitting before coloring and 
> before defining classes which could help do live range splitting 
> reasonably taking register pressure into account).
I've often wondered if we could use some of the class information to 
guide range splitting.  If a pseudo has contexts where it must be in 
class A and other contexts where it could be in class B, then there may 
be a reasonable split point where we could split the pseudo so that the 
split pseudos could be allocated into A & B respectively.

I looked at this eons ago with trying to split pseudos which had to be 
allocated to a particular hard reg over a small range, but could be 
allocated in a much larger class of regs elsewhere.  It worked, but was 
unmaintainable.  The other downside is we had defined the problem so 
narrowly that while the generated code clearly looked better, the net 
effect was unmeasurable as the reloads we avoided were typically outside 
of loops.

Jeff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2010-12-23  8:13 register allocation roy rosen
@ 2010-12-23 16:48 ` Vladimir Makarov
  2010-12-23 17:22   ` Jeff Law
  2010-12-27 15:43   ` roy rosen
  0 siblings, 2 replies; 41+ messages in thread
From: Vladimir Makarov @ 2010-12-23 16:48 UTC (permalink / raw)
  To: roy rosen; +Cc: gcc

On 12/23/2010 03:13 AM, roy rosen wrote:
> Hi All,
>
> I am looking at the code generated by my port and it seems that I have
> a problem that too many copies between registers are generated.
> I looked a bit at the register allocation and wanted to verify that I
> understand its behavior.
>
> Is that true that it first chooses a register class for each pseodo
> and only then starts coloring?
>
Yes, that is true.
> I think that my problem is that in my architecture there are two
> register classes which can do all arithmetic operation but class X can
> also do loads and stores and class Y can also do DSP operations.
>
> So when there are for example two DSP operations and between them some
> arithmetic operations I expect to use only class Y but GCC prefers to
> copy registers and do the arithmetic operations using X because for
> some reason it determined that the prefered class for the registers in
> the arithmetic operations is X.
>
> It seems that determining the class does not look at the whole flow
> but rather looks only at insns in which the register appears.
>
Defining classes for pseudos is already one of the most expensive 
operation in IRA.  Looking at the flow would make it even more 
complicated (I even don't know how to use this to improve the allocation 
because it means live range splitting before coloring and before 
defining classes which could help do live range splitting reasonably 
taking register pressure into account).
> Do I understand the situation correctly?
Yes, I guess.
> Is there something I can do about it?
I'd recommend to try ira-improv branch.  I think that part of the 
problem is in usage of cover classes.  The branch removes the cover 
classes and permits IRA to use intersected register classes and that 
helps to assign better hard registers.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* register allocation
@ 2010-12-23  8:13 roy rosen
  2010-12-23 16:48 ` Vladimir Makarov
  0 siblings, 1 reply; 41+ messages in thread
From: roy rosen @ 2010-12-23  8:13 UTC (permalink / raw)
  To: gcc

Hi All,

I am looking at the code generated by my port and it seems that I have
a problem that too many copies between registers are generated.
I looked a bit at the register allocation and wanted to verify that I
understand its behavior.

Is that true that it first chooses a register class for each pseodo
and only then starts coloring?

I think that my problem is that in my architecture there are two
register classes which can do all arithmetic operation but class X can
also do loads and stores and class Y can also do DSP operations.

So when there are for example two DSP operations and between them some
arithmetic operations I expect to use only class Y but GCC prefers to
copy registers and do the arithmetic operations using X because for
some reason it determined that the prefered class for the registers in
the arithmetic operations is X.

It seems that determining the class does not look at the whole flow
but rather looks only at insns in which the register appears.

Do I understand the situation correctly?
Is there something I can do about it?

Thanks, Roy.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
@ 2005-11-24 20:51 Joern RENNECKE
  0 siblings, 0 replies; 41+ messages in thread
From: Joern RENNECKE @ 2005-11-24 20:51 UTC (permalink / raw)
  To: Ian Lance Taylor, Andrew MacLeod; +Cc: gcc mailing list

In http://gcc.gnu.org/ml/gcc/2005-11/msg01163.html, Ian Lance Taylor wrote:

> Either way, register elimination can cause addresses which were valid
> to become invalid, typically because valid offsets from the frame
> pointer become invalid offsets from the stack pointer.  So that needs
> to be cleaned up somewhere.

This is not just about some requiring some cleanup somewhere.  Register
elimination and stack slot allocation determine the exact addresses that
are used, which in turn determine what reload inheritance is possible for
address reloads that are for stack slots which are close together on the
stack.  Getting this right is essential to avoid performance degradation on
some platforms.  These targets typically use LEGITIMIZE_RELOAD_ADDRESS to
split out-of-range addresses into a normal form with a base address load
and a memory access using this base with a small offset.

On the other hand, the hard register spills appear to offer a new
opportunity: we have talked about shrink-wrapping code in the past, but
have never implemented this in gcc.
I think that register saves/restores can be considered
special cases of hard register spills.  In order to do this efficiently,
there would have to be some interface with the target to exploit insn
sequences that can save/restore multiple registers more efficiently in
bulk, .e.g load/store multiple, or auto-increment use on targets that
are otherwise ACCUMULATE_OUTGOING_ARGS.  On the other hand, these
techniques can also help when we need to spill multiple hard registers
around a tight loop.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-23 14:06   ` Michael Matz
@ 2005-11-23 20:50     ` Peter Bergner
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Bergner @ 2005-11-23 20:50 UTC (permalink / raw)
  To: Michael Matz; +Cc: Andrew MacLeod, gcc mailing list

On Wed, 2005-11-23 at 15:05 +0100, Michael Matz wrote:
> > Spill Cost Engine [page(s) 26-29]:
> >     * The register allocator should not be estimating the execution
> >       frequency of a basic block as 10^nesting level.  That information
> >       should be coming from the cfg which comes from profile data or
> >       from a good static profile.  The problem with 10^loop nesting
> >       level is that we can overestimate the spill costs for some
> >       pseudos.  For example:
> >     	while (...) {
> >     	  <use of "a">
> >     	  if (...)
> >     	    <use of "b">
> >     	  else
> >     	    <use of "b"
> >     	}
> >       In the code above, "b"'s spill cost will be twice that of "a",
> >       when they really should have the same spill cost.
> 
> Nearly.  "b" _is_ more costly to spill, code size wise.  All else being 
> equal it's better to spill "a" in this case.  But the cost is of course 
> not twice as large, as you say.  I.e. I agree with you that the metric 
> should be based exclusively on the BB frequencies attached to the CFG, not 
> any nesting level.  Also like in new-ra ;)

The spill cost for a pseudo in a classic Chaitin/Briggs allocator does
not take number of spill instructions inserted into account, so "b"'s
spill cost would be twice that of "a" if we were to use 10^nesting
level.  That said, I think we're all in agreement that using basic
block frequencies from the cfg is the correct thing to do and that
taking static spill instruction counts into account is a good idea
which Andrew's proposal does by using it as a tie breaker.

I assume it goes without saying that when using -Os, spill cost will
be used as the tie breaker when two pseudos have the same static spill
instruction counts.

Peter

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-23 17:07   ` Andrew MacLeod
@ 2005-11-23 20:43     ` Ian Lance Taylor
  0 siblings, 0 replies; 41+ messages in thread
From: Ian Lance Taylor @ 2005-11-23 20:43 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc mailing list

Andrew MacLeod <amacleod@redhat.com> writes:

> > The current reload pass includes general heuristics to handle
> > reloading memory addresses.  This code knows things like "if stack
> > pointer plus displacement is not a valid memory address, try loading
> > the displacement into a register."  Many targets currently rely on
> > those heuristics to generate valid code.  I haven't been able to quite
> > pin down where this happens in your proposal.  For example, it's easy
> > for an address to use the frame pointer and be valid before reload,
> > and then for reload to eliminate the frame pointer (in fact, in your
> > scheme, what does frame pointer elimination?) and produce an offset
> > from the stack pointer which is invalid.  That is, spill code or frame
> > pointer elimination can generate invalid address, and something needs
> > to fix them up.  Where does that happen, and how?
> > 
> 
> 
> To be fair, I haven't given register elimination a lot of thought yet. I
> was presuming it could be inserted as a pass or as a separate component
> of the spiller.  Let me get back to you, I must investigate the
> issues :-). A couple of prime examples would be helpful :-)  I'll write
> up a new section for it.

The way gcc currently works, register elimination can not always be a
separate pass (naturally that might be different in your proposal).
The MIPS16, for example, requires a frame pointer if the frame size
does not fit in a signed 16-bit offset.  But the frame size can change
during the spilling process.  So you when you start spilling you might
think that you can use the stack pointer for all references to the
stack frame, and you might use the frame pointer as a general
register.  But then spilling might cause the stack frame to be 32768
bytes or larger.  So then you need to do the spilling (or, really,
global register allocation) all over again, but this time you can't
use the frame pointer.

However, in fairness, for most targets register elimination is based
on factors that don't change during the register allocation process,
like whether the function is a leaf or not.  In those cases you can
decide at the start of the spilling process exactly which registers
you can eliminate, and you could eliminate them in a separate pass.

Either way, register elimination can cause addresses which were valid
to become invalid, typically because valid offsets from the frame
pointer become invalid offsets from the stack pointer.  So that needs
to be cleaned up somewhere.  In current gcc, it's handled inside
reload, because it is a special case of the general case of pushing
constants into memory due to register constraints, or rematerializing
values from the stack or from global variables.

Waving my hands wildly, it seems to me that it's not quite enough to
have descriptions of how to spill every register class.  It seems to
me that you also need descriptions of how to handle each addressing
mode, and in particular how to convert invalid addresses generated
during the register elimination or spilling process into valid
addresses.  This is straightforward for most CPUs, but can get
moderately ugly for something like 68020 (where you have a lot of
choices) or SH (where practically everything needs to use r0).

> Scheduling should be able to use heuristics and perhaps some of the
> tools of the allocator (such as register pressure and number of register
> available) to prevent itself from being too stupid if so desired.

The way gcc currently works, it's really hard for the scheduler to
have decent, portable, register pressure heuristics, because we
haven't done instruction selection and we haven't generated spill
code.  I spent a lot of time optimizing for XScale, and having reload
insert an unexpected register spill can be a 10% performance hit.

I think that what I would like to see out of the scheduler would be
some way to say things like "if you find you have to spill at
instruction X, try moving instruction Y from before instruction X to
after instruction X and see if that gives you better register
allocation."

But this is pie in the sky stuff anyhow.

Thanks for your reply.

Ian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-22 19:26 ` Peter Bergner
                     ` (2 preceding siblings ...)
  2005-11-23 14:06   ` Michael Matz
@ 2005-11-23 17:08   ` Andrew MacLeod
  3 siblings, 0 replies; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-23 17:08 UTC (permalink / raw)
  To: Peter Bergner; +Cc: gcc mailing list

On Tue, 2005-11-22 at 13:26 -0600, Peter Bergner wrote:

> Register Coalescing [page(s) 8]:
>     * We will probably need some method for telling the coalescer
>       that a particular copy (or type of copy/copies) should either
>       be completely ignored (ie, do not coalesce it even if it looks
>       coalescable) or to think very hard before it does.  Examples of
>       copies we might want to skip would be copies inserted to satisfy
>       instruction constraints or some type of spilling/splitting code.
>       Examples of copies we might want to think about before blindly
>       coalescing might be pseudo - hard register copies, or copies
>       that would lead to an unacceptable increase in register 
>       constraints.

sure, this wouldn't be hard to do.  that could be flagged right in the
insn annotation of the copy.

> 
> Global Allocation [page(s) 10]:
>     * I'd like to keep the design flexible enough (if it's at all
>       possible) in case we want to attempt interprocedural register
>       allocation in the future.

I see no reason why that couldn't be done.

>     * I'd also like to allow Classic Chaitin and Briggs algorithms
>       which do iterative spilling (ie, color -> spill -> color ...).
>       This would be beneficial in that it gives us a well known
>       baseline to which all other algorithms/spilling heuristics can
>       be compared to.  It has the added benefit of making publishing
>       easier if your baseline allocator is already implemented.
> 

The very very first cut may well do this. It avoids writing much of
spiller and gets you going. Just spew out stores to spill :-)
Regardless, it would be trivial to do this after the fact when you
already have the pass written.

> Spiller [page(s) 10-11]:
>     * If we are doing a single pass Chaitin style allocation, then
>       I agree that the spiller would map all unallocated pseudos
>       into hardware registers.  However, if we are doing a multi
>       pass Chaitin style allocation, then the spiller should not
>       map the unallocated pseudos to hardware registers, but simply
>       insert the required spill code leaving the spilled pseudos as
>       pseudos.  The spiller should be flexible enough to do both.

That would be pretty straightforward. 

>     * I can also envision someone wanting to spill only a subset
>       of the unallocated pseudos, so we should handle that scenario.

You could easily do this by flagging pseudos you don't want spill code
generated for.  The spiller would simply ignore those so flagged. Or
vice versa.

> Insn Annotations [page(s) 17-18]:

>     * Encoding register pressure summary info as +1, -2 or 0 is fine,
>       but it is not a total solution.  In particular, it doesn't
>       handle register clobbers adequately.  An example would be the

True. we might need an additional value to represent that, or some other
mechanism. I figured we could work out those kinds of details when we
get closer to actually implementing it. The register pressure engine is
a bit further out than some of the rest of them, so I didn't put much
brainpower into it. It could evolve into something else easily. This
just seemed like a quick and dirty way to go.

>       for register classes with distinct subclasses (eg, GPRS and a
>       subset of GPRS capable of acting as index registers), would you
>       have separate counters?

I wasn't planning to track class subranges.  It might be an enhancement
someone may find interesting to try, and the intent is stated (perhaps
subtly) to make sure that adding additional register pressure values is
not difficult.  There is the expectation that at some point we may want
to track FPRs, GPRs and possibly all classes at the same time.  There is
no reason that couldn't be extended to include any particular subset you
found interesting.  This would have to be a separate value.

> 
> Spill Cost Engine [page(s) 26-29]:
>     * The register allocator should not be estimating the execution
>       frequency of a basic block as 10^nesting level.  That information
>       should be coming from the cfg which comes from profile data or
>       from a good static profile.  The problem with 10^loop nesting

Absolutely correct. That was simply thrown in as as simple starting
point.  I had forgotten that we now have static/dynamic CFG info always
available, so certainly we use those values rather than an ancient
mechanism that assumed nothing was available.  I will update the
document.

>       level is that we can overestimate the spill costs for some
>       pseudos.  For example:
>     	while (...) {
>     	  <use of "a">
>     	  if (...)
>     	    <use of "b">
>     	  else
>     	    <use of "b"
>     	}
>       In the code above, "b"'s spill cost will be twice that of "a",
>       when they really should have the same spill cost.

yes, using the CFG info they will be almost the same. B would have a
slightly higher instruction count, so if that is used to break ties, it
will be ever so slightly less favored, all other things being equal.

Thanks for the comments!

Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-20  0:37 ` Steven Bosscher
@ 2005-11-23 17:08   ` Andrew MacLeod
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-23 17:08 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc

On Sun, 2005-11-20 at 01:37 +0100, Steven Bosscher wrote:
> On Thursday 17 November 2005 17:53, Andrew MacLeod wrote:
> > http://people.redhat.com/dnovillo/rable.pdf
> 
> How are the insn annotations and caches you propose different from
> what df.c already does?
> 

I don't know, I haven't really looked at them. They may very well be
similar. I indicated we need them, and if these are suitable for use, we
can either use them or twist them to my own nefarious purposes :-). I
will likely need to add additional information at a minimum.

Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-19 19:31 ` Ian Lance Taylor
  2005-11-19 20:20   ` Denis Chertykov
  2005-11-20  0:20   ` Giovanni Bajo
@ 2005-11-23 17:07   ` Andrew MacLeod
  2005-11-23 20:43     ` Ian Lance Taylor
  2 siblings, 1 reply; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-23 17:07 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc mailing list

On Fri, 2005-11-18 at 22:18 -0800, Ian Lance Taylor wrote:
> Andrew MacLeod <amacleod@redhat.com> writes:

> Secondary/tertiary reloads and reload_in/out patterns are apparently
> subsumed by the Spill Engine.  Porting a target to the new framework
> is going to require providing the Spill Engine with the appropriate
> instructions for storing and loading each native register class,
> including a description of any temporary or scratch registers
> required.  That seems reasonable.  One minor thing that I don't quite
> understand is that bit about reaching a limit of SPILLIDs.  For
> targets with limited displacement, what matters is when you start
> needing larger displacements, which as far as I can see has more to do
> with the overall stack frame size than the number of SPILLIDs.
> 

There is a correlation between the number of SPILLIDs and the amount of
stack space you are going to need. If each spill takes 4 bytes, and
there is 1000 bytes available in the limited displacement pattern
(already taken any fixed requirements out), when SPILLID reaches 250,
you either have to try to compress SPILLIDs by some mechanism (such as
colouring them), or switch to the less efficient pattern which uses
other registers.  Since displacements aren't assigned yet, the number of
SPILLIDs is the only measure there is to decide when to do this. It is
possible that you could commit these spills to memory at this point, if
that served a purpose. 

If the less efficient store doesn't actually have any different register
requirements, there is no need to go through this. The correct insn
would be chosen based on the eventually assigned displacement.  

> 
> Reload inheritance is a complex idea threaded through reload.  The
> goal is to reuse values which happen to already be in registers
> whenever possible.  In particular, to do this for reloads required for
> a single insn.  A typical simple example is 'a++' where 'a' winds up
> living on the stack at a large unrepresentable offset.  You don't want
> to wind up computing the address of 'a' twice, although that is what a
> naive spill code implementation.  That example is too simple, I
> suppose, but it has the right general flavor.
> 

As the spiller steps through the program, it tracks exactly what is in
each hardware register. It also uses lookahead to see how register are
used in the near future. When a value is loaded, every attempt is made
to reuse that value.  This is one of the reasons spilled pseudos are
simply left alone until the spiller sees them. The two uses of the
address calculation of A should already be in the same pseudo when the
spiller sees them. It will simply reuse the value if they are close to
each other and it is possible with the various other register
constraints.  

> Reload inheritance is particularly important on machines with limited
> numbers of registers and limited addressing capability--e.g., SH,
> Thumb, MIPS16.  The current reload implementation needs to do reload
> inheritance as it goes, or it will run out of spill registers in some
> cases.  That is why a reload CSE pass after reload is not enough to
> solve the problem.

Which is one of the reasons the spiller understands how to spill
hardware registers. Sometimes there is no spill register available,
and/or there is a value which is used multiple times close together,
but no register available over the range. It may be better to
temporarily spill a value already assigned to a hardware register for a
period and free up the additional register.  Thats where the lookahead
comes in real handy :-)

> The current reload pass includes general heuristics to handle
> reloading memory addresses.  This code knows things like "if stack
> pointer plus displacement is not a valid memory address, try loading
> the displacement into a register."  Many targets currently rely on
> those heuristics to generate valid code.  I haven't been able to quite
> pin down where this happens in your proposal.  For example, it's easy
> for an address to use the frame pointer and be valid before reload,
> and then for reload to eliminate the frame pointer (in fact, in your
> scheme, what does frame pointer elimination?) and produce an offset
> from the stack pointer which is invalid.  That is, spill code or frame
> pointer elimination can generate invalid address, and something needs
> to fix them up.  Where does that happen, and how?
> 

To be fair, I haven't given register elimination a lot of thought yet. I
was presuming it could be inserted as a pass or as a separate component
of the spiller.  Let me get back to you, I must investigate the
issues :-). A couple of prime examples would be helpful :-)  I'll write
up a new section for it.

> le.
> I don't want to argue that your scheme needs to solve every problem.
> But I think that any attempt to tackle the register allocator should
> think seriously about better integration with the scheduler.  I don't
> see anything about that in your proposal.

There isn't :-). I have never been a proponent of integrating register
allocation and scheduling. Quite the opposite, I prefer keeping them
separate.  They can do some limited communication, but should not be
tightly intertwined. Either should be capable of running without knowing
the other exists. 

Scheduling should be able to use heuristics and perhaps some of the
tools of the allocator (such as register pressure and number of register
available) to prevent itself from being too stupid if so desired.

Conversely, register allocation may be able to call into the scheduler
to determine if a spill or rematerialization sequence is cheaper/more
expensive in some locations than others, and that sort of thing. The
register allocator can help the scheduler by trying to spread its use of
register out a bit. Using the minimum number of register is not alway
the right thing :-) These types of things are all easy to add after the
fact without affecting the design in any way.  (I suppose anyone what
wanted to experiment with it would be able to add a scheduling pass or
scheduling aware optimization(s) to the allocator)

Yet another one of those religious subjects :-)

Thanks for the comments!
Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-22 19:26 ` Peter Bergner
  2005-11-22 21:55   ` Steven Bosscher
       [not found]   ` <200511222256.13823.>
@ 2005-11-23 14:06   ` Michael Matz
  2005-11-23 20:50     ` Peter Bergner
  2005-11-23 17:08   ` Andrew MacLeod
  3 siblings, 1 reply; 41+ messages in thread
From: Michael Matz @ 2005-11-23 14:06 UTC (permalink / raw)
  To: Peter Bergner; +Cc: Andrew MacLeod, gcc mailing list

Hi,

On Tue, 22 Nov 2005, Peter Bergner wrote:

> Spill Location Optimizer [page(s) 11]:
>     * The IBM iSeries backend has this optimization.  During spilling,
>       it inserts copies to/from spill pseudos (essentially like another
>       register class) which represent the stores/loads from the stack.
>       These spill pseudos can then be dead code eliminated, coalesced
>       and colored (using an interference graph) just like any other
>       normal pseudo.  Normal Chaitin coloring (using k = infinity) does
>       a good job of minimizing the amount of stack space used for
>       spilled pseudos.

This is btw. also done by the new-ra branch.  Instead of spilling to stack 
directly it spills to special new pseudo regs.  The obvious problem with 
that is a phase ordering problem, namely that if you only have pseudo 
stack locations (the pseudo regs in this case) you don't know the final 
insn sequence (e.g. if the final stack offset happens to be 
unrepresentable so that insns are necessary to actually construct the 
stack address for the load/store).  That's why new-ra leaves the stack 
slots as pseudos only for one round, and then assign real stack positions 
to them (and recolors the possibly new insns and affected pseudos).

> Spill Cost Engine [page(s) 26-29]:
>     * The register allocator should not be estimating the execution
>       frequency of a basic block as 10^nesting level.  That information
>       should be coming from the cfg which comes from profile data or
>       from a good static profile.  The problem with 10^loop nesting
>       level is that we can overestimate the spill costs for some
>       pseudos.  For example:
>     	while (...) {
>     	  <use of "a">
>     	  if (...)
>     	    <use of "b">
>     	  else
>     	    <use of "b"
>     	}
>       In the code above, "b"'s spill cost will be twice that of "a",
>       when they really should have the same spill cost.

Nearly.  "b" _is_ more costly to spill, code size wise.  All else being 
equal it's better to spill "a" in this case.  But the cost is of course 
not twice as large, as you say.  I.e. I agree with you that the metric 
should be based exclusively on the BB frequencies attached to the CFG, not 
any nesting level.  Also like in new-ra ;)

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
       [not found]   ` <200511222256.13823.>
@ 2005-11-22 22:58     ` Peter Bergner
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Bergner @ 2005-11-22 22:58 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc, Andrew MacLeod, Daniel Berlin

On Tue, 2005-11-22 at 22:56 +0100, an unknown sender wrote:
> On Tuesday 22 November 2005 20:26, Peter Bergner wrote:
> > Insn Annotations [page(s) 17-18]:
> >     * I like the idea of easy access to the register usage info
> >       provided by the insn annotations.  RTL isn't really setup
> >       for making that easy.
> 
> But it is if you use df.c.   Really, it is.  It is right there:
> reg-use and reg-def chains per insn, UD and DU chains, etc.

I meant to say that I like the idea of easy access to an insn's
register usage info,  Whether that comes from insn annotations
or from df.c, I personally don't care.  I just don't want to
hunt around the RTL insn for them, since that's time consuming.


Peter



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-22 19:26 ` Peter Bergner
@ 2005-11-22 21:55   ` Steven Bosscher
       [not found]   ` <200511222256.13823.>
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 41+ messages in thread
From: Steven Bosscher @ 2005-11-22 21:55 UTC (permalink / raw)
  To: gcc; +Cc: Peter Bergner, Andrew MacLeod, Daniel Berlin

On Tuesday 22 November 2005 20:26, Peter Bergner wrote:
> Insn Annotations [page(s) 17-18]:
>     * I like the idea of easy access to the register usage info
>       provided by the insn annotations.  RTL isn't really setup
>       for making that easy.

But it is if you use df.c.   Really, it is.  It is right there:
reg-use and reg-def chains per insn, UD and DU chains, etc.

> Spill Cost Engine [page(s) 26-29]:
>     * The register allocator should not be estimating the execution
>       frequency of a basic block as 10^nesting level.  That information
>       should be coming from the cfg which comes from profile data or
>       from a good static profile.

The profile information or branch predictions are available in the
CFG.  In fact, even the current Chow-like allocator uses it.  See
allocno_compare in global.c.

Gr.
Steven

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
                   ` (4 preceding siblings ...)
  2005-11-20  0:37 ` Steven Bosscher
@ 2005-11-22 19:26 ` Peter Bergner
  2005-11-22 21:55   ` Steven Bosscher
                     ` (3 more replies)
  5 siblings, 4 replies; 41+ messages in thread
From: Peter Bergner @ 2005-11-22 19:26 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc mailing list

On Thu, 2005-11-17 at 11:53 -0500, Andrew MacLeod wrote:
> I have been contemplating building a GCC register allocator from scratch
> for some time.  To that end, I have put together a bit of a document
> given a high level overview of the various components I think would
> benefit GCC, and a rough description of how they would work together.  

First off, let me join the others in thanking you for submitting
your document for review.  I'm particularly glad to see you included
early instruction selection in your design, since I view the lack of
that as one of the major problems with the current register allocator.
Let me also say that I look forward to helping out anyway I can with
this project.

Peter


Live Range Disjointer [page(s) 6]:
    * As an FYI for others, this is also know in Chaitin's paper as
      "Right Number of Names" and also as "web analysis".

Instruction Selection [page(s) 7-8]:
    * Yes!  IMHO, the lack of early instruction selection is one of
      the major problems with the current allocator.
    * Note, better instruction scheduling could make good use
      of early instruction selection too.
    * Cost model (which includes register pressure info) for determining
      which native register class to use is good.

Register Coalescing [page(s) 8]:
    * We will probably need some method for telling the coalescer
      that a particular copy (or type of copy/copies) should either
      be completely ignored (ie, do not coalesce it even if it looks
      coalescable) or to think very hard before it does.  Examples of
      copies we might want to skip would be copies inserted to satisfy
      instruction constraints or some type of spilling/splitting code.
      Examples of copies we might want to think about before blindly
      coalescing might be pseudo - hard register copies, or copies
      that would lead to an unacceptable increase in register 
      constraints.

Global Allocation [page(s) 10]:
    * I'd like to keep the design flexible enough (if it's at all
      possible) in case we want to attempt interprocedural register
      allocation in the future.
    * I'd also like to allow Classic Chaitin and Briggs algorithms
      which do iterative spilling (ie, color -> spill -> color ...).
      This would be beneficial in that it gives us a well known
      baseline to which all other algorithms/spilling heuristics can
      be compared to.  It has the added benefit of making publishing
      easier if your baseline allocator is already implemented.

Spiller [page(s) 10-11]:
    * If we are doing a single pass Chaitin style allocation, then
      I agree that the spiller would map all unallocated pseudos
      into hardware registers.  However, if we are doing a multi
      pass Chaitin style allocation, then the spiller should not
      map the unallocated pseudos to hardware registers, but simply
      insert the required spill code leaving the spilled pseudos as
      pseudos.  The spiller should be flexible enough to do both.
    * I can also envision someone wanting to spill only a subset
      of the unallocated pseudos, so we should handle that scenario.
    * Now that I think of it, Vlad's idea seems vaguely familiar to
      Load/Store Range Analysis from PLDI93.  It's been so long since
      I read the paper, so am not 100% certain.  I don't recall hearing
      of anyone actually using Load/Store Range Analysis.

Spill Location Optimizer [page(s) 11]:
    * The IBM iSeries backend has this optimization.  During spilling,
      it inserts copies to/from spill pseudos (essentially like another
      register class) which represent the stores/loads from the stack.
      These spill pseudos can then be dead code eliminated, coalesced
      and colored (using an interference graph) just like any other
      normal pseudo.  Normal Chaitin coloring (using k = infinity) does
      a good job of minimizing the amount of stack space used for
      spilled pseudos.

Reg_Info [page(s) 16-17]:
    * I have some questions about how your register group mechanism
      will work in practice, but that can probably wait for a while.

Insn Annotations [page(s) 17-18]:
    * I like the idea of easy access to the register usage info
      provided by the insn annotations.  RTL isn't really setup
      for making that easy.
    * Encoding register pressure summary info as +1, -2 or 0 is fine,
      but it is not a total solution.  In particular, it doesn't
      handle register clobbers adequately.  An example would be the
      volatile registers being clobbered on a function call.  Also,
      for register classes with distinct subclasses (eg, GPRS and a
      subset of GPRS capable of acting as index registers), would you
      have separate counters?

Spill Cost Engine [page(s) 26-29]:
    * The register allocator should not be estimating the execution
      frequency of a basic block as 10^nesting level.  That information
      should be coming from the cfg which comes from profile data or
      from a good static profile.  The problem with 10^loop nesting
      level is that we can overestimate the spill costs for some
      pseudos.  For example:
    	while (...) {
    	  <use of "a">
    	  if (...)
    	    <use of "b">
    	  else
    	    <use of "b"
    	}
      In the code above, "b"'s spill cost will be twice that of "a",
      when they really should have the same spill cost.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
                   ` (3 preceding siblings ...)
  2005-11-19 19:31 ` Ian Lance Taylor
@ 2005-11-20  0:37 ` Steven Bosscher
  2005-11-23 17:08   ` Andrew MacLeod
  2005-11-22 19:26 ` Peter Bergner
  5 siblings, 1 reply; 41+ messages in thread
From: Steven Bosscher @ 2005-11-20  0:37 UTC (permalink / raw)
  To: gcc; +Cc: Andrew MacLeod

On Thursday 17 November 2005 17:53, Andrew MacLeod wrote:
> http://people.redhat.com/dnovillo/rable.pdf

How are the insn annotations and caches you propose different from
what df.c already does?

Gr.
Steven

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-19 19:31 ` Ian Lance Taylor
  2005-11-19 20:20   ` Denis Chertykov
@ 2005-11-20  0:20   ` Giovanni Bajo
  2005-11-23 17:07   ` Andrew MacLeod
  2 siblings, 0 replies; 41+ messages in thread
From: Giovanni Bajo @ 2005-11-20  0:20 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

Ian Lance Taylor <ian@airs.com> wrote:

> Reload inheritance is a complex idea threaded through reload.

In fact, this was cleared up in the reload-branch (as documented here:
http://gcc.gnu.org/wiki/BerndSchmidt), which seems that nobody has enough skill
and time to get into a mergeable state. Now that we're entering Stage 1, it'd
be great if somebody like you could find some time to work on it :)

Giovanni Bajo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-19 19:31 ` Ian Lance Taylor
@ 2005-11-19 20:20   ` Denis Chertykov
  2005-11-20  0:20   ` Giovanni Bajo
  2005-11-23 17:07   ` Andrew MacLeod
  2 siblings, 0 replies; 41+ messages in thread
From: Denis Chertykov @ 2005-11-19 20:20 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Andrew MacLeod, gcc mailing list

Ian Lance Taylor <ian@airs.com> writes:

> The current reload pass includes general heuristics to handle
> reloading memory addresses.  This code knows things like "if stack
> pointer plus displacement is not a valid memory address, try loading
> the displacement into a register."  Many targets currently rely on
> those heuristics to generate valid code.  I haven't been able to quite
> pin down where this happens in your proposal.  For example, it's easy
> for an address to use the frame pointer and be valid before reload,
> and then for reload to eliminate the frame pointer (in fact, in your
> scheme, what does frame pointer elimination?) and produce an offset
> from the stack pointer which is invalid.  That is, spill code or frame
> pointer elimination can generate invalid address, and something needs
> to fix them up.  Where does that happen, and how?

The rable.pdf completely miss description of registers elimination.
It's important and difficult part of RA.
(The "old new-ra" also havn't eliminatin)

Denis.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
                   ` (2 preceding siblings ...)
  2005-11-18  9:53 ` Giovanni Bajo
@ 2005-11-19 19:31 ` Ian Lance Taylor
  2005-11-19 20:20   ` Denis Chertykov
                     ` (2 more replies)
  2005-11-20  0:37 ` Steven Bosscher
  2005-11-22 19:26 ` Peter Bergner
  5 siblings, 3 replies; 41+ messages in thread
From: Ian Lance Taylor @ 2005-11-19 19:31 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc mailing list

Andrew MacLeod <amacleod@redhat.com> writes:

> The document is intended as a starting point and consists mostly of my
> thoughts at the moment. By the time the underlying RTL bits are done, I
> would like it to have evolved to include input from others.  The more
> useful comments there are, the better the chance of us getting a decent
> allocator. 

Thanks for thinking about this and writing this up.

When I look at reload, there are a few things that make the code
overly complex: reload inheritance, general hueristics to handle
memory addresses which are intended to work on all machines, and
secondary/tertiary reloads and the reload_in/out patterns.  So I'd
like to quickly look at these in your framework.

Secondary/tertiary reloads and reload_in/out patterns are apparently
subsumed by the Spill Engine.  Porting a target to the new framework
is going to require providing the Spill Engine with the appropriate
instructions for storing and loading each native register class,
including a description of any temporary or scratch registers
required.  That seems reasonable.  One minor thing that I don't quite
understand is that bit about reaching a limit of SPILLIDs.  For
targets with limited displacement, what matters is when you start
needing larger displacements, which as far as I can see has more to do
with the overall stack frame size than the number of SPILLIDs.

Reload inheritance is a complex idea threaded through reload.  The
goal is to reuse values which happen to already be in registers
whenever possible.  In particular, to do this for reloads required for
a single insn.  A typical simple example is 'a++' where 'a' winds up
living on the stack at a large unrepresentable offset.  You don't want
to wind up computing the address of 'a' twice, although that is what a
naive spill code implementation.  That example is too simple, I
suppose, but it has the right general flavor.

Reload inheritance is particularly important on machines with limited
numbers of registers and limited addressing capability--e.g., SH,
Thumb, MIPS16.  The current reload implementation needs to do reload
inheritance as it goes, or it will run out of spill registers in some
cases.  That is why a reload CSE pass after reload is not enough to
solve the problem.

I would be interested in your thoughts on handling this general issue
in your framework.  One approach that seems reasonably promising to me
is to first do pseudo spill code for the instructions in a basic
block, then run CSE over the pseudo code, and only then identify real
spill registers to fit into the pseudo code.

The current reload pass includes general heuristics to handle
reloading memory addresses.  This code knows things like "if stack
pointer plus displacement is not a valid memory address, try loading
the displacement into a register."  Many targets currently rely on
those heuristics to generate valid code.  I haven't been able to quite
pin down where this happens in your proposal.  For example, it's easy
for an address to use the frame pointer and be valid before reload,
and then for reload to eliminate the frame pointer (in fact, in your
scheme, what does frame pointer elimination?) and produce an offset
from the stack pointer which is invalid.  That is, spill code or frame
pointer elimination can generate invalid address, and something needs
to fix them up.  Where does that happen, and how?

The other main thing that bugs me about the current register
allocation scheme is that there is no communication with the
scheduler.  The scheduler can easily increase register pressure so
much that the resulting code is worse than if the scheduler had never
run at all.  Naturally this happens particularly on architectures with
relatively few registers and relatively deep pipelines, like XScale.
I don't want to argue that your scheme needs to solve every problem.
But I think that any attempt to tackle the register allocator should
think seriously about better integration with the scheduler.  I don't
see anything about that in your proposal.

Ian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-18  9:53 ` Giovanni Bajo
@ 2005-11-18 15:28   ` Andrew MacLeod
  0 siblings, 0 replies; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-18 15:28 UTC (permalink / raw)
  To: Giovanni Bajo; +Cc: gcc

On Fri, 2005-11-18 at 10:53 +0100, Giovanni Bajo wrote:
> Andrew MacLeod <amacleod@redhat.com> wrote:

> 1) Do you believe there will be sub-parts of this project which could be
> carried on succesfully and efficiently by programmers without previous RTL
> experience? IIUC, the optimizers will be basically abstracted by RTL details,
> but I was thinking of something within the critical path.
> 

Some aspects require a little less intimate knowledge of RTL. I think
the insn annotation and operand summary could be done with just basic
RTL understanding, and then the live range code, if it were built on top
of that, would require very little RTL knowledge. 

The RTL library, insn selector, and insn rewriter will be the most
knowledge intensive.  The degree of knowledge required for the rest of
the critical path goes down a bit to where a more basic knowledge should
be all that is necessary. 

> 2) As for the new tables needed by the RTL library, I suppose they will be
> generated by some new gen* program. Did you consider using a scripting language
> as a fast prototype to munge .md files and generate those tables? I believe it
> would allow faster initial development and more flexibility in changes. Much
> later, it can be rewritten in C.
> 

Thats quite possible, I hadn't really thought about that too much.  I
was actually thinking about looking into commandeering one or more of
the gen programs and having them generate the tables at the same time as
what they currently generate. That might be a bad idea, as I have never
really looked at them.  There is a relationship between the assembler
table and the insn alternative table for instance, so I figured it might
be possible to enhance it for what I want instead of doing something
from scratch.

If working with an existing gen program turns out to not be a good idea,
perhaps a scripting language might be helpful, but I don't have much
experience with any of them. 

Andrew

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
  2005-11-18  2:55 ` Mark Mitchell
  2005-11-18  3:27 ` Daniel Jacobowitz
@ 2005-11-18  9:53 ` Giovanni Bajo
  2005-11-18 15:28   ` Andrew MacLeod
  2005-11-19 19:31 ` Ian Lance Taylor
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Giovanni Bajo @ 2005-11-18  9:53 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc

Andrew MacLeod <amacleod@redhat.com> wrote:

> It is my intention over the next few months to do some of the initial
> underlying infrastructure bits upon which the entire document is
> based. Presuming that proceeds OK and I can build up the data
> structures I am looking for, I'll move on from there.  If anyone
> wants to help, I'm sure there will be some juicy things to do.

1) Do you believe there will be sub-parts of this project which could be
carried on succesfully and efficiently by programmers without previous RTL
experience? IIUC, the optimizers will be basically abstracted by RTL details,
but I was thinking of something within the critical path.

2) As for the new tables needed by the RTL library, I suppose they will be
generated by some new gen* program. Did you consider using a scripting language
as a fast prototype to munge .md files and generate those tables? I believe it
would allow faster initial development and more flexibility in changes. Much
later, it can be rewritten in C.

Giovanni Bajo

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
  2005-11-18  2:55 ` Mark Mitchell
@ 2005-11-18  3:27 ` Daniel Jacobowitz
  2005-11-18  9:53 ` Giovanni Bajo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Daniel Jacobowitz @ 2005-11-18  3:27 UTC (permalink / raw)
  To: gcc

On Thu, Nov 17, 2005 at 11:53:31AM -0500, Andrew MacLeod wrote:
> The document is intended as a starting point and consists mostly of my
> thoughts at the moment. By the time the underlying RTL bits are done, I
> would like it to have evolved to include input from others.  The more
> useful comments there are, the better the chance of us getting a decent
> allocator. 

I don't have any useful comments, but I want to thank you for doing
this: it seems like a sound design, and a decent strategy to get GCC
moving in an area where it has been stuck in a ditch for a while now.

-- 
Daniel Jacobowitz
CodeSourcery, LLC

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2005-11-17 16:53 Andrew MacLeod
@ 2005-11-18  2:55 ` Mark Mitchell
  2005-11-18  3:27 ` Daniel Jacobowitz
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Mark Mitchell @ 2005-11-18  2:55 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc mailing list

Andrew MacLeod wrote:
> It must be the season for this sort of thing :-)

I have not yet read the entire document, but I would very much like to
applaud both the goal of improving register allocation, and the spirit
in which you've approached it: in particular, posting a design and
getting comments before starting implementation.

I'm glad to see that strategy taking hold as a general approach for
major development initiatives. :-)

-- 
Mark Mitchell
CodeSourcery, LLC
mark@codesourcery.com
(916) 791-8304

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Register Allocation
@ 2005-11-17 16:53 Andrew MacLeod
  2005-11-18  2:55 ` Mark Mitchell
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-17 16:53 UTC (permalink / raw)
  To: gcc mailing list

It must be the season for this sort of thing :-)

I have been contemplating building a GCC register allocator from scratch
for some time.  To that end, I have put together a bit of a document
given a high level overview of the various components I think would
benefit GCC, and a rough description of how they would work together.  

It is my intention over the next few months to do some of the initial
underlying infrastructure bits upon which the entire document is based.
Presuming that proceeds OK and I can build up the data structures I am
looking for, I'll move on from there.  If anyone wants to help, I'm sure
there will be some juicy things to do. 

Anyone who wishes to provide constructive comments about what is in the
write up, feel free to send them to me.  I also know there are other
projects ongoing which could be related to this work in some fashion.
Anyone reading this document who is involved with those projects and
sees something useful here or in those projects that could be combined
in some way is encouraged to let me know their thoughts and ideas as
well. 

The document is intended as a starting point and consists mostly of my
thoughts at the moment. By the time the underlying RTL bits are done, I
would like it to have evolved to include input from others.  The more
useful comments there are, the better the chance of us getting a decent
allocator. 

The .pdf file is currently available at:

http://people.redhat.com/dnovillo/rable.pdf

(Thanks dnovillo :-)

Andrew 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2004-05-02 13:27 register allocation Qiong Cai
  2004-05-02 16:56 ` Daniel Berlin
@ 2004-05-03  7:07 ` Michael Matz
  1 sibling, 0 replies; 41+ messages in thread
From: Michael Matz @ 2004-05-03  7:07 UTC (permalink / raw)
  To: Qiong Cai; +Cc: gcc

Hi,

On Sun, 2 May 2004, Qiong Cai wrote:

> * Besides those ra*.[c,h] files, is tthere any other source files related 
> to register allocation?

Those Daniel mentioned, plus some support code in df.* for -fnew-ra.  
Strictly speaking also reload* belong to the register allocator.  Look at 
them last.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: register allocation
  2004-05-02 13:27 register allocation Qiong Cai
@ 2004-05-02 16:56 ` Daniel Berlin
  2004-05-03  7:07 ` Michael Matz
  1 sibling, 0 replies; 41+ messages in thread
From: Daniel Berlin @ 2004-05-02 16:56 UTC (permalink / raw)
  To: Qiong Cai; +Cc: gcc


On May 2, 2004, at 9:27 AM, Qiong Cai wrote:

> Hi,
>
> I'm going to study register allocation in GCC.

What do you mean "study"?

To be honest, we already know a lot about GCC register allocation, it's 
shortcomings, etc.
This is true for both allocators.

>   Currently GCC has
> two register allocators.  Here some questions:

> * Is there any performance results(eg. spec2k results) available for
> these two allocators?
Yes, for various platforms.
Check the mailing list archives.

> * Besides those ra*.[c,h] files, is tthere any other source files 
> related
> to register allocation?
local-alloc, global-alloc (the "old" allocator)


> * Which function, if available, calcuate the register pressure,
> which is the defined as the max number of live variable for a basic 
> block?
>
None.

> Many thanks.
>
> Qiong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* register allocation
@ 2004-05-02 13:27 Qiong Cai
  2004-05-02 16:56 ` Daniel Berlin
  2004-05-03  7:07 ` Michael Matz
  0 siblings, 2 replies; 41+ messages in thread
From: Qiong Cai @ 2004-05-02 13:27 UTC (permalink / raw)
  To: gcc

Hi,

I'm going to study register allocation in GCC.  Currently GCC has 
two register allocators.  Here some questions:

* Is there any performance results(eg. spec2k results) available for
these two allocators?
* Besides those ra*.[c,h] files, is tthere any other source files related 
to register allocation?
* Which function, if available, calcuate the register pressure, 
which is the defined as the max number of live variable for a basic block?

Many thanks.

Qiong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2004-03-26 22:26   ` Andrew MacLeod
@ 2004-03-27 18:22     ` Andi Kleen
  0 siblings, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2004-03-27 18:22 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: gcc

Andrew MacLeod <amacleod@redhat.com> writes:
>
> tree-ssa will be doing this type of thing automatically shortly. Its a
> one line patch to enable it. (attached). I haven't turned it on yet
> because it can make a real mess of the debug information and we'd like
> to fix that :-).

Could you perhaps make that a command line flag so that people who
don't care about debug information can get the better code ?

-Andi


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2004-03-26 22:21 ` Vladimir N. Makarov
@ 2004-03-26 22:26   ` Andrew MacLeod
  2004-03-27 18:22     ` Andi Kleen
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew MacLeod @ 2004-03-26 22:26 UTC (permalink / raw)
  To: Vladimir N. Makarov; +Cc: John Lu, gcc mailing list

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

On Fri, 2004-03-26 at 15:16, Vladimir N. Makarov wrote:
> John Lu wrote:
> 
> >Hi,
> >
> >I've been working on a port based on gcc-3.3 and I've noticed that
> >better assembly code is generated if the C source has
> >separate variables declared for distinct live ranges.
> >
> >  
> >
> Pesudo-register renaming is made by -fweb.  It was written by Jan 
> Hubicka.  Probably, this option should be set up by default (although 
> there is no practically improvement for SPECInt2000 for P4, some tests 
> are really faster).  Another positive thing of usage this option is 
> better scheduling because of less anti-dependences.
> 

tree-ssa will be doing this type of thing automatically shortly. Its a
one line patch to enable it. (attached). I haven't turned it on yet
because it can make a real mess of the debug information and we'd like
to fix that :-).

Andrew

PS. tree-ssa optimizes that program to:

foo1 (p1, p2)
{
  int i.15;
  int total;
  int i;

<bb 0>:
  total = 0;
  i = 0;

<L0>:;
  total = *((int *)((unsigned int)i * 4) + p1) + total;
  i = i + 1;
  if (i <= 99) goto <L0>; else goto <L12>;

<L12>:;
  i.15 = 0;

<L6>:;
  total = *((int *)((unsigned int)i.15 * 4) + p1) + total;
  i.15 = i.15 + 2;
  if (i.15 <= 99) goto <L6>; else goto <L5>;

<L5>:;
  return total;
}


[-- Attachment #2: DD --]
[-- Type: text/plain, Size: 900 bytes --]


	* tree-ssa (rewrite_out_of_ssa): Disable coalescing of variables not
	related by a copy that share the same root variable.


Index: tree-outof-ssa.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/Attic/tree-outof-ssa.c,v
retrieving revision 1.1.2.1
diff -c -p -r1.1.2.1 tree-outof-ssa.c
*** tree-outof-ssa.c	19 Mar 2004 02:07:25 -0000	1.1.2.1
--- tree-outof-ssa.c	24 Mar 2004 19:56:28 -0000
*************** rewrite_out_of_ssa (void)
*** 1970,1977 ****
  {
    var_map map;
    int var_flags = 0;
!   int ssa_flags = (SSANORM_REMOVE_ALL_PHIS | SSANORM_USE_COALESCE_LIST
! 		   | SSANORM_COALESCE_PARTITIONS);
  
    eliminate_virtual_phis ();
  
--- 1970,1976 ----
  {
    var_map map;
    int var_flags = 0;
!   int ssa_flags = (SSANORM_REMOVE_ALL_PHIS | SSANORM_USE_COALESCE_LIST);
  
    eliminate_virtual_phis ();
  

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register Allocation
  2004-03-26 22:21 Register Allocation John Lu
@ 2004-03-26 22:21 ` Vladimir N. Makarov
  2004-03-26 22:26   ` Andrew MacLeod
  0 siblings, 1 reply; 41+ messages in thread
From: Vladimir N. Makarov @ 2004-03-26 22:21 UTC (permalink / raw)
  To: John Lu; +Cc: gcc

John Lu wrote:

>Hi,
>
>I've been working on a port based on gcc-3.3 and I've noticed that
>better assembly code is generated if the C source has
>separate variables declared for distinct live ranges.
>
>  
>
Pesudo-register renaming is made by -fweb.  It was written by Jan 
Hubicka.  Probably, this option should be set up by default (although 
there is no practically improvement for SPECInt2000 for P4, some tests 
are really faster).  Another positive thing of usage this option is 
better scheduling because of less anti-dependences.

Different live ranges of a varaible sometimes has less conflicts and 
never more.  Therefore there is more probablility that varaible values 
will be in registers.  New ra works on webs (it is definitions and 
usages of pseudo-register which can have different names).  So it should 
work for new ra too.  But there are a lot factors which could prevent 
generates a better code.

Vlad

>  
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Register Allocation
@ 2004-03-26 22:21 John Lu
  2004-03-26 22:21 ` Vladimir N. Makarov
  0 siblings, 1 reply; 41+ messages in thread
From: John Lu @ 2004-03-26 22:21 UTC (permalink / raw)
  To: gcc

Hi,

I've been working on a port based on gcc-3.3 and I've noticed that
better assembly code is generated if the C source has
separate variables declared for distinct live ranges.

For example,

     int foo1(int *p1, int *p2) {
       int i;  /* one index variable for two live ranges */
       int total;

       total=0;
       for (i=0; i<100; i++) {
         total+=p1[i];
       }

       for (i=0; i<100; i+=2) {
         total+=p1[i];
       }

       return(total);
     }

produces worse code than:

     int foo2(int *p1, int *p2) {
       int i1,i2;  /* two index variables for two live ranges */
       int total;

       total=0;
       for (i1=0; i1<100; i1++) {
         total+=p1[i1];
       }

       for (i2=0; i2<100; i2+=2) {
         total+=p1[i2];
       }

       return(total);
     }

In my port, i1 is allocated a loop register which supports faster looping,
and i2 is allocated to a gpr because a decrement by two is needed.  When
only one index variable is declared, "i" is allocated to a loop register,
but this creates bad code for the second loop, since decrement by two is
not supported  for loop registers.  This happens with and without the
"-fnew-ra" option.

I was wondering if anyone else has seen this or am I doing something wrong
in my port.

Thanks,
John Lu

^ permalink raw reply	[flat|nested] 41+ messages in thread

* register Allocation
@ 2002-03-12  6:21 Danish Samad
  0 siblings, 0 replies; 41+ messages in thread
From: Danish Samad @ 2002-03-12  6:21 UTC (permalink / raw)
  To: gcc, syed_rauf_ul_hassan


Hello
 The problem I am getting is that I have a pattern for
addition of Single Integer value. When I give the half
intger values It uses subreg to load the values. and
then perform the single integer addition. The
unoptimized RTL is


(insn 30 28 31 (set (subreg:HI (reg:SI 45) 1)
        (mem/f:HI (reg:SI 43) 0)) -1 (nil)
    (nil))

(insn 33 31 34 (set (subreg:HI (reg:SI 46) 1)
        (mem/f:HI (reg:SI 44) 0)) -1 (nil)
    (nil))

(insn 34 33 36 (set (reg:SI 47)
        (plus:SI (reg:SI 45)
            (reg:SI 46))) -1 (nil)
    (nil))

After GREG the RTL becomes

(insn 30 28 31 (set (reg:HI 1 r1)
        (mem/f:HI (reg:SI 18 a2) 0)) 2 {*movhi_insn}
(nil)
    (nil))

(note 31 30 33 "" NOTE_INSN_DELETED)

(insn 33 31 34 (set (reg:HI 3 r3)
        (mem/f:HI (reg:SI 17 a1) 0)) 2 {*movhi_insn}
(nil)
    (nil))

(insn 34 33 36 (set (reg:SI 0 r0)
        (plus:SI (reg:SI 0 r0)
            (reg:SI 2 r2))) 4 {addsi3} (nil)
    (nil))

r1 should be r0 and  r3 should be r2
What could be causing this problem




__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Register allocation
  1997-10-14  5:51 Register allocation Thomas Koenig
@ 1998-12-21 22:38 ` Jeffrey A Law
  0 siblings, 0 replies; 41+ messages in thread
From: Jeffrey A Law @ 1998-12-21 22:38 UTC (permalink / raw)
  To: Thomas KÃƒÂ¶nig; +Cc: egcs

  In message <199710141250.NAA03152@mvmap66.ciw.uni-karlsruhe.de>you write:
  > egcs 971008 with haifa enabled generates two unnecessary register
  > moves for the function for Linux i386-glibc1:
  > 
  > typedef struct pt
  > {
  >     int x;
  >     int y;
  >     struct pt *n;
  > } pt;
  > 
  > #define WALL(a) ((a)->n == 0)
  > #define SQR(a) ((double) (a)* (double) (a))
  > 
  > double e_point_point(pt *a, pt *b)
  > {
  >     double res;
  >     res = SQR(a->x - b->x) + SQR(a->y - b->y);
  >     if (WALL(a) || WALL(b)) {
  > 	res *= 4;
  >     }
  >     return res;
  > }
Yea.  This is is a generic problem with how we've handled 2 address machines.

In the past we relied on reload to rearrange operands so that one of the
inputs would match one of the ouputs.

To do so requires spilling and potentially useless register copies as seen
in this case.

We've got a pass (regmove) that is supposed to improve the code we generate
for 2 address machines, but it does not help this case for a couple of reasons.

  * The source/dest operands of the subtractions are not related by copy
    insns at the time regmove runs.

  * The code to copy a source operand to a destination operand does not
    handle cases where the source operand is a MEM (as happens with this
    code.

I've got a hack to regmove (which I'm going to ask Jim & Joern to look at)
which addresses both problems.

I get the following code for your testcase when using that patch.  Note there
is not a single copy insn in the resulting code *AND* the subtractions operate
on regs, not memory operands :-)



.globl e_point_point
	.type	 e_point_point,@function
e_point_point:
	pushl %ebp	# 83	movsi-2
	movl %esp,%ebp	# 85	movsi+2/1
	pushl %ebx	# 86	movsi-2
	movl 12(%ebp),%ebx	# 6	movsi+2/2
	movl 8(%ebp),%ecx	# 4	movsi+2/2
	movl (%ebx),%edx	# 15	movsi+2/2
	movl (%ecx),%eax	# 80	movsi+2/2
	subl %edx,%eax	# 17	subsi3+1/1
	movl 4(%ebx),%edx	# 30	movsi+2/2
	pushl %eax	# 18	*addsidi3_1-3
	fildl (%esp)
	addl $4,%esp
	movl 4(%ecx),%eax	# 77	movsi+2/2
	fmul %st(0),%st	# 26	ffshi_1+1/1
	subl %edx,%eax	# 32	subsi3+1/1
	pushl %eax	# 33	*addsidi3_1-3
	fildl (%esp)
	addl $4,%esp
	fmul %st(0),%st	# 41	ffshi_1+1/1
	faddp %st,%st(1)	# 42	ffshi_1+1/1
	cmpl $0,8(%ecx)	# 46	tstsi_1
	je .L4	# 47	bleu+1
	cmpl $0,8(%ebx)	# 50	tstsi_1
	jne .L3	# 51	bleu+1
.L4:
	fmull .LC0	# 59	ffshi_1+1/1
.L3:
	popl %ebx	# 88	pop
	movl %ebp,%esp	# 89	epilogue_set_stack_ptr
	popl %ebp	# 90	pop
	ret	# 91	return_internal
.Lfe1:
	.size	 e_point_point,.Lfe1-e_point_point
	.ident	"GCC: (GNU) egcs-2.92.31 19981220 (gcc2 ss-980609 experimental)"


  > 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Register allocation
@ 1997-10-14  5:51 Thomas Koenig
  1998-12-21 22:38 ` Jeffrey A Law
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Koenig @ 1997-10-14  5:51 UTC (permalink / raw)
  To: egcs

egcs 971008 with haifa enabled generates two unnecessary register
moves for the function for Linux i386-glibc1:

typedef struct pt
{
    int x;
    int y;
    struct pt *n;
} pt;

#define WALL(a) ((a)->n == 0)
#define SQR(a) ((double) (a)* (double) (a))

double e_point_point(pt *a, pt *b)
{
    double res;
    res = SQR(a->x - b->x) + SQR(a->y - b->y);
    if (WALL(a) || WALL(b)) {
	res *= 4;
    }
    return res;
}

Here's the assembly output:

	.file	"point.c"
	.version	"01.01"
/ GNU C version egcs-2.90.12 971008 (gcc2-970802 experimental) (i586-pc-linux-gnulibc1) compiled by GNU C version egcs-2.90.12 971008 (gcc2-970802 experimental).
/ options passed:  -O6 -fomit-frame-pointer -fno-exceptions
/ options enabled:  -fdefer-pop -fomit-frame-pointer -fcse-follow-jumps
/ -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
/ -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse
/ -finline-functions -finline -fkeep-static-consts -fcaller-saves
/ -fpcc-struct-return -frerun-cse-after-loop -fschedule-insns2
/ -fsched-interblock -fsched-spec -fcommon -fverbose-asm -fgnu-linker
/ -fregmove -falias-check -fargument-alias -m80387 -mhard-float
/ -mno-soft-float -mieee-fp -mfp-ret-in-387 -mschedule-prologue
/ -mcpu=pentium -march=pentium

gcc2_compiled.:
.section	.rodata
	.align 4
.LC1:
	.long 0x0,0x40100000
.text
	.align 4
.globl e_point_point
	.type	 e_point_point,@function
e_point_point:
	pushl %ebx
	movl 8(%esp),%edx
	movl 12(%esp),%ecx
	movl (%edx),%ebx
	movl (%ecx),%eax
	fmul %st(0),%st
	subl %eax,%ebx
	movl %ebx,%eax
        ^^^^^^^^^^^^^^
	pushl %eax
	fildl (%esp)
	addl $4,%esp
	fmul %st(0),%st
	faddp %st,%st(1)
	cmpl $0,8(%edx)
	je .L3
	cmpl $0,8(%ecx)
	jne .L2
.L3:
	fldl .LC1
	fmulp %st,%st(1)
.L2:
	popl %ebx
	ret
.Lfe1:
	.size	 e_point_point,.Lfe1-e_point_point
	.ident	"GCC: (GNU) egcs-2.90.12 971008 (gcc2-970802 experimental)"

Both of these register moves are unnecessary, and when I replace the
first one with the more obvious

	subl %eax,%ebx
	pushl %ebx
	movl 4(%edx),%ebx
	fildl (%esp)

the resulting code is indeed faster.
-- 
Thomas Koenig, Thomas.Koenig@ciw.uni-karlsruhe.de, ig25@dkauni2.bitnet.
The joy of engineering is to find a straight line on a double
logarithmic diagram.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-01-11 16:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22  1:21 Register Allocation Adrian Strätling
2004-09-22  5:22 ` tm_gccmail
2004-10-04 14:13   ` Nick Ing-Simmons
  -- strict thread matches above, loose matches on Subject: below --
2010-12-23  8:13 register allocation roy rosen
2010-12-23 16:48 ` Vladimir Makarov
2010-12-23 17:22   ` Jeff Law
2010-12-27 15:43   ` roy rosen
2011-01-03 15:41     ` Jeff Law
2011-01-05 14:44       ` roy rosen
2011-01-05 15:26         ` Jeff Law
2011-01-11 16:11         ` Vladimir Makarov
2011-01-11 15:53       ` Vladimir Makarov
2005-11-24 20:51 Register Allocation Joern RENNECKE
2005-11-17 16:53 Andrew MacLeod
2005-11-18  2:55 ` Mark Mitchell
2005-11-18  3:27 ` Daniel Jacobowitz
2005-11-18  9:53 ` Giovanni Bajo
2005-11-18 15:28   ` Andrew MacLeod
2005-11-19 19:31 ` Ian Lance Taylor
2005-11-19 20:20   ` Denis Chertykov
2005-11-20  0:20   ` Giovanni Bajo
2005-11-23 17:07   ` Andrew MacLeod
2005-11-23 20:43     ` Ian Lance Taylor
2005-11-20  0:37 ` Steven Bosscher
2005-11-23 17:08   ` Andrew MacLeod
2005-11-22 19:26 ` Peter Bergner
2005-11-22 21:55   ` Steven Bosscher
     [not found]   ` <200511222256.13823.>
2005-11-22 22:58     ` Peter Bergner
2005-11-23 14:06   ` Michael Matz
2005-11-23 20:50     ` Peter Bergner
2005-11-23 17:08   ` Andrew MacLeod
2004-05-02 13:27 register allocation Qiong Cai
2004-05-02 16:56 ` Daniel Berlin
2004-05-03  7:07 ` Michael Matz
2004-03-26 22:21 Register Allocation John Lu
2004-03-26 22:21 ` Vladimir N. Makarov
2004-03-26 22:26   ` Andrew MacLeod
2004-03-27 18:22     ` Andi Kleen
2002-03-12  6:21 register Allocation Danish Samad
1997-10-14  5:51 Register allocation Thomas Koenig
1998-12-21 22:38 ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).