public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Register allocation cost question
@ 2023-10-10 15:11 Andrew Stubbs
  2023-10-10 19:09 ` Segher Boessenkool
  2023-10-11  6:54 ` Chung-Lin Tang
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Stubbs @ 2023-10-10 15:11 UTC (permalink / raw)
  To: gcc mailing list

Hi all,

I'm trying to add a new register set to the GCN port, but I've hit a 
problem I don't understand.

There are 256 new registers (each 2048 bit vector register) but the 
register file has to be divided between all the running hardware 
threads; if you can use fewer registers you can get more parallelism, 
which means that it's important that they're allocated in order.

The problem is that they're not allocated in order. Somehow the IRA pass 
is calculating different costs for the registers within the class. It 
seems to prefer registers a32, a96, a160, and a224.

The internal regno are 448, 512, 576, 640. These are not random numbers! 
They all have zero for the 6 LSB.

What could cause this? Did I overrun some magic limit? What target hook 
might I have miscoded?

I'm also seeing wrong-code bugs when I allow more than 32 new registers, 
but that might be an unrelated problem. Or the allocation is broken? I'm 
still analyzing this.

If it matters, ... the new registers can't be used for general purposes, 
so I'm trying to set them up as a temporary spill destination. This 
means they're typically not busy. It feels like it shouldn't be this 
hard... :(

Thanks in advance.

Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-10 15:11 Register allocation cost question Andrew Stubbs
@ 2023-10-10 19:09 ` Segher Boessenkool
  2023-10-11  8:57   ` Andrew Stubbs
  2023-10-11 14:49   ` Andrew Stubbs
  2023-10-11  6:54 ` Chung-Lin Tang
  1 sibling, 2 replies; 7+ messages in thread
From: Segher Boessenkool @ 2023-10-10 19:09 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: gcc mailing list

Hi Andrew,

On Tue, Oct 10, 2023 at 04:11:18PM +0100, Andrew Stubbs wrote:
> I'm also seeing wrong-code bugs when I allow more than 32 new registers, 
> but that might be an unrelated problem. Or the allocation is broken? I'm 
> still analyzing this.

It could be connected.  both things should not happen.

> If it matters, ... the new registers can't be used for general purposes, 

What does this mean?  I think you mean they *can* be used for anything,
you just don't want to (maybe it is slow)?  If you make it allocatable
registers, they *will* be allocated for anythin the compilers deems a
good idea.

> so I'm trying to set them up as a temporary spill destination. This 
> means they're typically not busy. It feels like it shouldn't be this 
> hard... :(

So what did you do, put them later in the allocation order?  Make their
register_move_cost higher than for normal registers (but still below
memory_move_cost)?  Or what?  TARGEt_SPILL_CLASS maybe?


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-10 15:11 Register allocation cost question Andrew Stubbs
  2023-10-10 19:09 ` Segher Boessenkool
@ 2023-10-11  6:54 ` Chung-Lin Tang
  2023-10-11  8:58   ` Andrew Stubbs
  1 sibling, 1 reply; 7+ messages in thread
From: Chung-Lin Tang @ 2023-10-11  6:54 UTC (permalink / raw)
  To: Andrew Stubbs, gcc mailing list



On 2023/10/10 11:11 PM, Andrew Stubbs wrote:
> Hi all,
> 
> I'm trying to add a new register set to the GCN port, but I've hit a 
> problem I don't understand.
> 
> There are 256 new registers (each 2048 bit vector register) but the 
> register file has to be divided between all the running hardware 
> threads; if you can use fewer registers you can get more parallelism, 
> which means that it's important that they're allocated in order.
> 
> The problem is that they're not allocated in order. Somehow the IRA pass 
> is calculating different costs for the registers within the class. It 
> seems to prefer registers a32, a96, a160, and a224.
> 
> The internal regno are 448, 512, 576, 640. These are not random numbers! 
> They all have zero for the 6 LSB.
> 
> What could cause this? Did I overrun some magic limit? What target hook 
> might I have miscoded?
> 
> I'm also seeing wrong-code bugs when I allow more than 32 new registers, 
> but that might be an unrelated problem. Or the allocation is broken? I'm 
> still analyzing this.
> 
> If it matters, ... the new registers can't be used for general purposes, 
> so I'm trying to set them up as a temporary spill destination. This 
> means they're typically not busy. It feels like it shouldn't be this 
> hard... :(

Have you tried experimenting with REG_ALLOC_ORDER? I see that the GCN port currently isn't using this target macro.

Chung-Lin


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-10 19:09 ` Segher Boessenkool
@ 2023-10-11  8:57   ` Andrew Stubbs
  2023-10-11 14:49   ` Andrew Stubbs
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Stubbs @ 2023-10-11  8:57 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc mailing list



On 10/10/2023 20:09, Segher Boessenkool wrote:
> Hi Andrew,
> 
> On Tue, Oct 10, 2023 at 04:11:18PM +0100, Andrew Stubbs wrote:
>> I'm also seeing wrong-code bugs when I allow more than 32 new registers,
>> but that might be an unrelated problem. Or the allocation is broken? I'm
>> still analyzing this.
> 
> It could be connected.  both things should not happen.
> 
>> If it matters, ... the new registers can't be used for general purposes,
> 
> What does this mean?  I think you mean they *can* be used for anything,
> you just don't want to (maybe it is slow)?  If you make it allocatable
> registers, they *will* be allocated for anythin the compilers deems a
> good idea.

Nope, the "Accelerator VGPR" registers are exclusively for the use of 
the new matrix multiply instructions that we don't support (yet).

The compiler is free to use them for storing data, but there are no real 
instructions to do there.

>> so I'm trying to set them up as a temporary spill destination. This
>> means they're typically not busy. It feels like it shouldn't be this
>> hard... :(
> 
> So what did you do, put them later in the allocation order?  Make their
> register_move_cost higher than for normal registers (but still below
> memory_move_cost)?  Or what?  TARGEt_SPILL_CLASS maybe?

We put them in a new register class, with a new constraint, and 
implemented the move instructions (only) with new alternatives for the 
new class. Then implemented TARGET_SPILL_CLASS in the obvious way.

All this is working just fine as long as there are only 32 new registers 
unfixed (a0-a31); the code even runs correctly and I can see the 
spilling happening correctly.

If I enable register a32 then it prefers that, and I get wrong code. 
Using that register ought to be logically correct, albeit suboptimal, so 
I don't understand that either.

Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-11  6:54 ` Chung-Lin Tang
@ 2023-10-11  8:58   ` Andrew Stubbs
  2023-10-11 10:56     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Stubbs @ 2023-10-11  8:58 UTC (permalink / raw)
  To: Chung-Lin Tang, gcc mailing list

On 11/10/2023 07:54, Chung-Lin Tang wrote:
> 
> 
> On 2023/10/10 11:11 PM, Andrew Stubbs wrote:
>> Hi all,
>>
>> I'm trying to add a new register set to the GCN port, but I've hit a
>> problem I don't understand.
>>
>> There are 256 new registers (each 2048 bit vector register) but the
>> register file has to be divided between all the running hardware
>> threads; if you can use fewer registers you can get more parallelism,
>> which means that it's important that they're allocated in order.
>>
>> The problem is that they're not allocated in order. Somehow the IRA pass
>> is calculating different costs for the registers within the class. It
>> seems to prefer registers a32, a96, a160, and a224.
>>
>> The internal regno are 448, 512, 576, 640. These are not random numbers!
>> They all have zero for the 6 LSB.
>>
>> What could cause this? Did I overrun some magic limit? What target hook
>> might I have miscoded?
>>
>> I'm also seeing wrong-code bugs when I allow more than 32 new registers,
>> but that might be an unrelated problem. Or the allocation is broken? I'm
>> still analyzing this.
>>
>> If it matters, ... the new registers can't be used for general purposes,
>> so I'm trying to set them up as a temporary spill destination. This
>> means they're typically not busy. It feels like it shouldn't be this
>> hard... :(
> 
> Have you tried experimenting with REG_ALLOC_ORDER? I see that the GCN port currently isn't using this target macro.

The default definition is 0,1,2,3,4.... and is already the desired 
behaviour.

Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-11  8:58   ` Andrew Stubbs
@ 2023-10-11 10:56     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Earnshaw (lists) @ 2023-10-11 10:56 UTC (permalink / raw)
  To: Andrew Stubbs, Chung-Lin Tang, gcc mailing list

On 11/10/2023 09:58, Andrew Stubbs wrote:
> On 11/10/2023 07:54, Chung-Lin Tang wrote:
>>
>>
>> On 2023/10/10 11:11 PM, Andrew Stubbs wrote:
>>> Hi all,
>>>
>>> I'm trying to add a new register set to the GCN port, but I've hit a
>>> problem I don't understand.
>>>
>>> There are 256 new registers (each 2048 bit vector register) but the
>>> register file has to be divided between all the running hardware
>>> threads; if you can use fewer registers you can get more parallelism,
>>> which means that it's important that they're allocated in order.
>>>
>>> The problem is that they're not allocated in order. Somehow the IRA pass
>>> is calculating different costs for the registers within the class. It
>>> seems to prefer registers a32, a96, a160, and a224.
>>>
>>> The internal regno are 448, 512, 576, 640. These are not random numbers!
>>> They all have zero for the 6 LSB.
>>>
>>> What could cause this? Did I overrun some magic limit? What target hook
>>> might I have miscoded?
>>>
>>> I'm also seeing wrong-code bugs when I allow more than 32 new registers,
>>> but that might be an unrelated problem. Or the allocation is broken? I'm
>>> still analyzing this.
>>>
>>> If it matters, ... the new registers can't be used for general purposes,
>>> so I'm trying to set them up as a temporary spill destination. This
>>> means they're typically not busy. It feels like it shouldn't be this
>>> hard... :(
>>
>> Have you tried experimenting with REG_ALLOC_ORDER? I see that the GCN port currently isn't using this target macro.
> 
> The default definition is 0,1,2,3,4.... and is already the desired behaviour.
> 
> Andrew

You may need to define HONOR_REG_ALLOC_ORDER though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Register allocation cost question
  2023-10-10 19:09 ` Segher Boessenkool
  2023-10-11  8:57   ` Andrew Stubbs
@ 2023-10-11 14:49   ` Andrew Stubbs
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Stubbs @ 2023-10-11 14:49 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc mailing list

On 10/10/2023 20:09, Segher Boessenkool wrote:
> Hi Andrew,
> 
> On Tue, Oct 10, 2023 at 04:11:18PM +0100, Andrew Stubbs wrote:
>> I'm also seeing wrong-code bugs when I allow more than 32 new registers,
>> but that might be an unrelated problem. Or the allocation is broken? I'm
>> still analyzing this.
> 
> It could be connected.  both things should not happen.

This is now confirmed to be unrelated: the instruction moving values 
from the new registers to the old must be followed by a no-op in certain 
instruction combinations due to GCN having only partial hardware 
dependency detection.

The register allocation is therefore valid (at least in the testcases 
I've been looking at).

The question of why it prefers registers with round numbers remains open 
(and important for optimization reasons).

Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-11 14:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-10 15:11 Register allocation cost question Andrew Stubbs
2023-10-10 19:09 ` Segher Boessenkool
2023-10-11  8:57   ` Andrew Stubbs
2023-10-11 14:49   ` Andrew Stubbs
2023-10-11  6:54 ` Chung-Lin Tang
2023-10-11  8:58   ` Andrew Stubbs
2023-10-11 10:56     ` Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).