public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [MRISC32] Not getting scaled index addressing in loops
@ 2022-05-28  6:51 m
  2022-05-28  7:00 ` m
  2022-06-22 19:53 ` Andrew Pinski
  0 siblings, 2 replies; 4+ messages in thread
From: m @ 2022-05-28  6:51 UTC (permalink / raw)
  To: gcc

Hello!

I maintain a fork of GCC which adds support for my custom CPU ISA, 
MRISC32 (the machine description can be found here: 
https://github.com/mrisc32/gcc-mrisc32/tree/mbitsnbites/mrisc32/gcc/config/mrisc32 
).

I recently discovered that scaled index addressing (i.e. MEM[base + 
index * scale]) does not work inside loops, but I have not been able to 
figure out why.

I believe that I have all the plumbing in the MD that's required 
(MAX_REGS_PER_ADDRESS, REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_INDEX_P, etc), 
and I have verified that scaled index addressing is used in trivial 
cases like this:

charcarray[100];
shortsarray[100];
intiarray[100];
voidsingle_element(intidx, intvalue) {
carray[idx] = value; // OK
sarray[idx] = value; // OK
iarray[idx] = value; // OK
}

...which produces the expected machine code similar to this:

stbr2, [r3, r1] // OK
sthr2, [r3, r1*2] // OK
stwr2, [r3, r1*4] // OK

However, when the array assignment happens inside a loop, only the char 
version uses index addressing. The other sizes (short and int) will be 
transformed into code where the addresses are stored in registers that 
are incremented by +2 and +4 respectively.

voidloop(void) {
for(intidx = 0; idx < 100; ++idx) {
carray[idx] = idx; // OK
sarray[idx] = idx; // BAD
iarray[idx] = idx; // BAD
}
} ...which produces:
.L4:
sthr1, [r3] // BAD
stwr1, [r2] // BAD
stbr1, [r5, r1] // OK
addr1, r1, #1
sner4, r1, #100
addr3, r3, #2 // (BAD)
addr2, r2, #4 // (BAD)
bsr4, .L4

I would expect scaled index addressing to be used in loops too, just as 
is done for AArch64 for instance. I have dug around in the machine 
description, but I can't really figure out what's wrong.

For reference, here is the same code in Compiler Explorer, including the 
code generated for AArch64 for comparison: https://godbolt.org/z/drzfjsxf7

Passing -da (dump RTL all) to gcc, I can see that the decision to not 
use index addressing has been made already in *.253r.expand.

Does anyone have any hints about what could be wrong and where I should 
start looking?

Regards,

   Marcus


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [MRISC32] Not getting scaled index addressing in loops
  2022-05-28  6:51 [MRISC32] Not getting scaled index addressing in loops m
@ 2022-05-28  7:00 ` m
  2022-06-22 19:53 ` Andrew Pinski
  1 sibling, 0 replies; 4+ messages in thread
From: m @ 2022-05-28  7:00 UTC (permalink / raw)
  To: gcc

I'm sorry about the messed up code formatting (I blame the WYSIWYG). I 
hope the message gets through anyway (have a look at the Compiler 
Explorer link - https://godbolt.org/z/drzfjsxf7 - it has all the code).

/Marcus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [MRISC32] Not getting scaled index addressing in loops
  2022-05-28  6:51 [MRISC32] Not getting scaled index addressing in loops m
  2022-05-28  7:00 ` m
@ 2022-06-22 19:53 ` Andrew Pinski
  2022-06-23  6:01   ` m
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Pinski @ 2022-06-22 19:53 UTC (permalink / raw)
  To: m; +Cc: GCC Mailing List

On Fri, May 27, 2022 at 11:52 PM m <m@bitsnbites.eu> wrote:
>
> Hello!
>
> I maintain a fork of GCC which adds support for my custom CPU ISA,
> MRISC32 (the machine description can be found here:
> https://github.com/mrisc32/gcc-mrisc32/tree/mbitsnbites/mrisc32/gcc/config/mrisc32
> ).
>
> I recently discovered that scaled index addressing (i.e. MEM[base +
> index * scale]) does not work inside loops, but I have not been able to
> figure out why.
>
> I believe that I have all the plumbing in the MD that's required
> (MAX_REGS_PER_ADDRESS, REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_INDEX_P, etc),
> and I have verified that scaled index addressing is used in trivial
> cases like this:
>
> charcarray[100];
> shortsarray[100];
> intiarray[100];
> voidsingle_element(intidx, intvalue) {
> carray[idx] = value; // OK
> sarray[idx] = value; // OK
> iarray[idx] = value; // OK
> }
>
> ...which produces the expected machine code similar to this:
>
> stbr2, [r3, r1] // OK
> sthr2, [r3, r1*2] // OK
> stwr2, [r3, r1*4] // OK
>
> However, when the array assignment happens inside a loop, only the char
> version uses index addressing. The other sizes (short and int) will be
> transformed into code where the addresses are stored in registers that
> are incremented by +2 and +4 respectively.
>
> voidloop(void) {
> for(intidx = 0; idx < 100; ++idx) {
> carray[idx] = idx; // OK
> sarray[idx] = idx; // BAD
> iarray[idx] = idx; // BAD
> }
> } ...which produces:
> .L4:
> sthr1, [r3] // BAD
> stwr1, [r2] // BAD
> stbr1, [r5, r1] // OK
> addr1, r1, #1
> sner4, r1, #100
> addr3, r3, #2 // (BAD)
> addr2, r2, #4 // (BAD)
> bsr4, .L4
>
> I would expect scaled index addressing to be used in loops too, just as
> is done for AArch64 for instance. I have dug around in the machine
> description, but I can't really figure out what's wrong.
>
> For reference, here is the same code in Compiler Explorer, including the
> code generated for AArch64 for comparison: https://godbolt.org/z/drzfjsxf7
>
> Passing -da (dump RTL all) to gcc, I can see that the decision to not
> use index addressing has been made already in *.253r.expand.

The problem is your cost model for the indexing is incorrect; IV-OPTs
uses TARGET_ADDRESS_COST to figure out the cost of each case.
So if you don't have that implemented, then the default one is used
and that will be incorrect in many cases.
You can find IV-OPTs costs and such by using the ivopts dump:
-fdump-tree-ivopts-details .

Thanks,
Andrew Pinski


>
> Does anyone have any hints about what could be wrong and where I should
> start looking?
>
> Regards,
>
>    Marcus
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [MRISC32] Not getting scaled index addressing in loops
  2022-06-22 19:53 ` Andrew Pinski
@ 2022-06-23  6:01   ` m
  0 siblings, 0 replies; 4+ messages in thread
From: m @ 2022-06-23  6:01 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: GCC Mailing List



On 2022-06-22, Andrew Pinski wrote:
> On Fri, May 27, 2022 at 11:52 PM m <m@bitsnbites.eu> wrote:
>> Hello!
>>
>> I maintain a fork of GCC which adds support for my custom CPU ISA,
>> MRISC32 (the machine description can be found here:
>> https://github.com/mrisc32/gcc-mrisc32/tree/mbitsnbites/mrisc32/gcc/config/mrisc32
>> ).
>>
>> I recently discovered that scaled index addressing (i.e. MEM[base +
>> index * scale]) does not work inside loops, but I have not been able to
>> figure out why.
>>
>> I believe that I have all the plumbing in the MD that's required
>> (MAX_REGS_PER_ADDRESS, REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_INDEX_P, etc),
>> and I have verified that scaled index addressing is used in trivial
>> cases like this:
>>
>> charcarray[100];
>> shortsarray[100];
>> intiarray[100];
>> voidsingle_element(intidx, intvalue) {
>> carray[idx] = value; // OK
>> sarray[idx] = value; // OK
>> iarray[idx] = value; // OK
>> }
>>
>> ...which produces the expected machine code similar to this:
>>
>> stbr2, [r3, r1] // OK
>> sthr2, [r3, r1*2] // OK
>> stwr2, [r3, r1*4] // OK
>>
>> However, when the array assignment happens inside a loop, only the char
>> version uses index addressing. The other sizes (short and int) will be
>> transformed into code where the addresses are stored in registers that
>> are incremented by +2 and +4 respectively.
>>
>> voidloop(void) {
>> for(intidx = 0; idx < 100; ++idx) {
>> carray[idx] = idx; // OK
>> sarray[idx] = idx; // BAD
>> iarray[idx] = idx; // BAD
>> }
>> } ...which produces:
>> .L4:
>> sthr1, [r3] // BAD
>> stwr1, [r2] // BAD
>> stbr1, [r5, r1] // OK
>> addr1, r1, #1
>> sner4, r1, #100
>> addr3, r3, #2 // (BAD)
>> addr2, r2, #4 // (BAD)
>> bsr4, .L4
>>
>> I would expect scaled index addressing to be used in loops too, just as
>> is done for AArch64 for instance. I have dug around in the machine
>> description, but I can't really figure out what's wrong.
>>
>> For reference, here is the same code in Compiler Explorer, including the
>> code generated for AArch64 for comparison: https://godbolt.org/z/drzfjsxf7
>>
>> Passing -da (dump RTL all) to gcc, I can see that the decision to not
>> use index addressing has been made already in *.253r.expand.
> The problem is your cost model for the indexing is incorrect; IV-OPTs
> uses TARGET_ADDRESS_COST to figure out the cost of each case.
> So if you don't have that implemented, then the default one is used
> and that will be incorrect in many cases.
> You can find IV-OPTs costs and such by using the ivopts dump:
> -fdump-tree-ivopts-details .
>
> Thanks,
> Andrew Pinski

Thank you Andrew!

I added a TARGET_ADDRESS_COST implementation that just returns zero,
as a test, and sure enough scaled indexed addressing was used.

Now I will just have to figure out a more accurate implementation for
my architecture.

Regards,

   Marcus

>
>> Does anyone have any hints about what could be wrong and where I should
>> start looking?
>>
>> Regards,
>>
>>     Marcus
>>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-23  6:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28  6:51 [MRISC32] Not getting scaled index addressing in loops m
2022-05-28  7:00 ` m
2022-06-22 19:53 ` Andrew Pinski
2022-06-23  6:01   ` m

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).