* Move insn out of the way
@ 2011-08-10 11:20 Paulo J. Matos
2011-08-10 11:40 ` Richard Guenther
0 siblings, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 11:20 UTC (permalink / raw)
To: gcc
Hi,
I am having a size optimisation issue with GCC-4.6.1.
The problem boils down to the fact that I have no idea on the best way
to hint to GCC that a given insn would make more sense someplace else.
The C code is simple:
int16_t mask(uint32_t a)
{
return (x & a) == a;
}
int16_t is QImode and uint32_t is HImode.
After combine the insn chain (which is unmodified all the way to ira) is
(in simplified form):
regQI 27 <- regQI AH [a]
regQI 28 <- regQI AL [a+1]
regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
regQI 30 <- regQI AL
regQI 29 <- regQI AH
regQI 24 <- 1
if regQI 29 != regQI 27
goto labelref 20
if regQI 30 != regQI 28
goto labelref 20
goto labelref 22
labelref 20
regQI 24 <- 0
labelref 22
regQI AL <- regQI 24
The problem resides in `regQI 24 <- 1' being before the jumps.
Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL,
which creates loads of conflicts and reloads. If that same insn would be
moved to after the jumps and before the `goto labelref 22' then all
would be fine cause by then regs 27, 28, 29, 30 are dead.
It's obviously hard to point to a solution but I was wondering if
there's a way to hint to GCC that moving an insn might help the code
issue. Or if I should look into a why an existing pass is not already
doing that.
Cheers,
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 11:20 Move insn out of the way Paulo J. Matos
@ 2011-08-10 11:40 ` Richard Guenther
2011-08-10 11:42 ` Richard Guenther
2011-08-10 13:46 ` Paulo J. Matos
0 siblings, 2 replies; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 11:40 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: gcc
On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> Hi,
>
> I am having a size optimisation issue with GCC-4.6.1.
> The problem boils down to the fact that I have no idea on the best way to
> hint to GCC that a given insn would make more sense someplace else.
>
> The C code is simple:
> int16_t mask(uint32_t a)
> {
> return (x & a) == a;
> }
>
> int16_t is QImode and uint32_t is HImode.
> After combine the insn chain (which is unmodified all the way to ira) is (in
> simplified form):
> regQI 27 <- regQI AH [a]
> regQI 28 <- regQI AL [a+1]
> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
> regQI 30 <- regQI AL
> regQI 29 <- regQI AH
> regQI 24 <- 1
> if regQI 29 != regQI 27
> goto labelref 20
> if regQI 30 != regQI 28
> goto labelref 20
> goto labelref 22
> labelref 20
> regQI 24 <- 0
> labelref 22
> regQI AL <- regQI 24
>
> The problem resides in `regQI 24 <- 1' being before the jumps.
> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which
> creates loads of conflicts and reloads. If that same insn would be moved to
> after the jumps and before the `goto labelref 22' then all would be fine
> cause by then regs 27, 28, 29, 30 are dead.
>
> It's obviously hard to point to a solution but I was wondering if there's a
> way to hint to GCC that moving an insn might help the code issue. Or if I
> should look into a why an existing pass is not already doing that.
On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0
which is then if-converted. Modified testcase:
long long x;
_Bool __attribute__((regparm(2))) mask (long long a)
{
return (x & a) == a;
}
on i?86 gets you
mask:
.LFB0:
.cfi_startproc
pushl %ebx
.cfi_def_cfa_offset 8
.cfi_offset 3, -8
movl %eax, %ebx
andl x, %ebx
movl %edx, %ecx
andl x+4, %ecx
xorl %ebx, %eax
xorl %ecx, %edx
orl %edx, %eax
sete %al
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 4
ret
so I wonder if you should investigate why the xor variant doesn't trigger
for you? On i?86 if-conversion probably solves your specific issue,
but I guess the initial expansion is where you could improve placement
of the 1 (after all, the 0 is after the jumps).
Richard.
> Cheers,
>
> --
> PMatos
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 11:40 ` Richard Guenther
@ 2011-08-10 11:42 ` Richard Guenther
2011-08-10 13:55 ` Paulo J. Matos
2011-08-10 13:46 ` Paulo J. Matos
1 sibling, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 11:42 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: gcc, Vladimir N. Makarov
On Wed, Aug 10, 2011 at 1:40 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
>> Hi,
>>
>> I am having a size optimisation issue with GCC-4.6.1.
>> The problem boils down to the fact that I have no idea on the best way to
>> hint to GCC that a given insn would make more sense someplace else.
>>
>> The C code is simple:
>> int16_t mask(uint32_t a)
>> {
>> return (x & a) == a;
>> }
>>
>> int16_t is QImode and uint32_t is HImode.
>> After combine the insn chain (which is unmodified all the way to ira) is (in
>> simplified form):
>> regQI 27 <- regQI AH [a]
>> regQI 28 <- regQI AL [a+1]
>> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
>> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
>> regQI 30 <- regQI AL
>> regQI 29 <- regQI AH
>> regQI 24 <- 1
>> if regQI 29 != regQI 27
>> goto labelref 20
>> if regQI 30 != regQI 28
>> goto labelref 20
>> goto labelref 22
>> labelref 20
>> regQI 24 <- 0
>> labelref 22
>> regQI AL <- regQI 24
>>
>> The problem resides in `regQI 24 <- 1' being before the jumps.
>> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which
>> creates loads of conflicts and reloads. If that same insn would be moved to
>> after the jumps and before the `goto labelref 22' then all would be fine
>> cause by then regs 27, 28, 29, 30 are dead.
>>
>> It's obviously hard to point to a solution but I was wondering if there's a
>> way to hint to GCC that moving an insn might help the code issue. Or if I
>> should look into a why an existing pass is not already doing that.
>
> On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0
> which is then if-converted. Modified testcase:
>
> long long x;
> _Bool __attribute__((regparm(2))) mask (long long a)
> {
> return (x & a) == a;
> }
>
> on i?86 gets you
>
> mask:
> .LFB0:
> .cfi_startproc
> pushl %ebx
> .cfi_def_cfa_offset 8
> .cfi_offset 3, -8
> movl %eax, %ebx
> andl x, %ebx
> movl %edx, %ecx
> andl x+4, %ecx
> xorl %ebx, %eax
> xorl %ecx, %edx
> orl %edx, %eax
> sete %al
> popl %ebx
> .cfi_restore 3
> .cfi_def_cfa_offset 4
> ret
>
> so I wonder if you should investigate why the xor variant doesn't trigger
> for you? On i?86 if-conversion probably solves your specific issue,
> but I guess the initial expansion is where you could improve placement
> of the 1 (after all, the 0 is after the jumps).
Oh, and I wonder if/why IRA can/does not rematerialize the constant
instead of spilling it. Might be a cost issue that it doesn't delay
allocating a reg for 1 as that is cheap to reload (is it?).
Richard.
> Richard.
>
>> Cheers,
>>
>> --
>> PMatos
>>
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 11:40 ` Richard Guenther
2011-08-10 11:42 ` Richard Guenther
@ 2011-08-10 13:46 ` Paulo J. Matos
2011-08-10 13:51 ` Richard Guenther
1 sibling, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 13:46 UTC (permalink / raw)
To: gcc
On 10/08/11 12:40, Richard Guenther wrote:
>
> On x86 we expand the code to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0
> which is then if-converted. Modified testcase:
>
> long long x;
> _Bool __attribute__((regparm(2))) mask (long long a)
> {
> return (x& a) == a;
> }
>
> on i?86 gets you
>
> mask:
> .LFB0:
> .cfi_startproc
> pushl %ebx
> .cfi_def_cfa_offset 8
> .cfi_offset 3, -8
> movl %eax, %ebx
> andl x, %ebx
> movl %edx, %ecx
> andl x+4, %ecx
> xorl %ebx, %eax
> xorl %ecx, %edx
> orl %edx, %eax
> sete %al
> popl %ebx
> .cfi_restore 3
> .cfi_def_cfa_offset 4
> ret
>
> so I wonder if you should investigate why the xor variant doesn't trigger
> for you?
I can reproduce this result in GCC 4.6.1 for x86.
I can't understand what you mean by this though. From inspecting the
logs it seems that the if-conversion is done manually at expand time.
The final pass before expand shows the original (x & a) == a, however,
after expand the rtl already contains xor, ior, etc. So I guess I would
need to do something similar in my backend. I can't however, find in the
i386(.md|.c) where this is actually happening.
> On i?86 if-conversion probably solves your specific issue,
> but I guess the initial expansion is where you could improve placement
> of the 1 (after all, the 0 is after the jumps).
>
This is happening on my own backend so I guess anything that is
implemented to do if-conversion on i386 needs to be implemented also on
my backend. Can you point me to the code on i386 so I can take a look at it?
Cheers,
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 13:46 ` Paulo J. Matos
@ 2011-08-10 13:51 ` Richard Guenther
2011-08-10 14:14 ` Paulo J. Matos
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 13:51 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: gcc
On Wed, Aug 10, 2011 at 3:46 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> On 10/08/11 12:40, Richard Guenther wrote:
>>
>> On x86 we expand the code to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0
>> which is then if-converted. Modified testcase:
>>
>> long long x;
>> _Bool __attribute__((regparm(2))) mask (long long a)
>> {
>> return (x& a) == a;
>> }
>>
>> on i?86 gets you
>>
>> mask:
>> .LFB0:
>> .cfi_startproc
>> pushl %ebx
>> .cfi_def_cfa_offset 8
>> .cfi_offset 3, -8
>> movl %eax, %ebx
>> andl x, %ebx
>> movl %edx, %ecx
>> andl x+4, %ecx
>> xorl %ebx, %eax
>> xorl %ecx, %edx
>> orl %edx, %eax
>> sete %al
>> popl %ebx
>> .cfi_restore 3
>> .cfi_def_cfa_offset 4
>> ret
>>
>> so I wonder if you should investigate why the xor variant doesn't trigger
>> for you?
>
> I can reproduce this result in GCC 4.6.1 for x86.
> I can't understand what you mean by this though. From inspecting the logs it
> seems that the if-conversion is done manually at expand time. The final pass
> before expand shows the original (x & a) == a, however, after expand the rtl
> already contains xor, ior, etc. So I guess I would need to do something
> similar in my backend. I can't however, find in the i386(.md|.c) where this
> is actually happening.
>
>> On i?86 if-conversion probably solves your specific issue,
>> but I guess the initial expansion is where you could improve placement
>> of the 1 (after all, the 0 is after the jumps).
>>
>
> This is happening on my own backend so I guess anything that is implemented
> to do if-conversion on i386 needs to be implemented also on my backend. Can
> you point me to the code on i386 so I can take a look at it?
I think it's all happening in generic code via do_store_flag.
Richard.
> Cheers,
>
> --
> PMatos
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 11:42 ` Richard Guenther
@ 2011-08-10 13:55 ` Paulo J. Matos
[not found] ` <4E431BD8.8060705@redhat.com>
0 siblings, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 13:55 UTC (permalink / raw)
To: gcc
On 10/08/11 12:42, Richard Guenther wrote:
>
> Oh, and I wonder if/why IRA can/does not rematerialize the constant
> instead of spilling it. Might be a cost issue that it doesn't delay
> allocating a reg for 1 as that is cheap to reload (is it?).
>
I would indeed expect IRA to move the constant assignment. However it
doesn't. The cost of a constant as per RTX_COSTS is 1 since it takes
exactly one instruction to actually do that (optimizing for size).
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-10 13:51 ` Richard Guenther
@ 2011-08-10 14:14 ` Paulo J. Matos
0 siblings, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 14:14 UTC (permalink / raw)
To: gcc
On 10/08/11 14:51, Richard Guenther wrote:
>
> I think it's all happening in generic code via do_store_flag.
>
ah, now I understand your previous question. I wonder if it's not
triggered because I don't have cstore<mode>4 defined. Might be that but
I have to look deeper.
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
[not found] ` <4E431BD8.8060705@redhat.com>
@ 2011-08-11 8:12 ` Paulo J. Matos
2011-08-11 8:49 ` Richard Guenther
2011-08-11 12:22 ` Paulo J. Matos
1 sibling, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-11 8:12 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc, Richard Guenther
On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> I can not reproduce the problem. It would be nice to give all info (the
> code without includes and all options). In this case I could have more info
> to say more definitely about the reason of the problem in IRA.
>
One of the issue with these problems of mine is that they are tied to
my backend, but not always. I think I managed to reproduce a similar
result in the avr backend using GCC4.6.1
test.c:
long long x;
_Bool mask (long long a)
{
return (x & a) == a;
}
$ avr-cc1 -Os test.c
This generates the following assembler:
mask:
push r13
push r14
push r15
push r16
push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 5 */
.L__stack_usage = 5
lds r14,x
and r14,r18
lds r15,x+1
and r15,r19
lds r16,x+2
and r16,r20
lds r17,x+3
and r17,r21
lds r27,x+4
and r27,r22
lds r26,x+5
and r26,r23
lds r31,x+6
and r31,r24
lds r30,x+7
and r30,r25
clr r13
inc r13
cp r14,r18
brne .L3
cp r15,r19
brne .L3
cp r16,r20
brne .L3
cp r17,r21
brne .L3
cp r27,r22
brne .L3
cp r26,r23
brne .L3
cp r31,r24
brne .L3
cpse r30,r25
.L3:
clr r13
.L2:
mov r24,r13
/* epilogue start */
pop r17
pop r16
pop r15
pop r14
pop r13
ret
.size mask, .-mask
.comm x,8,1
I can't tell how good or bad this assembler is but I note a couple of
similarities with my backends assembler output:
- It doesn't do if-conversion like Richard suggested. So (x & a) == a
is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0.
- The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.
The only assignment to r13 is as in my case after the jumps as 'clr
13' to set up the return value. I am not sure if this situation causes
a lot of register pressure, however I think it doesn't in avr but it
does in my backend. AVR has 32 registers to play with, mine can only
deal with 3 in the destination operand position.
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-11 8:12 ` Paulo J. Matos
@ 2011-08-11 8:49 ` Richard Guenther
2011-08-11 14:27 ` Vladimir Makarov
0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-11 8:49 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: Vladimir Makarov, gcc
On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> I can not reproduce the problem. It would be nice to give all info (the
>> code without includes and all options). In this case I could have more info
>> to say more definitely about the reason of the problem in IRA.
>>
>
> One of the issue with these problems of mine is that they are tied to
> my backend, but not always. I think I managed to reproduce a similar
> result in the avr backend using GCC4.6.1
>
> test.c:
> long long x;
> _Bool mask (long long a)
> {
> return (x & a) == a;
> }
>
> $ avr-cc1 -Os test.c
>
> This generates the following assembler:
> mask:
> push r13
> push r14
> push r15
> push r16
> push r17
> /* prologue: function */
> /* frame size = 0 */
> /* stack size = 5 */
> .L__stack_usage = 5
> lds r14,x
> and r14,r18
> lds r15,x+1
> and r15,r19
> lds r16,x+2
> and r16,r20
> lds r17,x+3
> and r17,r21
> lds r27,x+4
> and r27,r22
> lds r26,x+5
> and r26,r23
> lds r31,x+6
> and r31,r24
> lds r30,x+7
> and r30,r25
> clr r13
> inc r13
> cp r14,r18
> brne .L3
> cp r15,r19
> brne .L3
> cp r16,r20
> brne .L3
> cp r17,r21
> brne .L3
> cp r27,r22
> brne .L3
> cp r26,r23
> brne .L3
> cp r31,r24
> brne .L3
> cpse r30,r25
> .L3:
> clr r13
> .L2:
> mov r24,r13
> /* epilogue start */
> pop r17
> pop r16
> pop r15
> pop r14
> pop r13
> ret
> .size mask, .-mask
> .comm x,8,1
>
>
> I can't tell how good or bad this assembler is but I note a couple of
> similarities with my backends assembler output:
> - It doesn't do if-conversion like Richard suggested. So (x & a) == a
> is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0.
> - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.
>
> The only assignment to r13 is as in my case after the jumps as 'clr
> 13' to set up the return value. I am not sure if this situation causes
> a lot of register pressure, however I think it doesn't in avr but it
> does in my backend. AVR has 32 registers to play with, mine can only
> deal with 3 in the destination operand position.
What I was expecting IRA to do is
1) split live-range at kills, thus if a constant is assigned to a pseudo
then the constant has its own live-range
2) pseudos that are equal to a constant are assigned hard registers
last if re-materializing them during reload is cheaper than spilling them
I suspect that 1) is not happening, I hope that 2) would happen already.
Correct?
Richard.
> --
> PMatos
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
[not found] ` <4E431BD8.8060705@redhat.com>
2011-08-11 8:12 ` Paulo J. Matos
@ 2011-08-11 12:22 ` Paulo J. Matos
1 sibling, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-11 12:22 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc, Richard Guenther
On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> I can not reproduce the problem. It would be nice to give all info (the
> code without includes and all options). In this case I could have more info
> to say more definitely about the reason of the problem in IRA.
>
Let me add another example using the avr backend that produces really
strange code. The code has a similar nature:
_Bool simple(unsigned long *a, unsigned long *b) { return *a == *b; }
Generates the following assembler when compiled with -Os in gcc-4.6:
simple:
push r16
push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
mov r30,r24
mov r31,r25
ldi r24,lo8(1)
ld r16,Z
ldd r17,Z+1
ldd r18,Z+2
ldd r19,Z+3
mov r30,r22
mov r31,r23
ld r20,Z
ldd r21,Z+1
ldd r22,Z+2
ldd r23,Z+3
cp r16,r20
cpc r17,r21
cpc r18,r22
cpc r19,r23
breq .L2
ldi r24,lo8(0)
.L2:
/* epilogue start */
pop r17
pop r16
ret
Again here the placing of the return value is not very relevant
because I guess there's not much register pressure but when there is,
in my arch, the resulting code is increased by 5 words simply due to
the position of the constant assignment. In the above case, the
constant assignment is the 5th instruction, when it could be pretty
much closer to the end.
I am interested in knowing if this is indeed an IRA problem and I have
to wait for a fix, or if there's something that I need doing in the
backend to tell GCC to delay the constant assignment.
Cheers,
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-11 8:49 ` Richard Guenther
@ 2011-08-11 14:27 ` Vladimir Makarov
2011-08-12 10:01 ` Paulo J. Matos
0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2011-08-11 14:27 UTC (permalink / raw)
To: Richard Guenther; +Cc: Paulo J. Matos, gcc
On 08/11/2011 04:49 AM, Richard Guenther wrote:
> On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos<paulo@matos-sorge.com> wrote:
>> On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov<vmakarov@redhat.com> wrote:
>>> I can not reproduce the problem. It would be nice to give all info (the
>>> code without includes and all options). In this case I could have more info
>>> to say more definitely about the reason of the problem in IRA.
>>>
>> One of the issue with these problems of mine is that they are tied to
>> my backend, but not always. I think I managed to reproduce a similar
>> result in the avr backend using GCC4.6.1
>>
>> test.c:
>> long long x;
>> _Bool mask (long long a)
>> {
>> return (x& a) == a;
>> }
>>
>> $ avr-cc1 -Os test.c
>>
>> This generates the following assembler:
>> mask:
>> push r13
>> push r14
>> push r15
>> push r16
>> push r17
>> /* prologue: function */
>> /* frame size = 0 */
>> /* stack size = 5 */
>> .L__stack_usage = 5
>> lds r14,x
>> and r14,r18
>> lds r15,x+1
>> and r15,r19
>> lds r16,x+2
>> and r16,r20
>> lds r17,x+3
>> and r17,r21
>> lds r27,x+4
>> and r27,r22
>> lds r26,x+5
>> and r26,r23
>> lds r31,x+6
>> and r31,r24
>> lds r30,x+7
>> and r30,r25
>> clr r13
>> inc r13
>> cp r14,r18
>> brne .L3
>> cp r15,r19
>> brne .L3
>> cp r16,r20
>> brne .L3
>> cp r17,r21
>> brne .L3
>> cp r27,r22
>> brne .L3
>> cp r26,r23
>> brne .L3
>> cp r31,r24
>> brne .L3
>> cpse r30,r25
>> .L3:
>> clr r13
>> .L2:
>> mov r24,r13
>> /* epilogue start */
>> pop r17
>> pop r16
>> pop r15
>> pop r14
>> pop r13
>> ret
>> .size mask, .-mask
>> .comm x,8,1
>>
>>
>> I can't tell how good or bad this assembler is but I note a couple of
>> similarities with my backends assembler output:
>> - It doesn't do if-conversion like Richard suggested. So (x& a) == a
>> is not converted to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0.
>> - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.
>>
>> The only assignment to r13 is as in my case after the jumps as 'clr
>> 13' to set up the return value. I am not sure if this situation causes
>> a lot of register pressure, however I think it doesn't in avr but it
>> does in my backend. AVR has 32 registers to play with, mine can only
>> deal with 3 in the destination operand position.
> What I was expecting IRA to do is
>
> 1) split live-range at kills, thus if a constant is assigned to a pseudo
> then the constant has its own live-range
>
> 2) pseudos that are equal to a constant are assigned hard registers
> last if re-materializing them during reload is cheaper than spilling them
>
> I suspect that 1) is not happening, I hope that 2) would happen already.
>
> Correct?
>
Yes, that is mostly correct. The first could be done by -fweb (if
the live range where the pseudo is equal to the constant is disjoint).
The first could be done also by Jeff Law's project which can provide
splitting not only on the border of loops.
Some problems might be solved even in LRA (a new project I am working
on) which would spill the pseudo assigned to constant, assign the hard
registers to conflicting non-reload pseudos (spilled in IRA) and inherit
the hard register for the reload pseudos of the spilled pseudo (if insns
can not use the constant directly) achieving this way live range
splitting for the spilled pseudo. Reload pass can not do this because
it does not assign hard registers to pseudos spilled in IRA when a hard
register is freed by spilling a conflicting pseudo for reloads.
Actually the same problem exists in the old RA. IRA is different from
it mostly by:
o live range splitting at the most important program points (loop borders)
o better coloring
o better choosing hard registers
o better coalescing
o better communication with reload pass
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-11 14:27 ` Vladimir Makarov
@ 2011-08-12 10:01 ` Paulo J. Matos
2011-08-12 14:22 ` Vladimir Makarov
2011-08-12 16:12 ` Jeff Law
0 siblings, 2 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-12 10:01 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: Richard Guenther, gcc
On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> Yes, that is mostly correct. The first could be done by -fweb (if the live
> range where the pseudo is equal to the constant is disjoint). The first
> could be done also by Jeff Law's project which can provide splitting not
> only on the border of loops.
>
I was thinking that one possible solution in the short term would be
to add a new pass just before IRA which does constant assignment
moves. So, an insn where a register which is assigned a constant can
be moved as much as possible to the place right before the use of the
register or if there's no use of the register inside the current BB,
it can be moved as the last instruction of the BB.
What do you think about this? Would this work? I know it's not very
general, however, it's useful at least for my backend to get this
right as soon as possible due to several size test failures we have
which are a consequence of this problem.
Paulo Matos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-12 10:01 ` Paulo J. Matos
@ 2011-08-12 14:22 ` Vladimir Makarov
2011-08-12 15:06 ` Paulo J. Matos
2011-08-12 16:12 ` Jeff Law
1 sibling, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2011-08-12 14:22 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: Richard Guenther, gcc
On 08/12/2011 06:00 AM, Paulo J. Matos wrote:
> On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov<vmakarov@redhat.com> wrote:
>> Yes, that is mostly correct. The first could be done by -fweb (if the live
>> range where the pseudo is equal to the constant is disjoint). The first
>> could be done also by Jeff Law's project which can provide splitting not
>> only on the border of loops.
>>
> I was thinking that one possible solution in the short term would be
> to add a new pass just before IRA which does constant assignment
> moves. So, an insn where a register which is assigned a constant can
> be moved as much as possible to the place right before the use of the
> register or if there's no use of the register inside the current BB,
> it can be moved as the last instruction of the BB.
>
> What do you think about this? Would this work? I know it's not very
> general, however, it's useful at least for my backend to get this
> right as soon as possible due to several size test failures we have
> which are a consequence of this problem.
Sorry, Paulo. I don't think it is a good idea to have such a general
pass. A constant depending on its value could be prohibited to be used
in insn. Moving assignment to the constant most probably worsens insn
schedule on targets where the 1st insn scheduling is a default.
But moving the pass before 1st insn scheduling could work if register
pressure sensitive insn scheduling is used. Still it is too specialized
pass. I think register pressure relief as it is described in Simpson's
thesis would be a more general approach.
But to be honest, I think, the best solution would be in RA because it
is dealing with insn constraints and costs. I'll think about solving
this problem in RA.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-12 14:22 ` Vladimir Makarov
@ 2011-08-12 15:06 ` Paulo J. Matos
0 siblings, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-12 15:06 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: Richard Guenther, gcc
On Fri, Aug 12, 2011 at 3:21 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
> Sorry, Paulo. I don't think it is a good idea to have such a general pass.
Thanks for the observation and the points you made. I understand and
agree that this should be sorted at the IRA level. What I might do in
the meantime is to implement such a pass on my port of GCC until it is
sorted upstream.
>
> But to be honest, I think, the best solution would be in RA because it is
> dealing with insn constraints and costs. I'll think about solving this
> problem in RA.
>
Thanks! I will be eagerly waiting for an update.
Cheers,
--
PMatos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way
2011-08-12 10:01 ` Paulo J. Matos
2011-08-12 14:22 ` Vladimir Makarov
@ 2011-08-12 16:12 ` Jeff Law
1 sibling, 0 replies; 15+ messages in thread
From: Jeff Law @ 2011-08-12 16:12 UTC (permalink / raw)
To: Paulo J. Matos; +Cc: Vladimir Makarov, Richard Guenther, gcc
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 08/12/11 04:00, Paulo J. Matos wrote:
> On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov
> <vmakarov@redhat.com> wrote:
>> Yes, that is mostly correct. The first could be done by -fweb (if
>> the live range where the pseudo is equal to the constant is
>> disjoint). The first could be done also by Jeff Law's project
>> which can provide splitting not only on the border of loops.
>>
>
> I was thinking that one possible solution in the short term would be
> to add a new pass just before IRA which does constant assignment
> moves. So, an insn where a register which is assigned a constant can
> be moved as much as possible to the place right before the use of
> the register or if there's no use of the register inside the current
> BB, it can be moved as the last instruction of the BB.
I thought we already had code to do this in response to a pseudo not
getting a hard reg and the pseudo has an appropriate REG_EQUIV note on
its assignment insn.
jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJORVC7AAoJEBRtltQi2kC7zKAH/iesdm/aO4C9OQMfwGgZ6Xht
PvStoxWtOUo5JuafwYlRppi67rld+PBnBz65te6TauSwSA6WwJ4tzfSAQ3IyLYZB
/k8lgfCRt46XGwAHr3Zva5nOC/zfYyyoMRz8Z6XpfoGsnwt4Zq3Kej9iOXFQZleo
PWMHKiQDi6MxBPjGChTME0ct/yvClG/qb5WjbqPeLCLXnn0/VYmvicajvNi/Dscf
fYtS8wpw8Kh5ylQ0mKmTsRWRQvXZZToDazqngFsOSxVe+Yoxssk40A9UIJrB1xAP
Z7kuJ/i8hHYQkeeHaFd84wKrgHl4Ciue/58K2Nm6H7T93hJstt6eT/rLNCsuV2g=
=mG8D
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-12 16:12 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 11:20 Move insn out of the way Paulo J. Matos
2011-08-10 11:40 ` Richard Guenther
2011-08-10 11:42 ` Richard Guenther
2011-08-10 13:55 ` Paulo J. Matos
[not found] ` <4E431BD8.8060705@redhat.com>
2011-08-11 8:12 ` Paulo J. Matos
2011-08-11 8:49 ` Richard Guenther
2011-08-11 14:27 ` Vladimir Makarov
2011-08-12 10:01 ` Paulo J. Matos
2011-08-12 14:22 ` Vladimir Makarov
2011-08-12 15:06 ` Paulo J. Matos
2011-08-12 16:12 ` Jeff Law
2011-08-11 12:22 ` Paulo J. Matos
2011-08-10 13:46 ` Paulo J. Matos
2011-08-10 13:51 ` Richard Guenther
2011-08-10 14:14 ` Paulo J. Matos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).