Move insn out of the way

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Move insn out of the way
@ 2011-08-10 11:20 Paulo J. Matos
  2011-08-10 11:40 ` Richard Guenther
  0 siblings, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 11:20 UTC (permalink / raw)
  To: gcc

Hi,

I am having a size optimisation issue with GCC-4.6.1.
The problem boils down to the fact that I have no idea on the best way 
to hint to GCC that a given insn would make more sense someplace else.

The C code is simple:
int16_t mask(uint32_t a)
{
     return (x & a) == a;
}

int16_t is QImode and uint32_t is HImode.
After combine the insn chain (which is unmodified all the way to ira) is 
(in simplified form):
regQI 27 <- regQI AH [a]
regQI 28 <- regQI AL [a+1]
regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
regQI 30 <- regQI AL
regQI 29 <- regQI AH
regQI 24 <- 1
if regQI 29 != regQI 27
    goto labelref 20
if regQI 30 != regQI 28
    goto labelref 20
goto labelref 22
labelref 20
regQI 24 <- 0
labelref 22
regQI AL <- regQI 24

The problem resides in `regQI 24 <- 1' being before the jumps.
Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, 
which creates loads of conflicts and reloads. If that same insn would be 
moved to after the jumps and before the `goto labelref 22' then all 
would be fine cause by then regs 27, 28, 29, 30 are dead.

It's obviously hard to point to a solution but I was wondering if 
there's a way to hint to GCC that moving an insn might help the code 
issue. Or if I should look into a why an existing pass is not already 
doing that.

Cheers,

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 11:20 Move insn out of the way Paulo J. Matos
@ 2011-08-10 11:40 ` Richard Guenther
  2011-08-10 11:42   ` Richard Guenther
  2011-08-10 13:46   ` Paulo J. Matos
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 11:40 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: gcc

On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> Hi,
>
> I am having a size optimisation issue with GCC-4.6.1.
> The problem boils down to the fact that I have no idea on the best way to
> hint to GCC that a given insn would make more sense someplace else.
>
> The C code is simple:
> int16_t mask(uint32_t a)
> {
>    return (x & a) == a;
> }
>
> int16_t is QImode and uint32_t is HImode.
> After combine the insn chain (which is unmodified all the way to ira) is (in
> simplified form):
> regQI 27 <- regQI AH [a]
> regQI 28 <- regQI AL [a+1]
> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
> regQI 30 <- regQI AL
> regQI 29 <- regQI AH
> regQI 24 <- 1
> if regQI 29 != regQI 27
>   goto labelref 20
> if regQI 30 != regQI 28
>   goto labelref 20
> goto labelref 22
> labelref 20
> regQI 24 <- 0
> labelref 22
> regQI AL <- regQI 24
>
> The problem resides in `regQI 24 <- 1' being before the jumps.
> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which
> creates loads of conflicts and reloads. If that same insn would be moved to
> after the jumps and before the `goto labelref 22' then all would be fine
> cause by then regs 27, 28, 29, 30 are dead.
>
> It's obviously hard to point to a solution but I was wondering if there's a
> way to hint to GCC that moving an insn might help the code issue. Or if I
> should look into a why an existing pass is not already doing that.

On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0
which is then if-converted.  Modified testcase:

long long x;
_Bool __attribute__((regparm(2))) mask (long long a)
{
  return (x & a) == a;
}

on i?86 gets you

mask:
.LFB0:
        .cfi_startproc
        pushl   %ebx
        .cfi_def_cfa_offset 8
        .cfi_offset 3, -8
        movl    %eax, %ebx
        andl    x, %ebx
        movl    %edx, %ecx
        andl    x+4, %ecx
        xorl    %ebx, %eax
        xorl    %ecx, %edx
        orl     %edx, %eax
        sete    %al
        popl    %ebx
        .cfi_restore 3
        .cfi_def_cfa_offset 4
        ret

so I wonder if you should investigate why the xor variant doesn't trigger
for you?  On i?86 if-conversion probably solves your specific issue,
but I guess the initial expansion is where you could improve placement
of the 1 (after all, the 0 is after the jumps).

Richard.

> Cheers,
>
> --
> PMatos
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 11:40 ` Richard Guenther
@ 2011-08-10 11:42   ` Richard Guenther
  2011-08-10 13:55     ` Paulo J. Matos
  2011-08-10 13:46   ` Paulo J. Matos
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 11:42 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: gcc, Vladimir N. Makarov

On Wed, Aug 10, 2011 at 1:40 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
>> Hi,
>>
>> I am having a size optimisation issue with GCC-4.6.1.
>> The problem boils down to the fact that I have no idea on the best way to
>> hint to GCC that a given insn would make more sense someplace else.
>>
>> The C code is simple:
>> int16_t mask(uint32_t a)
>> {
>>    return (x & a) == a;
>> }
>>
>> int16_t is QImode and uint32_t is HImode.
>> After combine the insn chain (which is unmodified all the way to ira) is (in
>> simplified form):
>> regQI 27 <- regQI AH [a]
>> regQI 28 <- regQI AL [a+1]
>> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1))
>> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x))
>> regQI 30 <- regQI AL
>> regQI 29 <- regQI AH
>> regQI 24 <- 1
>> if regQI 29 != regQI 27
>>   goto labelref 20
>> if regQI 30 != regQI 28
>>   goto labelref 20
>> goto labelref 22
>> labelref 20
>> regQI 24 <- 0
>> labelref 22
>> regQI AL <- regQI 24
>>
>> The problem resides in `regQI 24 <- 1' being before the jumps.
>> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which
>> creates loads of conflicts and reloads. If that same insn would be moved to
>> after the jumps and before the `goto labelref 22' then all would be fine
>> cause by then regs 27, 28, 29, 30 are dead.
>>
>> It's obviously hard to point to a solution but I was wondering if there's a
>> way to hint to GCC that moving an insn might help the code issue. Or if I
>> should look into a why an existing pass is not already doing that.
>
> On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0
> which is then if-converted.  Modified testcase:
>
> long long x;
> _Bool __attribute__((regparm(2))) mask (long long a)
> {
>  return (x & a) == a;
> }
>
> on i?86 gets you
>
> mask:
> .LFB0:
>        .cfi_startproc
>        pushl   %ebx
>        .cfi_def_cfa_offset 8
>        .cfi_offset 3, -8
>        movl    %eax, %ebx
>        andl    x, %ebx
>        movl    %edx, %ecx
>        andl    x+4, %ecx
>        xorl    %ebx, %eax
>        xorl    %ecx, %edx
>        orl     %edx, %eax
>        sete    %al
>        popl    %ebx
>        .cfi_restore 3
>        .cfi_def_cfa_offset 4
>        ret
>
> so I wonder if you should investigate why the xor variant doesn't trigger
> for you?  On i?86 if-conversion probably solves your specific issue,
> but I guess the initial expansion is where you could improve placement
> of the 1 (after all, the 0 is after the jumps).

Oh, and I wonder if/why IRA can/does not rematerialize the constant
instead of spilling it.  Might be a cost issue that it doesn't delay
allocating a reg for 1 as that is cheap to reload (is it?).

Richard.

> Richard.
>
>> Cheers,
>>
>> --
>> PMatos
>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 11:42   ` Richard Guenther
@ 2011-08-10 13:55     ` Paulo J. Matos
       [not found]       ` <4E431BD8.8060705@redhat.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 13:55 UTC (permalink / raw)
  To: gcc

On 10/08/11 12:42, Richard Guenther wrote:
>
> Oh, and I wonder if/why IRA can/does not rematerialize the constant
> instead of spilling it.  Might be a cost issue that it doesn't delay
> allocating a reg for 1 as that is cheap to reload (is it?).
>

I would indeed expect IRA to move the constant assignment. However it 
doesn't. The cost of a constant as per RTX_COSTS is 1 since it takes 
exactly one instruction to actually do that (optimizing for size).

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <4E431BD8.8060705@redhat.com>]

* Re: Move insn out of the way
       [not found]       ` <4E431BD8.8060705@redhat.com>
@ 2011-08-11  8:12         ` Paulo J. Matos
  2011-08-11  8:49           ` Richard Guenther
  2011-08-11 12:22         ` Paulo J. Matos
  1 sibling, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-11  8:12 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc, Richard Guenther

On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> I can not reproduce the problem.  It would be nice to give all info (the
> code without includes and all options).  In this case I could have more info
> to say more definitely about the reason of the problem in IRA.
>

One of the issue with these problems of mine is that they are tied to
my backend, but not always. I think I managed to reproduce a similar
result in the avr backend using GCC4.6.1

test.c:
long long x;
_Bool mask (long long a)
{
  return (x & a) == a;
}

$ avr-cc1 -Os test.c

This generates the following assembler:
mask:
        push r13
        push r14
        push r15
        push r16
        push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 5 */
.L__stack_usage = 5
        lds r14,x
        and r14,r18
        lds r15,x+1
        and r15,r19
        lds r16,x+2
        and r16,r20
        lds r17,x+3
        and r17,r21
        lds r27,x+4
        and r27,r22
        lds r26,x+5
        and r26,r23
        lds r31,x+6
        and r31,r24
        lds r30,x+7
        and r30,r25
        clr r13
        inc r13
        cp r14,r18
        brne .L3
        cp r15,r19
        brne .L3
        cp r16,r20
        brne .L3
        cp r17,r21
        brne .L3
        cp r27,r22
        brne .L3
        cp r26,r23
        brne .L3
        cp r31,r24
        brne .L3
        cpse r30,r25
.L3:
        clr r13
.L2:
        mov r24,r13
/* epilogue start */
        pop r17
        pop r16
        pop r15
        pop r14
        pop r13
        ret
        .size   mask, .-mask
        .comm x,8,1


I can't tell how good or bad this assembler is but I note a couple of
similarities with my backends assembler output:
- It doesn't do if-conversion like Richard suggested. So (x & a) == a
is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0.
- The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.

The only assignment to r13 is as in my case after the jumps as 'clr
13' to set up the return value. I am not sure if this situation causes
a lot of register pressure, however I think it doesn't in avr but it
does in my backend. AVR has 32 registers to play with, mine can only
deal with 3 in the destination operand position.

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-11  8:12         ` Paulo J. Matos
@ 2011-08-11  8:49           ` Richard Guenther
  2011-08-11 14:27             ` Vladimir Makarov
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-11  8:49 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: Vladimir Makarov, gcc

On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>> I can not reproduce the problem.  It would be nice to give all info (the
>> code without includes and all options).  In this case I could have more info
>> to say more definitely about the reason of the problem in IRA.
>>
>
> One of the issue with these problems of mine is that they are tied to
> my backend, but not always. I think I managed to reproduce a similar
> result in the avr backend using GCC4.6.1
>
> test.c:
> long long x;
> _Bool mask (long long a)
> {
>  return (x & a) == a;
> }
>
> $ avr-cc1 -Os test.c
>
> This generates the following assembler:
> mask:
>        push r13
>        push r14
>        push r15
>        push r16
>        push r17
> /* prologue: function */
> /* frame size = 0 */
> /* stack size = 5 */
> .L__stack_usage = 5
>        lds r14,x
>        and r14,r18
>        lds r15,x+1
>        and r15,r19
>        lds r16,x+2
>        and r16,r20
>        lds r17,x+3
>        and r17,r21
>        lds r27,x+4
>        and r27,r22
>        lds r26,x+5
>        and r26,r23
>        lds r31,x+6
>        and r31,r24
>        lds r30,x+7
>        and r30,r25
>        clr r13
>        inc r13
>        cp r14,r18
>        brne .L3
>        cp r15,r19
>        brne .L3
>        cp r16,r20
>        brne .L3
>        cp r17,r21
>        brne .L3
>        cp r27,r22
>        brne .L3
>        cp r26,r23
>        brne .L3
>        cp r31,r24
>        brne .L3
>        cpse r30,r25
> .L3:
>        clr r13
> .L2:
>        mov r24,r13
> /* epilogue start */
>        pop r17
>        pop r16
>        pop r15
>        pop r14
>        pop r13
>        ret
>        .size   mask, .-mask
>        .comm x,8,1
>
>
> I can't tell how good or bad this assembler is but I note a couple of
> similarities with my backends assembler output:
> - It doesn't do if-conversion like Richard suggested. So (x & a) == a
> is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0.
> - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.
>
> The only assignment to r13 is as in my case after the jumps as 'clr
> 13' to set up the return value. I am not sure if this situation causes
> a lot of register pressure, however I think it doesn't in avr but it
> does in my backend. AVR has 32 registers to play with, mine can only
> deal with 3 in the destination operand position.

What I was expecting IRA to do is

 1) split live-range at kills, thus if a constant is assigned to a pseudo
 then the constant has its own live-range

 2) pseudos that are equal to a constant are assigned hard registers
 last if re-materializing them during reload is cheaper than spilling them

I suspect that 1) is not happening, I hope that 2) would happen already.

Correct?

Richard.

> --
> PMatos
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-11  8:49           ` Richard Guenther
@ 2011-08-11 14:27             ` Vladimir Makarov
  2011-08-12 10:01               ` Paulo J. Matos
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2011-08-11 14:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Paulo J. Matos, gcc

On 08/11/2011 04:49 AM, Richard Guenther wrote:
> On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos<paulo@matos-sorge.com>  wrote:
>> On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov<vmakarov@redhat.com>  wrote:
>>> I can not reproduce the problem.  It would be nice to give all info (the
>>> code without includes and all options).  In this case I could have more info
>>> to say more definitely about the reason of the problem in IRA.
>>>
>> One of the issue with these problems of mine is that they are tied to
>> my backend, but not always. I think I managed to reproduce a similar
>> result in the avr backend using GCC4.6.1
>>
>> test.c:
>> long long x;
>> _Bool mask (long long a)
>> {
>>   return (x&  a) == a;
>> }
>>
>> $ avr-cc1 -Os test.c
>>
>> This generates the following assembler:
>> mask:
>>         push r13
>>         push r14
>>         push r15
>>         push r16
>>         push r17
>> /* prologue: function */
>> /* frame size = 0 */
>> /* stack size = 5 */
>> .L__stack_usage = 5
>>         lds r14,x
>>         and r14,r18
>>         lds r15,x+1
>>         and r15,r19
>>         lds r16,x+2
>>         and r16,r20
>>         lds r17,x+3
>>         and r17,r21
>>         lds r27,x+4
>>         and r27,r22
>>         lds r26,x+5
>>         and r26,r23
>>         lds r31,x+6
>>         and r31,r24
>>         lds r30,x+7
>>         and r30,r25
>>         clr r13
>>         inc r13
>>         cp r14,r18
>>         brne .L3
>>         cp r15,r19
>>         brne .L3
>>         cp r16,r20
>>         brne .L3
>>         cp r17,r21
>>         brne .L3
>>         cp r27,r22
>>         brne .L3
>>         cp r26,r23
>>         brne .L3
>>         cp r31,r24
>>         brne .L3
>>         cpse r30,r25
>> .L3:
>>         clr r13
>> .L2:
>>         mov r24,r13
>> /* epilogue start */
>>         pop r17
>>         pop r16
>>         pop r15
>>         pop r14
>>         pop r13
>>         ret
>>         .size   mask, .-mask
>>         .comm x,8,1
>>
>>
>> I can't tell how good or bad this assembler is but I note a couple of
>> similarities with my backends assembler output:
>> - It doesn't do if-conversion like Richard suggested. So (x&  a) == a
>> is not converted to ((xl&  al) ^ al) | ((xh&  ah) ^ ah) == 0.
>> - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps.
>>
>> The only assignment to r13 is as in my case after the jumps as 'clr
>> 13' to set up the return value. I am not sure if this situation causes
>> a lot of register pressure, however I think it doesn't in avr but it
>> does in my backend. AVR has 32 registers to play with, mine can only
>> deal with 3 in the destination operand position.
> What I was expecting IRA to do is
>
>   1) split live-range at kills, thus if a constant is assigned to a pseudo
>   then the constant has its own live-range
>
>   2) pseudos that are equal to a constant are assigned hard registers
>   last if re-materializing them during reload is cheaper than spilling them
>
> I suspect that 1) is not happening, I hope that 2) would happen already.
>
> Correct?
>
   Yes, that is mostly correct.  The first could be done by -fweb (if 
the live range where the pseudo is equal to the constant is disjoint).  
The first could be done also by Jeff Law's project which can provide 
splitting not only on the border of loops.

   Some problems might be solved even in LRA (a new project I am working 
on) which would spill the pseudo assigned to constant, assign the hard 
registers to conflicting non-reload pseudos (spilled in IRA) and inherit 
the hard register for the reload pseudos of the spilled pseudo (if insns 
can not use the constant directly) achieving this way live range 
splitting for the spilled pseudo.  Reload pass can not do this because 
it does not assign hard registers to pseudos spilled in IRA when a hard 
register is freed by spilling a conflicting pseudo for reloads.

Actually the same problem exists in the old RA.  IRA is different from 
it mostly by:

o live range splitting at the most important program points (loop borders)
o better coloring
o better choosing hard registers
o better coalescing
o better communication with reload pass

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-11 14:27             ` Vladimir Makarov
@ 2011-08-12 10:01               ` Paulo J. Matos
  2011-08-12 14:22                 ` Vladimir Makarov
  2011-08-12 16:12                 ` Jeff Law
  0 siblings, 2 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-12 10:01 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Richard Guenther, gcc

On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>  Yes, that is mostly correct.  The first could be done by -fweb (if the live
> range where the pseudo is equal to the constant is disjoint).  The first
> could be done also by Jeff Law's project which can provide splitting not
> only on the border of loops.
>

I was thinking that one possible solution in the short term would be
to add a new pass just before IRA which does constant assignment
moves. So, an insn where a register which is assigned a constant can
be moved as much as possible to the place right before the use of the
register or if there's no use of the register inside the current BB,
it can be moved as the last instruction of the BB.

What do you think about this? Would this work? I know it's not very
general, however, it's useful at least for my backend to get this
right as soon as possible due to several size test failures we have
which are a consequence of this problem.

Paulo Matos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-12 10:01               ` Paulo J. Matos
@ 2011-08-12 14:22                 ` Vladimir Makarov
  2011-08-12 15:06                   ` Paulo J. Matos
  2011-08-12 16:12                 ` Jeff Law
  1 sibling, 1 reply; 15+ messages in thread
From: Vladimir Makarov @ 2011-08-12 14:22 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: Richard Guenther, gcc

On 08/12/2011 06:00 AM, Paulo J. Matos wrote:
> On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov<vmakarov@redhat.com>  wrote:
>>   Yes, that is mostly correct.  The first could be done by -fweb (if the live
>> range where the pseudo is equal to the constant is disjoint).  The first
>> could be done also by Jeff Law's project which can provide splitting not
>> only on the border of loops.
>>
> I was thinking that one possible solution in the short term would be
> to add a new pass just before IRA which does constant assignment
> moves. So, an insn where a register which is assigned a constant can
> be moved as much as possible to the place right before the use of the
> register or if there's no use of the register inside the current BB,
> it can be moved as the last instruction of the BB.
>
> What do you think about this? Would this work? I know it's not very
> general, however, it's useful at least for my backend to get this
> right as soon as possible due to several size test failures we have
> which are a consequence of this problem.
Sorry, Paulo.  I don't think it is a good idea to have such a general 
pass.  A constant depending on its value could be prohibited to be used 
in insn.  Moving assignment to the constant most probably worsens insn 
schedule on targets where the 1st insn scheduling is a default.

But moving the pass before 1st insn scheduling could work if register 
pressure sensitive insn scheduling is used. Still it is too specialized 
pass.  I think register pressure relief as it is described in Simpson's 
thesis would be a more general approach.

But to be honest, I think, the best solution would be in RA because it 
is dealing with insn constraints and costs.  I'll think about solving 
this problem in RA.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-12 14:22                 ` Vladimir Makarov
@ 2011-08-12 15:06                   ` Paulo J. Matos
  0 siblings, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-12 15:06 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: Richard Guenther, gcc

On Fri, Aug 12, 2011 at 3:21 PM, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
> Sorry, Paulo.  I don't think it is a good idea to have such a general pass.

Thanks for the observation and the points you made. I understand and
agree that this should be sorted at the IRA level. What I might do in
the meantime is to implement such a pass on my port of GCC until it is
sorted upstream.

>
> But to be honest, I think, the best solution would be in RA because it is
> dealing with insn constraints and costs.  I'll think about solving this
> problem in RA.
>

Thanks! I will be eagerly waiting for an update.

Cheers,

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-12 10:01               ` Paulo J. Matos
  2011-08-12 14:22                 ` Vladimir Makarov
@ 2011-08-12 16:12                 ` Jeff Law
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff Law @ 2011-08-12 16:12 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: Vladimir Makarov, Richard Guenther, gcc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/12/11 04:00, Paulo J. Matos wrote:
> On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov
> <vmakarov@redhat.com> wrote:
>> Yes, that is mostly correct.  The first could be done by -fweb (if
>> the live range where the pseudo is equal to the constant is
>> disjoint).  The first could be done also by Jeff Law's project
>> which can provide splitting not only on the border of loops.
>> 
> 
> I was thinking that one possible solution in the short term would be 
> to add a new pass just before IRA which does constant assignment 
> moves. So, an insn where a register which is assigned a constant can 
> be moved as much as possible to the place right before the use of
> the register or if there's no use of the register inside the current
> BB, it can be moved as the last instruction of the BB.
I thought we already had code to do this in response to a pseudo not
getting a hard reg and the pseudo has an appropriate REG_EQUIV note on
its assignment insn.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJORVC7AAoJEBRtltQi2kC7zKAH/iesdm/aO4C9OQMfwGgZ6Xht
PvStoxWtOUo5JuafwYlRppi67rld+PBnBz65te6TauSwSA6WwJ4tzfSAQ3IyLYZB
/k8lgfCRt46XGwAHr3Zva5nOC/zfYyyoMRz8Z6XpfoGsnwt4Zq3Kej9iOXFQZleo
PWMHKiQDi6MxBPjGChTME0ct/yvClG/qb5WjbqPeLCLXnn0/VYmvicajvNi/Dscf
fYtS8wpw8Kh5ylQ0mKmTsRWRQvXZZToDazqngFsOSxVe+Yoxssk40A9UIJrB1xAP
Z7kuJ/i8hHYQkeeHaFd84wKrgHl4Ciue/58K2Nm6H7T93hJstt6eT/rLNCsuV2g=
=mG8D
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
       [not found]       ` <4E431BD8.8060705@redhat.com>
  2011-08-11  8:12         ` Paulo J. Matos
@ 2011-08-11 12:22         ` Paulo J. Matos
  1 sibling, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-11 12:22 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc, Richard Guenther

On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote:
> I can not reproduce the problem.  It would be nice to give all info (the
> code without includes and all options).  In this case I could have more info
> to say more definitely about the reason of the problem in IRA.
>

Let me add another example using the avr backend that produces really
strange code. The code has a similar nature:
_Bool simple(unsigned long *a, unsigned long *b) { return *a == *b; }

Generates the following assembler when compiled with -Os in gcc-4.6:
simple:
        push r16
        push r17
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
        mov r30,r24
        mov r31,r25
        ldi r24,lo8(1)
        ld r16,Z
        ldd r17,Z+1
        ldd r18,Z+2
        ldd r19,Z+3
        mov r30,r22
        mov r31,r23
        ld r20,Z
        ldd r21,Z+1
        ldd r22,Z+2
        ldd r23,Z+3
        cp r16,r20
        cpc r17,r21
        cpc r18,r22
        cpc r19,r23
        breq .L2
        ldi r24,lo8(0)
.L2:
/* epilogue start */
        pop r17
        pop r16
        ret

Again here the placing of the return value is not very relevant
because I guess there's not much register pressure but when there is,
in my arch, the resulting code is increased by 5 words simply due to
the position of the constant assignment. In the above case, the
constant assignment is the 5th instruction, when it could be pretty
much closer to the end.

I am interested in knowing if this is indeed an IRA problem and I have
to wait for a fix, or if there's something that I need doing in the
backend to tell GCC to delay the constant assignment.

Cheers,

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 11:40 ` Richard Guenther
  2011-08-10 11:42   ` Richard Guenther
@ 2011-08-10 13:46   ` Paulo J. Matos
  2011-08-10 13:51     ` Richard Guenther
  1 sibling, 1 reply; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 13:46 UTC (permalink / raw)
  To: gcc

On 10/08/11 12:40, Richard Guenther wrote:
>
> On x86 we expand the code to ((xl&  al) ^ al) | ((xh&  ah) ^ ah) == 0
> which is then if-converted.  Modified testcase:
>
> long long x;
> _Bool __attribute__((regparm(2))) mask (long long a)
> {
>    return (x&  a) == a;
> }
>
> on i?86 gets you
>
> mask:
> .LFB0:
>          .cfi_startproc
>          pushl   %ebx
>          .cfi_def_cfa_offset 8
>          .cfi_offset 3, -8
>          movl    %eax, %ebx
>          andl    x, %ebx
>          movl    %edx, %ecx
>          andl    x+4, %ecx
>          xorl    %ebx, %eax
>          xorl    %ecx, %edx
>          orl     %edx, %eax
>          sete    %al
>          popl    %ebx
>          .cfi_restore 3
>          .cfi_def_cfa_offset 4
>          ret
>
> so I wonder if you should investigate why the xor variant doesn't trigger
> for you?

I can reproduce this result in GCC 4.6.1 for x86.
I can't understand what you mean by this though. From inspecting the 
logs it seems that the if-conversion is done manually at expand time. 
The final pass before expand shows the original (x & a) == a, however, 
after expand the rtl already contains xor, ior, etc. So I guess I would 
need to do something similar in my backend. I can't however, find in the 
i386(.md|.c) where this is actually happening.

> On i?86 if-conversion probably solves your specific issue,
> but I guess the initial expansion is where you could improve placement
> of the 1 (after all, the 0 is after the jumps).
>

This is happening on my own backend so I guess anything that is 
implemented to do if-conversion on i386 needs to be implemented also on 
my backend. Can you point me to the code on i386 so I can take a look at it?

Cheers,

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 13:46   ` Paulo J. Matos
@ 2011-08-10 13:51     ` Richard Guenther
  2011-08-10 14:14       ` Paulo J. Matos
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Guenther @ 2011-08-10 13:51 UTC (permalink / raw)
  To: Paulo J. Matos; +Cc: gcc

On Wed, Aug 10, 2011 at 3:46 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote:
> On 10/08/11 12:40, Richard Guenther wrote:
>>
>> On x86 we expand the code to ((xl&  al) ^ al) | ((xh&  ah) ^ ah) == 0
>> which is then if-converted.  Modified testcase:
>>
>> long long x;
>> _Bool __attribute__((regparm(2))) mask (long long a)
>> {
>>   return (x&  a) == a;
>> }
>>
>> on i?86 gets you
>>
>> mask:
>> .LFB0:
>>         .cfi_startproc
>>         pushl   %ebx
>>         .cfi_def_cfa_offset 8
>>         .cfi_offset 3, -8
>>         movl    %eax, %ebx
>>         andl    x, %ebx
>>         movl    %edx, %ecx
>>         andl    x+4, %ecx
>>         xorl    %ebx, %eax
>>         xorl    %ecx, %edx
>>         orl     %edx, %eax
>>         sete    %al
>>         popl    %ebx
>>         .cfi_restore 3
>>         .cfi_def_cfa_offset 4
>>         ret
>>
>> so I wonder if you should investigate why the xor variant doesn't trigger
>> for you?
>
> I can reproduce this result in GCC 4.6.1 for x86.
> I can't understand what you mean by this though. From inspecting the logs it
> seems that the if-conversion is done manually at expand time. The final pass
> before expand shows the original (x & a) == a, however, after expand the rtl
> already contains xor, ior, etc. So I guess I would need to do something
> similar in my backend. I can't however, find in the i386(.md|.c) where this
> is actually happening.
>
>> On i?86 if-conversion probably solves your specific issue,
>> but I guess the initial expansion is where you could improve placement
>> of the 1 (after all, the 0 is after the jumps).
>>
>
> This is happening on my own backend so I guess anything that is implemented
> to do if-conversion on i386 needs to be implemented also on my backend. Can
> you point me to the code on i386 so I can take a look at it?

I think it's all happening in generic code via do_store_flag.

Richard.

> Cheers,
>
> --
> PMatos
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Move insn out of the way
  2011-08-10 13:51     ` Richard Guenther
@ 2011-08-10 14:14       ` Paulo J. Matos
  0 siblings, 0 replies; 15+ messages in thread
From: Paulo J. Matos @ 2011-08-10 14:14 UTC (permalink / raw)
  To: gcc

On 10/08/11 14:51, Richard Guenther wrote:
>
> I think it's all happening in generic code via do_store_flag.
>

ah, now I understand your previous question. I wonder if it's not 
triggered because I don't have cstore<mode>4 defined. Might be that but 
I have to look deeper.

-- 
PMatos

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-08-12 16:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-10 11:20 Move insn out of the way Paulo J. Matos
2011-08-10 11:40 ` Richard Guenther
2011-08-10 11:42   ` Richard Guenther
2011-08-10 13:55     ` Paulo J. Matos
     [not found]       ` <4E431BD8.8060705@redhat.com>
2011-08-11  8:12         ` Paulo J. Matos
2011-08-11  8:49           ` Richard Guenther
2011-08-11 14:27             ` Vladimir Makarov
2011-08-12 10:01               ` Paulo J. Matos
2011-08-12 14:22                 ` Vladimir Makarov
2011-08-12 15:06                   ` Paulo J. Matos
2011-08-12 16:12                 ` Jeff Law
2011-08-11 12:22         ` Paulo J. Matos
2011-08-10 13:46   ` Paulo J. Matos
2011-08-10 13:51     ` Richard Guenther
2011-08-10 14:14       ` Paulo J. Matos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).