* Move insn out of the way @ 2011-08-10 11:20 Paulo J. Matos 2011-08-10 11:40 ` Richard Guenther 0 siblings, 1 reply; 15+ messages in thread From: Paulo J. Matos @ 2011-08-10 11:20 UTC (permalink / raw) To: gcc Hi, I am having a size optimisation issue with GCC-4.6.1. The problem boils down to the fact that I have no idea on the best way to hint to GCC that a given insn would make more sense someplace else. The C code is simple: int16_t mask(uint32_t a) { return (x & a) == a; } int16_t is QImode and uint32_t is HImode. After combine the insn chain (which is unmodified all the way to ira) is (in simplified form): regQI 27 <- regQI AH [a] regQI 28 <- regQI AL [a+1] regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1)) regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x)) regQI 30 <- regQI AL regQI 29 <- regQI AH regQI 24 <- 1 if regQI 29 != regQI 27 goto labelref 20 if regQI 30 != regQI 28 goto labelref 20 goto labelref 22 labelref 20 regQI 24 <- 0 labelref 22 regQI AL <- regQI 24 The problem resides in `regQI 24 <- 1' being before the jumps. Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which creates loads of conflicts and reloads. If that same insn would be moved to after the jumps and before the `goto labelref 22' then all would be fine cause by then regs 27, 28, 29, 30 are dead. It's obviously hard to point to a solution but I was wondering if there's a way to hint to GCC that moving an insn might help the code issue. Or if I should look into a why an existing pass is not already doing that. Cheers, -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 11:20 Move insn out of the way Paulo J. Matos @ 2011-08-10 11:40 ` Richard Guenther 2011-08-10 11:42 ` Richard Guenther 2011-08-10 13:46 ` Paulo J. Matos 0 siblings, 2 replies; 15+ messages in thread From: Richard Guenther @ 2011-08-10 11:40 UTC (permalink / raw) To: Paulo J. Matos; +Cc: gcc On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote: > Hi, > > I am having a size optimisation issue with GCC-4.6.1. > The problem boils down to the fact that I have no idea on the best way to > hint to GCC that a given insn would make more sense someplace else. > > The C code is simple: > int16_t mask(uint32_t a) > { > return (x & a) == a; > } > > int16_t is QImode and uint32_t is HImode. > After combine the insn chain (which is unmodified all the way to ira) is (in > simplified form): > regQI 27 <- regQI AH [a] > regQI 28 <- regQI AL [a+1] > regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1)) > regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x)) > regQI 30 <- regQI AL > regQI 29 <- regQI AH > regQI 24 <- 1 > if regQI 29 != regQI 27 > goto labelref 20 > if regQI 30 != regQI 28 > goto labelref 20 > goto labelref 22 > labelref 20 > regQI 24 <- 0 > labelref 22 > regQI AL <- regQI 24 > > The problem resides in `regQI 24 <- 1' being before the jumps. > Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which > creates loads of conflicts and reloads. If that same insn would be moved to > after the jumps and before the `goto labelref 22' then all would be fine > cause by then regs 27, 28, 29, 30 are dead. > > It's obviously hard to point to a solution but I was wondering if there's a > way to hint to GCC that moving an insn might help the code issue. Or if I > should look into a why an existing pass is not already doing that. On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0 which is then if-converted. Modified testcase: long long x; _Bool __attribute__((regparm(2))) mask (long long a) { return (x & a) == a; } on i?86 gets you mask: .LFB0: .cfi_startproc pushl %ebx .cfi_def_cfa_offset 8 .cfi_offset 3, -8 movl %eax, %ebx andl x, %ebx movl %edx, %ecx andl x+4, %ecx xorl %ebx, %eax xorl %ecx, %edx orl %edx, %eax sete %al popl %ebx .cfi_restore 3 .cfi_def_cfa_offset 4 ret so I wonder if you should investigate why the xor variant doesn't trigger for you? On i?86 if-conversion probably solves your specific issue, but I guess the initial expansion is where you could improve placement of the 1 (after all, the 0 is after the jumps). Richard. > Cheers, > > -- > PMatos > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 11:40 ` Richard Guenther @ 2011-08-10 11:42 ` Richard Guenther 2011-08-10 13:55 ` Paulo J. Matos 2011-08-10 13:46 ` Paulo J. Matos 1 sibling, 1 reply; 15+ messages in thread From: Richard Guenther @ 2011-08-10 11:42 UTC (permalink / raw) To: Paulo J. Matos; +Cc: gcc, Vladimir N. Makarov On Wed, Aug 10, 2011 at 1:40 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Wed, Aug 10, 2011 at 12:29 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote: >> Hi, >> >> I am having a size optimisation issue with GCC-4.6.1. >> The problem boils down to the fact that I have no idea on the best way to >> hint to GCC that a given insn would make more sense someplace else. >> >> The C code is simple: >> int16_t mask(uint32_t a) >> { >> return (x & a) == a; >> } >> >> int16_t is QImode and uint32_t is HImode. >> After combine the insn chain (which is unmodified all the way to ira) is (in >> simplified form): >> regQI 27 <- regQI AH [a] >> regQI 28 <- regQI AL [a+1] >> regQI AL <- andQI(regQI 28, memQI(symbolrefQI(x) + 1)) >> regQI AH <- andQI(regQI 27, memQI(symbolrefQI(x)) >> regQI 30 <- regQI AL >> regQI 29 <- regQI AH >> regQI 24 <- 1 >> if regQI 29 != regQI 27 >> goto labelref 20 >> if regQI 30 != regQI 28 >> goto labelref 20 >> goto labelref 22 >> labelref 20 >> regQI 24 <- 0 >> labelref 22 >> regQI AL <- regQI 24 >> >> The problem resides in `regQI 24 <- 1' being before the jumps. >> Since regQI 24 is going to AL, IRA decides to allocate regQI 24 to AL, which >> creates loads of conflicts and reloads. If that same insn would be moved to >> after the jumps and before the `goto labelref 22' then all would be fine >> cause by then regs 27, 28, 29, 30 are dead. >> >> It's obviously hard to point to a solution but I was wondering if there's a >> way to hint to GCC that moving an insn might help the code issue. Or if I >> should look into a why an existing pass is not already doing that. > > On x86 we expand the code to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0 > which is then if-converted. Modified testcase: > > long long x; > _Bool __attribute__((regparm(2))) mask (long long a) > { > return (x & a) == a; > } > > on i?86 gets you > > mask: > .LFB0: > .cfi_startproc > pushl %ebx > .cfi_def_cfa_offset 8 > .cfi_offset 3, -8 > movl %eax, %ebx > andl x, %ebx > movl %edx, %ecx > andl x+4, %ecx > xorl %ebx, %eax > xorl %ecx, %edx > orl %edx, %eax > sete %al > popl %ebx > .cfi_restore 3 > .cfi_def_cfa_offset 4 > ret > > so I wonder if you should investigate why the xor variant doesn't trigger > for you? On i?86 if-conversion probably solves your specific issue, > but I guess the initial expansion is where you could improve placement > of the 1 (after all, the 0 is after the jumps). Oh, and I wonder if/why IRA can/does not rematerialize the constant instead of spilling it. Might be a cost issue that it doesn't delay allocating a reg for 1 as that is cheap to reload (is it?). Richard. > Richard. > >> Cheers, >> >> -- >> PMatos >> >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 11:42 ` Richard Guenther @ 2011-08-10 13:55 ` Paulo J. Matos [not found] ` <4E431BD8.8060705@redhat.com> 0 siblings, 1 reply; 15+ messages in thread From: Paulo J. Matos @ 2011-08-10 13:55 UTC (permalink / raw) To: gcc On 10/08/11 12:42, Richard Guenther wrote: > > Oh, and I wonder if/why IRA can/does not rematerialize the constant > instead of spilling it. Might be a cost issue that it doesn't delay > allocating a reg for 1 as that is cheap to reload (is it?). > I would indeed expect IRA to move the constant assignment. However it doesn't. The cost of a constant as per RTX_COSTS is 1 since it takes exactly one instruction to actually do that (optimizing for size). -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <4E431BD8.8060705@redhat.com>]
* Re: Move insn out of the way [not found] ` <4E431BD8.8060705@redhat.com> @ 2011-08-11 8:12 ` Paulo J. Matos 2011-08-11 8:49 ` Richard Guenther 2011-08-11 12:22 ` Paulo J. Matos 1 sibling, 1 reply; 15+ messages in thread From: Paulo J. Matos @ 2011-08-11 8:12 UTC (permalink / raw) To: Vladimir Makarov; +Cc: gcc, Richard Guenther On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote: > I can not reproduce the problem. It would be nice to give all info (the > code without includes and all options). In this case I could have more info > to say more definitely about the reason of the problem in IRA. > One of the issue with these problems of mine is that they are tied to my backend, but not always. I think I managed to reproduce a similar result in the avr backend using GCC4.6.1 test.c: long long x; _Bool mask (long long a) { return (x & a) == a; } $ avr-cc1 -Os test.c This generates the following assembler: mask: push r13 push r14 push r15 push r16 push r17 /* prologue: function */ /* frame size = 0 */ /* stack size = 5 */ .L__stack_usage = 5 lds r14,x and r14,r18 lds r15,x+1 and r15,r19 lds r16,x+2 and r16,r20 lds r17,x+3 and r17,r21 lds r27,x+4 and r27,r22 lds r26,x+5 and r26,r23 lds r31,x+6 and r31,r24 lds r30,x+7 and r30,r25 clr r13 inc r13 cp r14,r18 brne .L3 cp r15,r19 brne .L3 cp r16,r20 brne .L3 cp r17,r21 brne .L3 cp r27,r22 brne .L3 cp r26,r23 brne .L3 cp r31,r24 brne .L3 cpse r30,r25 .L3: clr r13 .L2: mov r24,r13 /* epilogue start */ pop r17 pop r16 pop r15 pop r14 pop r13 ret .size mask, .-mask .comm x,8,1 I can't tell how good or bad this assembler is but I note a couple of similarities with my backends assembler output: - It doesn't do if-conversion like Richard suggested. So (x & a) == a is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0. - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps. The only assignment to r13 is as in my case after the jumps as 'clr 13' to set up the return value. I am not sure if this situation causes a lot of register pressure, however I think it doesn't in avr but it does in my backend. AVR has 32 registers to play with, mine can only deal with 3 in the destination operand position. -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-11 8:12 ` Paulo J. Matos @ 2011-08-11 8:49 ` Richard Guenther 2011-08-11 14:27 ` Vladimir Makarov 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2011-08-11 8:49 UTC (permalink / raw) To: Paulo J. Matos; +Cc: Vladimir Makarov, gcc On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos <paulo@matos-sorge.com> wrote: > On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote: >> I can not reproduce the problem. It would be nice to give all info (the >> code without includes and all options). In this case I could have more info >> to say more definitely about the reason of the problem in IRA. >> > > One of the issue with these problems of mine is that they are tied to > my backend, but not always. I think I managed to reproduce a similar > result in the avr backend using GCC4.6.1 > > test.c: > long long x; > _Bool mask (long long a) > { > return (x & a) == a; > } > > $ avr-cc1 -Os test.c > > This generates the following assembler: > mask: > push r13 > push r14 > push r15 > push r16 > push r17 > /* prologue: function */ > /* frame size = 0 */ > /* stack size = 5 */ > .L__stack_usage = 5 > lds r14,x > and r14,r18 > lds r15,x+1 > and r15,r19 > lds r16,x+2 > and r16,r20 > lds r17,x+3 > and r17,r21 > lds r27,x+4 > and r27,r22 > lds r26,x+5 > and r26,r23 > lds r31,x+6 > and r31,r24 > lds r30,x+7 > and r30,r25 > clr r13 > inc r13 > cp r14,r18 > brne .L3 > cp r15,r19 > brne .L3 > cp r16,r20 > brne .L3 > cp r17,r21 > brne .L3 > cp r27,r22 > brne .L3 > cp r26,r23 > brne .L3 > cp r31,r24 > brne .L3 > cpse r30,r25 > .L3: > clr r13 > .L2: > mov r24,r13 > /* epilogue start */ > pop r17 > pop r16 > pop r15 > pop r14 > pop r13 > ret > .size mask, .-mask > .comm x,8,1 > > > I can't tell how good or bad this assembler is but I note a couple of > similarities with my backends assembler output: > - It doesn't do if-conversion like Richard suggested. So (x & a) == a > is not converted to ((xl & al) ^ al) | ((xh & ah) ^ ah) == 0. > - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps. > > The only assignment to r13 is as in my case after the jumps as 'clr > 13' to set up the return value. I am not sure if this situation causes > a lot of register pressure, however I think it doesn't in avr but it > does in my backend. AVR has 32 registers to play with, mine can only > deal with 3 in the destination operand position. What I was expecting IRA to do is 1) split live-range at kills, thus if a constant is assigned to a pseudo then the constant has its own live-range 2) pseudos that are equal to a constant are assigned hard registers last if re-materializing them during reload is cheaper than spilling them I suspect that 1) is not happening, I hope that 2) would happen already. Correct? Richard. > -- > PMatos > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-11 8:49 ` Richard Guenther @ 2011-08-11 14:27 ` Vladimir Makarov 2011-08-12 10:01 ` Paulo J. Matos 0 siblings, 1 reply; 15+ messages in thread From: Vladimir Makarov @ 2011-08-11 14:27 UTC (permalink / raw) To: Richard Guenther; +Cc: Paulo J. Matos, gcc On 08/11/2011 04:49 AM, Richard Guenther wrote: > On Thu, Aug 11, 2011 at 10:11 AM, Paulo J. Matos<paulo@matos-sorge.com> wrote: >> On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov<vmakarov@redhat.com> wrote: >>> I can not reproduce the problem. It would be nice to give all info (the >>> code without includes and all options). In this case I could have more info >>> to say more definitely about the reason of the problem in IRA. >>> >> One of the issue with these problems of mine is that they are tied to >> my backend, but not always. I think I managed to reproduce a similar >> result in the avr backend using GCC4.6.1 >> >> test.c: >> long long x; >> _Bool mask (long long a) >> { >> return (x& a) == a; >> } >> >> $ avr-cc1 -Os test.c >> >> This generates the following assembler: >> mask: >> push r13 >> push r14 >> push r15 >> push r16 >> push r17 >> /* prologue: function */ >> /* frame size = 0 */ >> /* stack size = 5 */ >> .L__stack_usage = 5 >> lds r14,x >> and r14,r18 >> lds r15,x+1 >> and r15,r19 >> lds r16,x+2 >> and r16,r20 >> lds r17,x+3 >> and r17,r21 >> lds r27,x+4 >> and r27,r22 >> lds r26,x+5 >> and r26,r23 >> lds r31,x+6 >> and r31,r24 >> lds r30,x+7 >> and r30,r25 >> clr r13 >> inc r13 >> cp r14,r18 >> brne .L3 >> cp r15,r19 >> brne .L3 >> cp r16,r20 >> brne .L3 >> cp r17,r21 >> brne .L3 >> cp r27,r22 >> brne .L3 >> cp r26,r23 >> brne .L3 >> cp r31,r24 >> brne .L3 >> cpse r30,r25 >> .L3: >> clr r13 >> .L2: >> mov r24,r13 >> /* epilogue start */ >> pop r17 >> pop r16 >> pop r15 >> pop r14 >> pop r13 >> ret >> .size mask, .-mask >> .comm x,8,1 >> >> >> I can't tell how good or bad this assembler is but I note a couple of >> similarities with my backends assembler output: >> - It doesn't do if-conversion like Richard suggested. So (x& a) == a >> is not converted to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0. >> - The assignment of r13 to 1 is done as 'clr r13; inc r13' _before_ the jumps. >> >> The only assignment to r13 is as in my case after the jumps as 'clr >> 13' to set up the return value. I am not sure if this situation causes >> a lot of register pressure, however I think it doesn't in avr but it >> does in my backend. AVR has 32 registers to play with, mine can only >> deal with 3 in the destination operand position. > What I was expecting IRA to do is > > 1) split live-range at kills, thus if a constant is assigned to a pseudo > then the constant has its own live-range > > 2) pseudos that are equal to a constant are assigned hard registers > last if re-materializing them during reload is cheaper than spilling them > > I suspect that 1) is not happening, I hope that 2) would happen already. > > Correct? > Yes, that is mostly correct. The first could be done by -fweb (if the live range where the pseudo is equal to the constant is disjoint). The first could be done also by Jeff Law's project which can provide splitting not only on the border of loops. Some problems might be solved even in LRA (a new project I am working on) which would spill the pseudo assigned to constant, assign the hard registers to conflicting non-reload pseudos (spilled in IRA) and inherit the hard register for the reload pseudos of the spilled pseudo (if insns can not use the constant directly) achieving this way live range splitting for the spilled pseudo. Reload pass can not do this because it does not assign hard registers to pseudos spilled in IRA when a hard register is freed by spilling a conflicting pseudo for reloads. Actually the same problem exists in the old RA. IRA is different from it mostly by: o live range splitting at the most important program points (loop borders) o better coloring o better choosing hard registers o better coalescing o better communication with reload pass ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-11 14:27 ` Vladimir Makarov @ 2011-08-12 10:01 ` Paulo J. Matos 2011-08-12 14:22 ` Vladimir Makarov 2011-08-12 16:12 ` Jeff Law 0 siblings, 2 replies; 15+ messages in thread From: Paulo J. Matos @ 2011-08-12 10:01 UTC (permalink / raw) To: Vladimir Makarov; +Cc: Richard Guenther, gcc On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > Yes, that is mostly correct. The first could be done by -fweb (if the live > range where the pseudo is equal to the constant is disjoint). The first > could be done also by Jeff Law's project which can provide splitting not > only on the border of loops. > I was thinking that one possible solution in the short term would be to add a new pass just before IRA which does constant assignment moves. So, an insn where a register which is assigned a constant can be moved as much as possible to the place right before the use of the register or if there's no use of the register inside the current BB, it can be moved as the last instruction of the BB. What do you think about this? Would this work? I know it's not very general, however, it's useful at least for my backend to get this right as soon as possible due to several size test failures we have which are a consequence of this problem. Paulo Matos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-12 10:01 ` Paulo J. Matos @ 2011-08-12 14:22 ` Vladimir Makarov 2011-08-12 15:06 ` Paulo J. Matos 2011-08-12 16:12 ` Jeff Law 1 sibling, 1 reply; 15+ messages in thread From: Vladimir Makarov @ 2011-08-12 14:22 UTC (permalink / raw) To: Paulo J. Matos; +Cc: Richard Guenther, gcc On 08/12/2011 06:00 AM, Paulo J. Matos wrote: > On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov<vmakarov@redhat.com> wrote: >> Yes, that is mostly correct. The first could be done by -fweb (if the live >> range where the pseudo is equal to the constant is disjoint). The first >> could be done also by Jeff Law's project which can provide splitting not >> only on the border of loops. >> > I was thinking that one possible solution in the short term would be > to add a new pass just before IRA which does constant assignment > moves. So, an insn where a register which is assigned a constant can > be moved as much as possible to the place right before the use of the > register or if there's no use of the register inside the current BB, > it can be moved as the last instruction of the BB. > > What do you think about this? Would this work? I know it's not very > general, however, it's useful at least for my backend to get this > right as soon as possible due to several size test failures we have > which are a consequence of this problem. Sorry, Paulo. I don't think it is a good idea to have such a general pass. A constant depending on its value could be prohibited to be used in insn. Moving assignment to the constant most probably worsens insn schedule on targets where the 1st insn scheduling is a default. But moving the pass before 1st insn scheduling could work if register pressure sensitive insn scheduling is used. Still it is too specialized pass. I think register pressure relief as it is described in Simpson's thesis would be a more general approach. But to be honest, I think, the best solution would be in RA because it is dealing with insn constraints and costs. I'll think about solving this problem in RA. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-12 14:22 ` Vladimir Makarov @ 2011-08-12 15:06 ` Paulo J. Matos 0 siblings, 0 replies; 15+ messages in thread From: Paulo J. Matos @ 2011-08-12 15:06 UTC (permalink / raw) To: Vladimir Makarov; +Cc: Richard Guenther, gcc On Fri, Aug 12, 2011 at 3:21 PM, Vladimir Makarov <vmakarov@redhat.com> wrote: > > Sorry, Paulo. I don't think it is a good idea to have such a general pass. Thanks for the observation and the points you made. I understand and agree that this should be sorted at the IRA level. What I might do in the meantime is to implement such a pass on my port of GCC until it is sorted upstream. > > But to be honest, I think, the best solution would be in RA because it is > dealing with insn constraints and costs. I'll think about solving this > problem in RA. > Thanks! I will be eagerly waiting for an update. Cheers, -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-12 10:01 ` Paulo J. Matos 2011-08-12 14:22 ` Vladimir Makarov @ 2011-08-12 16:12 ` Jeff Law 1 sibling, 0 replies; 15+ messages in thread From: Jeff Law @ 2011-08-12 16:12 UTC (permalink / raw) To: Paulo J. Matos; +Cc: Vladimir Makarov, Richard Guenther, gcc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/12/11 04:00, Paulo J. Matos wrote: > On Thu, Aug 11, 2011 at 3:27 PM, Vladimir Makarov > <vmakarov@redhat.com> wrote: >> Yes, that is mostly correct. The first could be done by -fweb (if >> the live range where the pseudo is equal to the constant is >> disjoint). The first could be done also by Jeff Law's project >> which can provide splitting not only on the border of loops. >> > > I was thinking that one possible solution in the short term would be > to add a new pass just before IRA which does constant assignment > moves. So, an insn where a register which is assigned a constant can > be moved as much as possible to the place right before the use of > the register or if there's no use of the register inside the current > BB, it can be moved as the last instruction of the BB. I thought we already had code to do this in response to a pseudo not getting a hard reg and the pseudo has an appropriate REG_EQUIV note on its assignment insn. jeff -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJORVC7AAoJEBRtltQi2kC7zKAH/iesdm/aO4C9OQMfwGgZ6Xht PvStoxWtOUo5JuafwYlRppi67rld+PBnBz65te6TauSwSA6WwJ4tzfSAQ3IyLYZB /k8lgfCRt46XGwAHr3Zva5nOC/zfYyyoMRz8Z6XpfoGsnwt4Zq3Kej9iOXFQZleo PWMHKiQDi6MxBPjGChTME0ct/yvClG/qb5WjbqPeLCLXnn0/VYmvicajvNi/Dscf fYtS8wpw8Kh5ylQ0mKmTsRWRQvXZZToDazqngFsOSxVe+Yoxssk40A9UIJrB1xAP Z7kuJ/i8hHYQkeeHaFd84wKrgHl4Ciue/58K2Nm6H7T93hJstt6eT/rLNCsuV2g= =mG8D -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way [not found] ` <4E431BD8.8060705@redhat.com> 2011-08-11 8:12 ` Paulo J. Matos @ 2011-08-11 12:22 ` Paulo J. Matos 1 sibling, 0 replies; 15+ messages in thread From: Paulo J. Matos @ 2011-08-11 12:22 UTC (permalink / raw) To: Vladimir Makarov; +Cc: gcc, Richard Guenther On Thu, Aug 11, 2011 at 1:01 AM, Vladimir Makarov <vmakarov@redhat.com> wrote: > I can not reproduce the problem. It would be nice to give all info (the > code without includes and all options). In this case I could have more info > to say more definitely about the reason of the problem in IRA. > Let me add another example using the avr backend that produces really strange code. The code has a similar nature: _Bool simple(unsigned long *a, unsigned long *b) { return *a == *b; } Generates the following assembler when compiled with -Os in gcc-4.6: simple: push r16 push r17 /* prologue: function */ /* frame size = 0 */ /* stack size = 2 */ .L__stack_usage = 2 mov r30,r24 mov r31,r25 ldi r24,lo8(1) ld r16,Z ldd r17,Z+1 ldd r18,Z+2 ldd r19,Z+3 mov r30,r22 mov r31,r23 ld r20,Z ldd r21,Z+1 ldd r22,Z+2 ldd r23,Z+3 cp r16,r20 cpc r17,r21 cpc r18,r22 cpc r19,r23 breq .L2 ldi r24,lo8(0) .L2: /* epilogue start */ pop r17 pop r16 ret Again here the placing of the return value is not very relevant because I guess there's not much register pressure but when there is, in my arch, the resulting code is increased by 5 words simply due to the position of the constant assignment. In the above case, the constant assignment is the 5th instruction, when it could be pretty much closer to the end. I am interested in knowing if this is indeed an IRA problem and I have to wait for a fix, or if there's something that I need doing in the backend to tell GCC to delay the constant assignment. Cheers, -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 11:40 ` Richard Guenther 2011-08-10 11:42 ` Richard Guenther @ 2011-08-10 13:46 ` Paulo J. Matos 2011-08-10 13:51 ` Richard Guenther 1 sibling, 1 reply; 15+ messages in thread From: Paulo J. Matos @ 2011-08-10 13:46 UTC (permalink / raw) To: gcc On 10/08/11 12:40, Richard Guenther wrote: > > On x86 we expand the code to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0 > which is then if-converted. Modified testcase: > > long long x; > _Bool __attribute__((regparm(2))) mask (long long a) > { > return (x& a) == a; > } > > on i?86 gets you > > mask: > .LFB0: > .cfi_startproc > pushl %ebx > .cfi_def_cfa_offset 8 > .cfi_offset 3, -8 > movl %eax, %ebx > andl x, %ebx > movl %edx, %ecx > andl x+4, %ecx > xorl %ebx, %eax > xorl %ecx, %edx > orl %edx, %eax > sete %al > popl %ebx > .cfi_restore 3 > .cfi_def_cfa_offset 4 > ret > > so I wonder if you should investigate why the xor variant doesn't trigger > for you? I can reproduce this result in GCC 4.6.1 for x86. I can't understand what you mean by this though. From inspecting the logs it seems that the if-conversion is done manually at expand time. The final pass before expand shows the original (x & a) == a, however, after expand the rtl already contains xor, ior, etc. So I guess I would need to do something similar in my backend. I can't however, find in the i386(.md|.c) where this is actually happening. > On i?86 if-conversion probably solves your specific issue, > but I guess the initial expansion is where you could improve placement > of the 1 (after all, the 0 is after the jumps). > This is happening on my own backend so I guess anything that is implemented to do if-conversion on i386 needs to be implemented also on my backend. Can you point me to the code on i386 so I can take a look at it? Cheers, -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 13:46 ` Paulo J. Matos @ 2011-08-10 13:51 ` Richard Guenther 2011-08-10 14:14 ` Paulo J. Matos 0 siblings, 1 reply; 15+ messages in thread From: Richard Guenther @ 2011-08-10 13:51 UTC (permalink / raw) To: Paulo J. Matos; +Cc: gcc On Wed, Aug 10, 2011 at 3:46 PM, Paulo J. Matos <paulo@matos-sorge.com> wrote: > On 10/08/11 12:40, Richard Guenther wrote: >> >> On x86 we expand the code to ((xl& al) ^ al) | ((xh& ah) ^ ah) == 0 >> which is then if-converted. Modified testcase: >> >> long long x; >> _Bool __attribute__((regparm(2))) mask (long long a) >> { >> return (x& a) == a; >> } >> >> on i?86 gets you >> >> mask: >> .LFB0: >> .cfi_startproc >> pushl %ebx >> .cfi_def_cfa_offset 8 >> .cfi_offset 3, -8 >> movl %eax, %ebx >> andl x, %ebx >> movl %edx, %ecx >> andl x+4, %ecx >> xorl %ebx, %eax >> xorl %ecx, %edx >> orl %edx, %eax >> sete %al >> popl %ebx >> .cfi_restore 3 >> .cfi_def_cfa_offset 4 >> ret >> >> so I wonder if you should investigate why the xor variant doesn't trigger >> for you? > > I can reproduce this result in GCC 4.6.1 for x86. > I can't understand what you mean by this though. From inspecting the logs it > seems that the if-conversion is done manually at expand time. The final pass > before expand shows the original (x & a) == a, however, after expand the rtl > already contains xor, ior, etc. So I guess I would need to do something > similar in my backend. I can't however, find in the i386(.md|.c) where this > is actually happening. > >> On i?86 if-conversion probably solves your specific issue, >> but I guess the initial expansion is where you could improve placement >> of the 1 (after all, the 0 is after the jumps). >> > > This is happening on my own backend so I guess anything that is implemented > to do if-conversion on i386 needs to be implemented also on my backend. Can > you point me to the code on i386 so I can take a look at it? I think it's all happening in generic code via do_store_flag. Richard. > Cheers, > > -- > PMatos > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Move insn out of the way 2011-08-10 13:51 ` Richard Guenther @ 2011-08-10 14:14 ` Paulo J. Matos 0 siblings, 0 replies; 15+ messages in thread From: Paulo J. Matos @ 2011-08-10 14:14 UTC (permalink / raw) To: gcc On 10/08/11 14:51, Richard Guenther wrote: > > I think it's all happening in generic code via do_store_flag. > ah, now I understand your previous question. I wonder if it's not triggered because I don't have cstore<mode>4 defined. Might be that but I have to look deeper. -- PMatos ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-12 16:12 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-10 11:20 Move insn out of the way Paulo J. Matos 2011-08-10 11:40 ` Richard Guenther 2011-08-10 11:42 ` Richard Guenther 2011-08-10 13:55 ` Paulo J. Matos [not found] ` <4E431BD8.8060705@redhat.com> 2011-08-11 8:12 ` Paulo J. Matos 2011-08-11 8:49 ` Richard Guenther 2011-08-11 14:27 ` Vladimir Makarov 2011-08-12 10:01 ` Paulo J. Matos 2011-08-12 14:22 ` Vladimir Makarov 2011-08-12 15:06 ` Paulo J. Matos 2011-08-12 16:12 ` Jeff Law 2011-08-11 12:22 ` Paulo J. Matos 2011-08-10 13:46 ` Paulo J. Matos 2011-08-10 13:51 ` Richard Guenther 2011-08-10 14:14 ` Paulo J. Matos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).