* new unroller vs ppc
@ 2003-03-21 2:18 Dale Johannesen
2003-03-21 8:40 ` Zdenek Dvorak
0 siblings, 1 reply; 13+ messages in thread
From: Dale Johannesen @ 2003-03-21 2:18 UTC (permalink / raw)
To: Zdenek Dvorak, gcc; +Cc: Dale Johannesen
fyi, the new loop unroller doesn't seem to work very well on ppc:
int a[100];
int foo() {
int i;
for ( i=0; i<100; i++ )
a[i] = 6;}
(3.3 compiler)
(each of the load/update insns is dependent on the previous one, so
you can only issue one per cycle. This is not so good.)
L29:
stw r0,0(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
stwu r0,4(r9)
addi r9,r9,4
bdnz L29
(3.4 -fold-unroll-loops)
(This is much better. 25 times seems like a bit much,
but otherwise this is optimal. I know, I can tell it to unroll less.)
L58:
stw r0,0(r9)
stw r0,4(r9)
stw r0,8(r9)
stw r0,12(r9)
stw r0,16(r9)
stw r0,20(r9)
stw r0,24(r9)
stw r0,28(r9)
stw r0,32(r9)
stw r0,36(r9)
stw r0,40(r9)
stw r0,44(r9)
stw r0,48(r9)
stw r0,52(r9)
stw r0,56(r9)
stw r0,60(r9)
stw r0,64(r9)
stw r0,68(r9)
stw r0,72(r9)
stw r0,76(r9)
stw r0,80(r9)
stw r0,84(r9)
stw r0,88(r9)
stw r0,92(r9)
stw r0,96(r9)
addi r9,r9,100
bdnz L58
(3.4 -funroll-all-loops)
(it will not unroll at all with -funroll-loops)
b L8
L19:
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdz L17
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
bdz L17
stwx r11,r10,r0
bdz L17
L8:
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
bdnz L19
(3.4 -fno-branch-count-reg)
L5:
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
stwx r11,r10,r0
slwi r0,r9,2
addi r9,r9,1
cmpwi cr7,r9,99
stwx r11,r10,r0
ble+ cr7,L5
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-03-21 2:18 new unroller vs ppc Dale Johannesen
@ 2003-03-21 8:40 ` Zdenek Dvorak
2003-03-26 20:25 ` David Edelsohn
2003-03-29 8:43 ` David Edelsohn
0 siblings, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-03-21 8:40 UTC (permalink / raw)
To: Dale Johannesen; +Cc: gcc
Hello,
> fyi, the new loop unroller doesn't seem to work very well on ppc:
[snip]
Does this patch:
http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
help?
Zdenek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-03-21 8:40 ` Zdenek Dvorak
@ 2003-03-26 20:25 ` David Edelsohn
2003-03-29 8:43 ` David Edelsohn
1 sibling, 0 replies; 13+ messages in thread
From: David Edelsohn @ 2003-03-26 20:25 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: Dale Johannesen, gcc
I will try with your subreg patch, but just to let you know the
magnitude of the performance degradation created by the new unroller code
for PowerPC:
Base = old unroller, Peak = new unroller
168.wupwise 1600 262 611* 1600 270 593*
171.swim 3100 534 580* 3100 520 596*
172.mgrid 1800 317 567* 1800 518 347*
177.mesa 1400 293 477* 1400 308 454*
179.art 2600 233 1117* 2600 270 962*
183.equake 1300 129 1009* 1300 141 919*
188.ammp 2200 628 350* 2200 595 370*
200.sixtrack 1100 683 161* 1100 662 166*
181.mcf 1800 217 831* 1800 216 832*
197.parser 1800 592 304* 1800 579 311*
255.vortex 1900 289 657* 1900 287 662*
300.twolf 3000 465 645* 3000 469 640*
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-03-21 8:40 ` Zdenek Dvorak
2003-03-26 20:25 ` David Edelsohn
@ 2003-03-29 8:43 ` David Edelsohn
2003-03-29 14:04 ` Zdenek Dvorak
2003-04-07 2:56 ` Zdenek Dvorak
1 sibling, 2 replies; 13+ messages in thread
From: David Edelsohn @ 2003-03-29 8:43 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: Dale Johannesen, gcc
>>>>> Zdenek Dvorak writes:
Zdenek> Does this patch:
Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
Zdenek> help?
No, not really:
TEST old-unroller new-unroller new-unroller w/patch
168.wupwise 630 605 619
171.swim 596 592 568
172.mgrid 567 347 346
177.mesa 469 465 460
179.art 1154 992 999
183.equake 1018 927 957
188.ammp 363 370 369
200.sixtrack 209 110 98
181.mcf 834 824 819
197.parser 313 312 312
255.vortex 662 659 659
300.twolf 650 646 659
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-03-29 8:43 ` David Edelsohn
@ 2003-03-29 14:04 ` Zdenek Dvorak
2003-04-07 2:56 ` Zdenek Dvorak
1 sibling, 0 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-03-29 14:04 UTC (permalink / raw)
To: David Edelsohn; +Cc: Dale Johannesen, gcc
Hello,
> Zdenek> Does this patch:
> Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> Zdenek> help?
>
> No, not really:
>
> TEST old-unroller new-unroller new-unroller w/patch
> 168.wupwise 630 605 619
> 171.swim 596 592 568
> 172.mgrid 567 347 346
> 177.mesa 469 465 460
> 179.art 1154 992 999
> 183.equake 1018 927 957
> 188.ammp 363 370 369
> 200.sixtrack 209 110 98
> 181.mcf 834 824 819
> 197.parser 313 312 312
> 255.vortex 662 659 659
> 300.twolf 650 646 659
OK, I will have a look on this sometimes next week.
Zdenek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-03-29 8:43 ` David Edelsohn
2003-03-29 14:04 ` Zdenek Dvorak
@ 2003-04-07 2:56 ` Zdenek Dvorak
2003-04-07 16:06 ` Jan Hubicka
2003-04-07 20:20 ` Richard Henderson
1 sibling, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-04-07 2:56 UTC (permalink / raw)
To: David Edelsohn; +Cc: Dale Johannesen, gcc, jh, rth
Hello,
> Zdenek> Does this patch:
> Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> Zdenek> help?
>
> No, not really:
[snip]
looking at the assembler that comes out of crosscompiler, I see two
problems:
1) we do not do induction variable splitting; the webizer pass
http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00501.html
should take care of this
2) this code in cse.c
/* Don't associate these operations if they are a PLUS with the
same constant and it is a power of two. These might be doable
with a pre- or post-increment. Similarly for two subtracts of
identical powers of two with post decrement. */
if (code == PLUS && INTVAL (const_arg1) == INTVAL (inner_const)
&& ((HAVE_PRE_INCREMENT
&& exact_log2 (INTVAL (const_arg1)) >= 0)
|| (HAVE_POST_INCREMENT
&& exact_log2 (INTVAL (const_arg1)) >= 0)
|| (HAVE_PRE_DECREMENT
&& exact_log2 (- INTVAL (const_arg1)) >= 0)
|| (HAVE_POST_DECREMENT
&& exact_log2 (- INTVAL (const_arg1)) >= 0)))
break;
prevents us from combining the increments of induction variables even
with webizer. I think the correct solution is to split
pre/post modify transformation out of flow.c (relatively easy,
I have the patch somewhere), to run it before cse and to cancel this
code.
Zdenek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 2:56 ` Zdenek Dvorak
@ 2003-04-07 16:06 ` Jan Hubicka
2003-04-07 20:20 ` Richard Henderson
1 sibling, 0 replies; 13+ messages in thread
From: Jan Hubicka @ 2003-04-07 16:06 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh, rth
> Hello,
>
> > Zdenek> Does this patch:
> > Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> > Zdenek> help?
> >
> > No, not really:
> [snip]
>
> looking at the assembler that comes out of crosscompiler, I see two
> problems:
>
> 1) we do not do induction variable splitting; the webizer pass
> http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00501.html
> should take care of this
This patch appears to have problems to get in, however I hope it will
one day. Or is there something particulary wrong about it?
> 2) this code in cse.c
>
> /* Don't associate these operations if they are a PLUS with the
> same constant and it is a power of two. These might be doable
> with a pre- or post-increment. Similarly for two subtracts of
> identical powers of two with post decrement. */
>
> if (code == PLUS && INTVAL (const_arg1) == INTVAL (inner_const)
> && ((HAVE_PRE_INCREMENT
> && exact_log2 (INTVAL (const_arg1)) >= 0)
> || (HAVE_POST_INCREMENT
> && exact_log2 (INTVAL (const_arg1)) >= 0)
> || (HAVE_PRE_DECREMENT
> && exact_log2 (- INTVAL (const_arg1)) >= 0)
> || (HAVE_POST_DECREMENT
> && exact_log2 (- INTVAL (const_arg1)) >= 0)))
> break;
>
> prevents us from combining the increments of induction variables even
> with webizer. I think the correct solution is to split
> pre/post modify transformation out of flow.c (relatively easy,
> I have the patch somewhere), to run it before cse and to cancel this
> code.
You won't be able to cancel it completely I guess, only for CSE2 pass.
The passes before flow don't expect autoincrements to happen.
Otherwise this sounds like sane plan to me.
Honza
>
> Zdenek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 2:56 ` Zdenek Dvorak
2003-04-07 16:06 ` Jan Hubicka
@ 2003-04-07 20:20 ` Richard Henderson
2003-04-07 20:25 ` Zdenek Dvorak
1 sibling, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2003-04-07 20:20 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh
On Mon, Apr 07, 2003 at 12:39:38AM +0200, Zdenek Dvorak wrote:
> I think the correct solution is to split pre/post modify transformation
> out of flow.c (relatively easy, I have the patch somewhere), to run it
> before cse and to cancel this code.
I don't believe that will work; cse is not prepared for
autoinc operands, iirc.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 20:20 ` Richard Henderson
@ 2003-04-07 20:25 ` Zdenek Dvorak
2003-04-07 20:26 ` David Edelsohn
2003-04-07 20:32 ` Richard Henderson
0 siblings, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-04-07 20:25 UTC (permalink / raw)
To: Richard Henderson, David Edelsohn, Dale Johannesen, gcc, jh
Hello,
> > I think the correct solution is to split pre/post modify transformation
> > out of flow.c (relatively easy, I have the patch somewhere), to run it
> > before cse and to cancel this code.
>
> I don't believe that will work; cse is not prepared for
> autoinc operands, iirc.
Then what would you suggest to do? Cse apparently produces suboptimal
code in many cases due to this (I guess most of the increments it
ignores this way are not converted to autoincs anyway). I don't see
other way than to teach cse to cope with autoincs (but I am not brave
enough to attempt to do it).
Zdenek
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 20:25 ` Zdenek Dvorak
@ 2003-04-07 20:26 ` David Edelsohn
2003-04-07 20:28 ` Jan Hubicka
2003-04-07 20:32 ` Richard Henderson
1 sibling, 1 reply; 13+ messages in thread
From: David Edelsohn @ 2003-04-07 20:26 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: Richard Henderson, Dale Johannesen, gcc, jh
>>>>> Zdenek Dvorak writes:
Zdenek> Then what would you suggest to do? Cse apparently produces suboptimal
Zdenek> code in many cases due to this (I guess most of the increments it
Zdenek> ignores this way are not converted to autoincs anyway). I don't see
Zdenek> other way than to teach cse to cope with autoincs (but I am not brave
Zdenek> enough to attempt to do it).
How was the old unroller able to produce better code with this
same impediment?
David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 20:26 ` David Edelsohn
@ 2003-04-07 20:28 ` Jan Hubicka
0 siblings, 0 replies; 13+ messages in thread
From: Jan Hubicka @ 2003-04-07 20:28 UTC (permalink / raw)
To: David Edelsohn; +Cc: Zdenek Dvorak, Richard Henderson, Dale Johannesen, gcc, jh
> >>>>> Zdenek Dvorak writes:
>
> Zdenek> Then what would you suggest to do? Cse apparently produces suboptimal
> Zdenek> code in many cases due to this (I guess most of the increments it
> Zdenek> ignores this way are not converted to autoincs anyway). I don't see
> Zdenek> other way than to teach cse to cope with autoincs (but I am not brave
> Zdenek> enough to attempt to do it).
>
> How was the old unroller able to produce better code with this
> same impediment?
It is hooked into strength reduction that knows about autoincrements and
predicts when they are available.
Honza
>
> David
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 20:25 ` Zdenek Dvorak
2003-04-07 20:26 ` David Edelsohn
@ 2003-04-07 20:32 ` Richard Henderson
2003-04-07 20:49 ` tm_gccmail
1 sibling, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2003-04-07 20:32 UTC (permalink / raw)
To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh
On Mon, Apr 07, 2003 at 10:02:45PM +0200, Zdenek Dvorak wrote:
> Then what would you suggest to do?
Long term I think we should do a new autoinc pass. Currently
what we have is split between flow and regmove.
It would also be nice if it was able to interact with the
scheduler. A more aggressive autoinc pass can constrain the
scheduler too much and reduce overall performance. This is
especially visible on ia64.
r~
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: new unroller vs ppc
2003-04-07 20:32 ` Richard Henderson
@ 2003-04-07 20:49 ` tm_gccmail
0 siblings, 0 replies; 13+ messages in thread
From: tm_gccmail @ 2003-04-07 20:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: Zdenek Dvorak, David Edelsohn, Dale Johannesen, gcc, jh
On Mon, 7 Apr 2003, Richard Henderson wrote:
> On Mon, Apr 07, 2003 at 10:02:45PM +0200, Zdenek Dvorak wrote:
> > Then what would you suggest to do?
>
> Long term I think we should do a new autoinc pass. Currently
> what we have is split between flow and regmove.
>
> It would also be nice if it was able to interact with the
> scheduler. A more aggressive autoinc pass can constrain the
> scheduler too much and reduce overall performance. This is
> especially visible on ia64.
>
>
> r~
To chime in with my two cent's worth:
We need to have an address generation pass which handles all the
idiosyncracies of addressing in one pass, including:
1. Processors with only short displacements in indirect addressing modes.
2. Processors with both short and long displacements in indirect
adressing modes.
2. Processor with no offsetable addressing modes.
3. Processor with autoincrement/autodecrement.
4. Processors with different combinations of the above for different
modes.
I don't think all these cases can be handled cleanly with GCSE as has been
suggested recently.
Toshi
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-04-07 20:28 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-21 2:18 new unroller vs ppc Dale Johannesen
2003-03-21 8:40 ` Zdenek Dvorak
2003-03-26 20:25 ` David Edelsohn
2003-03-29 8:43 ` David Edelsohn
2003-03-29 14:04 ` Zdenek Dvorak
2003-04-07 2:56 ` Zdenek Dvorak
2003-04-07 16:06 ` Jan Hubicka
2003-04-07 20:20 ` Richard Henderson
2003-04-07 20:25 ` Zdenek Dvorak
2003-04-07 20:26 ` David Edelsohn
2003-04-07 20:28 ` Jan Hubicka
2003-04-07 20:32 ` Richard Henderson
2003-04-07 20:49 ` tm_gccmail
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).