public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* new unroller vs ppc
@ 2003-03-21  2:18 Dale Johannesen
  2003-03-21  8:40 ` Zdenek Dvorak
  0 siblings, 1 reply; 13+ messages in thread
From: Dale Johannesen @ 2003-03-21  2:18 UTC (permalink / raw)
  To: Zdenek Dvorak, gcc; +Cc: Dale Johannesen

fyi, the new loop unroller doesn't seem to work very well on ppc:

int a[100];
int foo() {
   int i;
   for ( i=0; i<100; i++ )
      a[i] = 6;}

(3.3 compiler)
(each of the load/update insns is dependent on the previous one, so
you can only issue one per cycle.  This is not so good.)
L29:
         stw r0,0(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         stwu r0,4(r9)
         addi r9,r9,4
         bdnz L29

(3.4 -fold-unroll-loops)
(This is much better.  25 times seems like a bit much,
but otherwise this is optimal.  I know, I can tell it to unroll less.)
L58:
         stw r0,0(r9)
         stw r0,4(r9)
         stw r0,8(r9)
         stw r0,12(r9)
         stw r0,16(r9)
         stw r0,20(r9)
         stw r0,24(r9)
         stw r0,28(r9)
         stw r0,32(r9)
         stw r0,36(r9)
         stw r0,40(r9)
         stw r0,44(r9)
         stw r0,48(r9)
         stw r0,52(r9)
         stw r0,56(r9)
         stw r0,60(r9)
         stw r0,64(r9)
         stw r0,68(r9)
         stw r0,72(r9)
         stw r0,76(r9)
         stw r0,80(r9)
         stw r0,84(r9)
         stw r0,88(r9)
         stw r0,92(r9)
         stw r0,96(r9)
         addi r9,r9,100
         bdnz L58

(3.4 -funroll-all-loops)
(it will not unroll at all with -funroll-loops)
         b L8
L19:
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdz L17
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdz L17
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdz L17
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdz L17
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdz L17
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         bdz L17
         stwx r11,r10,r0
         bdz L17
L8:
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         bdnz L19

(3.4 -fno-branch-count-reg)
L5:
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         stwx r11,r10,r0
         slwi r0,r9,2
         addi r9,r9,1
         cmpwi cr7,r9,99
         stwx r11,r10,r0
         ble+ cr7,L5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-03-21  2:18 new unroller vs ppc Dale Johannesen
@ 2003-03-21  8:40 ` Zdenek Dvorak
  2003-03-26 20:25   ` David Edelsohn
  2003-03-29  8:43   ` David Edelsohn
  0 siblings, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-03-21  8:40 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: gcc

Hello,

> fyi, the new loop unroller doesn't seem to work very well on ppc:

[snip]

Does this patch:
http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
help?

Zdenek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-03-21  8:40 ` Zdenek Dvorak
@ 2003-03-26 20:25   ` David Edelsohn
  2003-03-29  8:43   ` David Edelsohn
  1 sibling, 0 replies; 13+ messages in thread
From: David Edelsohn @ 2003-03-26 20:25 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Dale Johannesen, gcc

	I will try with your subreg patch, but just to let you know the
magnitude of the performance degradation created by the new unroller code
for PowerPC:

Base = old unroller, Peak = new unroller

   168.wupwise       1600       262       611*     1600       270       593*
   171.swim          3100       534       580*     3100       520       596*
   172.mgrid         1800       317       567*     1800       518       347*
   177.mesa          1400       293       477*     1400       308       454*
   179.art           2600       233      1117*     2600       270       962*
   183.equake        1300       129      1009*     1300       141       919*
   188.ammp          2200       628       350*     2200       595       370*
   200.sixtrack      1100       683       161*     1100       662       166*
   181.mcf           1800       217       831*     1800       216       832*
   197.parser        1800       592       304*     1800       579       311*
   255.vortex        1900       289       657*     1900       287       662*
   300.twolf         3000       465       645*     3000       469       640*

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-03-21  8:40 ` Zdenek Dvorak
  2003-03-26 20:25   ` David Edelsohn
@ 2003-03-29  8:43   ` David Edelsohn
  2003-03-29 14:04     ` Zdenek Dvorak
  2003-04-07  2:56     ` Zdenek Dvorak
  1 sibling, 2 replies; 13+ messages in thread
From: David Edelsohn @ 2003-03-29  8:43 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Dale Johannesen, gcc

>>>>> Zdenek Dvorak writes:

Zdenek> Does this patch:
Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
Zdenek> help?

	No, not really:

TEST		   old-unroller    new-unroller    new-unroller w/patch
168.wupwise		630		605		619
171.swim		596		592		568
172.mgrid		567		347		346
177.mesa		469		465		460
179.art			1154		992		999
183.equake		1018		927		957
188.ammp		363		370		369
200.sixtrack		209		110		 98
181.mcf			834		824		819
197.parser		313		312		312
255.vortex		662		659		659
300.twolf		650		646		659

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-03-29  8:43   ` David Edelsohn
@ 2003-03-29 14:04     ` Zdenek Dvorak
  2003-04-07  2:56     ` Zdenek Dvorak
  1 sibling, 0 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-03-29 14:04 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Dale Johannesen, gcc

Hello,

> Zdenek> Does this patch:
> Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> Zdenek> help?
> 
> 	No, not really:
> 
> TEST		   old-unroller    new-unroller    new-unroller w/patch
> 168.wupwise		630		605		619
> 171.swim		596		592		568
> 172.mgrid		567		347		346
> 177.mesa		469		465		460
> 179.art			1154		992		999
> 183.equake		1018		927		957
> 188.ammp		363		370		369
> 200.sixtrack		209		110		 98
> 181.mcf			834		824		819
> 197.parser		313		312		312
> 255.vortex		662		659		659
> 300.twolf		650		646		659

OK, I will have a look on this sometimes next week.

Zdenek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-03-29  8:43   ` David Edelsohn
  2003-03-29 14:04     ` Zdenek Dvorak
@ 2003-04-07  2:56     ` Zdenek Dvorak
  2003-04-07 16:06       ` Jan Hubicka
  2003-04-07 20:20       ` Richard Henderson
  1 sibling, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-04-07  2:56 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Dale Johannesen, gcc, jh, rth

Hello,

> Zdenek> Does this patch:
> Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> Zdenek> help?
> 
> 	No, not really:
[snip]

looking at the assembler that comes out of crosscompiler, I see two
problems:

1) we do not do induction variable splitting; the webizer pass
   http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00501.html
   should take care of this
2) this code in cse.c

/* Don't associate these operations if they are a PLUS with the
   same constant and it is a power of two.  These might be doable
   with a pre- or post-increment.  Similarly for two subtracts of
   identical powers of two with post decrement.  */

  if (code == PLUS && INTVAL (const_arg1) == INTVAL (inner_const)
      && ((HAVE_PRE_INCREMENT
           && exact_log2 (INTVAL (const_arg1)) >= 0)
          || (HAVE_POST_INCREMENT
              && exact_log2 (INTVAL (const_arg1)) >= 0)
          || (HAVE_PRE_DECREMENT
              && exact_log2 (- INTVAL (const_arg1)) >= 0)
          || (HAVE_POST_DECREMENT
              && exact_log2 (- INTVAL (const_arg1)) >= 0)))
    break;

prevents us from combining the increments of induction variables even
with webizer.  I think the correct solution is to split
pre/post modify transformation out of flow.c (relatively easy,
I have the patch somewhere), to run it before cse and to cancel this
code.

Zdenek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07  2:56     ` Zdenek Dvorak
@ 2003-04-07 16:06       ` Jan Hubicka
  2003-04-07 20:20       ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Hubicka @ 2003-04-07 16:06 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh, rth

> Hello,
> 
> > Zdenek> Does this patch:
> > Zdenek> http://gcc.gnu.org/ml/gcc-patches/2003-03/msg01564.html
> > Zdenek> help?
> > 
> > 	No, not really:
> [snip]
> 
> looking at the assembler that comes out of crosscompiler, I see two
> problems:
> 
> 1) we do not do induction variable splitting; the webizer pass
>    http://gcc.gnu.org/ml/gcc-patches/2003-02/msg00501.html
>    should take care of this

This patch appears to have problems to get in, however I hope it will
one day.  Or is there something particulary wrong about it?

> 2) this code in cse.c
> 
> /* Don't associate these operations if they are a PLUS with the
>    same constant and it is a power of two.  These might be doable
>    with a pre- or post-increment.  Similarly for two subtracts of
>    identical powers of two with post decrement.  */
> 
>   if (code == PLUS && INTVAL (const_arg1) == INTVAL (inner_const)
>       && ((HAVE_PRE_INCREMENT
>            && exact_log2 (INTVAL (const_arg1)) >= 0)
>           || (HAVE_POST_INCREMENT
>               && exact_log2 (INTVAL (const_arg1)) >= 0)
>           || (HAVE_PRE_DECREMENT
>               && exact_log2 (- INTVAL (const_arg1)) >= 0)
>           || (HAVE_POST_DECREMENT
>               && exact_log2 (- INTVAL (const_arg1)) >= 0)))
>     break;
> 
> prevents us from combining the increments of induction variables even
> with webizer.  I think the correct solution is to split
> pre/post modify transformation out of flow.c (relatively easy,
> I have the patch somewhere), to run it before cse and to cancel this
> code.
You won't be able to cancel it completely I guess, only for CSE2 pass.
The passes before flow don't expect autoincrements to happen.
Otherwise this sounds like sane plan to me.

Honza
> 
> Zdenek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07  2:56     ` Zdenek Dvorak
  2003-04-07 16:06       ` Jan Hubicka
@ 2003-04-07 20:20       ` Richard Henderson
  2003-04-07 20:25         ` Zdenek Dvorak
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2003-04-07 20:20 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh

On Mon, Apr 07, 2003 at 12:39:38AM +0200, Zdenek Dvorak wrote:
> I think the correct solution is to split pre/post modify transformation
> out of flow.c (relatively easy, I have the patch somewhere), to run it
> before cse and to cancel this code.

I don't believe that will work; cse is not prepared for
autoinc operands, iirc.



r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07 20:20       ` Richard Henderson
@ 2003-04-07 20:25         ` Zdenek Dvorak
  2003-04-07 20:26           ` David Edelsohn
  2003-04-07 20:32           ` Richard Henderson
  0 siblings, 2 replies; 13+ messages in thread
From: Zdenek Dvorak @ 2003-04-07 20:25 UTC (permalink / raw)
  To: Richard Henderson, David Edelsohn, Dale Johannesen, gcc, jh

Hello,

> > I think the correct solution is to split pre/post modify transformation
> > out of flow.c (relatively easy, I have the patch somewhere), to run it
> > before cse and to cancel this code.
> 
> I don't believe that will work; cse is not prepared for
> autoinc operands, iirc.

Then what would you suggest to do? Cse apparently produces suboptimal
code in many cases due to this (I guess most of the increments it
ignores this way are not converted to autoincs anyway). I don't see
other way than to teach cse to cope with autoincs (but I am not brave
enough to attempt to do it).

Zdenek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07 20:25         ` Zdenek Dvorak
@ 2003-04-07 20:26           ` David Edelsohn
  2003-04-07 20:28             ` Jan Hubicka
  2003-04-07 20:32           ` Richard Henderson
  1 sibling, 1 reply; 13+ messages in thread
From: David Edelsohn @ 2003-04-07 20:26 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Richard Henderson, Dale Johannesen, gcc, jh

>>>>> Zdenek Dvorak writes:

Zdenek> Then what would you suggest to do? Cse apparently produces suboptimal
Zdenek> code in many cases due to this (I guess most of the increments it
Zdenek> ignores this way are not converted to autoincs anyway). I don't see
Zdenek> other way than to teach cse to cope with autoincs (but I am not brave
Zdenek> enough to attempt to do it).

	How was the old unroller able to produce better code with this
same impediment?

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07 20:26           ` David Edelsohn
@ 2003-04-07 20:28             ` Jan Hubicka
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Hubicka @ 2003-04-07 20:28 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Zdenek Dvorak, Richard Henderson, Dale Johannesen, gcc, jh

> >>>>> Zdenek Dvorak writes:
> 
> Zdenek> Then what would you suggest to do? Cse apparently produces suboptimal
> Zdenek> code in many cases due to this (I guess most of the increments it
> Zdenek> ignores this way are not converted to autoincs anyway). I don't see
> Zdenek> other way than to teach cse to cope with autoincs (but I am not brave
> Zdenek> enough to attempt to do it).
> 
> 	How was the old unroller able to produce better code with this
> same impediment?

It is hooked into strength reduction that knows about autoincrements and
predicts when they are available.

Honza
> 
> David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07 20:25         ` Zdenek Dvorak
  2003-04-07 20:26           ` David Edelsohn
@ 2003-04-07 20:32           ` Richard Henderson
  2003-04-07 20:49             ` tm_gccmail
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2003-04-07 20:32 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: David Edelsohn, Dale Johannesen, gcc, jh

On Mon, Apr 07, 2003 at 10:02:45PM +0200, Zdenek Dvorak wrote:
> Then what would you suggest to do?

Long term I think we should do a new autoinc pass.  Currently
what we have is split between flow and regmove.

It would also be nice if it was able to interact with the
scheduler.  A more aggressive autoinc pass can constrain the
scheduler too much and reduce overall performance.  This is
especially visible on ia64.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: new unroller vs ppc
  2003-04-07 20:32           ` Richard Henderson
@ 2003-04-07 20:49             ` tm_gccmail
  0 siblings, 0 replies; 13+ messages in thread
From: tm_gccmail @ 2003-04-07 20:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Zdenek Dvorak, David Edelsohn, Dale Johannesen, gcc, jh

On Mon, 7 Apr 2003, Richard Henderson wrote:

> On Mon, Apr 07, 2003 at 10:02:45PM +0200, Zdenek Dvorak wrote:
> > Then what would you suggest to do?
> 
> Long term I think we should do a new autoinc pass.  Currently
> what we have is split between flow and regmove.
> 
> It would also be nice if it was able to interact with the
> scheduler.  A more aggressive autoinc pass can constrain the
> scheduler too much and reduce overall performance.  This is
> especially visible on ia64.
> 
> 
> r~

To chime in with my two cent's worth:

We need to have an address generation pass which handles all the
idiosyncracies of addressing in one pass, including:

1. Processors with only short displacements in indirect addressing modes.

2. Processors with both short and long displacements in indirect
   adressing modes.

2. Processor with no offsetable addressing modes.

3. Processor with autoincrement/autodecrement.

4. Processors with different combinations of the above for different
   modes.

I don't think all these cases can be handled cleanly with GCSE as has been
suggested recently.

Toshi


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-04-07 20:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-21  2:18 new unroller vs ppc Dale Johannesen
2003-03-21  8:40 ` Zdenek Dvorak
2003-03-26 20:25   ` David Edelsohn
2003-03-29  8:43   ` David Edelsohn
2003-03-29 14:04     ` Zdenek Dvorak
2003-04-07  2:56     ` Zdenek Dvorak
2003-04-07 16:06       ` Jan Hubicka
2003-04-07 20:20       ` Richard Henderson
2003-04-07 20:25         ` Zdenek Dvorak
2003-04-07 20:26           ` David Edelsohn
2003-04-07 20:28             ` Jan Hubicka
2003-04-07 20:32           ` Richard Henderson
2003-04-07 20:49             ` tm_gccmail

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).