[Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/40029]  New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
@ 2009-05-05 17:54 luisgpm at linux dot vnet dot ibm dot com
  2009-05-06  8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-05 17:54 UTC (permalink / raw)
  To: gcc-bugs

CPU2000's swim and mgrid had ~10% slowdown after the merge of the alias
improvement branch.

GCC was configured with the following:

/gcc/HEAD/configure --target=powerpc64-linux --host=powerpc64-linux
--build=powerpc64-linux --with-cpu=default32 --enable-threads=posix
--enable-shared --enable-__cxa_atexit --with-gmp=/gmp --with-mpfr=mpfr
--with-long-double-128 --enable-decimal-float --enable-secure-plt
--disable-bootstrap --disable-alsa --prefix=/install/gcc/HEAD
build_alias=powerpc64-linux host_alias=powerpc64-linux
target_alias=powerpc64-linux --enable-languages=c,c++,fortran --no-create
--no-recursion

Compile flags used: -m[32|64] -O3 -mcpu=power[4|5|6] -ffast-math
-ftree-loop-linear -funroll-loops -fpeel-loops

Will provide more details soon.


-- 
           Summary: [4.5 Regression] Big degradation on swim/mgrid on
                    powerpc 32/64 after alias improvement merge (gcc
                    r145494)
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: luisgpm at linux dot vnet dot ibm dot com
 GCC build triplet: powerpc*-*-*
  GCC host triplet: powerpc*-*-*
GCC target triplet: powerpc*-*-*


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-06  8:28 ` rguenth at gcc dot gnu dot org
  2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-06  8:28 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
  2009-05-06  8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
@ 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
  2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-11 18:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from luisgpm at linux dot vnet dot ibm dot com  2009-05-11 18:04 -------
Good asm code for a hot loop in swim's "calc1" function

10001e10:       lfd     f12,-10672(r11)
10001e14:       lfd     f9,-10672(r9)
10001e18:       addi    r21,r21,16
10001e1c:       lfd     f7,-10680(r11)
10001e20:       lfd     f6,-10672(r6)
10001e24:       fmul    f3,f9,f9
10001e28:       cmpw    r21,r0
10001e2c:       fadd    f4,f7,f12
10001e30:       lfd     f22,-10680(r9)
10001e34:       lfd     f10,-10664(r9)
10001e38:       addi    r9,r9,16
10001e3c:       lfd     f23,-10672(r5)
10001e40:       lfd     f13,-10664(r5)
10001e44:       addi    r5,r5,16
10001e48:       lfd     f5,-10664(r11)
10001e4c:       fsub    f28,f23,f9
10001e50:       fsub    f25,f13,f10
10001e54:       lfd     f13,-10672(r4)
10001e58:       addi    r11,r11,16
10001e5c:       fadd    f5,f12,f5
10001e60:       fsub    f20,f13,f0
10001e64:       fmul    f9,f11,f9
10001e68:       fmadd   f27,f22,f22,f3
10001e6c:       fmadd   f30,f10,f10,f3
10001e70:       lfd     f3,-10680(r8)
10001e74:       fadd    f26,f4,f6
10001e78:       fmul    f10,f11,f10
10001e7c:       fmul    f24,f28,f2
10001e80:       fmul    f21,f25,f2
10001e84:       fmul    f4,f9,f4
10001e88:       fmadd   f22,f0,f0,f27
10001e8c:       fadd    f27,f8,f7
10001e90:       fadd    f23,f26,f8
10001e94:       fmul    f26,f0,f11
10001e98:       lfd     f8,-10664(r6)
10001e9c:       lfd     f0,-10664(r4)
10001ea0:       addi    r6,r6,16
10001ea4:       fadd    f29,f5,f8
10001ea8:       fsub    f25,f0,f13
10001eac:       addi    r4,r4,16
10001eb0:       fmsub   f28,f20,f1,f24
10001eb4:       lfd     f20,-10672(r8)
10001eb8:       fmul    f5,f10,f5
10001ebc:       addi    r8,r8,16
10001ec0:       stfd    f4,-10672(r22)
10001ec4:       stfd    f5,-10664(r22)
10001ec8:       addi    r22,r22,16
10001ecc:       fmul    f27,f26,f27
10001ed0:       fadd    f24,f6,f29
10001ed4:       fmsub   f29,f25,f1,f21
10001ed8:       fdiv    f28,f28,f23
10001edc:       fmadd   f25,f13,f13,f30
10001ee0:       fadd    f6,f6,f12
10001ee4:       fmadd   f30,f3,f3,f22
10001ee8:       stfd    f27,-10680(r3)
10001eec:       fdiv    f29,f29,f24
10001ef0:       fmadd   f3,f20,f20,f25
10001ef4:       fmul    f20,f13,f11
10001ef8:       fmadd   f7,f30,f31,f7
10001efc:       stfd    f7,-10680(r10)
10001f00:       fmadd   f12,f3,f31,f12
10001f04:       fmul    f13,f20,f6
10001f08:       stfd    f12,-10672(r10)
10001f0c:       stfd    f13,-10672(r3)
10001f10:       addi    r10,r10,16
10001f14:       addi    r3,r3,16
10001f18:       stfd    f28,-10672(r7)
10001f1c:       stfd    f29,-10664(r7)
10001f20:       addi    r7,r7,16
10001f24:       bne     10001e10 <calc1_+0x1b0>

----------
Bad asm code for the same loop

10001a60:       addis   r27,r9,-435
10001a64:       addis   r12,r11,-2176
10001a68:       lfd     f13,-7440(r27)
10001a6c:       lfd     f10,28344(r12)
10001a70:       addis   r8,r11,-1958
10001a74:       addis   r10,r11,-1740
10001a78:       fsub    f7,f10,f13
10001a7c:       lfd     f8,-704(r8)
10001a80:       lfd     f10,0(r9)
10001a84:       addis   r7,r9,-218
10001a88:       addis   r28,r9,1523
10001a8c:       lfd     f9,-29752(r10)
10001a90:       fadd    f6,f12,f10
10001a94:       fsub    f2,f8,f0
10001a98:       addis   r12,r11,218
10001a9c:       addis   r27,r9,2176
10001aa0:       fadd    f5,f11,f9
10001aa4:       fadd    f11,f11,f12
10001aa8:       addi    r9,r9,8
10001aac:       cmpw    r6,r9
10001ab0:       fmul    f1,f7,f30
10001ab4:       fmul    f7,f13,f13
10001ab8:       fmul    f13,f13,f3
10001abc:       fadd    f31,f5,f6
10001ac0:       lfd     f5,29040(r7)
10001ac4:       fmsub   f2,f2,f29,f1
10001ac8:       fmadd   f1,f0,f0,f7
10001acc:       fmul    f0,f0,f3
10001ad0:       fmul    f6,f13,f6
10001ad4:       stfd    f6,-6728(r28)
10001ad8:       fdiv    f2,f2,f31
10001adc:       fmadd   f5,f5,f5,f1
10001ae0:       fmul    f31,f0,f11
10001ae4:       fmr     f0,f8
10001ae8:       stfd    f31,0(r11)
10001aec:       fmr     f11,f9
10001af0:       addi    r11,r11,8
10001af4:       fadd    f1,f5,f4
10001af8:       fmr     f4,f7
10001afc:       fmadd   f5,f1,f28,f12
10001b00:       fmr     f12,f10
10001b04:       stfd    f5,-28344(r27)
10001b08:       stfd    f2,-29040(r12)
10001b0c:       bne+    10001a60 <calc1_+0xe0>

----------

Looking into the differences for both cases, the good code seems to be
traversing the loop in a different way than the bad one, using smaller
displacements for each load/store. The bad case uses bigger displacements.

Also, it looks like we have a bigger unrolling factor on the good case (longer
code, more loads) compared to the bad case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
  2009-05-06  8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
  2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
  2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 10:45 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (2 preceding siblings ...)
  2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
@ 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
  2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from rguenth at gcc dot gnu dot org  2009-05-21 14:04 -------
That's

      DO 100 J=1,N
      DO 100 I=1,M
      CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J)
      CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1)
      Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1)
     1          -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1))
      H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J)
     1               +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J))
  100 CONTINUE

right?

4.4 can do predictive commoning on it while trunk can't - this also unrolls
the loop twice.  On trunk we are likely confused by PRE that already
partially performs what predictive commoning would do.  Disabling PRE
makes predictive commoning work but doesn't unroll the loop (same as
with disabling PRE in 4.4).  It is likely the full redundancies PRE
discovers that cause the unrolling.

That said - this looks like yet another unfortunate pass ordering problem
to me.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-05-21 14:04:13
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (3 preceding siblings ...)
  2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
@ 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
  2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:10 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from rguenth at gcc dot gnu dot org  2009-05-21 14:10 -------
Testcase:

      SUBROUTINE CALC1                                                          
      IMPLICIT REAL*8   (A-H, O-Z)                                              
      PARAMETER (N1=1335, N2=1335)                                              
      COMMON  U(N1,N2), V(N1,N2), P(N1,N2),                                     
     2        CU(N1,N2), CV(N1,N2),                                             
     *        Z(N1,N2), H(N1,N2)                                                
      COMMON /CONS/ DX,DY                                                       
      FSDX = 4.D0/DX                                                            
      FSDY = 4.D0/DY                                                            
      DO 100 J=1,N                                                              
      DO 100 I=1,M                                                              
      CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J)                               
      CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1)                               
      Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1)                 
     1          -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1))               
      H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J)                    
     1               +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J))                          
  100 CONTINUE                                                                  
      RETURN                                                                    
      END


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (4 preceding siblings ...)
  2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
@ 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
  2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-27 20:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from rguenth at gcc dot gnu dot org  2009-05-27 20:57 -------
Actually PRE seems to be more powerful than predictive commoning here.  We
just lose one opportunity while gaining.  With predictive commoning we have
8 loads and 4 stores, 11 multiplications and one division.
With PRE it is 6 loads and 4 stores, 10 multiplications and one division.
The only thing we gain from predictive commoning in 4.4 is unrolling the
loop once.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (5 preceding siblings ...)
  2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
@ 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
  2009-05-29 20:15 ` rguenther at suse dot de
  2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-29 19:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com  2009-05-29 19:52 -------
>From predictive commoning we gain a bit more performance, probably due to the
bigger unrolling factor.

Any chance of the unrolling taking place while still using PRE?

Thanks,
Luis


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (6 preceding siblings ...)
  2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-29 20:15 ` rguenther at suse dot de
  2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2009-05-29 20:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from rguenther at suse dot de  2009-05-29 20:15 -------
Subject: Re:  [4.5 Regression] Big degradation on
 swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)

On Fri, 29 May 2009, luisgpm at linux dot vnet dot ibm dot com wrote:

> ------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com  2009-05-29 19:52 -------
> From predictive commoning we gain a bit more performance, probably due to the
> bigger unrolling factor.
> 
> Any chance of the unrolling taking place while still using PRE?

-funroll[-all]-loops doesn't seem to do the job.  I didn't check if
enabling sms would do it.  Other unrolling on the tree level is only
implemented as side-effect of other optimizations (like vectorization
or predictive commoning or prefetching) :/

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
  2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
                   ` (7 preceding siblings ...)
  2009-05-29 20:15 ` rguenther at suse dot de
@ 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-30 13:11 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from rguenth at gcc dot gnu dot org  2009-11-30 13:11 -------
I believe this was fixed with

2009-07-22  Michael Matz  <matz@suse.de>

        PR tree-optimization/35229
        PR tree-optimization/39300

        * tree-ssa-pre.c (includes): Include tree-scalar-evolution.h.
        (inhibit_phi_insertion): New function.
        (insert_into_preds_of_block): Call it for REFERENCEs.
        (init_pre): Initialize and finalize scalar evolutions.
        * Makefile.in (tree-ssa-pre.o): Depend on tree-scalar-evolution.h .


which avoids the PRE and enables predictive commoning again (on x86_64
only the tail loop of the vectorized variant is).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-11-30 13:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
2009-05-06  8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-29 20:15 ` rguenther at suse dot de
2009-11-30 13:11 ` rguenth at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).