public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) @ 2009-05-05 17:54 luisgpm at linux dot vnet dot ibm dot com 2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org ` (8 more replies) 0 siblings, 9 replies; 10+ messages in thread From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-05 17:54 UTC (permalink / raw) To: gcc-bugs CPU2000's swim and mgrid had ~10% slowdown after the merge of the alias improvement branch. GCC was configured with the following: /gcc/HEAD/configure --target=powerpc64-linux --host=powerpc64-linux --build=powerpc64-linux --with-cpu=default32 --enable-threads=posix --enable-shared --enable-__cxa_atexit --with-gmp=/gmp --with-mpfr=mpfr --with-long-double-128 --enable-decimal-float --enable-secure-plt --disable-bootstrap --disable-alsa --prefix=/install/gcc/HEAD build_alias=powerpc64-linux host_alias=powerpc64-linux target_alias=powerpc64-linux --enable-languages=c,c++,fortran --no-create --no-recursion Compile flags used: -m[32|64] -O3 -mcpu=power[4|5|6] -ffast-math -ftree-loop-linear -funroll-loops -fpeel-loops Will provide more details soon. -- Summary: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: luisgpm at linux dot vnet dot ibm dot com GCC build triplet: powerpc*-*-* GCC host triplet: powerpc*-*-* GCC target triplet: powerpc*-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com @ 2009-05-06 8:28 ` rguenth at gcc dot gnu dot org 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com ` (7 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-06 8:28 UTC (permalink / raw) To: gcc-bugs -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |4.5.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com 2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org @ 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org ` (6 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-11 18:04 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from luisgpm at linux dot vnet dot ibm dot com 2009-05-11 18:04 ------- Good asm code for a hot loop in swim's "calc1" function 10001e10: lfd f12,-10672(r11) 10001e14: lfd f9,-10672(r9) 10001e18: addi r21,r21,16 10001e1c: lfd f7,-10680(r11) 10001e20: lfd f6,-10672(r6) 10001e24: fmul f3,f9,f9 10001e28: cmpw r21,r0 10001e2c: fadd f4,f7,f12 10001e30: lfd f22,-10680(r9) 10001e34: lfd f10,-10664(r9) 10001e38: addi r9,r9,16 10001e3c: lfd f23,-10672(r5) 10001e40: lfd f13,-10664(r5) 10001e44: addi r5,r5,16 10001e48: lfd f5,-10664(r11) 10001e4c: fsub f28,f23,f9 10001e50: fsub f25,f13,f10 10001e54: lfd f13,-10672(r4) 10001e58: addi r11,r11,16 10001e5c: fadd f5,f12,f5 10001e60: fsub f20,f13,f0 10001e64: fmul f9,f11,f9 10001e68: fmadd f27,f22,f22,f3 10001e6c: fmadd f30,f10,f10,f3 10001e70: lfd f3,-10680(r8) 10001e74: fadd f26,f4,f6 10001e78: fmul f10,f11,f10 10001e7c: fmul f24,f28,f2 10001e80: fmul f21,f25,f2 10001e84: fmul f4,f9,f4 10001e88: fmadd f22,f0,f0,f27 10001e8c: fadd f27,f8,f7 10001e90: fadd f23,f26,f8 10001e94: fmul f26,f0,f11 10001e98: lfd f8,-10664(r6) 10001e9c: lfd f0,-10664(r4) 10001ea0: addi r6,r6,16 10001ea4: fadd f29,f5,f8 10001ea8: fsub f25,f0,f13 10001eac: addi r4,r4,16 10001eb0: fmsub f28,f20,f1,f24 10001eb4: lfd f20,-10672(r8) 10001eb8: fmul f5,f10,f5 10001ebc: addi r8,r8,16 10001ec0: stfd f4,-10672(r22) 10001ec4: stfd f5,-10664(r22) 10001ec8: addi r22,r22,16 10001ecc: fmul f27,f26,f27 10001ed0: fadd f24,f6,f29 10001ed4: fmsub f29,f25,f1,f21 10001ed8: fdiv f28,f28,f23 10001edc: fmadd f25,f13,f13,f30 10001ee0: fadd f6,f6,f12 10001ee4: fmadd f30,f3,f3,f22 10001ee8: stfd f27,-10680(r3) 10001eec: fdiv f29,f29,f24 10001ef0: fmadd f3,f20,f20,f25 10001ef4: fmul f20,f13,f11 10001ef8: fmadd f7,f30,f31,f7 10001efc: stfd f7,-10680(r10) 10001f00: fmadd f12,f3,f31,f12 10001f04: fmul f13,f20,f6 10001f08: stfd f12,-10672(r10) 10001f0c: stfd f13,-10672(r3) 10001f10: addi r10,r10,16 10001f14: addi r3,r3,16 10001f18: stfd f28,-10672(r7) 10001f1c: stfd f29,-10664(r7) 10001f20: addi r7,r7,16 10001f24: bne 10001e10 <calc1_+0x1b0> ---------- Bad asm code for the same loop 10001a60: addis r27,r9,-435 10001a64: addis r12,r11,-2176 10001a68: lfd f13,-7440(r27) 10001a6c: lfd f10,28344(r12) 10001a70: addis r8,r11,-1958 10001a74: addis r10,r11,-1740 10001a78: fsub f7,f10,f13 10001a7c: lfd f8,-704(r8) 10001a80: lfd f10,0(r9) 10001a84: addis r7,r9,-218 10001a88: addis r28,r9,1523 10001a8c: lfd f9,-29752(r10) 10001a90: fadd f6,f12,f10 10001a94: fsub f2,f8,f0 10001a98: addis r12,r11,218 10001a9c: addis r27,r9,2176 10001aa0: fadd f5,f11,f9 10001aa4: fadd f11,f11,f12 10001aa8: addi r9,r9,8 10001aac: cmpw r6,r9 10001ab0: fmul f1,f7,f30 10001ab4: fmul f7,f13,f13 10001ab8: fmul f13,f13,f3 10001abc: fadd f31,f5,f6 10001ac0: lfd f5,29040(r7) 10001ac4: fmsub f2,f2,f29,f1 10001ac8: fmadd f1,f0,f0,f7 10001acc: fmul f0,f0,f3 10001ad0: fmul f6,f13,f6 10001ad4: stfd f6,-6728(r28) 10001ad8: fdiv f2,f2,f31 10001adc: fmadd f5,f5,f5,f1 10001ae0: fmul f31,f0,f11 10001ae4: fmr f0,f8 10001ae8: stfd f31,0(r11) 10001aec: fmr f11,f9 10001af0: addi r11,r11,8 10001af4: fadd f1,f5,f4 10001af8: fmr f4,f7 10001afc: fmadd f5,f1,f28,f12 10001b00: fmr f12,f10 10001b04: stfd f5,-28344(r27) 10001b08: stfd f2,-29040(r12) 10001b0c: bne+ 10001a60 <calc1_+0xe0> ---------- Looking into the differences for both cases, the good code seems to be traversing the loop in a different way than the bad one, using smaller displacements for each load/store. The bad case uses bigger displacements. Also, it looks like we have a bigger unrolling factor on the good case (longer code, more loads) compared to the bad case. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com 2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com @ 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org ` (5 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-21 10:45 UTC (permalink / raw) To: gcc-bugs -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (2 preceding siblings ...) 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org @ 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org ` (4 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:04 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from rguenth at gcc dot gnu dot org 2009-05-21 14:04 ------- That's DO 100 J=1,N DO 100 I=1,M CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J) CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1) Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1) 1 -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1)) H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J) 1 +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J)) 100 CONTINUE right? 4.4 can do predictive commoning on it while trunk can't - this also unrolls the loop twice. On trunk we are likely confused by PRE that already partially performs what predictive commoning would do. Disabling PRE makes predictive commoning work but doesn't unroll the loop (same as with disabling PRE in 4.4). It is likely the full redundancies PRE discovers that cause the unrolling. That said - this looks like yet another unfortunate pass ordering problem to me. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2009-05-21 14:04:13 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (3 preceding siblings ...) 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org @ 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org ` (3 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:10 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from rguenth at gcc dot gnu dot org 2009-05-21 14:10 ------- Testcase: SUBROUTINE CALC1 IMPLICIT REAL*8 (A-H, O-Z) PARAMETER (N1=1335, N2=1335) COMMON U(N1,N2), V(N1,N2), P(N1,N2), 2 CU(N1,N2), CV(N1,N2), * Z(N1,N2), H(N1,N2) COMMON /CONS/ DX,DY FSDX = 4.D0/DX FSDY = 4.D0/DY DO 100 J=1,N DO 100 I=1,M CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J) CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1) Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1) 1 -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1)) H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J) 1 +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J)) 100 CONTINUE RETURN END -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (4 preceding siblings ...) 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org @ 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com ` (2 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-05-27 20:58 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from rguenth at gcc dot gnu dot org 2009-05-27 20:57 ------- Actually PRE seems to be more powerful than predictive commoning here. We just lose one opportunity while gaining. With predictive commoning we have 8 loads and 4 stores, 11 multiplications and one division. With PRE it is 6 loads and 4 stores, 10 multiplications and one division. The only thing we gain from predictive commoning in 4.4 is unrolling the loop once. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (5 preceding siblings ...) 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org @ 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com 2009-05-29 20:15 ` rguenther at suse dot de 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org 8 siblings, 0 replies; 10+ messages in thread From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-29 19:52 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com 2009-05-29 19:52 ------- >From predictive commoning we gain a bit more performance, probably due to the bigger unrolling factor. Any chance of the unrolling taking place while still using PRE? Thanks, Luis -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (6 preceding siblings ...) 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com @ 2009-05-29 20:15 ` rguenther at suse dot de 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org 8 siblings, 0 replies; 10+ messages in thread From: rguenther at suse dot de @ 2009-05-29 20:15 UTC (permalink / raw) To: gcc-bugs ------- Comment #6 from rguenther at suse dot de 2009-05-29 20:15 ------- Subject: Re: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) On Fri, 29 May 2009, luisgpm at linux dot vnet dot ibm dot com wrote: > ------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com 2009-05-29 19:52 ------- > From predictive commoning we gain a bit more performance, probably due to the > bigger unrolling factor. > > Any chance of the unrolling taking place while still using PRE? -funroll[-all]-loops doesn't seem to do the job. I didn't check if enabling sms would do it. Other unrolling on the tree level is only implemented as side-effect of other optimizations (like vectorization or predictive commoning or prefetching) :/ Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com ` (7 preceding siblings ...) 2009-05-29 20:15 ` rguenther at suse dot de @ 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu dot org @ 2009-11-30 13:11 UTC (permalink / raw) To: gcc-bugs ------- Comment #7 from rguenth at gcc dot gnu dot org 2009-11-30 13:11 ------- I believe this was fixed with 2009-07-22 Michael Matz <matz@suse.de> PR tree-optimization/35229 PR tree-optimization/39300 * tree-ssa-pre.c (includes): Include tree-scalar-evolution.h. (inhibit_phi_insertion): New function. (insert_into_preds_of_block): Call it for REFERENCEs. (init_pre): Initialize and finalize scalar evolutions. * Makefile.in (tree-ssa-pre.o): Depend on tree-scalar-evolution.h . which avoids the PRE and enables predictive commoning again (on x86_64 only the tail loop of the vectorized variant is). -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-11-30 13:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com 2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com 2009-05-29 20:15 ` rguenther at suse dot de 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).