public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
@ 2009-05-05 17:54 luisgpm at linux dot vnet dot ibm dot com
2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-05 17:54 UTC (permalink / raw)
To: gcc-bugs
CPU2000's swim and mgrid had ~10% slowdown after the merge of the alias
improvement branch.
GCC was configured with the following:
/gcc/HEAD/configure --target=powerpc64-linux --host=powerpc64-linux
--build=powerpc64-linux --with-cpu=default32 --enable-threads=posix
--enable-shared --enable-__cxa_atexit --with-gmp=/gmp --with-mpfr=mpfr
--with-long-double-128 --enable-decimal-float --enable-secure-plt
--disable-bootstrap --disable-alsa --prefix=/install/gcc/HEAD
build_alias=powerpc64-linux host_alias=powerpc64-linux
target_alias=powerpc64-linux --enable-languages=c,c++,fortran --no-create
--no-recursion
Compile flags used: -m[32|64] -O3 -mcpu=power[4|5|6] -ffast-math
-ftree-loop-linear -funroll-loops -fpeel-loops
Will provide more details soon.
--
Summary: [4.5 Regression] Big degradation on swim/mgrid on
powerpc 32/64 after alias improvement merge (gcc
r145494)
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: luisgpm at linux dot vnet dot ibm dot com
GCC build triplet: powerpc*-*-*
GCC host triplet: powerpc*-*-*
GCC target triplet: powerpc*-*-*
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-06 8:28 ` rguenth at gcc dot gnu dot org
2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-06 8:28 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |4.5.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
@ 2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-11 18:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from luisgpm at linux dot vnet dot ibm dot com 2009-05-11 18:04 -------
Good asm code for a hot loop in swim's "calc1" function
10001e10: lfd f12,-10672(r11)
10001e14: lfd f9,-10672(r9)
10001e18: addi r21,r21,16
10001e1c: lfd f7,-10680(r11)
10001e20: lfd f6,-10672(r6)
10001e24: fmul f3,f9,f9
10001e28: cmpw r21,r0
10001e2c: fadd f4,f7,f12
10001e30: lfd f22,-10680(r9)
10001e34: lfd f10,-10664(r9)
10001e38: addi r9,r9,16
10001e3c: lfd f23,-10672(r5)
10001e40: lfd f13,-10664(r5)
10001e44: addi r5,r5,16
10001e48: lfd f5,-10664(r11)
10001e4c: fsub f28,f23,f9
10001e50: fsub f25,f13,f10
10001e54: lfd f13,-10672(r4)
10001e58: addi r11,r11,16
10001e5c: fadd f5,f12,f5
10001e60: fsub f20,f13,f0
10001e64: fmul f9,f11,f9
10001e68: fmadd f27,f22,f22,f3
10001e6c: fmadd f30,f10,f10,f3
10001e70: lfd f3,-10680(r8)
10001e74: fadd f26,f4,f6
10001e78: fmul f10,f11,f10
10001e7c: fmul f24,f28,f2
10001e80: fmul f21,f25,f2
10001e84: fmul f4,f9,f4
10001e88: fmadd f22,f0,f0,f27
10001e8c: fadd f27,f8,f7
10001e90: fadd f23,f26,f8
10001e94: fmul f26,f0,f11
10001e98: lfd f8,-10664(r6)
10001e9c: lfd f0,-10664(r4)
10001ea0: addi r6,r6,16
10001ea4: fadd f29,f5,f8
10001ea8: fsub f25,f0,f13
10001eac: addi r4,r4,16
10001eb0: fmsub f28,f20,f1,f24
10001eb4: lfd f20,-10672(r8)
10001eb8: fmul f5,f10,f5
10001ebc: addi r8,r8,16
10001ec0: stfd f4,-10672(r22)
10001ec4: stfd f5,-10664(r22)
10001ec8: addi r22,r22,16
10001ecc: fmul f27,f26,f27
10001ed0: fadd f24,f6,f29
10001ed4: fmsub f29,f25,f1,f21
10001ed8: fdiv f28,f28,f23
10001edc: fmadd f25,f13,f13,f30
10001ee0: fadd f6,f6,f12
10001ee4: fmadd f30,f3,f3,f22
10001ee8: stfd f27,-10680(r3)
10001eec: fdiv f29,f29,f24
10001ef0: fmadd f3,f20,f20,f25
10001ef4: fmul f20,f13,f11
10001ef8: fmadd f7,f30,f31,f7
10001efc: stfd f7,-10680(r10)
10001f00: fmadd f12,f3,f31,f12
10001f04: fmul f13,f20,f6
10001f08: stfd f12,-10672(r10)
10001f0c: stfd f13,-10672(r3)
10001f10: addi r10,r10,16
10001f14: addi r3,r3,16
10001f18: stfd f28,-10672(r7)
10001f1c: stfd f29,-10664(r7)
10001f20: addi r7,r7,16
10001f24: bne 10001e10 <calc1_+0x1b0>
----------
Bad asm code for the same loop
10001a60: addis r27,r9,-435
10001a64: addis r12,r11,-2176
10001a68: lfd f13,-7440(r27)
10001a6c: lfd f10,28344(r12)
10001a70: addis r8,r11,-1958
10001a74: addis r10,r11,-1740
10001a78: fsub f7,f10,f13
10001a7c: lfd f8,-704(r8)
10001a80: lfd f10,0(r9)
10001a84: addis r7,r9,-218
10001a88: addis r28,r9,1523
10001a8c: lfd f9,-29752(r10)
10001a90: fadd f6,f12,f10
10001a94: fsub f2,f8,f0
10001a98: addis r12,r11,218
10001a9c: addis r27,r9,2176
10001aa0: fadd f5,f11,f9
10001aa4: fadd f11,f11,f12
10001aa8: addi r9,r9,8
10001aac: cmpw r6,r9
10001ab0: fmul f1,f7,f30
10001ab4: fmul f7,f13,f13
10001ab8: fmul f13,f13,f3
10001abc: fadd f31,f5,f6
10001ac0: lfd f5,29040(r7)
10001ac4: fmsub f2,f2,f29,f1
10001ac8: fmadd f1,f0,f0,f7
10001acc: fmul f0,f0,f3
10001ad0: fmul f6,f13,f6
10001ad4: stfd f6,-6728(r28)
10001ad8: fdiv f2,f2,f31
10001adc: fmadd f5,f5,f5,f1
10001ae0: fmul f31,f0,f11
10001ae4: fmr f0,f8
10001ae8: stfd f31,0(r11)
10001aec: fmr f11,f9
10001af0: addi r11,r11,8
10001af4: fadd f1,f5,f4
10001af8: fmr f4,f7
10001afc: fmadd f5,f1,f28,f12
10001b00: fmr f12,f10
10001b04: stfd f5,-28344(r27)
10001b08: stfd f2,-29040(r12)
10001b0c: bne+ 10001a60 <calc1_+0xe0>
----------
Looking into the differences for both cases, the good code seems to be
traversing the loop in a different way than the bad one, using smaller
displacements for each load/store. The bad case uses bigger displacements.
Also, it looks like we have a bigger unrolling factor on the good case (longer
code, more loads) compared to the bad case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 10:45 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (2 preceding siblings ...)
2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
@ 2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:04 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2009-05-21 14:04 -------
That's
DO 100 J=1,N
DO 100 I=1,M
CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J)
CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1)
Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1)
1 -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1))
H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J)
1 +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J))
100 CONTINUE
right?
4.4 can do predictive commoning on it while trunk can't - this also unrolls
the loop twice. On trunk we are likely confused by PRE that already
partially performs what predictive commoning would do. Disabling PRE
makes predictive commoning work but doesn't unroll the loop (same as
with disabling PRE in 4.4). It is likely the full redundancies PRE
discovers that cause the unrolling.
That said - this looks like yet another unfortunate pass ordering problem
to me.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-05-21 14:04:13
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (3 preceding siblings ...)
2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
@ 2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-21 14:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from rguenth at gcc dot gnu dot org 2009-05-21 14:10 -------
Testcase:
SUBROUTINE CALC1
IMPLICIT REAL*8 (A-H, O-Z)
PARAMETER (N1=1335, N2=1335)
COMMON U(N1,N2), V(N1,N2), P(N1,N2),
2 CU(N1,N2), CV(N1,N2),
* Z(N1,N2), H(N1,N2)
COMMON /CONS/ DX,DY
FSDX = 4.D0/DX
FSDY = 4.D0/DY
DO 100 J=1,N
DO 100 I=1,M
CU(I+1,J) = .5D0*(P(I+1,J)+P(I,J))*U(I+1,J)
CV(I,J+1) = .5D0*(P(I,J+1)+P(I,J))*V(I,J+1)
Z(I+1,J+1) = (FSDX*(V(I+1,J+1)-V(I,J+1))-FSDY*(U(I+1,J+1)
1 -U(I+1,J)))/(P(I,J)+P(I+1,J)+P(I+1,J+1)+P(I,J+1))
H(I,J) = P(I,J)+.25D0*(U(I+1,J)*U(I+1,J)+U(I,J)*U(I,J)
1 +V(I,J+1)*V(I,J+1)+V(I,J)*V(I,J))
100 CONTINUE
RETURN
END
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (4 preceding siblings ...)
2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
@ 2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-05-27 20:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from rguenth at gcc dot gnu dot org 2009-05-27 20:57 -------
Actually PRE seems to be more powerful than predictive commoning here. We
just lose one opportunity while gaining. With predictive commoning we have
8 loads and 4 stores, 11 multiplications and one division.
With PRE it is 6 loads and 4 stores, 10 multiplications and one division.
The only thing we gain from predictive commoning in 4.4 is unrolling the
loop once.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (5 preceding siblings ...)
2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
@ 2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-29 20:15 ` rguenther at suse dot de
2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: luisgpm at linux dot vnet dot ibm dot com @ 2009-05-29 19:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com 2009-05-29 19:52 -------
>From predictive commoning we gain a bit more performance, probably due to the
bigger unrolling factor.
Any chance of the unrolling taking place while still using PRE?
Thanks,
Luis
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (6 preceding siblings ...)
2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
@ 2009-05-29 20:15 ` rguenther at suse dot de
2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2009-05-29 20:15 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenther at suse dot de 2009-05-29 20:15 -------
Subject: Re: [4.5 Regression] Big degradation on
swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
On Fri, 29 May 2009, luisgpm at linux dot vnet dot ibm dot com wrote:
> ------- Comment #5 from luisgpm at linux dot vnet dot ibm dot com 2009-05-29 19:52 -------
> From predictive commoning we gain a bit more performance, probably due to the
> bigger unrolling factor.
>
> Any chance of the unrolling taking place while still using PRE?
-funroll[-all]-loops doesn't seem to do the job. I didn't check if
enabling sms would do it. Other unrolling on the tree level is only
implemented as side-effect of other optimizations (like vectorization
or predictive commoning or prefetching) :/
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/40029] [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494)
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
` (7 preceding siblings ...)
2009-05-29 20:15 ` rguenther at suse dot de
@ 2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-30 13:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2009-11-30 13:11 -------
I believe this was fixed with
2009-07-22 Michael Matz <matz@suse.de>
PR tree-optimization/35229
PR tree-optimization/39300
* tree-ssa-pre.c (includes): Include tree-scalar-evolution.h.
(inhibit_phi_insertion): New function.
(insert_into_preds_of_block): Call it for REFERENCEs.
(init_pre): Initialize and finalize scalar evolutions.
* Makefile.in (tree-ssa-pre.o): Depend on tree-scalar-evolution.h .
which avoids the PRE and enables predictive commoning again (on x86_64
only the tail loop of the vectorized variant is).
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40029
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-11-30 13:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-05 17:54 [Bug middle-end/40029] New: [4.5 Regression] Big degradation on swim/mgrid on powerpc 32/64 after alias improvement merge (gcc r145494) luisgpm at linux dot vnet dot ibm dot com
2009-05-06 8:28 ` [Bug middle-end/40029] " rguenth at gcc dot gnu dot org
2009-05-11 18:04 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-21 10:45 ` rguenth at gcc dot gnu dot org
2009-05-21 14:04 ` rguenth at gcc dot gnu dot org
2009-05-21 14:10 ` rguenth at gcc dot gnu dot org
2009-05-27 20:58 ` rguenth at gcc dot gnu dot org
2009-05-29 19:52 ` luisgpm at linux dot vnet dot ibm dot com
2009-05-29 20:15 ` rguenther at suse dot de
2009-11-30 13:11 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).