* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
@ 2008-04-21 7:12 ` ubizjak at gmail dot com
2008-04-21 9:10 ` rguenth at gcc dot gnu dot org
` (26 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-21 7:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from ubizjak at gmail dot com 2008-04-21 07:11 -------
Confirmed.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2008-04-21 07:11:35
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
2008-04-21 7:12 ` [Bug target/34163] " ubizjak at gmail dot com
@ 2008-04-21 9:10 ` rguenth at gcc dot gnu dot org
2008-04-22 16:44 ` ubizjak at gmail dot com
` (25 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-21 9:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from rguenth at gcc dot gnu dot org 2008-04-21 09:09 -------
Well, this bug needs proper analysis and a testcase, but yes, I also see this
slowdown.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
2008-04-21 7:12 ` [Bug target/34163] " ubizjak at gmail dot com
2008-04-21 9:10 ` rguenth at gcc dot gnu dot org
@ 2008-04-22 16:44 ` ubizjak at gmail dot com
2008-04-22 16:52 ` ubizjak at gmail dot com
` (24 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-22 16:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from ubizjak at gmail dot com 2008-04-22 16:43 -------
(In reply to comment #2)
> Well, this bug needs proper analysis and a testcase, but yes, I also see this
> slowdown.
Richi, the only difference in generated code is by backing out your patch [1]
[1] http://gcc.gnu.org/viewcvs?view=rev&revision=129796
Other suspected patches have no effect on generated code for "gfortran
-march=opteron -ffast-math -funroll-loops -ftree-loop-linear -ftree-vectorize
-msse3 -O3"
I will check execution times, but since I have Core2, perhaps there will be no
slowdown. Can you try to benchmark on your target with [1] backed out?
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (2 preceding siblings ...)
2008-04-22 16:44 ` ubizjak at gmail dot com
@ 2008-04-22 16:52 ` ubizjak at gmail dot com
2008-04-22 18:14 ` pinskia at gcc dot gnu dot org
` (23 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-22 16:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from ubizjak at gmail dot com 2008-04-22 16:51 -------
Confirmed also on core2:
benchmarked with patch:
22 Apr 2008 18:47:04 gfortran - Compile nf
command=gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
-ftree-vectorize -msse3 -O3 nf.s -o nf
22 Apr 2008 18:47:04 gfortran - Execute nf
nf Run # 1 21.46340 21.46340 - Error=100.0000%
nf Run # 2 21.45860 21.46100 - Error= 0.0112%
Geometric Mean Execution Time = 21.46 seconds
benchmarked without patch:
22 Apr 2008 18:44:51 gfortran - Compile nf
command=gfortran -march=opteron -ffast-math -funroll-loops -ftree-loop-linear
-ftree-vectorize -msse3 -O3 nf.s -o nf
22 Apr 2008 18:44:51 gfortran - Execute nf
nf Run # 1 19.46120 19.46120 - Error=100.0000%
nf Run # 2 19.46200 19.46160 - Error= 0.0021%
Geometric Mean Execution Time = 19.46 seconds
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (3 preceding siblings ...)
2008-04-22 16:52 ` ubizjak at gmail dot com
@ 2008-04-22 18:14 ` pinskia at gcc dot gnu dot org
2008-04-22 22:20 ` rguenth at gcc dot gnu dot org
` (22 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-04-22 18:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from pinskia at gcc dot gnu dot org 2008-04-22 18:14 -------
(In reply to comment #3)
> [1] http://gcc.gnu.org/viewcvs?view=rev&revision=129796
It was a correctness fix, which usually will slow down generated code. :)
So you have to look at the difference to make sure that the code generated
before was actually producing no overflows.
Thanks,
Andrew Pinski
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (4 preceding siblings ...)
2008-04-22 18:14 ` pinskia at gcc dot gnu dot org
@ 2008-04-22 22:20 ` rguenth at gcc dot gnu dot org
2008-04-24 19:57 ` ubizjak at gmail dot com
` (21 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-22 22:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2008-04-22 22:20 -------
Indeed. It would be interesting to analyze what optimization the folding
enabled
and see if that can be recovered somehow.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (5 preceding siblings ...)
2008-04-22 22:20 ` rguenth at gcc dot gnu dot org
@ 2008-04-24 19:57 ` ubizjak at gmail dot com
2008-04-25 9:56 ` ubizjak at gmail dot com
` (20 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-24 19:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ubizjak at gmail dot com 2008-04-24 19:56 -------
Created an attachment (id=15527)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15527&action=view)
x86_64 asm dump of trisolve procedure (genereated without the patch)
All the difference is in trisolve procedure (attached). The performance will be
10% better, if trisolve in the dump is substituted with attached function.
I'm using -O2 -funroll-loops.
BTW: There are two loops in this asm (.L3 and .L5). In current asm, suspicious
parts are:
movsd 16(%r9), %xmm6
mulsd 16(%r8), %xmm6
and
mulsd -16(%rdx), %xmm0
mulsd -16(%r11), %xmm0
That is - loads from different addresses that are not present in non-patched
asm.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (6 preceding siblings ...)
2008-04-24 19:57 ` ubizjak at gmail dot com
@ 2008-04-25 9:56 ` ubizjak at gmail dot com
2008-04-25 10:24 ` rguenth at gcc dot gnu dot org
` (19 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-25 9:56 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from ubizjak at gmail dot com 2008-04-25 09:55 -------
The problem is indeed in trisolve:
subroutine trisolve(x,i1,i2)
integer :: i1 , i2
real(dpkind),dimension(i2)::x
integer :: i
x(i1) = gi(i1)* x(i1)
do i = i1+1 , i2
x(i) = gi(i)*(x(i)-au1(i-1)*x(i-1))
enddo
do i = i2-1 , i1 , -1
x(i) = x(i) - gi(i)*au1(i)*x(i+1)
enddo
end subroutine trisolve
Please note two very tight loops that calculate x[n] from the value x[n-1],
where x[n-1] is the result of a previous step.
.127t.optimized tree dump for the the first loop (the second loop is the same,
only going from last to first element) in non-regressed case shows:
<bb 4>:
MEM[base: ivtmp.297] = MEM[base: ivtmp.295] * ((MEM[base: ivtmp.297] -
MEM[base: ivtmp.300] * MEM[base: ivtmp.297, offset: 0x0fffffffffffffff8]));
ivtmp.295 = ivtmp.295 + D.3347;
ivtmp.297 = ivtmp.297 + 8;
ivtmp.300 = ivtmp.300 + 8;
ivtmp.304 = ivtmp.304 + 1;
if ((integer(kind=4)) ivtmp.304 == D.1652)
goto <bb 5>;
else
goto <bb 4>;
this code results in:
.L3:
movsd (%r9), %xmm10
addl $4, %edx
movsd (%rcx), %xmm9
X+> mulsd -8(%rcx), %xmm10
movsd 8(%rcx), %xmm7
movsd 16(%rcx), %xmm5
movsd 24(%rcx), %xmm3
subsd %xmm10, %xmm9
mulsd (%rax), %xmm9
addq %r10, %rax
1-> movsd %xmm9, (%rcx)
movsd 8(%r9), %xmm8
1+> mulsd %xmm9, %xmm8
subsd %xmm8, %xmm7
mulsd (%rax), %xmm7
addq %r10, %rax
2-> movsd %xmm7, 8(%rcx)
movsd 16(%r9), %xmm6
2+> mulsd %xmm7, %xmm6
subsd %xmm6, %xmm5
mulsd (%rax), %xmm5
addq %r10, %rax
3-> movsd %xmm5, 16(%rcx)
movsd 24(%r9), %xmm4
addq $32, %r9
3+> mulsd %xmm5, %xmm4
subsd %xmm4, %xmm3
mulsd (%rax), %xmm3
addq %r10, %rax
X-> movsd %xmm3, 24(%rcx)
addq $32, %rcx
cmpl %ebp, %edx
jne .L3
In the code above, it can be seen how unrolled iterations are linked together.
The result from previous iteration (marked with N->) enters next iteration
(marked with N+>).
BTW: Optimizer could also link X-> and X+> but this is probably too much...
Patched gcc regressed in this area:
<bb 4>:
MEM[base: ivtmp.297] = MEM[base: ivtmp.295] * ((MEM[base: ivtmp.297] -
MEM[base: ivtmp.300] * MEM[base: ivtmp.302]));
ivtmp.295 = ivtmp.295 + D.3349;
ivtmp.297 = ivtmp.297 + 8;
ivtmp.300 = ivtmp.300 + 8;
ivtmp.302 = ivtmp.302 + 8;
ivtmp.304 = ivtmp.304 + 1;
if ((integer(kind=4)) ivtmp.304 == D.1652)
goto <bb 5>;
else
goto <bb 4>;
this code results in:
.L3:
movsd (%r9), %xmm10
addl $4, %edx
movsd (%rcx), %xmm9
X-> mulsd (%r8), %xmm10
movsd 8(%rcx), %xmm7
movsd 16(%rcx), %xmm5
movsd 24(%rcx), %xmm3
subsd %xmm10, %xmm9
mulsd (%rax), %xmm9
addq %rbx, %rax
1-> movsd %xmm9, (%rcx)
movsd 8(%r9), %xmm8
1+> mulsd 8(%r8), %xmm8
subsd %xmm8, %xmm7
mulsd (%rax), %xmm7
addq %rbx, %rax
2-> movsd %xmm7, 8(%rcx)
movsd 16(%r9), %xmm6
2+> mulsd 16(%r8), %xmm6
subsd %xmm6, %xmm5
mulsd (%rax), %xmm5
addq %rbx, %rax
3-> movsd %xmm5, 16(%rcx)
movsd 24(%r9), %xmm4
addq $32, %r9
3+> mulsd 24(%r8), %xmm4
addq $32, %r8
subsd %xmm4, %xmm3
mulsd (%rax), %xmm3
addq %rbx, %rax
X-> movsd %xmm3, 24(%rcx)
addq $32, %rcx
cmpl %r12d, %edx
jne .L3
In the code above, the links are broken. In ".+>" case, gcc reloads from memory
the same value that is otherwise available in the register, marked with ".->".
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (7 preceding siblings ...)
2008-04-25 9:56 ` ubizjak at gmail dot com
@ 2008-04-25 10:24 ` rguenth at gcc dot gnu dot org
2008-04-25 11:08 ` ubizjak at gmail dot com
` (18 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-04-25 10:24 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from rguenth at gcc dot gnu dot org 2008-04-25 10:23 -------
Not hoisting the load from x(i) is a missed PRE opportunity. Complete testcase
for the second loop:
subroutine trisolve2(x,i1,i2,nxyz)
integer :: nxyz
real,dimension(nxyz):: au1
real,allocatable,dimension(:) :: gi
integer :: i1 , i2
real,dimension(i2)::x
integer :: i
allocate(gi(nxyz))
do i = i1+1 , i2
x(i) = gi(i)*(x(i)-au1(i-1)*x(i-1))
enddo
end subroutine trisolve2
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (8 preceding siblings ...)
2008-04-25 10:24 ` rguenth at gcc dot gnu dot org
@ 2008-04-25 11:08 ` ubizjak at gmail dot com
2008-12-27 11:24 ` [Bug target/34163] [4.3/4.4 Regression] " ubizjak at gmail dot com
` (17 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-04-25 11:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from ubizjak at gmail dot com 2008-04-25 11:07 -------
(In reply to comment #9)
> Not hoisting the load from x(i) is a missed PRE opportunity. Complete testcase
> for the second loop:
This is actually the first loop.
Just for reference: "-O2 -funroll-loops" flags are needed.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (9 preceding siblings ...)
2008-04-25 11:08 ` ubizjak at gmail dot com
@ 2008-12-27 11:24 ` ubizjak at gmail dot com
2008-12-27 11:28 ` ubizjak at gmail dot com
` (16 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-12-27 11:24 UTC (permalink / raw)
To: gcc-bugs
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|10% performance regression |[4.3/4.4 Regression] 10%
|since Nov 1 on Polyhedron's |performance regression since
|"NF" on AMD64 |Nov 1 on Polyhedron's "NF"
| |on AMD64
Target Milestone|--- |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (10 preceding siblings ...)
2008-12-27 11:24 ` [Bug target/34163] [4.3/4.4 Regression] " ubizjak at gmail dot com
@ 2008-12-27 11:28 ` ubizjak at gmail dot com
2008-12-29 21:17 ` rguenth at gcc dot gnu dot org
` (15 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2008-12-27 11:28 UTC (permalink / raw)
To: gcc-bugs
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (11 preceding siblings ...)
2008-12-27 11:28 ` ubizjak at gmail dot com
@ 2008-12-29 21:17 ` rguenth at gcc dot gnu dot org
2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
` (14 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-12-29 21:17 UTC (permalink / raw)
To: gcc-bugs
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (12 preceding siblings ...)
2008-12-29 21:17 ` rguenth at gcc dot gnu dot org
@ 2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
2009-01-28 3:54 ` rob1weld at aol dot com
` (13 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-01-24 10:28 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from rguenth at gcc dot gnu dot org 2009-01-24 10:20 -------
GCC 4.3.3 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.3 |4.3.4
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (13 preceding siblings ...)
2009-01-24 10:28 ` rguenth at gcc dot gnu dot org
@ 2009-01-28 3:54 ` rob1weld at aol dot com
2009-02-16 10:23 ` bonzini at gnu dot org
` (12 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rob1weld at aol dot com @ 2009-01-28 3:54 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from rob1weld at aol dot com 2009-01-28 03:54 -------
On the Trunk using "-O2" or "-O3" can produce slower code.
I built gcc version 4.4.0 20090126 [trunk revision 143680] for
i386-redhat-linux
and was dismayed to find that libmudflaps had a few FAILs:
Results for 4.4.0 20090126 (experimental) [trunk revision 143680] (GCC)
testsuite on i386-redhat-linux-gnu
http://gcc.gnu.org/ml/gcc-testresults/2009-01/msg02853.html
The file "libmudflap.cth/pass40-frag.c" fails with NO optimization due to:
6100 6200 WARNING: program timed out.
FAIL: libmudflap.cth/pass40-frag.c output pattern test
Since it is only completing 62% of it's task the timeout needs an appropriate
increase.
The file "libmudflap.cth/pass40-frag.c" fails with "-O2" due to:
4100 4200 4300 4400 4500 4600 4700 4800 4900 WARNING: program timed out.
FAIL: libmudflap.cth/pass40-frag.c (-O2) output pattern test
With "-O2" it only completes 49% (slower than default).
The file "libmudflap.cth/pass40-frag.c" fails with "-O3" due to:
5100 5200 5300 5400 5500 5600 5700 5800 WARNING: program timed out.
FAIL: libmudflap.cth/pass40-frag.c (-O3) output pattern test
With "-O3" it only completes 58% (slower than default, but faster than "-O2").
Rob
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (14 preceding siblings ...)
2009-01-28 3:54 ` rob1weld at aol dot com
@ 2009-02-16 10:23 ` bonzini at gnu dot org
2009-06-25 8:25 ` [Bug target/34163] [4.3/4.4/4.5 " ubizjak at gmail dot com
` (11 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: bonzini at gnu dot org @ 2009-02-16 10:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from bonzini at gnu dot org 2009-02-16 10:23 -------
Predictive commoning does exactly what you want.
--
bonzini at gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bonzini at gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (15 preceding siblings ...)
2009-02-16 10:23 ` bonzini at gnu dot org
@ 2009-06-25 8:25 ` ubizjak at gmail dot com
2009-06-25 8:31 ` ubizjak at gmail dot com
` (10 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-25 8:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #14 from ubizjak at gmail dot com 2009-06-25 08:25 -------
(In reply to comment #13)
> Predictive commoning does exactly what you want.
It is not effective for the testcase in Comment #9. The dumps for innermost
loop are the same for -O2 -funroll-loops [-fpredictive-commoning]:
.L6:
movss (%rsi), %xmm9
addl $4, %r8d
mulss (%rcx), %xmm9
movss (%rdx), %xmm8
movss 4(%rdx), %xmm6
movss 8(%rdx), %xmm4
movss 12(%rdx), %xmm2
subss %xmm9, %xmm8
mulss 0(%rbp), %xmm8
movss %xmm8, (%rdx)
movss 4(%rsi), %xmm7
mulss 4(%rcx), %xmm7
subss %xmm7, %xmm6
mulss 4(%rbp), %xmm6
movss %xmm6, 4(%rdx)
movss 8(%rsi), %xmm5
mulss 8(%rcx), %xmm5
subss %xmm5, %xmm4
mulss 8(%rbp), %xmm4
movss %xmm4, 8(%rdx)
movss 12(%rsi), %xmm3
addq $16, %rsi
mulss 12(%rcx), %xmm3
addq $16, %rcx
subss %xmm3, %xmm2
mulss 12(%rbp), %xmm2
addq $16, %rbp
movss %xmm2, 12(%rdx)
addq $16, %rdx
cmpl %r9d, %r8d
jne .L6
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (16 preceding siblings ...)
2009-06-25 8:25 ` [Bug target/34163] [4.3/4.4/4.5 " ubizjak at gmail dot com
@ 2009-06-25 8:31 ` ubizjak at gmail dot com
2009-06-25 9:01 ` rguenth at gcc dot gnu dot org
` (9 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-25 8:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #15 from ubizjak at gmail dot com 2009-06-25 08:31 -------
(In reply to comment #14)
> (In reply to comment #13)
> > Predictive commoning does exactly what you want.
Predictive commoning failed: no suitable chains
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (17 preceding siblings ...)
2009-06-25 8:31 ` ubizjak at gmail dot com
@ 2009-06-25 9:01 ` rguenth at gcc dot gnu dot org
2009-07-03 8:47 ` ubizjak at gmail dot com
` (8 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-06-25 9:01 UTC (permalink / raw)
To: gcc-bugs
------- Comment #16 from rguenth at gcc dot gnu dot org 2009-06-25 09:01 -------
Executing predictive commoning without unrolling.
with -m32. One of the cases SCEV is confused about pointer-plus offsets
being sizetype:
(Data Ref:
stmt: (*x_58(D))[D.1627_54] = D.1638_71;
ref: (*x_58(D))[D.1627_54];
base_object: (*x_58(D))[0];
Access function 0: {(integer(kind=8)) i_43 + -1, +, 1}_1
Access function 1: 0B
vs.
(Data Ref:
stmt: D.1634_67 = (*x_58(D))[D.1632_62];
ref: (*x_58(D))[D.1632_62];
base_object: (*x_58(D))[0];
Access function 0: {(integer(kind=8)) (i_43 + -1) + -1, +, 1}_1
Access function 1: 0B
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (18 preceding siblings ...)
2009-06-25 9:01 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 8:47 ` ubizjak at gmail dot com
2009-07-03 9:08 ` rguenther at suse dot de
` (7 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: ubizjak at gmail dot com @ 2009-07-03 8:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #17 from ubizjak at gmail dot com 2009-07-03 08:46 -------
(In reply to comment #16)
> One of the cases SCEV is confused about pointer-plus offsets being sizetype:
Do we have a solution for this problem...?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (19 preceding siblings ...)
2009-07-03 8:47 ` ubizjak at gmail dot com
@ 2009-07-03 9:08 ` rguenther at suse dot de
2009-07-03 11:06 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2009-07-03 9:08 UTC (permalink / raw)
To: gcc-bugs
------- Comment #18 from rguenther at suse dot de 2009-07-03 09:08 -------
Subject: Re: [4.3/4.4/4.5 Regression] 10% performance
regression since Nov 1 on Polyhedron's "NF" on AMD64
On Fri, 3 Jul 2009, ubizjak at gmail dot com wrote:
> ------- Comment #17 from ubizjak at gmail dot com 2009-07-03 08:46 -------
> (In reply to comment #16)
>
> > One of the cases SCEV is confused about pointer-plus offsets being sizetype:
>
> Do we have a solution for this problem...?
My hope is that no-undefined-overflow will somehow magically solve
these problems ... otherwise no, there is unfortunately no way out
here.
Richard.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (20 preceding siblings ...)
2009-07-03 9:08 ` rguenther at suse dot de
@ 2009-07-03 11:06 ` rguenth at gcc dot gnu dot org
2009-07-03 11:14 ` rguenth at gcc dot gnu dot org
` (5 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 11:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #19 from rguenth at gcc dot gnu dot org 2009-07-03 11:05 -------
In fact, in this case we have the C equivalent
int i;
long j = (long)(i - 1);
vs.
long j = (long)i - 1;
which I believe are equivalent if overflow is undefined (or i - 1 does not
wrap).
It is just that fold obviously considers (long)i - 1 to be more expensive
than (long)(i - 1) and thus does not transform the latter into the former
(and it can't transform (long)i - 1 to (long)(i - 1) as if (long)i - 1
does not overflow there is no guarantee that i - 1 does not).
We should be able to do the former transformation during SCEV analysis
though.
I have a patch which results in (-O3 -ffast-math -funroll-loops)
.L6:
mulss (%rcx), %xmm0
movss (%rdx), %xmm5
movss 4(%rdx), %xmm4
addl $4, %ebp
subss %xmm0, %xmm5
movss 8(%rdx), %xmm0
mulss (%rsi), %xmm5
movss %xmm5, (%rdx)
mulss 4(%rcx), %xmm5
subss %xmm5, %xmm4
mulss 4(%rsi), %xmm4
movss %xmm4, 4(%rdx)
movss 8(%rcx), %xmm3
mulss %xmm4, %xmm3
subss %xmm3, %xmm0
mulss 8(%rsi), %xmm0
movss %xmm0, 8(%rdx)
movss 12(%rcx), %xmm2
addq $16, %rcx
mulss %xmm0, %xmm2
movss 12(%rdx), %xmm0
subss %xmm2, %xmm0
mulss 12(%rsi), %xmm0
addq $16, %rsi
movss %xmm0, 12(%rdx)
addq $16, %rdx
cmpl %r8d, %ebp
jne .L6
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot
|dot org |org
Status|NEW |ASSIGNED
Last reconfirmed|2008-04-21 07:11:35 |2009-07-03 11:05:43
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (21 preceding siblings ...)
2009-07-03 11:06 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 11:14 ` rguenth at gcc dot gnu dot org
2009-07-03 11:22 ` rguenth at gcc dot gnu dot org
` (4 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 11:14 UTC (permalink / raw)
To: gcc-bugs
------- Comment #20 from rguenth at gcc dot gnu dot org 2009-07-03 11:14 -------
Before:
Time for setup 0.139
Time per iteration 0.271
Total Time 6.649
Time for setup 0.136
Time per iteration 0.265
Total Time 10.210
Time for setup 0.134
Time per iteration 0.265
Total Time 7.276
Time for setup 0.134
Time per iteration 0.260
Total Time 11.572
After:
Time for setup 0.114
Time per iteration 0.238
Total Time 5.834
Time for setup 0.111
Time per iteration 0.233
Total Time 8.948
Time for setup 0.110
Time per iteration 0.237
Total Time 6.504
Time for setup 0.112
Time per iteration 0.235
Total Time 10.454
which seems to exactly recover this regression.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (22 preceding siblings ...)
2009-07-03 11:14 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 11:22 ` rguenth at gcc dot gnu dot org
2009-07-03 14:11 ` rguenth at gcc dot gnu dot org
` (3 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 11:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #21 from rguenth at gcc dot gnu dot org 2009-07-03 11:22 -------
Created an attachment (id=18133)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18133&action=view)
patch
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug target/34163] [4.3/4.4/4.5 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (23 preceding siblings ...)
2009-07-03 11:22 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 14:11 ` rguenth at gcc dot gnu dot org
2009-07-03 14:11 ` [Bug middle-end/34163] [4.3/4.4 " rguenth at gcc dot gnu dot org
` (2 subsequent siblings)
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 14:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #22 from rguenth at gcc dot gnu dot org 2009-07-03 14:11 -------
Subject: Bug 34163
Author: rguenth
Date: Fri Jul 3 14:11:14 2009
New Revision: 149207
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=149207
Log:
2009-07-03 Richard Guenther <rguenther@suse.de>
PR middle-end/34163
* tree-chrec.c (chrec_convert_1): Fold (T2)(t +- x) to
(T2)t +- (T2)x if t +- x is known to not overflow and
the conversion widens the operation.
* Makefile.in (tree-chrec.o): Add $(FLAGS_H) dependency.
* gfortran.dg/pr34163.f90: New testcase.
Added:
trunk/gcc/testsuite/gfortran.dg/pr34163.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-chrec.c
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (24 preceding siblings ...)
2009-07-03 14:11 ` rguenth at gcc dot gnu dot org
@ 2009-07-03 14:11 ` rguenth at gcc dot gnu dot org
2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-07-03 14:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #23 from rguenth at gcc dot gnu dot org 2009-07-03 14:11 -------
Fixed on the trunk.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|rguenth at gcc dot gnu dot |unassigned at gcc dot gnu
|org |dot org
Status|ASSIGNED |NEW
Component|target |middle-end
Known to work| |4.5.0
Summary|[4.3/4.4/4.5 Regression] 10%|[4.3/4.4 Regression] 10%
|performance regression since|performance regression since
|Nov 1 on Polyhedron's "NF" |Nov 1 on Polyhedron's "NF"
|on AMD64 |on AMD64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (25 preceding siblings ...)
2009-07-03 14:11 ` [Bug middle-end/34163] [4.3/4.4 " rguenth at gcc dot gnu dot org
@ 2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-08-04 12:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #24 from rguenth at gcc dot gnu dot org 2009-08-04 12:28 -------
GCC 4.3.4 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.4 |4.3.5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread
* [Bug middle-end/34163] [4.3/4.4 Regression] 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64
2007-11-20 15:00 [Bug target/34163] New: 10% performance regression since Nov 1 on Polyhedron's "NF" on AMD64 burnus at gcc dot gnu dot org
` (26 preceding siblings ...)
2009-08-04 12:37 ` rguenth at gcc dot gnu dot org
@ 2010-05-22 18:20 ` rguenth at gcc dot gnu dot org
27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-05-22 18:20 UTC (permalink / raw)
To: gcc-bugs
------- Comment #25 from rguenth at gcc dot gnu dot org 2010-05-22 18:11 -------
GCC 4.3.5 is being released, adjusting target milestone.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|4.3.5 |4.3.6
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34163
^ permalink raw reply [flat|nested] 29+ messages in thread